Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May 1;8(5):giz046.
doi: 10.1093/gigascience/giz046.

PseudoFuN: Deriving functional potentials of pseudogenes from integrative relationships with genes and microRNAs across 32 cancers

Affiliations

PseudoFuN: Deriving functional potentials of pseudogenes from integrative relationships with genes and microRNAs across 32 cancers

Travis S Johnson et al. Gigascience. .

Abstract

Background: Long thought "relics" of evolution, not until recently have pseudogenes been of medical interest regarding regulation in cancer. Often, these regulatory roles are a direct by-product of their close sequence homology to protein-coding genes. Novel pseudogene-gene (PGG) functional associations can be identified through the integration of biomedical data, such as sequence homology, functional pathways, gene expression, pseudogene expression, and microRNA expression. However, not all of the information has been integrated, and almost all previous pseudogene studies relied on 1:1 pseudogene-parent gene relationships without leveraging other homologous genes/pseudogenes.

Results: We produce PGG families that expand beyond the current 1:1 paradigm. First, we construct expansive PGG databases by (i) CUDAlign graphics processing unit (GPU) accelerated local alignment of all pseudogenes to gene families (totaling 1.6 billion individual local alignments and >40,000 GPU hours) and (ii) BLAST-based assignment of pseudogenes to gene families. Second, we create an open-source web application (PseudoFuN [Pseudogene Functional Networks]) to search for integrative functional relationships of sequence homology, microRNA expression, gene expression, pseudogene expression, and gene ontology. We produce four "flavors" of CUDAlign-based databases (>462,000,000 PGG pairwise alignments and 133,770 PGG families) that can be queried and downloaded using PseudoFuN. These databases are consistent with previous 1:1 PGG annotation and also are much more powerful including millions of de novo PGG associations. For example, we find multiple known (e.g., miR-20a-PTEN-PTENP1) and novel (e.g., miR-375-SOX15-PPP4R1L) microRNA-gene-pseudogene associations in prostate cancer. PseudoFuN provides a "one stop shop" for identifying and visualizing thousands of potential regulatory relationships related to pseudogenes in The Cancer Genome Atlas cancers.

Conclusions: Thousands of new PGG associations can be explored in the context of microRNA-gene-pseudogene co-expression and differential expression with a simple-to-use online tool by bioinformaticians and oncologists alike.

Keywords: competing endogenous RNA; database; functional prediction; gene regulation; graphics processing unit; high-performance computing; network analysis; pseudogenes.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Workflow for both CUDAlign and BLAST databases. Left side PGG families are produced using the BLAST matches. Right side PGG families are produced using the PGG family alignment matrix with percentile cutoffs using CUDAlign.
Figure 2:
Figure 2:
The number pseudogenes that align to gene families. The x-axis is the number of gene families, which have an alignment score above a specified cutoff (the different colored lines). The y-axis is the number of pseudogenes with an alignment score higher than the cutoff to the number of gene families on the x-axis. The inset gray box is a closer view of the low-range gene family numbers (1–10) to show higher-resolution patterns.
Figure 3:
Figure 3:
Comparison of database members. The top six plots are comparisons between the CUDAlign databases using different cutoffs, the BLAST database, and the Pseudogene.org parent genes. The bottom row shows intra-database comparisons, left: Pseudogene.org, middle: CUDAlign database of different alignment score cutoffs, right: relative size of all databases.
Figure 4:
Figure 4:
PseudoFuN online output for SOX15 PGG family. A, Interactive graph visualization of the SOX15 PGG network. B, TCGA prostate co-expression matrix for SOX15 PGG family genes and pseudogenes across normal samples. C, TCGA prostate co-expression matrix for SOX15 PGG family genes and pseudogenes across tumor samples. D, Negatively correlated miRNAs for all members of the SOX15 PGG family. E, Differential gene and pseudogene expression for tumor and normal samples for each member of the SOX15 PGG family in the prostate cancer TCGA dataset. FPKM: fragments per kilobase million.

Similar articles

Cited by

References

    1. Vanin EF. Processed pseudogenes: characteristics and evolution. Annu Rev Genet. 1985;19:253–72. - PubMed
    1. Mighell AJ, Smith NR, Robinson PA, et al. .. Vertebrate pseudogenes. FEBS Lett. 2000;468:109–14. - PubMed
    1. Pink RC, Wicks K, Caley DP, et al. .. Pseudogenes: pseudo-functional or key regulators in health and disease?. RNA. 2011;17:792–8. - PMC - PubMed
    1. Chan JJ, Tay Y. Noncoding RNA:RNA regulatory networks in cancer. Int J Mol Sci. 2018;19:E1310. - PMC - PubMed
    1. Poliseno L, Salmena L, Zhang J, et al. .. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature. 2010;465:1033–8. - PMC - PubMed

Publication types