Integrating transcriptomic and proteomic data for accurate assembly and annotation of genomes
- PMID: 28003436
- PMCID: PMC5204337
- DOI: 10.1101/gr.201368.115
Integrating transcriptomic and proteomic data for accurate assembly and annotation of genomes
Abstract
Complementing genome sequence with deep transcriptome and proteome data could enable more accurate assembly and annotation of newly sequenced genomes. Here, we provide a proof-of-concept of an integrated approach for analysis of the genome and proteome of Anopheles stephensi, which is one of the most important vectors of the malaria parasite. To achieve broad coverage of genes, we carried out transcriptome sequencing and deep proteome profiling of multiple anatomically distinct sites. Based on transcriptomic data alone, we identified and corrected 535 events of incomplete genome assembly involving 1196 scaffolds and 868 protein-coding gene models. This proteogenomic approach enabled us to add 365 genes that were missed during genome annotation and identify 917 gene correction events through discovery of 151 novel exons, 297 protein extensions, 231 exon extensions, 192 novel protein start sites, 19 novel translational frames, 28 events of joining of exons, and 76 events of joining of adjacent genes as a single gene. Incorporation of proteomic evidence allowed us to change the designation of more than 87 predicted "noncoding RNAs" to conventional mRNAs coded by protein-coding genes. Importantly, extension of the newly corrected genome assemblies and gene models to 15 other newly assembled Anopheline genomes led to the discovery of a large number of apparent discrepancies in assembly and annotation of these genomes. Our data provide a framework for how future genome sequencing efforts should incorporate transcriptomic and proteomic analysis in combination with simultaneous manual curation to achieve near complete assembly and accurate annotation of genomes.
© 2017 Prasad et al.; Published by Cold Spring Harbor Laboratory Press.
Figures
Similar articles
-
Annotation of the zebrafish genome through an integrated transcriptomic and proteomic analysis.Mol Cell Proteomics. 2014 Nov;13(11):3184-98. doi: 10.1074/mcp.M114.038299. Epub 2014 Jul 24. Mol Cell Proteomics. 2014. PMID: 25060758 Free PMC article.
-
Multi-Omics Driven Assembly and Annotation of the Sandalwood (Santalum album) Genome.Plant Physiol. 2018 Apr;176(4):2772-2788. doi: 10.1104/pp.17.01764. Epub 2018 Feb 12. Plant Physiol. 2018. PMID: 29440596 Free PMC article.
-
De novo reconstruction of the Toxoplasma gondii transcriptome improves on the current genome annotation and reveals alternatively spliced transcripts and putative long non-coding RNAs.BMC Genomics. 2012 Dec 12;13:696. doi: 10.1186/1471-2164-13-696. BMC Genomics. 2012. PMID: 23231500 Free PMC article.
-
Proteogenomics of rare taxonomic phyla: A prospective treasure trove of protein coding genes.Proteomics. 2016 Jan;16(2):226-40. doi: 10.1002/pmic.201500263. Epub 2015 Nov 23. Proteomics. 2016. PMID: 26773550 Review.
-
Improving genome assemblies and annotations for nonhuman primates.ILAR J. 2013;54(2):144-53. doi: 10.1093/ilar/ilt037. ILAR J. 2013. PMID: 24174438 Free PMC article. Review.
Cited by
-
Nematode gene annotation by machine-learning-assisted proteotranscriptomics enables proteome-wide evolutionary analysis.Genome Res. 2023 Jan;33(1):112-128. doi: 10.1101/gr.277070.122. Epub 2023 Jan 18. Genome Res. 2023. PMID: 36653121 Free PMC article.
-
Microbes of traditional fermentation processes as synthetic biology chassis to tackle future food challenges.Front Bioeng Biotechnol. 2022 Sep 16;10:982975. doi: 10.3389/fbioe.2022.982975. eCollection 2022. Front Bioeng Biotechnol. 2022. PMID: 36185425 Free PMC article.
-
Proteotranscriptomics - A facilitator in omics research.Comput Struct Biotechnol J. 2022 Jul 9;20:3667-3675. doi: 10.1016/j.csbj.2022.07.007. eCollection 2022. Comput Struct Biotechnol J. 2022. PMID: 35891789 Free PMC article. Review.
-
Dissecting Plasmodium yoelii Pathobiology: Proteomic Approaches for Decoding Novel Translational and Post-Translational Modifications.ACS Omega. 2022 Mar 2;7(10):8246-8257. doi: 10.1021/acsomega.1c03892. eCollection 2022 Mar 15. ACS Omega. 2022. PMID: 35309442 Free PMC article.
-
Advances in Understanding Leishmania Pathobiology: What Does RNA-Seq Tell Us?Front Cell Dev Biol. 2021 Sep 1;9:702240. doi: 10.3389/fcell.2021.702240. eCollection 2021. Front Cell Dev Biol. 2021. PMID: 34540827 Free PMC article. Review.
References
-
- Armengaud J. 2009. A perfect genome annotation is within reach with the proteomics and genomics alliance. Curr Opin Microbiol 12: 292–300. - PubMed
-
- Brunner E, Ahrens CH, Mohanty S, Baetschmann H, Loevenich S, Potthast F, Deutsch EW, Panse C, de Lichtenberg U, Rinner O, et al. 2007. A high-quality catalog of the Drosophila melanogaster proteome. Nat Biotechnol 25: 576–583. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources