Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship

Marie A Brunet; Sébastien A Levesque; Darel J Hunting; Alan A Cohen; Xavier Roucou

doi:10.1101/gr.230938.117

Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship

Genome Res. 2018 May;28(5):609-624. doi: 10.1101/gr.230938.117. Epub 2018 Apr 6.

Authors

Marie A Brunet^{1

2

3}, Sébastien A Levesque⁴, Darel J Hunting⁵, Alan A Cohen², Xavier Roucou^{1

3}

Affiliations

¹ Biochemistry Department, Université de Sherbrooke, Quebec J1E 4K8, Canada.
² Groupe de recherche PRIMUS, Department of Family and Emergency Medicine, Quebec J1H 5N4, Canada.
³ PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec G1V 0A6, Canada.
⁴ Pediatric Department, Centre Hospitalier de l'Université de Sherbrooke, Quebec J1H 5N4, Canada.
⁵ Department of Nuclear Medicine & Radiobiology, Université de Sherbrooke, Quebec J1H 5N4, Canada.

Abstract

Technological advances promise unprecedented opportunities for whole exome sequencing and proteomic analyses of populations. Currently, data from genome and exome sequencing or proteomic studies are searched against reference genome annotations. This provides the foundation for research and clinical screening for genetic causes of pathologies. However, current genome annotations substantially underestimate the proteomic information encoded within a gene. Numerous studies have now demonstrated the expression and function of alternative (mainly small, sometimes overlapping) ORFs within mature gene transcripts. This has important consequences for the correlation of phenotypes and genotypes. Most alternative ORFs are not yet annotated because of a lack of evidence, and this absence from databases precludes their detection by standard proteomic methods, such as mass spectrometry. Here, we demonstrate how current approaches tend to overlook alternative ORFs, hindering the discovery of new genetic drivers and fundamental research. We discuss available tools and techniques to improve identification of proteins from alternative ORFs and finally suggest a novel annotation system to permit a more complete representation of the transcriptomic and proteomic information contained within a gene. Given the crucial challenge of distinguishing functional ORFs from random ones, the suggested pipeline emphasizes both experimental data and conservation signatures. The addition of alternative ORFs in databases will render identification less serendipitous and advance the pace of research and genomic knowledge. This review highlights the urgent medical and research need to incorporate alternative ORFs in current genome annotations and thus permit their inclusion in hypotheses and models, which relate phenotypes and genotypes.

Publication types

Research Support, Non-U.S. Gov't
Review

MeSH terms

Alternative Splicing / genetics*
Exons / genetics*
Genetic Association Studies*
Genomics / methods
Humans
Introns / genetics*
Models, Genetic
Open Reading Frames / genetics*
Promoter Regions, Genetic / genetics*
Proteomics / methods

Abstract

Publication types

MeSH terms

Grants and funding