Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences

Genome Res. 1996 Sep;6(9):846-57. doi: 10.1101/gr.6.9.846.

Abstract

A large set of mRNA and encoded protein sequences, from orthologous murine and human genes, was compiled to analyze statistical, biological, and evolutionary properties of coding and noncoding transcribed sequences. Protein sequence conservation varied between 36% and 100% identity, with an average value of 85%. The average degree of nucleotide sequence identity for the corresponding coding sequences was also approximately 85%, whereas 5' and 3' untranslated regions (UTRs) were less conserved, with aligned identities of 67% and 69%, respectively. For some mouse and human genes, nucleotide sequences are more highly conserved than the encoded protein sequences. A subset of 32 sequences, consisting of only mouse/human protein pairs for which the human sequence represents a positionally cloned disease gene, had properties very similar to the larger data set, suggesting that our data are representative of the genome as a whole. With respect to sequence conservation, two interesting outliers are the breast cancer (BRCAI) gene product and the testis-determining factor (SRY), both of which display among the lowest degrees of sequence identity. The occurrence of both introns and repetitive elements (e.g., Alu, Bl) in 5' and 3' UTRs was also studied. These results provide one benchmark for the "comparative genomics" of mice and humans, with practical implications for the cross-referencing of transcript maps. Also, they should prove useful in estimating the additional sampling diversity provided by mouse EST sequencing projects designed to complement the existing human cDNA collection.

Publication types

  • Comparative Study

MeSH terms

  • Amino Acid Sequence
  • Animals
  • BRCA1 Protein / genetics
  • Base Sequence
  • Breast Neoplasms / genetics
  • Conserved Sequence
  • DNA-Binding Proteins / genetics
  • Female
  • Humans
  • Male
  • Mice / genetics*
  • Molecular Sequence Data
  • Nuclear Proteins*
  • Proteins / chemistry
  • Proteins / genetics*
  • RNA, Messenger / chemistry
  • RNA, Messenger / genetics*
  • Sex Determination Analysis
  • Sex-Determining Region Y Protein
  • Testis
  • Transcription Factors*

Substances

  • BRCA1 Protein
  • DNA-Binding Proteins
  • Nuclear Proteins
  • Proteins
  • RNA, Messenger
  • SRY protein, human
  • Sex-Determining Region Y Protein
  • Sry protein, mouse
  • Transcription Factors

Associated data

  • GENBANK/J02984
  • GENBANK/J04615
  • GENBANK/L10838
  • GENBANK/L19761
  • GENBANK/M16707
  • GENBANK/M17085
  • GENBANK/M18533
  • GENBANK/M19311
  • GENBANK/M24194
  • GENBANK/M28209
  • GENBANK/M29474
  • GENBANK/M29870
  • GENBANK/M57298
  • GENBANK/M61108
  • GENBANK/M62302
  • GENBANK/M85289
  • GENBANK/U03271
  • GENBANK/U14394
  • GENBANK/X00090
  • GENBANK/X03342
  • GENBANK/X06499
  • GENBANK/X13916
  • GENBANK/X15643
  • GENBANK/X16312
  • GENBANK/X16940
  • GENBANK/X60484
  • GENBANK/X61123
  • GENBANK/X79238
  • GENBANK/X80910