Identification of protein coding regions in the human genome by quadratic discriminant analysis

Proc Natl Acad Sci U S A. 1997 Jan 21;94(2):565-8. doi: 10.1073/pnas.94.2.565.

Abstract

A new method for predicting internal coding exons in genomic DNA sequences has been developed. This method is based on a prediction algorithm that uses the quadratic discriminant function for multivariate statistical pattern recognition. Substantial improvements have been made (with only 9 discriminant variables) when compared with existing methods: HEXON [Solovyev, V. V., Salamov, A. A. & Lawrence, C. B. (1994) Nucleic Acids Res. 22, 5156-5163] (based on linear discriminant analysis) and GRAIL2 [Uberbacher, E. C. & Mural, R. J. (1991) Proc. Natl. Acad. Sci. USA 88, 11261-11265] (based on neural networks). A computer program called MZEF is freely available to the genome community and allows users to adjust prior probability and to output alternative overlapping exons.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Base Sequence
  • Exons*
  • Genes*
  • Human Genome Project
  • Humans
  • Molecular Sequence Data
  • Multivariate Analysis
  • Proteins / genetics
  • Sequence Analysis, DNA / methods*
  • Software

Substances

  • Proteins

Associated data

  • GENBANK/J02843
  • GENBANK/J02846
  • GENBANK/J02933
  • GENBANK/J03059
  • GENBANK/J03930
  • GENBANK/J04038
  • GENBANK/J04617
  • GENBANK/J04988
  • GENBANK/J05096
  • GENBANK/J05451
  • GENBANK/K00650
  • GENBANK/K03021
  • GENBANK/L05072
  • GENBANK/L10615
  • GENBANK/L10641
  • GENBANK/L11910
  • GENBANK/L13470
  • GENBANK/L14565
  • GENBANK/L14927
  • GENBANK/M10612
  • GENBANK/M11228
  • GENBANK/M12523
  • GENBANK/M13792
  • GENBANK/M15205
  • GENBANK/M15840
  • GENBANK/M16110
  • GENBANK/M17262
  • GENBANK/M19645
  • GENBANK/M20543
  • GENBANK/M24461