Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 26 (23), 2952-60

ALCHEMY: A Reliable Method for Automated SNP Genotype Calling for Small Batch Sizes and Highly Homozygous Populations

Affiliations

ALCHEMY: A Reliable Method for Automated SNP Genotype Calling for Small Batch Sizes and Highly Homozygous Populations

Mark H Wright et al. Bioinformatics.

Abstract

Motivation: The development of new high-throughput genotyping products requires a significant investment in testing and training samples to evaluate and optimize the product before it can be used reliably on new samples. One reason for this is current methods for automated calling of genotypes are based on clustering approaches which require a large number of samples to be analyzed simultaneously, or an extensive training dataset to seed clusters. In systems where inbred samples are of primary interest, current clustering approaches perform poorly due to the inability to clearly identify a heterozygote cluster.

Results: As part of the development of two custom single nucleotide polymorphism genotyping products for Oryza sativa (domestic rice), we have developed a new genotype calling algorithm called 'ALCHEMY' based on statistical modeling of the raw intensity data rather than modelless clustering. A novel feature of the model is the ability to estimate and incorporate inbreeding information on a per sample basis allowing accurate genotyping of both inbred and heterozygous samples even when analyzed simultaneously. Since clustering is not used explicitly, ALCHEMY performs well on small sample sizes with accuracy exceeding 99% with as few as 18 samples.

Availability: ALCHEMY is available for both commercial and academic use free of charge and distributed under the GNU General Public License at http://alchemy.sourceforge.net/

Contact: mhw6@cornell.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Figures

Fig. 1.
Fig. 1.
Density plot of log intensities across all A allele probes for one sample (black solid line) and fit of Gaussian mixture distribution (gray dashed line).
Fig. 2.
Fig. 2.
Effect of increasing number of samples which are simultaneously analyzed for ALCHEMY and BRLMM-P (Affymetrix 44K).
Fig. 3.
Fig. 3.
Trade-off between accuracy and completeness of the dataset generated by varying the threshold at which genotypes with lower posterior probabilities are declared ‘no call’ and dropped from the final dataset. Note the limited range of the y-axis.

Similar articles

See all similar articles

Cited by 17 PubMed Central articles

See all "Cited by" articles

References

    1. Affymetrix Inc. BRLMM: an improved genotype calling method for the genechip® mapping 500k array set. 2006 Available at http://affymetrix.com/support/technical/whitepapers/brlmm_whitepaper.pdf (last accessed date September 29, 2010)
    1. Buckler ES, et al. The genetic architecture of maize flowering time. Science. 2009;325:714–718. - PubMed
    1. Carvalho B, et al. Exploration, normalization, and genotype calls of high-density oligonucleotide snp array data. Biostatistics. 2007;8:485–499. - PubMed
    1. Fan JB, et al. Highly parallel snp genotyping. Cold Spring Harb. Symp. Quant. Biol. 2003;68:69–78. - PubMed
    1. Garris AJ, et al. Genetic structure and diversity in Oryza sativa l. Genetics. 2005;169:1631–1638. - PMC - PubMed

Publication types

Feedback