SNPLims: a data management system for genome wide association studies

BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S13. doi: 10.1186/1471-2105-9-S2-S13.

Abstract

Background: Recent progresses in genotyping technologies allow the generation high-density genetic maps using hundreds of thousands of genetic markers for each DNA sample. The availability of this large amount of genotypic data facilitates the whole genome search for genetic basis of diseases. We need a suitable information management system to efficiently manage the data flow produced by whole genome genotyping and to make it available for further analyses.

Results: We have developed an information system mainly devoted to the storage and management of SNP genotype data produced by the Illumina platform from the raw outputs of genotyping into a relational database. The relational database can be accessed in order to import any existing data and export user-defined formats compatible with many different genetic analysis programs. After calculating family-based or case-control association study data, the results can be imported in SNPLims. One of the main features is to allow the user to rapidly identify and annotate statistically relevant polymorphisms from the large volume of data analyzed. Results can be easily visualized either graphically or creating ASCII comma separated format output files, which can be used as input to further analyses.

Conclusions: The proposed infrastructure allows to manage a relatively large amount of genotypes for each sample and an arbitrary number of samples and phenotypes. Moreover, it enables the users to control the quality of the data and to perform the most common screening analyses and identify genes that become "candidate" for the disease under consideration.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromosome Mapping / methods*
  • DNA Mutational Analysis / methods*
  • Database Management Systems*
  • Databases, Genetic*
  • Information Storage and Retrieval / methods*
  • Linkage Disequilibrium / genetics*
  • Polymorphism, Single Nucleotide / genetics*
  • User-Computer Interface*