Representing genetic sequence data for pharmacogenomics: an evolutionary approach using ontological and relational models

Bioinformatics. 2002;18 Suppl 1:S207-15. doi: 10.1093/bioinformatics/18.suppl_1.s207.

Abstract

Motivation: The information model chosen to store biological data affects the types of queries possible, database performance, and difficulty in updating that information model. Genetic sequence data for pharmacogenetics studies can be complex, and the best information model to use may change over time. As experimental and analytical methods change, and as biological knowledge advances, the data storage requirements and types of queries needed may also change.

Results: We developed a model for genetic sequence and polymorphism data, and used XML Schema to specify the elements and attributes required for this model. We implemented this model as an ontology in a frame-based representation and as a relational model in a database system. We collected genetic data from two pharmacogenetics resequencing studies, and formulated queries useful for analysing these data. We compared the ontology and relational models in terms of query complexity, performance, and difficulty in changing the information model. Our results demonstrate benefits of evolving the schema for storing pharmacogenetics data: ontologies perform well in early design stages as the information model changes rapidly and simplify query formulation, while relational models offer improved query speed once the information model and types of queries needed stabilize.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms
  • Database Management Systems*
  • Databases, Genetic*
  • Gene Expression Profiling / methods*
  • Hypermedia
  • Information Storage and Retrieval / methods*
  • Models, Genetic*
  • Pharmacogenetics / methods*
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*