Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jul 3;5:47-66.
doi: 10.4137/ebo.s2320.

Initial Implementation of a Comparative Data Analysis Ontology

Free PMC article

Initial Implementation of a Comparative Data Analysis Ontology

Francisco Prosdocimi et al. Evol Bioinform Online. .
Free PMC article


Comparative analysis is used throughout biology. When entities under comparison (e.g. proteins, genomes, species) are related by descent, evolutionary theory provides a framework that, in principle, allows N-ary comparisons of entities, while controlling for non-independence due to relatedness. Powerful software tools exist for specialized applications of this approach, yet it remains under-utilized in the absence of a unifying informatics infrastructure. A key step in developing such an infrastructure is the definition of a formal ontology. The analysis of use cases and existing formalisms suggests that a significant component of evolutionary analysis involves a core problem of inferring a character history, relying on key concepts: "Operational Taxonomic Units" (OTUs), representing the entities to be compared; "character-state data" representing the observations compared among OTUs; "phylogenetic tree", representing the historical path of evolution among the entities; and "transitions", the inferred evolutionary changes in states of characters that account for observations. Using the Web Ontology Language (OWL), we have defined these and other fundamental concepts in a Comparative Data Analysis Ontology (CDAO). CDAO has been evaluated for its ability to represent token data sets and to support simple forms of reasoning. With further development, CDAO will provide a basis for tools (for semantic transformation, data retrieval, validation, integration, etc.) that make it easier for software developers and biomedical researchers to apply evolutionary methods of inference to diverse types of data, so as to integrate this powerful framework for reasoning into their research.

Keywords: character data; comparative method; evolution; ontology; phylogeny.


Figure 1
Figure 1
ontology development strategy. The strategy for development of CDAO was modified from that suggested by Stevens et al. We began by studying use cases. After deciding on a representation system, we conceptualized domain knowledge by identifying, defining, and classifying terms for key concepts and relations. These concepts and relations were formalized, and then subjected to evaluation as described in the text.
Figure 2
Figure 2
Illustration of some key concepts in evolutionary analysis. These data on a hypothetical family of proteins may be used to illustrate various concepts that are familiar in the domain of comparative evolutionary analysis. Phylogenetic trees have tips that typically represent currently existing biological entities (here proteins) that are referred to as OTUs, and that are associated with character-state data. The tips of the tree are linked to their ancestors (internal nodes) by branches or edges. Aligned sites in a protein-coding sequence are a type of character with a coordinate system (1 … 10) and with discrete states comprising nucleotides (A, T, C, G) or an alignment gap (−). Individual characters can be combined to form a compound character, e.g. 3 consecutive base-pairs combined to represent a single codon. The cellular location represented by a Gene Ontology (GO) term is also a discrete character that can be analyzed using the comparative evolutionary approach. An example of a continuous character would be the response of the protein to a chemical inhibitor (here shown as an IC50 value in micromolar). ND indicates that the state of a character is unknown for a given OTU.
Figure 3
Figure 3
Some key concepts and relations formalized in CDAO. Domain-specific terms in CDAO represent either classes, shown by ovals and boxes, or properties (also called “relations”), shown by lines with arrows. The subsumption property “is_a” relates a class to its superclass (solid lines). other properties are defined in CDAO and discussed in the text (dashed lines).
Figure 4
Figure 4
Annotation of rooted and unrooted evolutionary trees using CDAO concepts and relations. a) An example of a rooted tree showing how the concepts and relations defined in CDAO can be used to represent the topology of the tree and associated data. In particular, important evolutionary concepts, such as the Most Recent Common Ancestor (MRCA) can be specified. In the case of a rooted tree, the edges (or branches) of the tree are directed and the relations has_parent_node and has_child_node are used. b) The representation of an unrooted tree using CDAO. here, the direction of the edges is unknown and the relations has_Left_node and has_Right_node are used. Unrooted trees may contain subtrees for which the ancestor node is known, and in this case a rooted subtree can be specified using the has_Root relation.
Figure 5
Figure 5
An example of instance data in the NEXUS format used commonly in phylogenetics.

Similar articles

See all similar articles

Cited by 15 articles

  • BioHackathon 2015: Semantics of data for life sciences and reproducible research.
    Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, Moriya Y, Tokimatsu T, Yamaguchi A, Yamamoto Y, Wu H, Amstutz P, Antezana E, Aoki NP, Arakawa K, Bolleman JT, Bolton E, Bonnal RJP, Bono H, Burger K, Chiba H, Cohen KB, Deutsch EW, Fernández-Breis JT, Fu G, Fujisawa T, Fukushima A, García A, Goto N, Groza T, Hercus C, Hoehndorf R, Itaya K, Juty N, Kawashima T, Kim JH, Kinjo AR, Kotera M, Kozaki K, Kumagai S, Kushida T, Lütteke T, Matsubara M, Miyamoto J, Mohsen A, Mori H, Naito Y, Nakazato T, Nguyen-Xuan J, Nishida K, Nishida N, Nishide H, Ogishima S, Ohta T, Okuda S, Paten B, Perret JL, Prathipati P, Prins P, Queralt-Rosinach N, Shinmachi D, Suzuki S, Tabata T, Takatsuki T, Taylor K, Thompson M, Uchiyama I, Vieira B, Wei CH, Wilkinson M, Yamada I, Yamanaka R, Yoshitake K, Yoshizawa AC, Dumontier M, Kosaki K, Takagi T. Vos RA, et al. F1000Res. 2020 Feb 24;9:136. doi: 10.12688/f1000research.18236.1. eCollection 2020. F1000Res. 2020. PMID: 32308977 Free PMC article.
  • The Orthology Ontology: development and applications.
    Fernández-Breis JT, Chiba H, Legaz-García Mdel C, Uchiyama I. Fernández-Breis JT, et al. J Biomed Semantics. 2016 Jun 4;7(1):34. doi: 10.1186/s13326-016-0077-x. J Biomed Semantics. 2016. PMID: 27259657 Free PMC article.
  • Evolutionary relationships among barley and Arabidopsis core circadian clock and clock-associated genes.
    Calixto CP, Waugh R, Brown JW. Calixto CP, et al. J Mol Evol. 2015 Feb;80(2):108-19. doi: 10.1007/s00239-015-9665-0. Epub 2015 Jan 22. J Mol Evol. 2015. PMID: 25608480 Free PMC article.
  • Big data and other challenges in the quest for orthologs.
    Sonnhammer EL, Gabaldón T, Sousa da Silva AW, Martin M, Robinson-Rechavi M, Boeckmann B, Thomas PD, Dessimoz C; Quest for Orthologs consortium. Sonnhammer EL, et al. Bioinformatics. 2014 Nov 1;30(21):2993-8. doi: 10.1093/bioinformatics/btu492. Epub 2014 Jul 26. Bioinformatics. 2014. PMID: 25064571 Free PMC article.
  • Advancing data reuse in phyloinformatics using an ontology-driven Semantic Web approach.
    Panahiazar M, Sheth AP, Ranabahu A, Vos RA, Leebens-Mack J. Panahiazar M, et al. BMC Med Genomics. 2013;6 Suppl 3(Suppl 3):S5. doi: 10.1186/1755-8794-6-S3-S5. Epub 2013 Nov 11. BMC Med Genomics. 2013. PMID: 24565381 Free PMC article.
See all "Cited by" articles


    1. Wakefield MJ, Maxwell P, Huttley GA. Vestige: maximum likelihood phylogenetic footprinting. BMC Bioinformatics. 2005;6 - PMC - PubMed
    1. Kolodny R, Koehl P, Levitt M. Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol. 2005 Mar 4;346(4:):1173–88. - PMC - PubMed
    1. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003 Jul 1;31(13:):3812–14. - PMC - PubMed
    1. Harvey PH, Pagel MD. The Comparative Method in Evolutionary Biology. Oxford; Oxford University Press; 1991.
    1. Pagel M. The Comparative Method. In: Pagel M, editor. Encyclopedia of Evolution. Vol. 1. New York: Oxford University Press; 2002. pp. 183–90.

LinkOut - more resources