An ontology approach to comparative phenomics in plants

Anika Oellrich; Ramona L Walls; Ethalinda Ks Cannon; Steven B Cannon; Laurel Cooper; Jack Gardiner; Georgios V Gkoutos; Lisa Harper; Mingze He; Robert Hoehndorf; Pankaj Jaiswal; Scott R Kalberer; John P Lloyd; David Meinke; Naama Menda; Laura Moore; Rex T Nelson; Anuradha Pujar; Carolyn J Lawrence; Eva Huala

doi:10.1186/s13007-015-0053-y

An ontology approach to comparative phenomics in plants

Plant Methods. 2015 Feb 25:11:10. doi: 10.1186/s13007-015-0053-y. eCollection 2015.

Authors

Anika Oellrich^#¹, Ramona L Walls^#², Ethalinda Ks Cannon³, Steven B Cannon^{4

5}, Laurel Cooper⁶, Jack Gardiner⁷, Georgios V Gkoutos⁸, Lisa Harper⁴, Mingze He⁷, Robert Hoehndorf⁹, Pankaj Jaiswal⁶, Scott R Kalberer⁴, John P Lloyd¹⁰, David Meinke¹¹, Naama Menda¹², Laura Moore⁶, Rex T Nelson⁴, Anuradha Pujar¹², Carolyn J Lawrence^{5

7}, Eva Huala¹³

Affiliations

¹ Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA UK.
² iPlant Collaborative, University of Arizona, 1657 E. Helen St., Tucson, Arizona 85721 USA.
³ Department of Electrical and Computer Engineering Iowa State University, 1018 Crop Informatics Lab, Ames, Iowa 50011 USA.
⁴ USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Crop Genome Informatics Lab, Iowa State University, Ames, IA 50011 USA.
⁵ Department of Agronomy, Agronomy Hall, Iowa State University, Ames, IA 50010 USA.
⁶ Department of Botany and Plant Pathology, 2082 Cordley Hall, Oregon State University, Corvallis, OR 97331 USA.
⁷ Department of Genetics, Development and Cell Biology, Roy J Carver Co-Laboratory, Iowa State University, Ames, IA 50010 USA.
⁸ Department of Computer Science, Aberystwyth University, Llandinam Building, Aberystwyth, SY23 3DB UK.
⁹ Computer, Electrical and Mathematical Sciences & Engineering Division and Computational Bioscience Research Center, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, P.O. Box 2882, Thuwal, 23955-6900 Kingdom of Saudi Arabia.
¹⁰ Department of Plant Biology, Michigan State University, 220 Trowbridge Rd, East Lansing, MI 48824 USA.
¹¹ Department of Botany, Oklahoma State University, 301 Physical Sciences, Stillwater, OK 74078 USA.
¹² Boyce Thompson Institute for Plant Research, 533 Tower Road, Ithaca, NY 14853 USA.
¹³ Phoenix Bioinformatics, 643 Bair Island Rd Suite 403, Redwood City, CA 94063 USA.

^# Contributed equally.

Abstract

Background: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework.

Results: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes.

Conclusions: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.