Cyberinfrastructure to Improve Forest Health and Productivity: The Role of Tree Databases in Connecting Genomes, Phenomes, and the Environment

Jill L Wegrzyn; Margaret A Staton; Nathaniel R Street; Dorrie Main; Emily Grau; Nic Herndon; Sean Buehler; Taylor Falk; Sumaira Zaman; Risharde Ramnath; Peter Richter; Lang Sun; Bradford Condon; Abdullah Almsaeed; Ming Chen; Chanaka Mannapperuma; Sook Jung; Stephen Ficklin

doi:10.3389/fpls.2019.00813

Cyberinfrastructure to Improve Forest Health and Productivity: The Role of Tree Databases in Connecting Genomes, Phenomes, and the Environment

Front Plant Sci. 2019 Jun 25:10:813. doi: 10.3389/fpls.2019.00813. eCollection 2019.

Authors

Jill L Wegrzyn¹, Margaret A Staton², Nathaniel R Street³, Dorrie Main⁴, Emily Grau¹, Nic Herndon¹, Sean Buehler¹, Taylor Falk¹, Sumaira Zaman¹, Risharde Ramnath¹, Peter Richter¹, Lang Sun¹, Bradford Condon², Abdullah Almsaeed², Ming Chen², Chanaka Mannapperuma³, Sook Jung⁴, Stephen Ficklin⁴

Affiliations

¹ Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, United States.
² Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, Knoxville, TN, United States.
³ Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Sweden.
⁴ Department of Horticulture, Washington State University, Pullman, WA, United States.

Abstract

Despite tremendous advancements in high throughput sequencing, the vast majority of tree genomes, and in particular, forest trees, remain elusive. Although primary databases store genetic resources for just over 2,000 forest tree species, these are largely focused on sequence storage, basic genome assemblies, and functional assignment through existing pipelines. The tree databases reviewed here serve as secondary repositories for community data. They vary in their focal species, the data they curate, and the analytics provided, but they are united in moving toward a goal of centralizing both data access and analysis. They provide frameworks to view and update annotations for complex genomes, interrogate systems level expression profiles, curate data for comparative genomics, and perform real-time analysis with genotype and phenotype data. The organism databases of today are no longer simply catalogs or containers of genetic information. These repositories represent integrated cyberinfrastructure that support cross-site queries and analysis in web-based environments. These resources are striving to integrate across diverse experimental designs, sequence types, and related measures through ontologies, community standards, and web services. Efficient, simple, and robust platforms that enhance the data generated by the research community, contribute to improving forest health and productivity.

Keywords: bioinformatics; content management system; database; forest tree; web services.