A most wanted list of conserved microbial protein families with no known domains

PLoS One. 2018 Oct 17;13(10):e0205749. doi: 10.1371/journal.pone.0205749. eCollection 2018.

Abstract

The number and proportion of genes with no known function are growing rapidly. To quantify this phenomenon and provide criteria for prioritizing genes for functional characterization, we developed a bioinformatics pipeline that identifies robustly defined protein families with no annotated domains, ranks these with respect to phylogenetic breadth, and identifies them in metagenomics data. We applied this approach to 271 965 protein families from the SFams database and discovered many with no functional annotation, including >118 000 families lacking any known protein domain. From these, we prioritized 6 668 conserved protein families with at least three sequences from organisms in at least two distinct classes. These Function Unknown Families (FUnkFams) are present in Tara Oceans Expedition and Human Microbiome Project metagenomes, with distributions associated with sampling environment. Our findings highlight the extent of functional novelty in sequence databases and establish an approach for creating a "most wanted" list of genes to prioritize for further characterization.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins / chemistry*
  • Bacterial Proteins / genetics
  • Computational Biology
  • Databases, Nucleic Acid*
  • Humans
  • Metagenome / genetics*
  • Metagenomics
  • Microbiota / genetics*
  • Phylogeny
  • Protein Domains / genetics*
  • Sequence Homology, Nucleic Acid

Substances

  • Bacterial Proteins

Grant support

This work was supported by the Gordon & Betty Moore Foundation, grant #3300, https://www.moore.org/initiative-strategy-detail?initiativeId=marine-microbiology-initiative (KSP); National Science Foundation, grant #DMS-1563159, https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=5300 (KSP); Lab support from Gladstone Institutes (KSP). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.