Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Sep 22;12(9):842-859.
doi: 10.1016/j.cels.2021.06.005.

Path to improving the life cycle and quality of genome-scale models of metabolism

Affiliations
Review

Path to improving the life cycle and quality of genome-scale models of metabolism

Yara Seif et al. Cell Syst. .

Abstract

Genome-scale models of metabolism (GEMs) are key computational tools for the systems-level study of metabolic networks. Here, we describe the "GEM life cycle," which we subdivide into four stages: inception, maturation, specialization, and amalgamation. We show how different types of GEM reconstruction workflows fit in each stage and proceed to highlight two fundamental bottlenecks for GEM quality improvement: GEM maturation and content removal. We identify common characteristics contributing to increasing quality of maturing GEMs drawing from past independent GEM maturation efforts. We then shed some much-needed light on the latent and unrecognized but pervasive issue of content removal, demonstrating the substantial effects of model pruning on its solution space. Finally, we propose a novel framework for content removal and associated confidence-level assignment which will help guide future GEM development efforts, reduce duplication of effort across groups, potentially aid automated reconstruction platforms, and boost the reproducibility of model development.

Keywords: functional annotation; metabolic modeling; metabolic reconstructions; systems biology.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1:
Figure 1:. The GEM life cycle can be subdivided into 4 phases:
1) GEM inception; metabolic models are built for the first time by drawing from existing reference database and models and iterating through several curation cycles, 2) GEM maturation; an existing manually curated GEM is continually updated over the years after its inception as new knowledge comes to light, 3) GEM specialization; an existing high quality GEM is tailored to a specific strain, cell line or diseased state using an ‘omic’ data set, and 4) GEM amalgamation; high quality GEMs are joined together to form a multi-cellular model.
Figure 2:
Figure 2:. The eight GEM maturation characteristics:
A) increasing degree of compartmentalization; an increasing number of cellular compartments are modeled accounting for sectionalization of metabolism, B) advances in knowledge; as novel pathways and increased details in known pathways are uncovered, a maturing GEM reflects increasing organism specificity with network structure features driving improvements and discovery, C) increasingly informed modeling assumptions; as more data covering a specific organisms is generated, the biomass function of a maturing GEM increases in complexity, D) technology driven advances; with the emergence of ‘omics’ data, gene knockout screens, and an increasing number of Biolog phenotype microarray data sets available, the diversity of approaches to validate a model increases E) enriched object-associated information; with the emergence and expansion of diverse reference databases, objects in GEMs are increasingly associated with crossreferences, F) crowd-sourcing; a higher diversity of expertise is used through direct (GitHub repositories, Jamborees) and indirect crowd-sourcing (sequential maturing of GEMs).
Figure 3:
Figure 3:. Content removal forms part of every stage in the GEM life cycle.
The number of removed or added instances is normalized across the plots by the total number of instances in the oldest database or model included in the analysis. A) Reference database updates. Fluctuations in both removed and added pathways are shown for each update going from 2003 to 2017. Numbers were obtained from the reports in the corresponding publications (Karp et al., 2002; Krieger, 2004; Caspi et al., 2008, 2015, 2018), B) GEM inception. Addition and removal of genes, reactions, and metabolites are plotted for the 773 reconstructions of gut microbes. ModelSEED draft reconstructions were obtained courtesy of Thiele et. al, and we compared their modeled content with the corresponding curated reconstructions. Counts are normalized with respect to the number of instances in the ModelSEED reconstructions. C) GEM maturation. Content removal occurs at every update, with genes being affected the least. Models were downloaded for E. coli (Reed et al., 2003; Feist et al., 2007; Orth et al., 2011; Monk et al., 2017; Fang, Lloyd and Palsson, 2020), H. sapiens (Duarte et al., 2007; Thiele et al., 2013; Swainston et al., 2016; Brunk et al., 2018) and S. cerevisiae (Duarte, Herrgård and Palsson, 2004; Mo, Palsson and Herrgård, 2009; Heavner et al., 2012, 2013; Aung, Henry and Walker, 2013; Lu et al., 2019) from the BiGG database(Norsigian et al., 2020), VMH(Noronha et al., 2019), BioModels(Chelliah et al., 2015), yeast.sf.net (Aung, Henry and Walker, 2013), and Silicolife(Home - SilicoLife, no date). iMM904 is shaded to mark a change of namespace which caused the sudden increase in the percent total of metabolite and reaction removals. D) GEM specialization. The 410 strain-specific models of Salmonella (Seif et al., 2018; Seif, Monk, Machado, et al., 2019) were obtained from the BIGG database and the 126 tissue-specific models were downloaded from Wang et. al (Wang, Eddy and Price, 2012) and we compared the content of each model with the content of the starting reference models STM.v1.0 and Human Recon 1, respectively.
Figure 4:
Figure 4:
Types of content removal can be subdivided into: A) content duplication. From left to right duplicate reaction caused by; inconsistent metabolite identifier (pi_e and pi__e identify extracellular phosphate), diverging stoichiometries (three phosphate molecules transported to the cytoplasm versus one phosphate molecule transported), reaction identifier mismatch (Plt1 and Plt2 are both the same transport reaction), multiple directionalities (bi-directional transport versus import of hydrogen across the outer membrane), compartment promiscuity (erroneous assignment of a phosphoglucomutase to the periplasmic compartment converting D-glucose1phosphate to D-glucose-6-phosphate), and duplicate pathways (conversion of isocitrate to citrate via a two-step pathway and via a lumped reaction). N.B., these cases serve as examples of possible content duplication but should be checked against the literature. For example, an organism could encode two phosphate transporters both of which catalyze a transport process but with different stoichiometry. B) Organism nonspecific pathways; left: example of a heme pathway initially added to the S. aureus GEM with incorrect differential cofactor usage due to lack of knowledge and subsequently corrected as a result of recent discoveries (Lobo et al., 2015; Seif, Monk, Mih, et al., 2019), right: example of a plant pathway for vitamin K1 (phylloquinone) biosynthesis which was added to the GEM of a gram-positive bacterium likely due to insufficient curation. C) Low confidence modeling assumptions. From left to right; orphan taurocholate amidohydrolase reaction annotated as being catalyzed by a C59 family penicillin amidase, promiscuous assignment of plsX as a phosphate acyltransferase, a isohexadecanoylglycerol-3-phosphate O-acyltransferase and an isopentadecanoyl-glycerol-3-phosphate O-acyltransferase (among others), gap filled putrescine biosynthesis pathway with erroneous gene assignment. D) Low confidence gene annotation; transport reaction assigned to a low confidence transporter, generic reaction with long “OR” based gene reaction rule, E) Datadriven evidence. Reaction catalyzed by a gene carrying a loss-of-function mutation, and reaction catalyzed by a gene that is not expressed in a cell. Abbreviations: X_c = cytoplasmic, X_e = extracellular, X_p = periplasmic, h = hydrogen, pi = phosphate, icit = isocitrate, acon__C = cis-aconitate, cit = citrate, cpppg3 = coproporphyrinogen III, amet = S-adenosyl-L-methionine, met__L = L-methionine, fum = fumarate, succ = succinate, pppg9 = protoporphyrinogen IX, o2 = oxygen, h2o = water, co2 = carbon dioxide, ppp9 = protoporphyrin, sbzcoa = O-succinylbenzoylcoA, dhna = 1,4-dihydroxy-2-naphthoate, phyQ = phylloquinone, 14dhncoa = 1,4-dihydroxy-2napthoyl-coA, fa3coa = fatty acid (Iso-C15:0)-coA, fa3coa = fatty acid (Iso-C16:0)-coA, 1ipsg3p = 1-isopentadecanoyl-sn-glycerol 3-phosphate, 1ihgly3p = 1-isohexadecanoyl-sn-glycerol_3phosphate, arg = L-arginine, agm = agmatine, ptrc = putrescine, arbt = arbutin
Figure 5:
Figure 5:
Effect of removal of content on A) The solution space: We illustrate here the effect of deleting a reaction in a dummy model containing two pathways (p1 and p2). We show how the removal of reaction 4 translates to two added zeros in the stoichiometric matrix (S) and two modified constraints, B) Reaction essentiality: We extracted two iterations from three GEM sequels for each of S. aureus (Bosi et al., 2016; Seif, Monk, Mih, et al., 2019), M. tuberculosis (Jamshidi and Palsson, 2007; Kavvas et al., 2018), and P. putida (Nogales, Palsson and Thiele, 2008; Nogales et al., 2020). For each species, we found the set of removed reactions between the latest model and its previous version. We proceeded to re-introduce randomly sampled subsets of the deleted reactions into the latest model. We observe that as deleted content is added back to the reconstruction, fewer reactions are essential, C) Thermodynamic consistency: Here, we limited our analysis to the S. aureus GEM sequel. Similar to B, we iteratively add randomly picked subset sizes of the deleted reactions. However, as reactions are reintroduced, we simulate for the model’s ability to freely produce cofactors with all nutrient exchanges blocked (Fritzemeier et al., 2017). As deleted content is added back to the reconstruction, more cofactors can be freely generated in the model. Abbreviations: v1 = flux through reaction 1, p1 = pathway 1, S = stoichiometric matrix, ATP = adenosine triphosphate, NADH = reduced nicotinamide adenine dinucleotide, NADPH = reduced nicotinamide adenine dinucleotide phosphate, MQL8 = menaquinol 8.

Similar articles

Cited by

References

    1. Agren R et al. (2014) ‘Identification of anticancer drugs for hepatocellular carcinoma through personalized genome-scale metabolic modeling’, Molecular systems biology, 10(3), p. 721. - PMC - PubMed
    1. Altman T et al. (2013) ‘A systematic comparison of the MetaCyc and KEGG pathway databases’, BMC bioinformatics, 14, p. 112. - PMC - PubMed
    1. Arkin AP et al. (2018) ‘KBase: The United States Department of Energy Systems Biology Knowledgebase’, Nature biotechnology, 36(7), pp. 566–569. - PMC - PubMed
    1. Aung HW, Henry SA and Walker LP (2013) ‘Revising the Representation of Fatty Acid, Glycerolipid, and Glycerophospholipid Metabolism in the Consensus Model of Yeast Metabolism’, Industrial biotechnology, 9(4), pp. 215–228. - PMC - PubMed
    1. Aurich MK, Fleming RMT and Thiele I (2016) ‘MetaboTools: A Comprehensive Toolbox for Analysis of Genome-Scale Metabolic Models’, Frontiers in physiology, 7, p. 327. - PMC - PubMed

Publication types