Microbial comparative pan-genomics using binomial mixture models
- PMID: 19691844
- PMCID: PMC2907702
- DOI: 10.1186/1471-2164-10-385
Microbial comparative pan-genomics using binomial mixture models
Abstract
Background: The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made, using regression methods or mixture models. We extend the latter approach by using statistical ideas developed for capture-recapture problems in ecology and epidemiology.
Results: We estimate core- and pan-genome sizes for 16 different bacterial species. The results reveal a complex dependency structure for most species, manifested as heterogeneous detection probabilities. Estimated pan-genome sizes range from small (around 2600 gene families) in Buchnera aphidicola to large (around 43000 gene families) in Escherichia coli. Results for Echerichia coli show that as more data become available, a larger diversity is estimated, indicating an extensive pool of rarely occurring genes in the population.
Conclusion: Analyzing pan-genomics data with binomial mixture models is a way to handle dependencies between genomes, which we find is always present. A bottleneck in the estimation procedure is the annotation of rarely occurring genes.
Figures
Similar articles
-
Robust identification of orthologues and paralogues for microbial pan-genomics using GET_HOMOLOGUES: a case study of pIncA/C plasmids.Methods Mol Biol. 2015;1231:203-32. doi: 10.1007/978-1-4939-1720-4_14. Methods Mol Biol. 2015. PMID: 25343868
-
A process for analysis of microarray comparative genomics hybridisation studies for bacterial genomes.BMC Genomics. 2008 Jan 29;9:53. doi: 10.1186/1471-2164-9-53. BMC Genomics. 2008. PMID: 18230148 Free PMC article.
-
Different evolutionary trends form the twilight zone of the bacterial pan-genome.Microb Genom. 2021 Sep;7(9):000670. doi: 10.1099/mgen.0.000670. Microb Genom. 2021. PMID: 34559043 Free PMC article.
-
Comparative genomics: the bacterial pan-genome.Curr Opin Microbiol. 2008 Oct;11(5):472-7. doi: 10.1016/j.mib.2008.09.006. Curr Opin Microbiol. 2008. PMID: 19086349 Review.
-
Optimizing the Parametrization of Homologue Classification in the Pan-Genome Computation for a Bacterial Species: Case Study Streptococcus pyogenes.Methods Mol Biol. 2022;2449:299-324. doi: 10.1007/978-1-0716-2095-3_13. Methods Mol Biol. 2022. PMID: 35507269 Review.
Cited by
-
Under-Appreciated Phylogroup Diversity of Escherichia coli within and between Animals at the Urban-Wildland Interface.Appl Environ Microbiol. 2023 Jun 28;89(6):e0014223. doi: 10.1128/aem.00142-23. Epub 2023 May 16. Appl Environ Microbiol. 2023. PMID: 37191541 Free PMC article.
-
Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow.Microorganisms. 2023 Jan 3;11(1):119. doi: 10.3390/microorganisms11010119. Microorganisms. 2023. PMID: 36677411 Free PMC article.
-
Salmonella enterica serovar Cerro displays a phylogenetic structure and genomic features consistent with virulence attenuation and adaptation to cattle.Front Microbiol. 2022 Nov 30;13:1005215. doi: 10.3389/fmicb.2022.1005215. eCollection 2022. Front Microbiol. 2022. PMID: 36532462 Free PMC article.
-
Dissecting Complex Traits Using Omics Data: A Review on the Linear Mixed Models and Their Application in GWAS.Plants (Basel). 2022 Nov 28;11(23):3277. doi: 10.3390/plants11233277. Plants (Basel). 2022. PMID: 36501317 Free PMC article. Review.
-
The draft genome of Andean Rhodopseudomonas sp. strain AZUL predicts genome plasticity and adaptation to chemical homeostasis.BMC Microbiol. 2022 Dec 9;22(1):297. doi: 10.1186/s12866-022-02685-w. BMC Microbiol. 2022. PMID: 36494611 Free PMC article.
References
-
- Read TD, Ussery DW. Opening the pan-genomics box. Current Opinion in Microbiology. 2006;9 doi: 10.1016/j.mib.2006.08.010. - DOI
-
- Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AJ, Durkin AS, DeBoy RT, Davidsen TM, Mora M, Scarselli M, y Ros IM, Peterson JD, Hauser CR, Sundaram JP, Nelson WC, Madupu R, Brinkac LM, Dodson RJ, Rosovitz MJ, Sullivan SA, Daugherty SC, Haft DH, Selengut J, Gwinn ML, Zhou L, Zafar N, Khouri H, Radune D, Dimitrov G, Watkins K, OConnor KJB, Smith S, Utterback TR, White O, Rubens EC, Grandi G, Madoff LC, Kasper DL, Telford JL, Wessels MR, Rappuoli R, Fraser CM. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial pan-genome. PNAS. 2005;102:13950–13955. doi: 10.1073/pnas.0506758102. - DOI - PMC - PubMed
-
- Chen S, Hung C, Xu J, Reigstad C, Magrini V, Sabo A, Blasiar D, Bieri T, Meyer R, Ozersky P, Armstrong J, Fulton R, Latreille J, Spieth J, Hooton T, Merdis E, Hultgren S, Gordon J. Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: A comparative genomics approach. PNAS. 2006;103(15):5977–5982. doi: 10.1073/pnas.0600938103. - DOI - PMC - PubMed
-
- Rasko D, Rosovitz GMJ, Myers, Mongodin E, Fricke W, Gajer P, Crabtree J, Sebaihia M, Thomson N, Chaudhuri R, Henderson I, Sperandio V, Ravel J. The Pangenome Structure of Escherichia coli: Comparative Genomic Analysis of E. coli Commensal and Pathogenic Isolates. Journal of Bacteriology. 2008;190(20):6881–6893. doi: 10.1128/JB.00619-08. - DOI - PMC - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
