Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug 1;10(8):1920-1926.
doi: 10.1093/gbe/evy143.

Recombination Signal in Mycobacterium tuberculosis Stems from Reference-guided Assemblies and Alignment Artefacts

Affiliations

Recombination Signal in Mycobacterium tuberculosis Stems from Reference-guided Assemblies and Alignment Artefacts

Maxime Godfroid et al. Genome Biol Evol. .

Abstract

DNA acquisition via genetic recombination is considered advantageous as it has the potential to bring together beneficial mutations that emerge independently within a population. Furthermore, recombination is considered to contribute to the maintenance of genome stability by purging slightly deleterious mutations. The prevalence of recombination differs among prokaryotic species and depends on the accessibility of DNA transfer mechanisms. An exceptional example is the human pathogen Mycobacterium tuberculosis (MTB) where no clear transfer mechanisms have been so far characterized and the presence of recombination is questioned. Here, we analyze completely assembled MTB genomes in search for evidence of recombination. We find that putative recombination events are enriched in strains reconstructed by reference-guided assembly and in regions with unreliable alignments. In addition, assembly and alignment artefacts introduce phylogenetic signals that are conflicting the established MTB phylogeny. Our results reveal that the so far reported recombination events in MTB are likely to stem from methodological artefacts. We conclude that no reliable signal of recombination is observed in the currently available MTB genomes. Moreover, our study demonstrates the limitations of reference-guided genome assembly for phylogenetic reconstructions. Rigorously de novo assembled genomes of high quality are mandatory in order to distinguish true evolutionary signal from noise, in particular for low diversity species such as MTB.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
—Phylogenetic tree inferred from the concatenated amino acid alignments of the 2,650 complete single-copy protein families. The total alignment length is 786,198 amino acids, 4,305 (0.55%) of the sites are variable and 1,714 (0.22%) are parsimony informative. The number of strains grouped in the collapsed clade is shown in brackets. See supplementary figure S2, Supplementary Material online for the complete phylogeny. The root position is estimated using midpoint rooting, the MAD method (Tria et al. 2017), and the outgroup method with Mycobacterium canetti based on 2,557 complete single-copy protein families. All approaches infer the root position on the branch splitting lineage 1 from the others; this result is in agreement with previous analyses (Comas et al. 2013).
<sc>Fig</sc>. 2.
Fig. 2.
—Properties of recombined segments. Out of the 1,297 recombined segments, 993 (76.56%) segments are inferred to terminal branches and 304 (23.44%) segments are inferred to internal branches. 505 (50.85%) segments from the terminal branches are found on problematic assemblies and 488 (49.15%) segments are found in other strains. (A) Distribution of 993 recombined segments in terminal branches. Colors denote lineages as follows: Blue for lineage 1, red for lineage 2, black for lineage 3, and green for lineage 4 (see also fig. 1). Problematic assemblies are marked with yellow star. (B) Distribution of gap contribution to recombined segments. The proportion of gapped positions in a recombined segment is calculated as the number of alignment positions where at least two strains have no gaps and at least one strain has a gap, divided by the length of the segment (excluding positions with only one strain having no gap).
<sc>Fig</sc>. 3.
Fig. 3.
—Examples of incongruent phylogenies. (A) Phylogeny of universal genomic region 35 inferred from the complete alignment (1,164,225 nt, 1.2% variable sites, 0.92% parsimony informative sites, HoT score: 50.41%). (B) Phylogeny of universal genomic region 35 estimated from the alignment with unreliable positions removed (586,857 nt, 0.71% variable sites, 0.43% parsimony informative sites). See also supplementary figure S3, Supplementary Material online. (C) Phylogeny of universal genomic region 40 inferred from the complete alignment (611,340 nt, 0.68% variable sites, 0.46% parsimony informative sites, HoT score: 98.28%). (D) Splits network of universal genomic region 40 with unreliable positions removed (600,805 nt, 0.62% variable sites, 0.41% parsimony informative sites). See also supplementary figure S5, Supplementary Material online. Five problematic strains are identified: NZ_CP010340.1 and NZ_CP010338.1 were removed from RefSeq, and NZ_CP009100.1, NZ_CP009101.1, and NC_021054.1 from lineage 2 are H37Rv-guided assemblies.

Similar articles

Cited by

References

    1. Achtman M. 2008. Evolution, population structure, and phylogeography of genetically monomorphic bacterial pathogens. Annu Rev Microbiol. 62:53–70. - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.. 1990. Basic local alignment search tool. J Mol Biol. 215:403–410. - PubMed
    1. Baltrus DA, Guillemin K, Phillips PC.. 2008. Natural transformation increases the rate of adaptation in the human pathogen Helicobacter pylori. Evolution 62:39–49. - PubMed
    1. Benjamini Y, Speed TP.. 2012. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 4010:e72.. - PMC - PubMed
    1. Bertels F, Silander OK, Pachkov M, Rainey PB, van Nimwegen E.. 2014. Automated reconstruction of whole-genome phylogenies from short-sequence reads. Mol Biol Evol. 315:1077–1088. - PMC - PubMed

Publication types

LinkOut - more resources