Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May 10;17:342.
doi: 10.1186/s12864-016-2670-x.

Next-generation Sequencing-Based Detection of Germline L1-mediated Transductions

Affiliations
Free PMC article

Next-generation Sequencing-Based Detection of Germline L1-mediated Transductions

Jelena Tica et al. BMC Genomics. .
Free PMC article

Abstract

Background: While active LINE-1 (L1) elements possess the ability to mobilize flanking sequences to different genomic loci through a process termed transduction influencing genomic content and structure, an approach for detecting polymorphic germline non-reference transductions in massively-parallel sequencing data has been lacking.

Results: Here we present the computational approach TIGER (Transduction Inference in GERmline genomes), enabling the discovery of non-reference L1-mediated transductions by combining L1 discovery with detection of unique insertion sequences and detailed characterization of insertion sites. We employed TIGER to characterize polymorphic transductions in fifteen genomes from non-human primate species (chimpanzee, orangutan and rhesus macaque), as well as in a human genome. We achieved high accuracy as confirmed by PCR and two single molecule DNA sequencing techniques, and uncovered differences in relative rates of transduction between primate species.

Conclusions: By enabling detection of polymorphic transductions, TIGER makes this form of relevant structural variation amenable for population and personal genome analysis.

Keywords: Bioinformatics; Genetics; Genome; L1; NGS; Primates; Retrotransposon; Single-molecule sequencing; Transductions.

Figures

Fig. 1
Fig. 1
TIGER approach. a L1-mediated transduction insertions are typically composed of flanking target site duplications (TSDs, purple triangles), L1 sequence and unique transduction sequence (TS) followed by a non-reference polyA tail. To detect such events in paired-end NGS data, candidate regions are chosen based on an overlap between L1 insertion loci, paired-ends indicative for an insertion of unique sequence copied from a distal locus (as evident from translocation (TL) supporting read pairs), and remapped single-anchored (SA) reads in the reference genome. b A combination of reads indicative for L1 insertion as well as unique duplicative sequence insertion and additionally single-anchored reads are used to discover L1-mediated transduction insertions. TL and SA read pairs are realigned to ensure correct placement onto the reference genome. Additional filtering steps are implemented for removal of low-confidence calls
Fig. 2
Fig. 2
Computational analysis of the chr7:6620368-6620628 insertion into the chr10:54643580-54643593 region in the chimpanzee sample PR01171. a Depiction of the chr10:54643580-54643593 region using the Integrative Genomics Viewer (IGV) [57] before read realignment (upper panel). After realignment using BLAT many initially single-anchored reads were placed correctly, facilitating the ascertainment of this L1-mediated transduction clustering to a region on the source chromosome 7 with an average uniqueness of 1 (reads mapping exactly once to the reference genome). b A detailed view of L1-mediated 3′ transduction read placements: one read is shown to map to the target locus on chromosome 10 and the other read (mate of the pair) maps either to a non-reference L1 element (displayed on the top panel) or forms a cluster of reads uniquely mapping to the source on chromosome 7 (displayed on the lower panel). Out of 29 reads, 7 were carrying parts of a non-reference polyA tail (only subset of reads shown)
Fig. 3
Fig. 3
Experimental verification of TIGER-based L1-mediated 3′ transductions by PCR. a General primer design: outer (grey arrows) primers were placed outside of the event in the target locus to amplify the L1-mediated sequence transduction insertion allele and/or the reference genome allele. On the left side of the locus, the corresponding sequence (dotted line) uniquely matches the target site, and subsequently matches to multiple positions in the genome in line with the presence of an L1 element. Further to the right, the sequence will also match uniquely to the target site and end with a polyA stretch not seen in the reference genome. In order to confirm the presence and origin of the transduced sequence (source locus), we employed a 2nd set of primers (purple arrows) inside the predicted unique transduction sequence. b Example PCRs verifying rhesus macaque L1-mediated sequence transductions, based on outer primers, are shown for inferred carrier (C) and non-carrier (NC) samples. In the presence of an L1-mediated transduction sequence insertion, a larger band than the reference band in NC is seen; heterozygotes show both bands whereas homozygous L1-mediated sequence transduction insertions show only one (i.e. the higher) band. c A Circos plot shows the distribution for all inferred rhesus macaque L1-mediated sequence transductions (for orangutan and chimpanzee, see Additional file 1: Figure S6); experimentally validated insertions by PCR and MinION single molecule sequencing are depicted in green. Arrowheads indicate directionality towards the target locus
Fig. 4
Fig. 4
Pacific Biosciences (a) and Oxford Nanopore MinION (b) long read verification of L1-mediated transduction insertions. a Left panel: alignment dotplot – surrounding reference genome sequence for the human chr4:104210671-104214687 region shown on the x-axis; PacBio read on the y-axis: ~1000 bp shift shows presence of insertion. Right panel: Inspection of the inserted sequence verified the presence of the L1 element (in blue) and the transduced sequence including the new polyA tail (in red; based on the consensus sequence created from all PacBio reads); the new polyadenylation signal is underlined. b Dotplot – with reference genome sequence on the x-axis and MinION read on the y-axis: ~1200 bp shift shows presence of an insertion. The inserted sequence verified both the presence of an L1 element (in blue) and additional transduced sequence including the new polyA tail (in red; based on the consensus sequence created from subset of MinION reads). c Alignment of the inserted L1 sequence to the ~6 kb long L1 consensus sequence shows that the integrated L1 is 5′-truncated (pairwise-alignment performed with BLAST)
Fig. 5
Fig. 5
L1 subfamilies associated with L1-mediated transductions: P values are based on Fisher’s exact test per subfamily using 2 × 2 contingency tables

Similar articles

See all similar articles

Cited by 4 articles

References

    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Chimpanzee Sequencing Analysis Consortium Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005;437:69–87. doi: 10.1038/nature04072. - DOI - PubMed
    1. Rhesus Macaque Genome Sequencing Analysis Consortium Evolutionary and biomedical insights from the rhesus macaque genome. Science. 2007;316:222–34. doi: 10.1126/science.1139247. - DOI - PubMed
    1. Locke DP, Hillier LW, Warren WC, Worley KC, Nazareth LV, Muzny DM, Yang SP, Wang Z, Chinwalla AT, Minx P, et al. Comparative and demographic analysis of orang-utan genomes. Nature. 2011;469:529–33. doi: 10.1038/nature09687. - DOI - PMC - PubMed
    1. Xing J, Zhang Y, Han K, Salem AH, Sen SK, Huff CD, Zhou Q, Kirkness EF, Levy S, Batzer MA, Jorde LB. Mobile elements create structural variation: analysis of a complete human genome. Genome Res. 2009;19:1516–26. doi: 10.1101/gr.091827.109. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

Feedback