Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan 3;13:1.
doi: 10.1186/1471-2164-13-1.

Optimizing Illumina Next-Generation Sequencing Library Preparation for Extremely AT-biased Genomes

Free PMC article

Optimizing Illumina Next-Generation Sequencing Library Preparation for Extremely AT-biased Genomes

Samuel O Oyola et al. BMC Genomics. .
Free PMC article


Background: Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences.

Results: We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates.

Conclusion: We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material.


Figure 1
Figure 1
Screening for tolerance to an AT-rich template using conventional PCR amplification. Top panel: PCR amplification of a 540 bp locus (Pf3D7_11:1294982-1295521) with a relatively balanced (70% AT) base composition (positive control) in the presence or absence of TMAC. Bottom panel: PCR amplification of a 1217 bp locus (Pf3D7_01:55900-57116) with extreme AT content (84%) in the presence or absence of TMAC. M, 100 bp DNA ladder (NEB); (1) PWO master; (2) PWO master + TMAC; (3) PfuULTRA; (4) PfuULTRA + TMAC; (5) Kapa HiFi; (6) Kapa HiFi + TMAC; (7) AccuPrime Taq HiFi; (8) AccuPrime Taq HiFi + TMAC; (9) AccuPrime pfx SuperMix; (10) Phusion; (11) Phusion +TMAC; (12) Platinum HiFi; (13) Platinum HiFi + TMAC; (14) Platinum pfx; (15) Platinum pfx + TMAC, (16) Ex Taq; (17) Ex Taq + TMAC; (18) Kapa2G Robust; (19) Kapa2G Robust + TMAC.
Figure 2
Figure 2
A plot of genome coverage against normalized average depth. Duplicate data sets were normalized and pooled. Variance in coverage above and below the normalized average depth (red vertical line) across the genome is shown. Deviation of sample curves from the average depth indicates level of evenness in coverage depth distribution across the genome. The closer the sample curve is to the vertical line, the more even the coverage. The theoretical curve represents average normalized depth at 100% genome coverage. A) Coverage by libraries made from P. falciparum 3D7 (1 normalized depth represents 21×). B) Coverage by libraries made from clinical isolate, PK0076 (1 normalized depth represents 11×). Kapa HiFi, Kapa2G and Platinum pfx enzymes were used in the presence of TMAC.
Figure 3
Figure 3
GC profile analysis of sequenced data. The GC content distribution for different library preparation methods are shown alongside theoretical data for comparison. A) GC content analysis on libraries prepared from P. falciparum 3D7 with mapped reads normalized to 21× genome coverage. B) GC content of libraries prepared from a clinical isolate (PK0076) with mapped reads normalized to 11× genome coverage. Libraries with GC content above 19.4% (the GC content of the P. falciparum 3D7 reference genome) indicate amplification bias towards templates with neutral GC composition. C) Artemis [9,10] screen view of coverage (mapped reads normalized to 21× genome coverage) for a PCR-free library and four other libraries under test on P. falciparum 3D7 chromosome 1 (zoomed in to show coverage on the GC rich telomere). Kapa HiFi, Kapa2G and Platinum pfx enzymes were used in the presence of TMAC. See additional file 1, Figure S2 A & B for coverage on the entire chromosome 1 and AT-rich locus.
Figure 4
Figure 4
Box plots showing coverage analysis of P. falciparum chromosome 11. (i) P. falciparum 3D7; mapped reads normalized to 21× genome coverage (1 normalized depth represents 21×). (ii) Clinical isolate PK0076; mapped reads normalized to 11× genome coverage (1 normalized depth represents 11×). Subplots B, C and D in both i & ii show coverage of sub-regions of the P. falciparum 3D7 chromosome 11. A) Coverage depth variability plotted for each library on the entire chromosome. B) Distribution of base coverage depth for each library over gene Pf11_0074 and its neighboring introns. C) Distribution of base coverage depth at positions 259985-260864 (extreme AT-region). D) Distribution of base coverage depth at positions 29092-30361 (VAR gene and introns). Top and bottom sides of a box plot represent 75th and 25th percentile of base coverage-depth distribution respectively. The middle line represents 50th percentile. A narrow box indicates less variation in coverage depth across that locus and vice versa. Kapa HiFi, Kapa2G and Platinum pfx enzymes were used in the presence of TMAC. All P. falciparum 3D7and most clinical isolate libraries were prepared in duplicate and each replicate data plotted independently as shown.

Similar articles

See all similar articles

Cited by 91 articles

See all "Cited by" articles


    1. Doolittle RF. The grand assault. Nature. 2002;419(6906):493–494. doi: 10.1038/419493a. - DOI - PubMed
    1. Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S. et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002;419(6906):498–511. doi: 10.1038/nature01097. - DOI - PMC - PubMed
    1. Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008;36(16):e105. doi: 10.1093/nar/gkn425. - DOI - PMC - PubMed
    1. Aird D, Ross MG, Chen WS, Danielsson M, Fennell T, Russ C, Jaffe DB, Nusbaum C, Gnirke A. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 2011;12(2):R18. doi: 10.1186/gb-2011-12-2-r18. - DOI - PMC - PubMed
    1. Kieleczawa J, Mazaika E. Optimization of protocol for sequencing of difficult templates. J Biomol Tech. 2010;21(2):97–102. - PMC - PubMed

Publication types

MeSH terms


LinkOut - more resources