Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2012 Apr 19;13 Suppl 6(Suppl 6):S4.
doi: 10.1186/1471-2105-13-S6-S4.

Biases in Read Coverage Demonstrated by Interlaboratory and Interplatform Comparison of 117 mRNA and Genome Sequencing Experiments

Affiliations
Free PMC article
Comparative Study

Biases in Read Coverage Demonstrated by Interlaboratory and Interplatform Comparison of 117 mRNA and Genome Sequencing Experiments

Ekaterina E Khrameeva et al. BMC Bioinformatics. .
Free PMC article

Abstract

High-throughput sequencing of whole genomes and transcriptomes allows one to generate large amounts of sequence data very rapidly and at a low cost. The goal of most mRNA sequencing studies is to perform the comparison of the expression level between different samples. However, given a broad variety of modern sequencing protocols, platforms and versions thereof, it is not clear to what extent the obtained results are consistent across platforms and laboratories. The comparison of 117 human mRNA and genome high-throughput sequencing experiments performed on the Illumina and SOLiD platforms at 26 institutions all over the world demonstrated high dependency of the gene coverage profiles on the producing laboratory. Gene coverage profiles showed laboratory-specific non-uniformity that survived the 3'-bias correction and mappability normalization, suggesting that there are other yet unknown mRNA-associated biases.

Figures

Figure 1
Figure 1
Correlation of per-nucleotide coverage profiles between all pairs of sequencing experiments. The heat map colors represent the Pearson correlation coefficients. Shades of blue correspond to the interval (-0.1, 0.2); shades of red correspond to (0.2, 1.0). Individual experiments are clustered by gene coverage. The labels contain the following information about experiments: SRA study ID; SRA experiment ID; institution short name; genome ("G") or transcriptome ("R") sequencing; platform ("I" stands for Illumina, "S" for SOLiD); fragment length if reads are paired; read length; individual ID, nationality, cell line and/or tissue. Additional information about experiments can be found in Additional file 6.
Figure 2
Figure 2
Distribution of Pearson's correlation coefficients in the same laboratory. Pearson's correlation coefficients were calculated between all possible pairs of different experiments of the same laboratory independently for each single-exon gene coverage profile and averaged by such genes.
Figure 3
Figure 3
Distribution of Pearson's correlation coefficients between laboratories. Pearson's correlation coefficients were calculated between all possible pairs of diffierent experiments between laboratories independently for each single-exon gene coverage profile and averaged by such genes.
Figure 4
Figure 4
Single-exon gene coverage distribution over gene length. The distribution was calculated after normalization for 5' - 3' coverage bias. Points of different color represent different experiments grouped by laboratory.

Similar articles

See all similar articles

Cited by 2 articles

References

    1. Kircher M, Kelso J. High-throughput DNA sequencing-concepts and limitations. Bioessays. 2010;32(6):524–536. doi: 10.1002/bies.200900181. - DOI - PubMed
    1. Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008;36(16) - PMC - PubMed
    1. Hansen KD, Brenner SE, Dudoit S. Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 2010;38(12) - PMC - PubMed
    1. Li J, Jiang H, Wong WH. Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol. 2010;11(5) - PMC - PubMed
    1. Heap GA, Yang JH, Downes K, Healy BC, Hunt KA, Bockett N, Franke L, Dubois PC, Mein CA, Dobson RJ, Albert TJ, Rodesch MJ, Clayton DG, Todd JA, van Heel DA, Plagnol V. Genome-wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing. Hum Mol Genet. 2010;19:122–134. doi: 10.1093/hmg/ddp473. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

Feedback