Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 4:7:e6222.
doi: 10.7717/peerj.6222. eCollection 2019.

LiBiNorm: an htseq-count analogue with improved normalisation of Smart-seq2 data and library preparation diagnostics

Affiliations

LiBiNorm: an htseq-count analogue with improved normalisation of Smart-seq2 data and library preparation diagnostics

Nigel P Dyer et al. PeerJ. .

Abstract

Protocols for preparing RNA sequencing (RNA-seq) libraries, most prominently "Smart-seq" variations, introduce global biases that can have a significant impact on the quantification of gene expression levels. This global bias can lead to drastic over- or under-representation of RNA in non-linear length-dependent fashion due to enzymatic reactions during cDNA production. It is currently not corrected by any RNA-seq software, which mostly focus on local bias in coverage along RNAs. This paper describes LiBiNorm, a simple command line program that mimics the popular htseq-count software and allows diagnostics, quantification, and global bias removal. LiBiNorm outputs gene expression data that has been normalized to correct for global bias introduced by the Smart-seq2 protocol. In addition, it produces data and several plots that allow insights into the experimental history underlying library preparation. The LiBiNorm package includes an R script that allows visualization of the main results. LiBiNorm is the first software application to correct for the global bias that is introduced by the Smart-seq2 protocol. It is freely downloadable at http://www2.warwick.ac.uk/fac/sci/lifesci/research/libinorm.

Keywords: Gene expression; Global bias; Normalization; RNA-seq; Smart-seq2.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. Example plots of read bias (SRA accession SRR1743160) produced with LiBiNorm.
(A) detected transcripts are aligned at 5′ and 3′ ends and ordered by length, shortest on top. Read density along RNAs is indicated by color intensity (the darker, the higher). (B) predicted bias for each model as a function of transcript length: bias relative to a linear length model. (C) comparison of negative log likelihood values (the lower the better the fit) for each of the six models with parameters determined for the SRR1743160 dataset. (D–G) estimated model parameter values d, h, t1 & t2, and a, respectively. See text for interpretation of parameters. (H) read coverages along transcripts aligned at 5′ and 3′ ends and separated into different length classes (colors). The experimental data and model fits are shown separately as solid and dashed lines (fit of model BD), respectively.
Figure 2
Figure 2. Evaluation of bias correction.
(A) scatter plot of gene expression values derived from RNA-seq using TruSeq (SRR1743167) and Smart-seq2 (SRR1743160) based on conventional (linear; equivalent to FPKM) TPM. (B) same as (A), but using LiBiNorm (Model BD) to calculate TPM for the Smart-seq2 sample, which improves the R2 compared to conventional TPM. Red dots mark genes with mRNA lengths between 10 and 10.1 kb in length, showing how the bias correction compensates for the underestimated expression levels of these genes. (C) change of R2 (%; y-axis) when systematically comparing gene expression for Smart-seq2 and TruSeq protocols compared to a linear TPM reference (x-axis). An average across the four TruSeq samples is plotted for each of the 14 Smart-seq2 samples for each of the software packages as indicated.

Similar articles

Cited by

  • Anti-bias training for (sc)RNA-seq: experimental and computational approaches to improve precision.
    Davies P, Jones M, Liu J, Hebenstreit D. Davies P, et al. Brief Bioinform. 2021 Nov 5;22(6):bbab148. doi: 10.1093/bib/bbab148. Brief Bioinform. 2021. PMID: 33959753 Free PMC article.
  • CYP19A1 mediates severe SARS-CoV-2 disease outcome in males.
    Stanelle-Bertram S, Beck S, Mounogou NK, Schaumburg B, Stoll F, Al Jawazneh A, Schmal Z, Bai T, Zickler M, Beythien G, Becker K, de la Roi M, Heinrich F, Schulz C, Sauter M, Krasemann S, Lange P, Heinemann A, van Riel D, Leijten L, Bauer L, van den Bosch TPP, Lopuhaä B, Busche T, Wibberg D, Schaudien D, Goldmann T, Lüttjohann A, Ruschinski J, Jania H, Müller Z, Pinho Dos Reis V, Krupp-Buzimkic V, Wolff M, Fallerini C, Baldassarri M, Furini S, Norwood K, Käufer C, Schützenmeister N, von Köckritz-Blickwede M, Schroeder M, Jarczak D, Nierhaus A, Welte T, Kluge S, McHardy AC, Sommer F, Kalinowski J, Krauss-Etschmann S, Richter F, von der Thüsen J, Baumgärtner W, Klingel K, Ondruschka B; GEN-COVID Multicenter Study Group; Renieri A, Gabriel G. Stanelle-Bertram S, et al. Cell Rep Med. 2023 Sep 19;4(9):101152. doi: 10.1016/j.xcrm.2023.101152. Epub 2023 Aug 12. Cell Rep Med. 2023. PMID: 37572667 Free PMC article.
  • RWP-RK Domain 3 (OsRKD3) induces somatic embryogenesis in black rice.
    Purwestri YA, Lee YS, Meehan C, Mose W, Susanto FA, Wijayanti P, Fauzia AN, Nuringtyas TR, Hussain N, Putra HL, Gutierrez-Marcos J. Purwestri YA, et al. BMC Plant Biol. 2023 Apr 19;23(1):202. doi: 10.1186/s12870-023-04220-z. BMC Plant Biol. 2023. PMID: 37076789 Free PMC article.
  • Accelerated aging of the brain transcriptome by the common chemotherapeutic doxorubicin.
    Cavalier AN, Clayton ZS, Hutton DA, Wahl D, Lark DS, Reisz JA, Melov S, Campisi J, Seals DR, LaRocca TJ. Cavalier AN, et al. Exp Gerontol. 2021 Sep;152:111451. doi: 10.1016/j.exger.2021.111451. Epub 2021 Jun 18. Exp Gerontol. 2021. PMID: 34147619 Free PMC article.
  • 3 '-5 ' crosstalk contributes to transcriptional bursting.
    Cavallaro M, Walsh MD, Jones M, Teahan J, Tiberi S, Finkenstädt B, Hebenstreit D. Cavallaro M, et al. Genome Biol. 2021 Feb 4;22(1):56. doi: 10.1186/s13059-020-02227-5. Genome Biol. 2021. PMID: 33541397 Free PMC article.

References

    1. Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31(2):166–169. doi: 10.1093/bioinformatics/btu638. - DOI - PMC - PubMed
    1. Archer N, Walsh MD, Shahrezaei V, Hebenstreit D. Modeling enzyme processivity reveals that RNA-Seq libraries are biased in characteristic and correctable ways. Cell Systems. 2016;3(5):467–479. doi: 10.1016/j.cels.2016.10.012. e412. - DOI - PMC - PubMed
    1. Combs PA, Eisen MB. Low-cost, low-input RNA-seq protocols perform nearly as well as high-input protocols. PeerJ. 2015;3:e869. doi: 10.7717/peerj.869. - DOI - PMC - PubMed
    1. CSHL Synthesis of complementary DNA. Nature Methods. 2005;2(2):151–152. doi: 10.1038/nmeth0205-151. - DOI
    1. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21. doi: 10.1093/bioinformatics/bts635. - DOI - PMC - PubMed

Grants and funding

This work has been supported by BBSRC research grants BB/L006340/1 and BB/M017982/1. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources