Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(5):e19838.
doi: 10.1371/journal.pone.0019838. Epub 2011 May 13.

The Sensitivity of Massively Parallel Sequencing for Detecting Candidate Infectious Agents Associated With Human Tissue

Free PMC article

The Sensitivity of Massively Parallel Sequencing for Detecting Candidate Infectious Agents Associated With Human Tissue

Richard A Moore et al. PLoS One. .
Free PMC article


Massively parallel sequencing technology now provides the opportunity to sample the transcriptome of a given tissue comprehensively. Transcripts at only a few copies per cell are readily detectable, allowing the discovery of low abundance viral and bacterial transcripts in human tissue samples. Here we describe an approach for mining large sequence data sets for the presence of microbial sequences. Further, we demonstrate the sensitivity of this approach by sequencing human RNA-seq libraries spiked with decreasing amounts of an RNA-virus. At a modest depth of sequencing, viral transcripts can be detected at frequencies less than 1 in 1,000,000. With current sequencing platforms approaching outputs of one billion reads per run, this is a highly sensitive method for detecting putative infectious agents associated with human tissues.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.


Figure 1
Figure 1. Flow chart of subtraction methodology.
Paired end reads from a human sequence library are first filtered to remove low quality reads;  = 20 nt homopolymers), and reads comprised of artifactual adapter or primer sequences. Then, using BWA , read pairs are aligned to databases of human ribosomal sequences, transcript sequences, and genomic sequences. Remaining reads are then aligned sequentially to the genome, transcriptome and human rRNA using BWA . Reads that remain unaligned after comparison to the various human sequence databases are then aligned to a custom database (IAdb) of all known viral and bacterial complete genome sequences, using Novoalign (, with the requirement of correct pairing logic. Although not considered here, remaining reads can be characterized further by de novo assembly.
Figure 2
Figure 2. Circos plot detailing HaRNAV sequence recovery.
The red and blue lines represent reads aligning on the minus and plus strand, respectively. The Heterosigma akashiwo RNA virus has an 8,587 bp ss-RNA linear genome with a single CDS, shown in green on the circos plot. The read depth of coverage is shown in the centre of the plot. The genome is depicted by alternating black-white arcs of 500 bp in size.

Similar articles

See all similar articles

Cited by 37 articles

See all "Cited by" articles


    1. Pagano JS, Blaser M, Buendia MA, Damania B, Khalili K, et al. Infectious agents and cancer: criteria for a causal relation. Semin Cancer Biol. 2004;14:453–471. - PubMed
    1. Parkin DM. The global health burden of infection-associated cancers in the year 2002. Int J Cancer. 2006;118:3030–3044. - PubMed
    1. Feng H, Shuda M, Chang Y, Moore PS. Clonal integration of a polyomavirus in human Merkel cell carcinoma. Science. 2008;319:1096–1100. - PMC - PubMed
    1. Palacios G, Druce J, Du L, Tran T, Birch C, et al. A new arenavirus in a cluster of fatal transplant-associated diseases. N Engl J Med. 2008;358:991–998. - PubMed
    1. Tai V, Lawrence JE, Lang AS, Chan AM, Culley AI, et al. Characterization of HaRNAV, a single-stranded RNA virus causing lysis of Heterosigma akashiwo (Raphidophyceae). J Phycol. 2003;39:343–352.

Publication types

LinkOut - more resources