Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 14;3(2):vex030.
doi: 10.1093/ve/vex030. eCollection 2017 Jul.

On the effective depth of viral sequence data

Affiliations

On the effective depth of viral sequence data

Christopher J R Illingworth et al. Virus Evol. .

Abstract

Genome sequence data are of great value in describing evolutionary processes in viral populations. However, in such studies, the extent to which data accurately describes the viral population is a matter of importance. Multiple factors may influence the accuracy of a dataset, including the quantity and nature of the sample collected, and the subsequent steps in viral processing. To investigate this phenomenon, we sequenced replica datasets spanning a range of viruses, and in which the point at which samples were split was different in each case, from a dataset in which independent samples were collected from a single patient to another in which all processing steps up to sequencing were applied to a single sample before splitting the sample and sequencing each replicate. We conclude that neither a high read depth nor a high template number in a sample guarantee the precision of a dataset. Measures of consistency calculated from within a single biological sample may also be insufficient; distortion of the composition of a population by the experimental procedure or genuine within-host diversity between samples may each affect the results. Where it is possible, data from replicate samples should be collected to validate the consistency of short-read sequence data.

Keywords: evolutionary modelling; population genetics; sequence data.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Pathways via which a short-read dataset may not accurately represent a within-host population. A viral sample collected from a patient may contain a population of viruses that do not fully represent the genetic diversity of the population within the host. Further, when sequenced, the output data may provide a distorted view of the material contained within the sample.
Figure 2.
Figure 2.
(A) Effective read depths given a sample of a finite number of viral particles and a constant read depth in sequencing of 104 at each site. Black dots show harmonic mean effective read depths across a set of 100 sets of simulations, while error bars show 95% high and low ranges for this statistic. At low particle numbers, the inferred values are close to the red dashed line, which indicates equality between the effective read depth and the number of viral particles in the sample. At high particle numbers, the inferred values are close to the blue dashed line, which indicates equality between the effective, and absolute, read depths. More than 2 × 105 particles were required to get a mean effective depth within 95% of the actual read depth. (B) Effective read depths given a sample of a fixed number of viral particles and a range of depths of sequencing.
Figure 3.
Figure 3.
(A) Outline sequencing approach. In the standard protocol, cDNA was synthesised from RNA extracts prior to library preparation. Points at which the splitting of replicates occurred for different datasets in the study are marked. In the HSV1 set, independent replicate samples were collected. The HIV extracts were split with one aliquot processed by the standard cDNA synthesis/SureSelect method and the other processed with a SureSelect RNA sequencing approach. Depletion of host genomic DNA prior to cDNA synthesis was performed on aliquots of the HCV01 extracts. In the Noro set, replicates were split before the second round of PCR amplification during the library preparation. In the HCV02 set, the final library was split, then sequenced on two independent MiSeq runs. (B) Mean absolute (black) and effective (white) read depths for replicates within four sets of samples that have been repeatedly sequenced according to different protocols. Labels below each set of depths indicate the dataset; labels above each set indicate the mode of difference between replicates. In our approach, the two methods of processing replicate HIV samples produced inconsistent results, leading to low effective depths.
Figure 4.
Figure 4.
Allele frequencies derived for representative sets of viral samples. Data are shown for samples from the HIV, Noro, HCV01, and HCV02 sets, respectively. The frequency shown is that of the minority allele in the first replicate. The red dashed line shows perfect agreement between frequencies.
Figure 5.
Figure 5.
Mean absolute (black) and effective (white) read depths for replicates within four sets of samples that have been repeatedly sequenced according to different protocols. Labels below each set of depths indicate the dataset; labels above each set indicate the mode of difference between replicates. Codes refer to viruses collected via nasal wash (NW), nasal turbinate (NT), bronchaeolar lavage (BL), and from the soft palate (SP), right lung (RL), left lung (LL) and trachea (TR) of an animal.

Similar articles

Cited by

References

    1. Acevedo A., Brodsky L., Andino R. (2014) ‘Mutational and fitness landscapes of an RNA virus revealed through population sequencing’, Nature, 505: 686–690. - PMC - PubMed
    1. Ait-Khaled M. et al. (1995) ‘Distinct HIV-1 long terminal repeat quasispecies present in nervous tissues compared to that in lung, blood and lymphoid tissues of an AIDS patient’, AIDS, 9/7: 675–683. - PubMed
    1. Archer J. et al. (2010) ‘The evolutionary analysis of emerging low frequency HIV-1 CXCR4 using variants through time? An ultra-deep approach’, PLoS Computational Biology, 6/12: e1001022–11. - PMC - PubMed
    1. Bedford T. et al. (2010) ‘Global migration dynamics underlie evolution and persistence of human influenza A (H3N2)’, PLoS Pathogens, 6/5: e1000918. - PMC - PubMed
    1. Beerenwinkel N., Zagordi O. (2011) ‘Ultra-deep sequencing for the analysis of viral populations’, Current Opinion Virology, 1/5: 413–418. - PubMed

LinkOut - more resources