Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr;21(4):972-981.
doi: 10.1038/s41436-018-0278-z. Epub 2018 Oct 5.

Standard Operating Procedure for Somatic Variant Refinement of Sequencing Data With Paired Tumor and Normal Samples

Affiliations
Free PMC article

Standard Operating Procedure for Somatic Variant Refinement of Sequencing Data With Paired Tumor and Normal Samples

Erica K Barnell et al. Genet Med. .
Free PMC article

Abstract

Purpose: Following automated variant calling, manual review of aligned read sequences is required to identify a high-quality list of somatic variants. Despite widespread use in analyzing sequence data, methods to standardize manual review have not been described, resulting in high inter- and intralab variability.

Methods: This manual review standard operating procedure (SOP) consists of methods to annotate variants with four different calls and 19 tags. The calls indicate a reviewer's confidence in each variant and the tags indicate commonly observed sequencing patterns and artifacts that inform the manual review call. Four individuals were asked to classify variants prior to, and after, reading the SOP and accuracy was assessed by comparing reviewer calls with orthogonal validation sequencing.

Results: After reading the SOP, average accuracy in somatic variant identification increased by 16.7% (p value = 0.0298) and average interreviewer agreement increased by 12.7% (p value < 0.001). Manual review conducted after reading the SOP did not significantly increase reviewer time.

Conclusion: This SOP supports and enhances manual somatic variant detection by improving reviewer accuracy while reducing the interreviewer variability for variant calling and annotation.

Keywords: manual review; somatic variant refinement.

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Fig. 1
Fig. 1
Example of the Integrative Genomics Viewer (IGV) interface with associated features relevant to manual review. The IGV interface is divided into three parts. The Genome Ruler details information about the genome assembly being visualized (Reference Genome), the coordinates currently being visualized (Variant Coordinates), and other navigation/display controls (e.g., Popup Text Behavior, Zoom In and Out, etc.). In this example, a portion of human chromosome 1 (build 37) is shown. The central section of IGV displays Data Tracks. In this case, short read DNA alignment data (e.g., BAM files) are shown for normal and tumor samples and are colored by read strand. Mismatches with the reference genome are highlighted by base: adenine (green), cytosine (blue), guanine (orange), and thymine (red). Coverage tracks summarize the total read depth at each base position. The Genome Features section shows the reference sequence itself, the amino acids for the three possible reading frames, and the gene associated with this locus (PTCHD2 in this example). The default gene track available with IGV is shown (RefSeq). Many other data formats and sources can be loaded as data tracks or genome features.
Fig. 2
Fig. 2
Example of the Integrative Genomics Viewer Navigator (IGVNav) interface, associated features, and input/output files. a IGVNav is a simple plugin for IGV that provides a separate application window for recording results of manual review. The 1-Base? button can be selected for 1-base input files (default is 0-base). The “S” button will sort the read sequences in the data tracks so that mismatches appear at the top. The navigation bar displays variant information and allows for movement between variants. The Call, Tags, and Notes sections allow manual reviewers to annotate variants (Table 1), which is reflected in the output file. The Save button is used to update the output file. b An IGVNav input file consists of a header line and data for the first five columns (chromosome [chr], start coordinate [start], stop coordinate [stop], reference allele [ref], and variant allele [var]). Each line represents a variant that will be individually visualized using IGV. c During manual review, the input file is updated by clicking on the Save button. This will print the call, tags, and notes associated with individual variants to the original input file.
Fig. 3
Fig. 3
Step-by-step instructions for setting up and executing somatic variant refinement via manual review. a Method for setting up Integrative Genomics Viewer (IGV) and Integrative Genomics Viewer Navigator (IGVNav) for manual review. b Method for analyzing each variant during manual review.
Fig. 4
Fig. 4
Validation of the manual review standard operating procedure (SOP). a Sequencing data from an acute myeloid leukemia (AML) case was used to test the impact of the SOP on accurately identifying somatic variants. A total of 300 variants that had genome sequencing and orthogonal sequencing were identified for the experiment. Four novice reviewers assessed 200 variants prior to and after reading the SOP to determine improvement in accuracy, reduction in interreviewer variability, change in reviewer time per variant, and appropriate use of tags. b Reviewer accuracy was assessed before and after reading the SOP. The bar plot shows accuracy stratified by reviewer and the box plot shows the reviewers’ cumulative median accuracy. c Box plot showing the median interreviewer agreement before and after reading the SOP. Agreement for each variant was calculated by assessing the correlation between the four reviewer calls using a correlation matrix as described in the Methods. d Box plot showing the median time required to conduct manual review before and after reading the SOP. e Frequency diagram showing the number of reviewers that correctly annotated false positive variants with gold standard tags, parsed by tag. AI Adjacent Indel, D Directional, DN Dinucleotide repeat, E End of reads, HDR High Discrepancy Region, LM Low Mapping, LVF Low Variant Frequency, MM Multiple Mismatches, MN Mononucleotide repeat, MV Multiple Variants, SSE Same Start End, TN Tumor in Normal, TR Tandem Repeat.

Similar articles

See all similar articles

Cited by 12 articles

See all "Cited by" articles

References

    1. Griffith M, Miller CA, Griffith OL, Krysiak K, Skidmore ZL, Ramu A, et al. Optimizing cancer genome sequencing and analysis. Cell Syst. 2015;1:210–223. doi: 10.1016/j.cels.2015.08.015. - DOI - PMC - PubMed
    1. Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589–595. doi: 10.1093/bioinformatics/btp698. - DOI - PMC - PubMed
    1. Broad Institute. Picard tools. http://broadinstitute.github.io/picard/. Accessed 28 June 2018.
    1. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol. 2013;31:213–219. doi: 10.1038/nbt.2514. - DOI - PMC - PubMed
    1. Larson DE, Harris CC, Chen K, Koboldt DC, Abbott TE, Dooling DJ, et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics. 2012;28:311–317. doi: 10.1093/bioinformatics/btr665. - DOI - PMC - PubMed

Publication types

Feedback