Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May;19(3):417-426.
doi: 10.1016/j.jmoldx.2016.12.001. Epub 2017 Mar 18.

Principles and Recommendations for Standardizing the Use of the Next-Generation Sequencing Variant File in Clinical Settings

Affiliations
Free PMC article

Principles and Recommendations for Standardizing the Use of the Next-Generation Sequencing Variant File in Clinical Settings

Ira M Lubin et al. J Mol Diagn. .
Free PMC article

Abstract

A national workgroup convened by the Centers for Disease Control and Prevention identified principles and made recommendations for standardizing the description of sequence data contained within the variant file generated during the course of clinical next-generation sequence analysis for diagnosing human heritable conditions. The specifications for variant files were initially developed to be flexible with regard to content representation to support a variety of research applications. This flexibility permits variation with regard to how sequence findings are described and this depends, in part, on the conventions used. For clinical laboratory testing, this poses a problem because these differences can compromise the capability to compare sequence findings among laboratories to confirm results and to query databases to identify clinically relevant variants. To provide for a more consistent representation of sequence findings described within variant files, the workgroup made several recommendations that considered alignment to a common reference sequence, variant caller settings, use of genomic coordinates, and gene and variant naming conventions. These recommendations were considered with regard to the existing variant file specifications presently used in the clinical setting. Adoption of these recommendations is anticipated to reduce the potential for ambiguity in describing sequence findings and facilitate the sharing of genomic data among clinical laboratories and other entities.

Figures

Figure 1
Figure 1
Next-generation sequencing workflow and associated data files (designated with a dashed-line box). Machine sequencing of the patient sample produces a large number of short reads deposited in a file with associated quality scores (eg, FASTQ). These reads are aligned to a reference assembly or sequence and the results are deposited in an alignment file (eg, BAM). Variants are called and their properties relevant to the sequence (eg, type of variant) are annotated and deposited in the variant file [eg, variant call format (VCF)]. The data in the variant file are further analyzed to determine what findings are clinically relevant and reportable to the physician to inform medical decision making.
Figure 2
Figure 2
Variant representation in three common variant file specifications. A: The two variants are listed from sample NA12878 from the 1000Genomes database. Variant 1 is a deletion of A, and variant 2 is a substitution of A to G. The dbSNP identifiers, chromosome number, nucleotide change, and the predicted effect are shown. The Human Genome Variation Society nomenclature for the change is shown relative to the genomic DNA, the mRNA, and protein RefSeq sequences. B: Contrast the differences among the variant file specifications for each of the two variants. The genome variant call format (gVCF) includes the invariant regions, not typically reported by the VCF. The genome variation format (GVF) includes additional annotation of the effect of the variant on the reference annotated features. SNV, single-nucleotide variants.
Figure 3
Figure 3
Origin of genomic coordinates. Genomic coordinates of sequence contained within the variant file are made in reference to a genomic build/reference and assigned based on a 5′ to 3′ numbering of the positive strand. Genomic coordinates can change during major updates to the reference assembly, as illustrated in comparing GRCh37 to GRCh38.
Figure 4
Figure 4
Phasing data are required to establish the cis or trans relationship of two nonadjacent variants (bold and underlined). A: The true diplotype describing an individual having one allele with two nonadjacent variants that are not found in the same allele of the homologous chromosome. B: Output from a variant caller. C: Interpretation of the output of the variant caller. In the presence of phasing data, the correct haplotypes are established. In the absence of phasing data, the cis or trans association of the two variants cannot be distinguished.

Similar articles

See all similar articles

Cited by 4 articles

Publication types

Feedback