Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Case Reports
. 2019 Apr 23;10(1):1869.
doi: 10.1038/s41467-019-09637-5.

Sequencing of Human Genomes With Nanopore Technology

Affiliations
Free PMC article
Case Reports

Sequencing of Human Genomes With Nanopore Technology

Rory Bowden et al. Nat Commun. .
Free PMC article

Abstract

Whole-genome sequencing (WGS) is becoming widely used in clinical medicine in diagnostic contexts and to inform treatment choice. Here we evaluate the potential of the Oxford Nanopore Technologies (ONT) MinION long-read sequencer for routine WGS by sequencing the reference sample NA12878 and the genome of an individual with ataxia-pancytopenia syndrome and severe immune dysregulation. We develop and apply a novel reference panel-free analytical method to infer and then exploit phase information which improves single-nucleotide variant (SNV) calling performance from otherwise modest levels. In the clinical sample, we identify and directly phase two non-synonymous de novo variants in SAMD9L, (OMIM #159550) inferring that they lie on the same paternal haplotype. Whilst consensus SNV-calling error rates from ONT data remain substantially higher than those from short-read methods, we demonstrate the substantial benefits of analytical innovation. Ongoing improvements to base-calling and SNV-calling methodology must continue for nanopore sequencing to establish itself as a primary method for clinical WGS.

Conflict of interest statement

R.B., G.L. and D.B. have been members of the MinION access program (MAP) in connection with which they have received free-of-charge flow cells from Oxford Nanopore Technologies. The Wellcome Centre for Human Genetics has been a member of Oxford Nanopore’s PromethION early access program. R.W.D., A.H., G.L., M.A.S. and P.D. are or have been employees of Genomics plc. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Characteristics of sequencing using ONT for NA12878. a Yield per flow cell, with flow cells organized left to right by run date. The total size of the bar represents the number of reads from each flow cell and is split into the proportion of reads that have been mapped in a single alignment (single alignment), mapped in multiple alignments (multiple alignments), have been base-called, but not been mapped (unmapped) and reads that have not been base-called (not basecalled). b Average read length per flow cell. c Yield (base pairs) per flow cell. d Distribution of per-read substitution, insertion and deletion error rates in the high-quality read set. e Distribution of genomic coverage. f Proportion of sites with a read depth less than 40 binned by G + C content of the surrounding window. Shown are read depth in windows of size 100 bp and 6000 bp
Fig. 2
Fig. 2
Investigation of residual errors in NA12878 data. Annotation of called or truth SNPs using genomic features or sequencing context in NA12878. Results across columns give the different sets of SNPs, either pre or post-phasing, and for post-phasing, optionally all SNPs or those at high local depth ( > = 60× coverage). Results across rows give SNP classes of true positives, false positives and false negatives. Bars are broken horizontally to reflect multiple possible annotations, while vertical splits represent SNPs with multiple annotations. Annotations are: homopolymer, SNP intersects a homopolymer of length at least 5 bases; Coverage <40×, per-base coverage of less than 40×; 40%
Fig. 3
Fig. 3
Phasing clinical sample using ONT. Top of figure shows Illumina unphased genotypes for the mother (M I), father (F I) and proband (P I), as well as phased genotypes for the proband using ONT, at bi-allelic PASS SNVs identified by Illumina sequencing that have a heterozygote genotype in at least one member of the trio. Unphased genotypes are represented with triangles in boxes where blue = alt and orange = ref. Phased proband genotypes (P N) are represented by two rows of vertical bars, where each row is an arbitrarily labelled haplotype, and each bar is split by colour according to the probability of that haplotype having reference or alternate base. Middle of figure shows two rows with the reads for haplotype 1 or haplotype 2, where for each read, bases are rectangles, and read span is given by a horizontal line. Gaps represent either a gap (deletion), or a base that corresponds to neither the reference nor the alternate allele. Bottom shows physical position, with sites of interest in red. Note that some of the phase set containing the sites of interest extends another 150 kb distally but is not shown in the interests of clarity. Based on GRCh37 and NM_152703.3, 92761932 T > C corresponds to c.3353 A > G whilst 92764209 C > T is corresponds to c.1076 G > A

Similar articles

See all similar articles

Cited by 9 articles

See all "Cited by" articles

References

    1. Lander ES, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Venter JC, et al. The sequence of the human genome. Science. 2001;291:1304–1351. doi: 10.1126/science.1058040. - DOI - PubMed
    1. The Thousand Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. - DOI - PMC - PubMed
    1. Taylor JC, et al. Factors influencing success of clinical genome sequencing across a broad spectrum of disorders. Nat. Genet. 2015;47:717–726. doi: 10.1038/ng.3304. - DOI - PMC - PubMed
    1. Gordon D, et al. Long-read sequence assembly of the gorilla genome. Science. 2016;352:aae0344. doi: 10.1126/science.aae0344. - DOI - PMC - PubMed

MeSH terms

Substances

Supplementary concepts

Feedback