Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 40 (14), e107

Identification of High-Confidence Somatic Mutations in Whole Genome Sequence of Formalin-Fixed Breast Cancer Specimens


Identification of High-Confidence Somatic Mutations in Whole Genome Sequence of Formalin-Fixed Breast Cancer Specimens

Shawn E Yost et al. Nucleic Acids Res.


The utilization of archived, formalin-fixed paraffin-embedded (FFPE) tumor samples for massive parallel sequencing has been challenging due to DNA damage and contamination with normal stroma. Here, we perform whole genome sequencing of DNA isolated from two triple-negative breast cancer tumors archived for >11 years as 5 µm FFPE sections and matched germline DNA. The tumor samples show differing amounts of FFPE damaged DNA sequencing reads revealed as relatively high alignment mismatch rates enriched for C · G > T · A substitutions compared to germline samples. This increase in mismatch rate is observable with as few as one million reads, allowing for an upfront evaluation of the sample integrity before whole genome sequencing. By applying innovative quality filters incorporating global nucleotide mismatch rates and local mismatch rates, we present a method to identify high-confidence somatic mutations even in the presence of FFPE induced DNA damage. This results in a breast cancer mutational profile consistent with previous studies and revealing potentially important functional mutations. Our study demonstrates the feasibility of performing genome-wide deep sequencing analysis of FFPE archived tumors of limited sample size such as residual cancer after treatment or metastatic biopsies.


Figure 1.
Figure 1.
(A) Frequency of mismatches within sequencing reads for germline and FFPE tumor samples. The distribution of reads with 0, 1, 2 or ≥3 mismatches to the reference genome is shown for all sequencing data (All) and a random subset of 50 M, 5 M and 1 M sequencing reads. (B) Read based global nucleotide mismatch rate for all base substitutions. (C) Read based global nucleotide mismatch rate for each substitution type.
Figure 2.
Figure 2.
Distribution of substitution types for variants passing Filter 2.1 in germline (G) and FFPE tumor (T) samples and called homozygous alternate (Alt) or heterozygous (Het). Variants identified in public SNP repository (Known) or novel for both patients in this study (Novel) or passing in both germline and FFPE tumor samples (Paired) or only in one sample (Unique) are distinguished. The fraction of novel heterozygous variants (C·G > T·A) called between the tumor and germline samples of patient 02542 is substantially different.
Figure 3.
Figure 3.
Flow diagram describing the number of variants passing each filtering step for both patients 06408 (blue) and 02542 (red).
Figure 4.
Figure 4.
Filters 2.5 and 2.6 remove false positive somatic variants due to formalin fixation and other systematic and random errors in the process. Shown is the fraction of substitution types for somatic variants after Filter 2.4, after Filter 2.5 and after Filter 2.6 for 06408 and 02542 FFPE tumors. After Filter 2.6 the novel somatic variants of substitution type C·G > T·A called in 02542 tumor have a similar profile to that observed for novel germline variants in the matched sample (Figure 2).

Similar articles

See all similar articles

Cited by 37 PubMed Central articles

See all "Cited by" articles


    1. Clark MJ, Homer N, O’Connor BD, Chen Z, Eskin A, Lee H, Merriman B, Nelson SF. U87MG decoded: the genomic sequence of a cytogenetically aberrant human cancer cell line. PLoS Genet. 2010;6:e1000832. - PMC - PubMed
    1. Lee W, Jiang Z, Liu J, Haverty PM, Guan Y, Stinson J, Yue P, Zhang Y, Pant KP, Bhatt D, et al. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature. 2010;465:473–477. - PubMed
    1. Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford-Shore BH, McGrath S, Hickenbotham M, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature. 2008;456:66–72. - PMC - PubMed
    1. Pleasance ED, Stephens PJ, O’Meara S, McBride DJ, Meynert A, Jones D, Lin ML, Beare D, Lau KW, Greenman C, et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature. 2010;463:184–190. - PMC - PubMed
    1. Puente XS, Pinyol M, Quesada V, Conde L, Ordóñez GR, Villamor N, Escaramis G, Jares P, Beà S, González-Díaz M, et al. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature. 2011;475:101–105. - PMC - PubMed

Publication types