Determination of Disease Phenotypes and Pathogenic Variants From Exome Sequence Data in the CAGI 4 Gene Panel Challenge

Hum Mutat. 2017 Sep;38(9):1201-1216. doi: 10.1002/humu.23249. Epub 2017 Jun 27.


The use of gene panel sequence for diagnostic and prognostic testing is now widespread, but there are so far few objective tests of methods to interpret these data. We describe the design and implementation of a gene panel sequencing data analysis pipeline (VarP) and its assessment in a CAGI4 community experiment. The method was applied to clinical gene panel sequencing data of 106 patients, with the goal of determining which of 14 disease classes each patient has and the corresponding causative variant(s). The disease class was correctly identified for 36 cases, including 10 where the original clinical pipeline did not find causative variants. For a further seven cases, we found strong evidence of an alternative disease to that tested. Many of the potentially causative variants are missense, with no previous association with disease, and these proved the hardest to correctly assign pathogenicity or otherwise. Post analysis showed that three-dimensional structure data could have helped for up to half of these cases. Over-reliance on HGMD annotation led to a number of incorrect disease assignments. We used a largely ad hoc method to assign probabilities of pathogenicity for each variant, and there is much work still to be done in this area.

Keywords: CAGI; VarP analysis pipeline; gene panel sequencing; missense mutations; monogenic disease.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology
  • Databases, Genetic
  • Disease / classification*
  • Disease / genetics
  • Genetic Predisposition to Disease
  • Genetic Variation*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Models, Molecular
  • Mutation, Missense
  • Phenotype
  • Proteins / chemistry
  • Proteins / genetics
  • Whole Exome Sequencing / methods*


  • Proteins