Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct;36(10):931-40.
doi: 10.1002/humu.22851. Epub 2015 Aug 31.

PhenomeCentral: A Portal for Phenotypic and Genotypic Matchmaking of Patients With Rare Genetic Diseases

Affiliations
Free PMC article

PhenomeCentral: A Portal for Phenotypic and Genotypic Matchmaking of Patients With Rare Genetic Diseases

Orion J Buske et al. Hum Mutat. .
Free PMC article

Abstract

The discovery of disease-causing mutations typically requires confirmation of the variant or gene in multiple unrelated individuals, and a large number of rare genetic diseases remain unsolved due to difficulty identifying second families. To enable the secure sharing of case records by clinicians and rare disease scientists, we have developed the PhenomeCentral portal (https://phenomecentral.org). Each record includes a phenotypic description and relevant genetic information (exome or candidate genes). PhenomeCentral identifies similar patients in the database based on semantic similarity between clinical features, automatically prioritized genes from whole-exome data, and candidate genes entered by the users, enabling both hypothesis-free and hypothesis-driven matchmaking. Users can then contact other submitters to follow up on promising matches. PhenomeCentral incorporates data for over 1,000 patients with rare genetic diseases, contributed by the FORGE and Care4Rare Canada projects, the US NIH Undiagnosed Diseases Program, the EU Neuromics and ANDDIrare projects, as well as numerous independent clinicians and scientists. Though the majority of these records have associated exome data, most lack a molecular diagnosis. PhenomeCentral has already been used to identify causative mutations for several patients, and its ability to find matching patients and diagnose these diseases will grow with each additional patient that is entered.

Keywords: HPO; Matchmaker Exchange; deep phenotyping; patient matchmaking; semantic similarity.

Conflict of interest statement

Disclosure statement: The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Finding similar patients in PhenomeCentral. Patient data can be contributed to PhenomeCentral through the PhenoTips user interface, including the phenotype quick search box that enables rapid entry of phenotype terms from the HPO (A), or selected records can be automatically deidentified and transferred from any PhenoTips instance automatically. The patient record can contain both present and absent phenotypic features (B) as well as genetic information, including candidate genes and VCF files (C). The patient’s features are then immediately compared with all other patients in PhenomeCentral (D), and the best matches are shown to the user. A detailed breakdown of the phenotypic (E) and genotypic (F) similarity is shown for each match, enabling the user to see the underlying reasons for the match and determine whether or not the match is worth following up. A customizable email template (G) facilitates contacting the (potentially undisclosed) submitter of another patient record.
Figure 2
Figure 2
A: The number of patient records (red solid line) and user accounts (blue dashed line) on PhenomeCentral overtime. B: The locations of PhenomeCentral users, estimated from the domain name of institutional email addresses associated with user accounts. The approximate region was identified by querying freegeoip.net with the IP address associated with the domain name of each email address. One point is plotted per domain name, with the color corresponding to the number of users with that domain (the darker the color, the more users with email addresses on that domain).
Figure 3
Figure 3
A: Comparison of the performance of 13 semantic similarity measures at finding similar patients in real PhenomeCentral cases (N = 720; panels 1 and 2) and simulated cases (N = 1,000; panels 3–6). For simulated patients, either all disease-associated phenotype terms were selected (panels 3 and 4), or five terms were randomly selected (panels 5 and 6). Noise (40% additional random phenotype terms) and imprecision (replacing terms with a random ancestor) were then introduced (panels 2, 4, 6). Cases were considered similar if they were sampled from the same OMIM disease in simulated cases, and if they shared a candidate gene, shared a diagnosis, or were submitted as part of the same cohort for real cases. To control for variable cohort size (2–12 for simulated cases, 2–14 for real cases), two cases were randomly selected from each cohort for each of 10 iterations. The performance of each measure is the fraction of cases for which the matching case was ranked within the top one (red/dark) or five (blue/light) most similar cases. The box extends from the first to third data quartile, with whiskers extending to the most extreme data point at most 1.5 times the interquartile range away from the box. Measures were ordered by mean top-five performance across all experiments. B: Comparison of the performance of six methods at prioritizing causal and candidate genes in 112 cases from PhenomeCentral (top; panel 7) and 1,000 simulated cases with noise and imprecision introduced (bottom; panel 8; same parameters as panel 4). As a baseline method, the Exomiser was run on each case individually and genes ordered by their PHIVE score. This was compared with five methods that first identify the most phenotypically similar patents (using the PhenoDigm score), and then score genes separately for each match. The performance of each method was measured in two ways: the fraction of cases where one of the causal or candidate genes was ranked as the top gene for the most similar patient (red/dark) or among the top five genes (blue/light; either the top gene for one of the four most similar patients or the top gene from the Exomiser directly). C: Execution time of each class of similarity measure on the 1,000 simulated cases (N = 499,500 pairwise comparisons). Measures were implemented in Python using memoization and executed on a single thread of a 32-core Intel Xeon 2.70GHz CPU.
Figure 4
Figure 4
Two validated PhenomeCentral matches, each showing a breakdown of the phenotypic similarity between the two patients on the left, and the genotypic similarity between the two patients on the right. The phenotypes are grouped via a greedy iterative process. In each iteration, the most informative common ancestor is found and all descendants of that term in each patient are removed and displayed as a group. A: The match between two patients with EFTUD2 mutations, where only one was classified as having mandibulofacial MFDM at the outset (the other was described as “CHARGE-like”). B: The match between two patients with STIM1 mutations, subsequently diagnosed with York Platelet syndrome.

Similar articles

See all similar articles

Cited by 39 articles

See all "Cited by" articles

Publication types

Feedback