A likelihood ratio-based method to predict exact pedigrees for complex families from next-generation sequencing data

Verena Heinrich; Tom Kamphans; Stefan Mundlos; Peter N Robinson; Peter M Krawitz

doi:10.1093/bioinformatics/btw550

A likelihood ratio-based method to predict exact pedigrees for complex families from next-generation sequencing data

Bioinformatics. 2017 Jan 1;33(1):72-78. doi: 10.1093/bioinformatics/btw550. Epub 2016 Aug 26.

Authors

Verena Heinrich^{1

2}, Tom Kamphans³, Stefan Mundlos^{1

2}, Peter N Robinson², Peter M Krawitz²

Affiliations

¹ Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin.
² Institute for Medical Genetics and Human Genetics, Charité Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin.
³ Smart Algos, Berlin, Germany.

Abstract

Motivation: Next generation sequencing technology considerably changed the way we screen for pathogenic mutations in rare Mendelian disorders. However, the identification of the disease-causing mutation amongst thousands of variants of partly unknown relevance is still challenging and efficient techniques that reduce the genomic search space play a decisive role. Often segregation- or linkage analysis are used to prioritize candidates, however, these approaches require correct information about the degree of relationship among the sequenced samples. For quality assurance an automated control of pedigree structures and sample assignment is therefore highly desirable in order to detect label mix-ups that might otherwise corrupt downstream analysis.

Results: We developed an algorithm based on likelihood ratios that discriminates between different classes of relationship for an arbitrary number of genotyped samples. By identifying the most likely class we are able to reconstruct entire pedigrees iteratively, even for highly consanguineous families. We tested our approach on exome data of different sequencing studies and achieved high precision for all pedigree predictions. By analyzing the precision for varying degrees of relatedness or inbreeding we could show that a prediction is robust down to magnitudes of a few hundred loci.

Availability and implementation: A java standalone application that computes the relationships between multiple samples as well as a Rscript that visualizes the pedigree information is available for download as well as a web service at www.gene-talk.de CONTACT: heinrich@molgen.mpg.deSupplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

Algorithms
Exome
Female
Genetic Linkage
Genome, Human*
Genomics / methods
High-Throughput Nucleotide Sequencing / methods
Humans
Male
Mutation*
Pedigree*
Sequence Analysis, DNA / methods*
Software*