Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches
- PMID: 32664994
- PMCID: PMC7362544
- DOI: 10.1186/s13073-020-00761-2
Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches
Abstract
Background: When interpreting sequencing data from multiple spatial or longitudinal biopsies, detecting sample mix-ups is essential, yet more difficult than in studies of germline variation. In most genomic studies of tumors, genetic variation is detected through pairwise comparisons of the tumor and a matched normal tissue from the sample donor. In many cases, only somatic variants are reported, which hinders the use of existing tools that detect sample swaps solely based on genotypes of inherited variants. To address this problem, we have developed Somalier, a tool that operates directly on alignments and does not require jointly called germline variants. Instead, Somalier extracts a small sketch of informative genetic variation for each sample. Sketches from hundreds of germline or somatic samples can then be compared in under a second, making Somalier a useful tool for measuring relatedness in large cohorts. Somalier produces both text output and an interactive visual report that facilitates the detection and correction of sample swaps using multiple relatedness metrics.
Results: We introduce the tool and demonstrate its utility on a cohort of five glioma samples each with a normal, tumor, and cell-free DNA sample. Applying Somalier to high-coverage sequence data from the 1000 Genomes Project also identifies several related samples. We also demonstrate that it can distinguish pairs of whole-genome and RNA-seq samples from the same individuals in the Genotype-Tissue Expression (GTEx) project.
Conclusions: Somalier is a tool that can rapidly evaluate relatedness from sequencing data. It can be applied to diverse sequencing data types and genome builds and is available under an MIT license at github.com/brentp/somalier .
Conflict of interest statement
Brent S. Pedersen and Aaron R. Quinlan are co-founders of Base2 Genomics. The remaining authors declare that they have no competing interests.
Figures
Similar articles
-
A computational approach to distinguish somatic vs. germline origin of genomic alterations from deep sequencing of cancer specimens without a matched normal.PLoS Comput Biol. 2018 Feb 7;14(2):e1005965. doi: 10.1371/journal.pcbi.1005965. eCollection 2018 Feb. PLoS Comput Biol. 2018. PMID: 29415044 Free PMC article.
-
GASOLINE: detecting germline and somatic structural variants from long-reads data.Sci Rep. 2023 Nov 27;13(1):20817. doi: 10.1038/s41598-023-48285-0. Sci Rep. 2023. PMID: 38012350 Free PMC article.
-
Short and long-read genome sequencing methodologies for somatic variant detection; genomic analysis of a patient with diffuse large B-cell lymphoma.Sci Rep. 2021 Mar 19;11(1):6408. doi: 10.1038/s41598-021-85354-8. Sci Rep. 2021. PMID: 33742045 Free PMC article.
-
Computational methods and resources for the interpretation of genomic variants in cancer.BMC Genomics. 2015;16 Suppl 8(Suppl 8):S7. doi: 10.1186/1471-2164-16-S8-S7. Epub 2015 Jun 18. BMC Genomics. 2015. PMID: 26111056 Free PMC article. Review.
-
Genomic sequencing in cancer.Cancer Lett. 2013 Nov 1;340(2):161-70. doi: 10.1016/j.canlet.2012.11.004. Epub 2012 Nov 23. Cancer Lett. 2013. PMID: 23178448 Free PMC article. Review.
Cited by
-
A novel variant in the SPTB gene underlying hereditary spherocytosis and a literature review of previous variants.BMC Med Genomics. 2024 Aug 12;17(1):206. doi: 10.1186/s12920-024-01973-w. BMC Med Genomics. 2024. PMID: 39135028 Free PMC article. Review.
-
Inherited defects of piRNA biogenesis cause transposon de-repression, impaired spermatogenesis, and human male infertility.Nat Commun. 2024 Aug 9;15(1):6637. doi: 10.1038/s41467-024-50930-9. Nat Commun. 2024. PMID: 39122675 Free PMC article.
-
Prevalence and impact of the KIT M541L variant in patients with mastocytosis.Oncotarget. 2024 Jul 22;15:521-531. doi: 10.18632/oncotarget.28614. Oncotarget. 2024. PMID: 39037378 Free PMC article.
-
The Open Pediatric Cancer Project.bioRxiv [Preprint]. 2024 Jul 11:2024.07.09.599086. doi: 10.1101/2024.07.09.599086. bioRxiv. 2024. PMID: 39026781 Free PMC article. Preprint.
-
ntsm: an alignment-free, ultra-low-coverage, sequencing technology agnostic, intraspecies sample comparison tool for sample swap detection.Gigascience. 2024 Jan 2;13:giae024. doi: 10.1093/gigascience/giae024. Gigascience. 2024. PMID: 38832466 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
