In the age of personalized medicine, genetic testing by means of targeted sequencing has taken a key role. However, when comparing different sets of targeted sequencing data, these are often characterized by a considerable lack of harmonization. Laboratories follow their own best practices, analyzing their own target regions. The question on how to best integrate data from different sites remains unanswered. Studying the example of myelodysplastic syndrome (MDS), we analyzed 11 targeted sequencing sets, collected from six different centers (n = 831). An intersecting target region of 43,076 bp (30 genes) was identified; whereas, the original target regions covered up to 499,097 bp (117 genes). Considering a region of interest in the context of MDS, a target region of 55,969 bp (31 genes) was identified. For each gene, coverage and sequencing data quality was evaluated, calculating a sequencing score. Analyses revealed huge differences between different data sets as well as between different genes. Analysis of the relation between sequencing score and mutation frequency in MDS revealed that most genes with high frequency in MDS could be sequenced without expecting low coverage or quality. Still, no gene appeared consistently unproblematic for all data sets. To allow for comparable results in a multicenter setting analyzing MDS, we propose to use a predefined target region of interest and to perform centralized data analysis using harmonized criteria.
Copyright © 2021 Association for Molecular Pathology and American Society for Investigative Pathology. Published by Elsevier Inc. All rights reserved.