We present an extension of the EuroForMix model to improve genotype prediction for unknown contributors in Massively Parallel Sequencing (MPS) mixture data by incorporating marker amplification efficiency (MAE) parameters. The proposed two-step procedure first estimates MAE directly from the tested mixture profile and then treats these estimates as fixed inputs to the extended EuroForMix model, allowing the remaining parameters to be inferred as in the original framework. This eliminates the need for external calibration data. Fully joint inference is theoretically appealing but computationally impractical for large MPS panels; the two-step strategy offers a tractable alternative. We evaluated four approaches for estimating MAE: two empirical methods, which are susceptible to overfitting, and two Bayesian methods that apply shrinkage to mitigate this. For the Bayesian variants, variational inference was employed to obtain accurate posterior MAE estimates, enabling efficient analysis of profiles with thousands of markers. Predictive performance was assessed using five DNA mixture datasets containing either 2- or 3-person mixtures, without conditioning on any of the contributing profiles. Four were based on MPS data: two SNP datasets (one high-density with ∼10,000 markers and one with ∼100 markers), one microhaplotype (MH) dataset, and one STR dataset. The fifth dataset comprised capillary electrophoresis (CE) STR data. Performance was evaluated using Brier score, accuracy and calibration, where the latter reflects how well prediction probabilities can be trusted. For practical forensic relevance, we also applied a high-certainty prediction threshold (p ≥ 0.95) and calculated the corresponding coverage and accuracy. Across all datasets, the MAE-based methods improved genotype prediction compared to the default EuroForMix model, with largest gain observed for the MPS datasets. Empirical approaches produced more decisive predictions and higher high-certainty coverage, whereas Bayesian approaches yielded more conservative and better-calibrated probabilities. Improvement for CE-STR data was modest. The binned version of EuroForMix provided a competitive alternative, particularly for high-density SNP panels.
Keywords: DNA mixture deconvolution; MPS; Marker amplification efficiency; Variational inference.
Copyright © 2026 The Authors. Published by Elsevier B.V. All rights reserved.