Prediction of Human Induced Pluripotent Stem Cell Cardiac Differentiation Outcome by Multifactorial Process Modeling

Front Bioeng Biotechnol. 2020 Jul 23;8:851. doi: 10.3389/fbioe.2020.00851. eCollection 2020.

Abstract

Human cardiomyocytes (CMs) have potential for use in therapeutic cell therapy and high-throughput drug screening. Because of the inability to expand adult CMs, their large-scale production from human pluripotent stem cells (hPSC) has been suggested. Significant improvements have been made in understanding directed differentiation processes of CMs from hPSCs and their suspension culture-based production at chemically defined conditions. However, optimization experiments are costly, time-consuming, and highly variable, leading to challenges in developing reliable and consistent protocols for the generation of large CM numbers at high purity. This study examined the ability of data-driven modeling with machine learning for identifying key experimental conditions and predicting final CM content using data collected during hPSC-cardiac differentiation in advanced stirred tank bioreactors (STBRs). Through feature selection, we identified process conditions, features, and patterns that are the most influential on and predictive of the CM content at the process endpoint, on differentiation day 10 (dd10). Process-related features were extracted from experimental data collected from 58 differentiation experiments by feature engineering. These features included data continuously collected online by the bioreactor system, such as dissolved oxygen concentration and pH patterns, as well as offline determined data, including the cell density, cell aggregate size, and nutrient concentrations. The selected features were used as inputs to construct models to classify the resulting CM content as being "sufficient" or "insufficient" regarding pre-defined thresholds. The models built using random forests and Gaussian process modeling predicted insufficient CM content for a differentiation process with 90% accuracy and precision on dd7 of the protocol and with 85% accuracy and 82% precision at a substantially earlier stage: dd5. These models provide insight into potential key factors affecting hPSC cardiac differentiation to aid in selecting future experimental conditions and can predict the final CM content at earlier process timepoints, providing cost and time savings. This study suggests that data-driven models and machine learning techniques can be employed using existing data for understanding and improving production of a specific cell type, which is potentially applicable to other lineages and critical for realization of their therapeutic applications.

Keywords: bioreactor; cardiomyocytes; cell production; classification; directed differentiation; feature selection; human induced pluripotent stem cells; machine learning.