Application of Virtual Sample Generation and Screening in Process Parameter Optimization of Botanical Medicinal Materials

Curr Top Med Chem. 2023;23(8):618-626. doi: 10.2174/1568026623666230117121531.


Background: The small sample problem widely exists in the fields of the chemical industry, chemistry, biology, medicine, and food industry. It has been a problem in process modeling and system optimization. The aim of this study is to focus on the problems of small sample size in modeling, the process parameters in the ultrasonic extraction of botanical medicinal materials can be obtained by optimizing the extraction rate model. However, difficulty in data acquisition results in problem of small sample size in modeling, which eventually reduces the accuracy of modeling prediction.

Methods: A virtual sample generation method based on full factorial design (FFD) is proposed to solve the problem ofa small sample size. The experiments are first conducted according to the Box- Behnken Design (BBD) to obtain small-size samples, and the response surface function is established accordingly. Then, virtual sample inputs are obtained by the FFD, and the corresponding virtual sample outputs are calculated by the response surface function. Furthermore, a screening method of virtual samples is proposed based on an extreme learning machine (ELM). The connection weights of ELM are used for further optimization and screening of the generated virtual samples.

Result: The results show that virtual sample data can effectively expand the sample size. The precision of the model trained on semi-synthetic samples (small-size experimental simples and virtual samples) is higher than the model trained merely on small-size experimental samples.

Conclusion: The virtual sample generation and screening methods proposed in this paper can effectively solve the modeling problem of small samples. The reliable process parameters can be obtained by optimizing the model trained by the semi-synthetic samples.

Keywords: Botanical medicinal materials; Extreme learning machine; Optimization of process parameters; Response surface methodology (RSM); Support vector regression (SVR); Virtual samples.