A Genome-Wide Association Study and Machine-Learning Algorithm Analysis on the Prediction of Facial Phenotypes by Genotypes in Korean Women

Clin Cosmet Investig Dermatol. 2022 Mar 11:15:433-445. doi: 10.2147/CCID.S339547. eCollection 2022.

Abstract

Purpose: Changes in facial appearance are affected by various intrinsic and extrinsic factors, which vary from person to person. Therefore, each person needs to determine their skin condition accurately to care for their skin accordingly. Recently, genetic identification by skin-related phenotypes has become possible using genome-wide association studies (GWAS) and machine-learning algorithms. However, because most GWAS have focused on populations with American or European skin pigmentation, large-scale GWAS are needed for Asian populations. This study aimed to evaluate the correlation of facial phenotypes with candidate single-nucleotide polymorphisms (SNPs) to predict phenotype from genotype using machine learning.

Materials and methods: A total of 749 Korean women aged 30-50 years were enrolled in this study and evaluated for five facial phenotypes (melanin, gloss, hydration, wrinkle, and elasticity). To find highly related SNPs with each phenotype, GWAS analysis was used. In addition, phenotype prediction was performed using three machine-learning algorithms (linear, ridge, and linear support vector regressions) using five-fold cross-validation.

Results: Using GWAS analysis, we found 46 novel highly associated SNPs (p < 1×10-05): 3, 20, 12, 6, and 5 SNPs for melanin, gloss, hydration, wrinkle, and elasticity, respectively. On comparing the performance of each model based on phenotypes using five-fold cross-validation, the ridge regression model showed the highest accuracy (r2 = 0.6422-0.7266) in all skin traits. Therefore, the optimal solution for personal skin diagnosis using GWAS was with the ridge regression model.

Conclusion: The proposed facial phenotype prediction model in this study provided the optimal solution for accurately predicting the skin condition of an individual by identifying genotype information of target characteristics and machine-learning methods. This model has potential utility for the development of customized cosmetics.

Keywords: customized cosmetics; genome-wide association study; machine-learning algorithm; microarray; single-nucleotide polymorphism.

Grants and funding

This study was supported by Kolmar Korea Co. and Eone Diagnomics Genome Center Co. equally.