Explanatory Model of Dry Eye Disease Using Health and Nutrition Examinations: Machine Learning and Network-Based Factor Analysis From a National Survey

JMIR Med Inform. 2020 Feb 20;8(2):e16153. doi: 10.2196/16153.


Background: Dry eye disease (DED) is a complex disease of the ocular surface, and its associated factors are important for understanding and effectively treating DED.

Objective: This study aimed to provide an integrative and personalized model of DED by making an explanatory model of DED using as many factors as possible from the Korea National Health and Nutrition Examination Survey (KNHANES) data.

Methods: Using KNHANES data for 2012 (4391 sample cases), a point-based scoring system was created for ranking factors associated with DED and assessing patient-specific DED risk. First, decision trees and lasso were used to classify continuous factors and to select important factors, respectively. Next, a survey-weighted multiple logistic regression was trained using these factors, and points were assigned using the regression coefficients. Finally, network graphs of partial correlations between factors were utilized to study the interrelatedness of DED-associated factors.

Results: The point-based model achieved an area under the curve of 0.70 (95% CI 0.61-0.78), and 13 of 78 factors considered were chosen. Important factors included sex (+9 points for women), corneal refractive surgery (+9 points), current depression (+7 points), cataract surgery (+7 points), stress (+6 points), age (54-66 years; +4 points), rhinitis (+4 points), lipid-lowering medication (+4 points), and intake of omega-3 (0.43%-0.65% kcal/day; -4 points). Among these, the age group 54 to 66 years had high centrality in the network, whereas omega-3 had low centrality.

Conclusions: Integrative understanding of DED was possible using the machine learning-based model and network-based factor analysis. This method for finding important risk factors and identifying patient-specific risk could be applied to other multifactorial diseases.

Keywords: dry eye disease; epidemiology; machine learning; patient-specific modeling; systems analysis.