[edit]
Construction and Evaluation of a Metabolic Syndrome Prediction Model Based on Classification Algorithms
Proceedings of 2024 International Conference on Machine Learning and Intelligent Computing, PMLR 245:443-455, 2024.
Abstract
Exploring the risk factors influencing Metabolic Syndrome (MS), constructing a risk prediction model based on multiple classification algorithms, comparing the predictive performance of different models for MS, and interpreting the models to derive specific MS prediction rules, providing scientific basis for MS prediction. A retrospective analysis was conducted on clinical data from 2,193 MS patients. Based on whether the patients developed MS, they were divided into a healthy group and an MS group. Statistical correlation tests were used to identify the risk factors associated with MS. Six classification algorithms, including decision trees, logistic regression, random forests, naive Bayes, K-nearest neighbors, and support vector machines, were employed to build an MS prediction model. The prediction model’s performance was evaluated using R language by generating receiver operating characteristic (ROC) curves. Among the 2,193 MS patients, the incidence rate of MS was 34.66%. Significant differences (P¡0.05) were observed between the healthy group and the MS group in terms of age, marital status, income, ethnicity, waist circumference, body mass index (BMI), uric acid levels, blood glucose levels, high-density lipoprotein levels, and triglyceride levels. Blood glucose, waist circumference, BMI, and triglycerides showed a significant linear correlation with MS. The ROC curve results indicated that the random forest algorithm achieved an area under the curve (AUC) of 0.94 (95% CI: 0.914-0.957), logistic regression achieved an AUC of 0.90 (95% CI: 0.867-0.925), support vector machines achieved an AUC of 0.89 (95% CI: 0.859-0.920), decision trees achieved an AUC of 0.87 (95% CI: 0.831-0.905), K-nearest neighbors achieved an AUC of 0.81 (95% CI: 0.770-0.850), and naive Bayes achieved an AUC of 0.74 (95% CI: 0.694-0.785). The study results confirmed that factors such as age, marital status, waist circumference, BMI, blood glucose levels, and triglyceride levels are all risk factors for developing MS. Furthermore, the random forest and logistic regression models demonstrated excellent performance in predicting MS.