DECISION TREE-BASED CLASSIFICATION APPROACH TO DISCOVER FACTORS AFFECTING VITAMIN D LEVEL WITH MACHINE LEARNING
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Purpose: Vitamin D level is emphasized as an important biomarker in determining risk factors for different diseases. Vitamin D is an important vitamin for human health and its deficiency is associated with serious health problems. Therefore, it is of great importance to detect vitamin D deficiency, which can be easily prevented and treated. The possible relationship between vitamin D deficiency and musculoskeletal pain, osteoporosis, diabetes mellitus, hypertension is frequently discussed in researches. Enhanced availability of health data and decreased data processing expenses facilitate the extraction of valuable patterns related to vitamin D from extensive datasets. To illustrate, decision trees are commonly used for explainability and explainable AI (XAI) purposes. In this research, it is aimed to analyze the factors in determining the vitamin D level and the decision rules related to it. Materials and Methods: A descriptive framework based on one of the machine learning techniques, that is decision tree is followed. The data used to create the decision rules were obtained from volunteers between the ages of 18-85 who applied to Izmir Katip & Ccedil;elebi University Atat & uuml;rk Training and Research Hospital Infectious Diseases and Family Medicine Polyclinics and agreed to participate in the study between 01.03.2017 and 01.09.2017. The sample size was calculated as 172 with 80% power, 5% error margin using NCSS and PASS software. The following parameters were examined: AST, ALT, ALP, BUN, creatine, total protein, albumin 25 (OH) D, PTH, TSH, Ca, Mg, phosphate, uric acid and VDR gene polymorphism. An investigator-designed socio-demographic data questionnaire was administered inperson interviews with 172 participants as a consequence of the research conducted with that total number of individuals.The validity of the models were assessed according to accuracy scores for each model. Results: It was observed that age, gender and laboratory test values are strong predictors for vitamin D level. As a result of two CART (Classification and Regression Trees) models, %90.47 and %95 predictive accuracy were observed respectively. In the first model, uric acid, age and creatine; in the second model TSH, ALP and smoking(yes) were the most important three biomarkers affecting vitamin D level. Conclusion: The collected features give a comprehensive list of variables that influence vitamin D in the dataset under consideration. Important findings of the study include not only the identification of these variables, but also the effective categorization determination procedures. Final decision tree models were constructed using two distinct feature sets. The initial model was created with 12 features (Age, ALP, TSH, URICACID, PHOSPHATE, AST, Cigarette Consumption, CA, CREATIN, TOTALPROTEIN, MG, BUN) that had over 4% importance, resulting in a classification accuracy rate of 92.7%. The second model was built using all features in the dataset and achieved a classification accuracy rate of 88.37%. In contrast to previous research, the Age variable is the most influential factor within the scope of this dataset, which includes demographic information on patients and their existing disorders.