Exploring the weightage of correlates of Diabetes prediction using Machine learning
DOI:
https://doi.org/10.58445/rars.3204Keywords:
Machine Learning, Diabetes predictionAbstract
There is an increase in the prevalence of Type 1 and Type 2 Diabetes/diabetes around the world. This alarming trend highlights the need for prediction tools that can help identify risk factors associated with these chronic diseases, so that interventions can be implemented at a much earlier stage in their development. This study investigates two distinct datasets drawn from Kaggle, focusing on clinical and lifestyle factors, respectively. We constructed machine learning models, such as Logistic Regression L1 penalty, Logistic Regression L2 penalty, Random forest and a dummy classifier model to contrast basic accuracy, to first make predictions of the likelihood of diabetes occurrence given various factors. We demonstrate that the models have high predictive accuracy, with the logistic regression L2 penalty model achieving 95.297% accuracy, the logistic regression L1 penalty model achieving 96% accuracy, and the random forest model achieving 97% accuracy. However, the key contribution of this study is to provide interpretation of these models to determine the most important drivers of the models’ predictions. We find that even though well-known factors such as HBA1C level, hypertension, and heart disease have high associations with diabetes, factors such as mental health even though below BMI and HBA1C level do have a moderate predictive power.
References
References
Ahmed, N. (n.d.). Machine learning based diabetes prediction and development of smart web application. Science Direct. https://doi.org/10.1016/j.ijcce.2021.12.001
Alhussan, A. (n.d.). Classification of Diabetes Using Feature Selection and Hybrid Al-Biruni Earth Radius and Dipper Throated Optimization. PUBMED Central. 10.3390/diagnostics13122038
Farida, M. (n.d.). A scoping review of artificial intelligence-based methods for diabetes risk prediction. PUBMED. 10.1038/s41746-023-00933-5
Jyoti, R. K. (2020). Diabetes Prediction Using Machine Learning. IJSCSEIT. https://doi.org/10.32628/CSEIT206463
Khokar, P. B. (2025). Advances in artificial intelligence for diabetes prediction: insights from a systematic literature review. Science Direct. https://doi.org/10.1016/j.artmed.2025.103132
Lugnar, M. (2023). Identifying top ten predictors of type 2 diabetes through machine learning analysis of UK Biobank data. Scientific Reports. https://doi.org/10.1038/s41598-024-52023-5
Negi, P. (n.d.). Evaluating Feature Selection Methods to Enhance Diabetes Prediction with Random Forest. ACM. https://doi.org/10.1145/3647444.364793
Noh, M. J. (n.d.). Diabetes Prediction Through Linkage of Causal Discovery and Inference Model with Machine Learning Models. MDPI. https://doi.org/10.3390/biomedicines13010124
Perez, E. R., & Molano, B. A. (2025). Learning from the machine: is diabetes in adults predicted by lifestyle variables? A retrospective predictive modelling study of NHANES 2007-2018. PUBMED. 10.1136/bmjopen-2024-096595
Qin, Y. (n.d.). Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type. PUBMED. 10.3390/ijerph192215027
Ravaut, M. (2021). Development and Validation of a Machine Learning Model Using Administrative Health Data to Predict Onset of Type 2 Diabetes. PUBMED.
Zhou, H. (2023). A diabetes prediction model based on Boruta feature selection and ensemble learning. BMC. https://doi.org/10.1186/s12859-023-05300-5
Additional Files
Posted
Categories
License
Copyright (c) 2025 Aarush Raheja

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.