Preprint / Version 1

Comparative analysis on efficacy of machine learning models in predicting type 2 diabetes


  • Tanvi Das Whitney High



machine learning models, bioinformatics, artificial neural network, support vector machines, linear regression


Different indicators and symptoms of diabetes, although researched, are always necessary to understand. According to the World Economic Forum's report in 2019, approximately 463 million people worldwide, aged between 20 and 79, were affected by diabetes. This number is projected to increase to 700 million by the year 2045. In the Americas, around 11.3% of the population is diagnosed with diabetes, followed by the Middle East with the next highest percentage. The goal of this paper is to investigate and model the different predictors of diabetes using mathematical and machine learning methods to get a better understanding of the disease and to propose a process that can be used by hospitals and practitioners to predict diabetes and intervene before its onset. A detailed review of all algorithms, models, and procedures can be seen below. The purpose of this study was to determine if there was a significant difference in the performance of various models, namely support vector machine (SVM), linear classifier, decision tree classifier, and artificial neural network for diabetes prediction. The comparative results show that the decision tree model outperforms both SVM and linear classifier; the decision tree gave a classification accuracy of 76.66%, the SVM gave a classification accuracy of 75.33%, and the linear classifier gave an accuracy of 67.67%. This result indicates that the accuracy of decision trees and SVM is better than linear classifiers for predicting diabetes in a patient population.


Fregoso-Aparicio, L., Noguez, J., Montesinos, L. et al. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr 13, 148 (2021).

Singla R, Singla A, Gupta Y, Kalra S. Artificial Intelligence/Machine Learning in Diabetes Care. Indian J Endocrinol Metab. 2019 Jul-Aug;23(4):495-497. doi: 10.4103/ijem.IJEM_228_19. PMID: 31741913; PMCID: PMC6844177

Teboul, Alex. “Diabetes Health Indicators Dataset.” Kaggle, 8 Nov. 2021,

Bureau, US Census. “Census Bureau Releases New U.S. Population Estimates by Age and Sex.” Census.Gov, 14 Apr. 2022,

“1.4. Support Vector Machines.” Scikit, Accessed 21 July 2023.

“1.10. Decision Trees.” Scikit, Accessed 29 July 2023.

“1.10. Decision Trees.” Scikit, Accessed 29 July 2023.

Additional Files