Preprint / Version 1

Analyzing the Probability of Diabetes Through Machine Learning

##article.authors##

  • Anish Subramanian Adlai E. Stevenson High School

DOI:

https://doi.org/10.58445/rars.3621

Keywords:

Diabetes, Machine Learning, Disease prediction

Abstract

Numerous diseases impact people around the world on a daily basis, worrying many about their physical health and well-being. Throughout the world, diabetes is one of the most widespread diseases, affecting hundreds of millions of individuals. Machine learning may be able to diagnose patients with diabetes based on their medical records. Using different models such as Logistic Regression, Random Forests, and Neural Networks, we have found that it is possible to predict the probability of having diabetes. The neural network had the highest AUROC (Area Under the Receiver Operating Characteristic curve), AUPRC (Area Under the Precision-Recall Curve), and Accuracy of 0.8330, 0.4316, and 0.8597, respectively, making it the best-performing model out of the three. The results of this paper suggest that machine learning models, specifically neural networks, may be useful in diabetes diagnosis. 



References

Bloomgarden, Zachary T. “What Will We See in Diabetes in the next 10 Years?” Journal of Diabetes, vol. 16, no. 6, June 2024, p. e13594. DOI.org (Crossref), https://doi.org/10.1111/1753-0407.13594.

Qin, Yifan, et al. “Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type.” International Journal of Environmental Research and Public Health, vol. 19, no. 22, Nov. 2022, p. 15027. Crossref, https://doi.org/10.3390/ijerph192215027.

Habehh, Hafsa, and Suril Gohel. “Machine Learning in Healthcare.” Current Genomics, vol. 22, no. 4, Dec. 2021, pp. 291–300. Crossref, https://doi.org/10.2174/1389202922666210705124359.

Khare, Akshay Dattatray . “Diabetes Dataset.” Www.kaggle.com, 2022, www.kaggle.com/datasets/akshaydattatraykhare/diabetes-dataset.‌

Serrano, Luis. Grokking Machine Learning. Manning Publications Co. LLC, 2021.

Schonlau, Matthias, and Rosie Yuyan Zou. “The Random Forest Algorithm for Statistical Learning.” The Stata Journal: Promoting Communications on Statistics and Stata, vol. 20, no. 1, Mar. 2020, pp. 3–29. DOI.org (Crossref), https://doi.org/10.1177/1536867X20909688.

Hancock, John T., et al. “Evaluating Classifier Performance with Highly Imbalanced Big Data.” Journal of Big Data, vol. 10, no. 1, Apr. 2023, p. 42. DOI.org (Crossref), https://doi.org/10.1186/s40537-023-00724-5.

Downloads

Posted

2026-01-31