Improving Diabetes Prediction Accuracy Using Ensemble Machine Learning Models
DOI:
https://doi.org/10.58445/rars.3258Keywords:
HbA1c, diabetes prediction, machine learning, Random Forest, Voting Classifier, Kaggle, classification, glycemic control, ensemble learning, predictive modelingAbstract
This study investigates prediction of HbA1c level which is a principal biomarker of diabetes control based on patient biographical and health data from a publicly accessible dataset [1]. I tried regression models like Linear Regression [2], Decision Tree Regressor [3], and Random Forest Regressor [4] to predict accurate HbA1c levels. Upon facing poorly performing models, most likely because of data bias and feature insufficiency, I restructured the task as a classification problem by approximating the ranges of HbA1c levels into significant categories. I implemented models including Random Forest Classifier [5], Decision Tree Classifier [6], K-Nearest Neighbors [7], and an ensemble Voting Classifier [8]. The Voting Classifier increased the best accuracy to 72.5%, improving over Random Forest’s standalone accuracy of 68.1% [5]. Model tuning focused on parameters such as the number of trees and maximum depth. Variance Inflation Factor analysis was executed to evaluate feature multicollinearity and it confirmed that multicollinearity was not a major issue. Results show that classification models are more suitable for this dataset and confirm the importance of feature engineering and hyperparameter adjustment. This finding demonstrates that classification models better suit this dataset, showing how predictive instruments can assist medical personnel in approximating HbA1c values without resorting to decisions purely based on costly or time-consuming laboratory testing.
References
AravindPCoder. (2023, November 18). Diabetes dataset. Kaggle.
https://www.kaggle.com/datasets/aravindpcoder/diabetes-dataset?resource=download
Scikit-Learn Developers. (2025). Linear Regression documentation. Scikit-Learn.
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
Scikit-Learn Developers. (2025). Decision Tree Regressor documentation. Scikit-Learn.
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html
Scikit-Learn Developers. (2025). Random Forest Regressor documentation. Scikit-Learn.
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html
Scikit-Learn Developers. (2025). Random Forest Classifier documentation. Scikit-Learn.
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
Scikit-Learn Developers. (2025). Decision Tree Classifier documentation. Scikit-Learn.
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
Kartik. (2025, August 23). K-nearest neighbors (KNN). GeeksforGeeks.
https://www.geeksforgeeks.org/machine-learning/k-nearest-neighbours/
Scikit-Learn Developers. (2025). Voting Classifier documentation. Scikit-Learn.
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html
Alhassan, Zakhriya, et al. “Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms with Electronic Health Records.” JMIR Medical Informatics, U.S. National Library of Medicine, 24 May 2021,
pmc.ncbi.nlm.nih.gov/articles/PMC8185616/.
Tao, X., Jiang, M., Liu, Y., Hu, Q., Zhu, B., Hu, J., et al. (2023, September 30). Predicting three-month fasting blood glucose and glycated hemoglobin changes in patients with Type 2 diabetes mellitus based on multiple machine learning algorithms. Scientific Reports.
https://doi.org/10.1038/s41598-023-43240-5
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
https://doi.org/10.48550/arXiv.1201.0490
GraphPad by Dotmatics. (n.d.). Linear regression calculator.
https://www.graphpad.com/quickcalcs/linear1/
Tablas-Mejia, I. (2025). Conclusion section for research papers. San José State University Writing Center.
https://www.sjsu.edu/writingcenter/docs/handouts/Conclusion%20Section%20for%20Research%20Papers.pdf
Downloads
Posted
Categories
License
Copyright (c) 2025 Aadit Singh

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.