Preprint / Version 1

Analyzing the Performance of Brain Stroke Prediction Using Various Machine Learning Classification Algorithms

##article.authors##

  • Mohammad Zuhayr Mahrus Kabir Student

DOI:

https://doi.org/10.58445/rars.2478

Keywords:

Machine Learning, Logistic Regression, SVM, KNN, XGBoost, Random Forest Classifier, Decision Trees, Confusion Matrix

Abstract

In today’s day and age, strokes are officially regarded as the leading cause of death and disability globally, per the World Health Organization (WHO). A stroke, also known as a cerebrovascular accident, is a medical emergency that occurs when blood flow to a part of the brain is interrupted or reduced, leading to damage or death of brain cells. In many cases, the medical diagnosis of a stroke isn’t attained until after its onset, which, more often than not, leads to fatal consequences. Prompt medical attention is critical when treating a stroke to minimize brain damage and prevent long-term disability or death. Lately, machine learning has been viewed as a significant advancement towards preemptive stroke diagnosis. Machine learning (ML) algorithms analyze vast amounts of medical data, including electronic health records, medical imaging, genetic information, and real-time patient monitoring data, to uncover patterns and insights that were previously unattainable. This research paper investigates the application of machine learning models at their fundamental level for stroke prediction. The paper employs a supervised machine learning model, applying regression algorithms to a collected patient dataset comprising demographic, clinical, and lifestyle factors of patients. Various classifiers, including logistic regression, decision trees, support vector machines (SVM), k-nearest neighbors (KNN), and random forest, were employed to develop predictive models. The study aimed to assess the performance of these classifiers and identify the most accurate model for stroke prediction. Results indicated that the random forest classifier achieved the highest accuracy among all models evaluated, with 99.81% accuracy. This finding underscores the efficacy of ensemble learning techniques in capturing complex interactions and non-linear relationships within the data. The research highlights the potential of ML-based approaches for identifying high-risk individuals for stroke and guiding targeted preventive interventions in clinical practice.

References

World Health Organization, “World Stroke Day 2022,” WHO, Oct. 29, 2022. [Online]. Available:

https://www.who.int/srilanka/news/detail/29-10-2022-world-stroke-day-2022

E. M. Alanazi et al., “Predicting Risk of Stroke from Lab Tests Using Machine Learning Algorithms:

Development and Evaluation of Prediction models,” JMIR Form. Res., vol. 5, no. 12, p. e23440, Dec.

, doi: 10.2196/23440.

D. Atallah, M. Badawy, A. El-Sayed, and M. Ghoneim, “Predicting Kidney Transplantation Outcome

Based on Hybrid Feature Selection and KNN Classifier,” Multimedia Tools Appl., vol. 78, pp.

–20407, 2019, doi: 10.1007/s11042-019-7370-5.

“How Many People are Affected by/at Risk for Stroke?,” Eunice Kennedy Shriver National Institute of

Child Health and Human Development, U.S. Department of Health and Human Services. [Online].

Available: https://www.nichd.nih.gov/health/topics/stroke/conditioninfo/risk [Accessed: Jul. 31, 2024].

World Health Organization, “Stroke, Cerebrovascular Accident,” WHO, 2024. [Online]. Available:

https://www.emro.who.int/health-topics/stroke-cerebrovascular-accident/index.html

Stroke Awareness Foundation, “Stroke Facts & Statistics,” Jul. 11, 2023. [Online]. Available:

https://www.strokeinfo.org/stroke-facts-statistics/

G. G. Sailasya and G. L. A. Kumari, “Analyzing the Performance of Stroke Prediction using ML

Classification Algorithms,” Int. J. Adv. Comput. Sci. Appl., vol. 12, 2021. [Online]. Available:

https://thesai.org/Downloads/Volume12No7/Paper_65-Analyzing_the_Performance_of_Stroke_

Prediction.pdf [Accessed: Jul. 31, 2024].

E. Dritsas and M. Trigka, “Stroke Risk Prediction with Machine Learning Techniques,” Sensors, vol.

, no. 13, p. 4670, Jun. 2022, doi: 10.3390/s22134670.

R. Islam, S. Debnath, and T. Islam, “Predictive Analysis for Risk of Stroke using Machine Learning

Techniques,” in Proc. IC4ME, 2021, pp. 1–4, doi: 10.1109/IC4ME253898.2021.9768524.

A. Suresh, “What is a confusion matrix?,” Medium, Analytics Vidhya, Jan. 18, 2024. [Online].

Available: https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5

“Logistic Regression in Machine Learning - Javatpoint,” Javatpoint. [Online]. Available:

https://www.javatpoint.com/logistic-regression-in-machine-learning [Accessed: Dec. 11, 2023].

M. Song, “Logistic regression explained,” Medium, Sep. 24, 2023. [Online]. Available:

https://medium.com/@msong507/logistic-regression-explained-2d1b8babe6c1

“Decision Tree Algorithm in Machine Learning - Javatpoint,” Javatpoint. [Online]. Available:

https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm [Accessed: Dec.

, 2023].

C. V. Nicholson, “Decision tree,” Pathmind. [Online]. Available: https://wiki.pathmind.com/decision-

tree [Accessed: Jul. 30, 2024].

M. Attard, “8 Key Advantages and Disadvantages of Decision Trees,” Inside Learning Machines,

May 1, 2024. [Online]. Available: https://insidelearningmachines.com/advantages_and_

disadvantages_of_decision_trees/#2_Robust_to_Outliers

D. Gunay, “Random forest,” Medium, Sep. 14, 2023. [Online]. Available: https://medium.com/

@denizgunay/random-forest-af5bde5d7e1e

“Demystifying the Random Forest Algorithm for Accurate Predictions,” Spotfire. [Online]. Available:

https://www.spotfire.com/glossary/what-is-a-random-forest [Accessed: Dec. 11, 2023].

Sachinsoni, “K Nearest Neighbours — Introduction to Machine Learning Algorithms,” Medium, Jun.

, 2023. [Online]. Available: https://medium.com/@sachinsoni600517/k-nearest-neighbours-

introduction-to-machine-learning-algorithms-9dbc9d9fb3b2

A. Alkhaled, A. Kabutey, K. Selvi, Č. Mizera, P. Hrabě, and D. Herak, “Application of Computational

Intelligence in Describing the Drying Kinetics of Persimmon Fruit (Diospyros kaki) During Vacuum and

Hot Air Drying Process,” Processes, vol. 8, p. 544, 2020, doi: 10.3390/pr8050544.

R. Guo, Z. Zhao, T. Wang, G. Liu, J. Zhao, and D. Gao, “Degradation State Recognition of Piston

Pump Based on ICEEMDAN and XGBoost,” Appl. Sci., vol. 10, p. 6593, 2020, doi:

3390/app10186593.

M. Singh, S. Verma, and P. Singhal, “A Comparative Study of Stroke Prediction Algorithms using

Machine Learning,” in Advances in Intelligent Systems and Computing, 2023, doi:

1007/978-3-031-35641-4_22.

A. Tazin et al., “Stroke Disease Detection and Prediction using Robust Learning Approaches,” J.

Healthc. Eng., Nov. 26, 2021, doi: https://doi.org/10.1155/2021/7633381.

M. Wiryaseputra, “Stroke Prediction using Machine Learning Classification Algorithm,” 2022.

[Online]. Available: https://www.researchgate.net/publication/362175348_Stroke_Prediction_Using

Machine_Learning_Classification_Algorithm

Additional Files

Posted

2025-04-18