Preprint / Version 3

An Intelligent System for Early Prediction of Cardiovascular Disease using Machine Learning

##article.authors##

  • Aarush Kachhawa Saint Francis High School

DOI:

https://doi.org/10.58445/rars.75

Keywords:

machine learning, classification models, cardiovascular disease prediction, supervised machine learning

Abstract

Cardiovascular disease (CVD) remains the leading cause of death, responsible for 18.6 million deaths globally in 2019. Given the wide availability of several effective therapeutic treatment options, early diagnosis of CVD is critical for timely intervention and slowing down the progression of the disease. CVD is associated with a multitude of risk markers with non-linear interactions among them, making accurate diagnosis of CVD quite challenging, especially for non-specialized clinicians and under-resourced facilities in developing countries. In recent years, machine learning based computational techniques have shown great promise in becoming a great diagnostic tool. The goal of this research is to leverage multiple machine learning methods such as random forest, gradient boosting, logistic regression and artificial neural network and evaluate their prediction efficacy. This study also evaluates the feasibility of combining multiple UCI datasets in order to improve the prediction accuracy of the models. On a merged dataset of over 700 patients from the UCI machine learning repository, the most accurate model was found to be the random forest classifier, showing an accuracy and F1 score of 94% and AUC of 0.98. It was found that ensemble learning methodologies along with data optimization and hyperparameter tuning techniques were able to achieve higher accuracy relative to prior published studies on these datasets. Finally, this study also proposes how these machine learning workloads can be incorporated into a distributed cloud connected healthcare system to make them widely accessible to practicing doctors and enable them to assess CVD risk of their patients.

References

2021 Heart Disease and Stroke statistics update fact sheet at-a-glance. (n.d.). Retrieved June 1, 2022, from https://www.heart.org/-/media/phd-files-2/science-news/2/2021-heart-and-stroke-stat-update/2021_heart_disease_and_stroke_statistics_update_fact_sheet_at_a_glance.pdf?la=en

Machine learning: What it is and why it matters. SAS. (n.d.). Retrieved May 31, 2022, from https://www.sas.com/en_us/insights/analytics/machine-learning.html

Nasteski, V. (2017). An overview of the supervised machine learning methods. HORIZONS.B, 4, 51-62. https://doi.org/10.20544/horizons.b.04.1.17.p05

Diabetes prediction using support Vector Machines. Sisense. (2022, March 18). Retrieved May 31, 2022, from

https://www.sisense.com/blog/diabetes-prediction-using-support-vector-machines/

What is logistic regression? Master's in Data Science. (n.d.). Retrieved May 31, 2022, from https://www.mastersindatascience.org/learning/introduction-to-machine-learning-algorithms/logis tic-regression/

Yıldırım, S. (2020, February 17). Gradient boosted decision trees-explained. Medium. Retrieved May 31, 2022, from https://towardsdatascience.com/gradient-boosted-decision-trees-explained-9259bd8205af

Brownlee, J. (2020, December 2). Bagging and Random Forest Ensemble algorithms for Machine Learning. Machine Learning Mastery. Retrieved May 31, 2022, from https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/

Bhoyar, S., Wagholikar, N., Bakshi, K., & Chaudhari, S. (2021). Real-time heart disease prediction system using Multilayer Perceptron. 2021 2nd International Conference for Emerging Technology (INCET). https://doi.org/10.1109/incet51464.2021.9456389

Whisker plot. Whisker Plot - an overview | ScienceDirect Topics. (n.d.). Retrieved May 31, 2022, from https://www.sciencedirect.com/topics/mathematics/whisker-plot

Pal, M., & Parija, S. (2021). Prediction of heart diseases using Random Forest. Journal of Physics: Conference Series, 1817(1), 012009. https://doi.org/10.1088/1742-6596/1817/1/012009

UCI Machine Learning Repository: Heart disease data set. (n.d.). Retrieved May 31, 2022, from https://archive.ics.uci.edu/ml/datasets/heart+disease

Singh, A., & Kumar, R. (2020). Heart disease prediction using machine learning algorithms. 2020 International Conference on Electrical and Electronics Engineering (ICE3). https://doi.org/10.1109/ice348803.2020.9122958

Mishra, A. (2020, May 28). Metrics to evaluate your machine learning algorithm. Medium.

Retrieved May 31, 2022, from https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234

UCI Machine Learning Repository: Statlog (heart) data set. (n.d.). Retrieved May 31, 2022, from https://archive.ics.uci.edu/ml/datasets/statlog+(heart)

Downloads

Posted

2022-12-05 — Updated on 2022-12-24

Versions