Determining the most effective machine learning model on segmented vs. unsegmented patient data to assign Drug Classification

Nino Saeki

doi:10.58445/rars.1861

##article.authors##

Nino Saeki Tesla STEM Highschool

DOI:

https://doi.org/10.58445/rars.1861

Keywords:

Machine Learning, k-Neighbors, Random Forest, Decision Tree, Support Vector Machine, Drug Classification, Male vs. Female, Patient Data

Abstract

Throughout history and in the modern world, accurate drug prescription has been one of the most important tasks that a medical professional does in their everyday tasks. Through recent innovations in the machine learning environment, algorithms and models can more accurately predict the physiological activity of drugs and further classify drugs based on their physiological properties. This project focuses on the latter, and leverages a sample patient dataset and runs logistic regression, k-neighbors, support vector machine, naïve Bayes, decision trees, and the random forest models to determine accuracy between models. Afterwards, the data is segmented by sex, and the models are implemented on each dataset, and the accuracies are compared. The accuracy for the models that were applied to the entire dataset are the following (greatest to least accurate): decision tree (100%), random forest (100%), SVM (98%), naive Bayes (83%), logistic regression (83%), and k-Neighbors (66.67%). Overall, segmentation had the smallest effect on the Random Forest and Decision Tree models as both produced a 0% difference in accuracy between male and female datasets, and had the biggest effect on the k-neighbors model with a 38.03% between male and female datasets.

References

Medlinskiene, Kristina, et al. "Barriers and facilitators to the uptake of new medicines into clinical practice: a systematic review." BMC health services research 21.1 (2021): 1198.

Sae-Ang, Apichat, et al. "Drug recommendation from diagnosis codes: Classification vs. Collaborative filtering approaches." International Journal of Environmental Research and Public Health 20.1 (2022): 309.

Kumar, Yogesh, et al. "Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda." Journal of ambient intelligence and humanized computing 14.7 (2023): 8459-8486.

Karalis, Vangelis D. "The integration of artificial intelligence into clinical practice." Applied Biosciences 3.1 (2024): 14-44.

Arnold, Arthur P., et al. "Male–female comparisons are powerful in biomedical research—don’t abandon them." Nature 629.8010 (2024): 37-40.

d’Emden, Michael C., et al. "Favourable effects of fenofibrate on lipids and cardiovascular disease in women with type 2 diabetes: results from the Fenofibrate Intervention and Event Lowering in Diabetes (FIELD) study." Diabetologia 57 (2014): 2296-2303.

Guido, Rosita, et al. "An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review." Information 15.4 (2024): 235.

Sarang, Poornachandra. "Support Vector Machines: A Supervised Learning Algorithm for Classification and Regression." Thinking Data Science: A Data Science Practitioner’s Guide. Cham: Springer International Publishing, 2023. 153-165.

Habehh, Hafsa, and Suril Gohel. “Machine Learning in Healthcare.” Current genomics vol. 22,4 (2021): 291-300. doi:10.2174/1389202922666210705124359

Determining the most effective machine learning model on segmented vs. unsegmented patient data to assign Drug Classification

##article.authors##

DOI:

Keywords:

Abstract

References

Downloads

Posted

Categories

License