Exploring the Use of Synthetic Data from AI Chatbots for Predicting Alzheimer's Disease: Methods for Validation and Barriers to Real-World Implementation
DOI:
https://doi.org/10.58445/rars.3564Keywords:
data science, artificial intellegence, chatbots, synthetic data, Alzheimer's disease, machine learningAbstract
Artificial intelligence has rapidly advanced in the last decade for applications in healthcare from diagnostic tools to predictive analytics. Machine learning, a subset of the wide range of AI tools undergoing implementation, uses its sophisticated abilities to recognize patterns in subsequent data sets to help enable the early detection of chronic illnesses like Alzheimer's disease. Atenacious challenge persists in the reliance of real world data, when developing machine learning models, raising ethical concerns over privacy, security, and bias. Synthetic data offers a potential solution by mimicking real-world medical datasets while avoiding privacy issues. Traditional approaches rely on statistical analytics and simulations to generate viable data sets, unlike this research, which will be proposing a new source of data generation: ChatGPT. Exploring AI-generated synthetic data shows promise for a future in enhanced early diagnostic capabilities, and if successful, goes far beyond Alzheimer's.
A dataset from the University of Southern California’s Image and Data Archive was selected as a trusted reference source and used as the foundation for model testing. Key variables, particularly age distribution and cognitive classification groups (CN, EMCI, LMCI, and MCI), were analyzed to identify patterns associated with Alzheimer’s disease progression. These observed patterns were then used to guide the generation of synthetic data via a chatbot, ensuring that the synthetic dataset reflected realistic demographic distributions and cognitive stage relationships for training the machine learning model.
The machine learning model was trained on the synthetic dataset to predict Alzheimer’s diagnosis based on symptom-related variables. The model achieved an accuracy of 1.00, with perfect precision, recall, and F1-scores, indicating no misclassifications. While impressive, these results suggest overfitting due to limited data diversity and dominant features such as MMSE score and hippocampal volume. Additionally, the inclusion of Patient_ID as a feature likely contributed to memorization rather than generalizable learning.
References
Abidi, Y. (2024, April 23). The 5 Best Open-Source AI Image Generators. MUO; MakeUseOf. https://www.makeuseof.com/best-open-source-ai-image-generators/
Adel, S. M., Bichu, Y. M., Pandian, S. M., Sabouni, W., Shah, C., & Vaiid, N. (2024). Clinical audit of an artificial intelligence (AI) empowered smile simulation system: a prospective clinical trial. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-69314-6
Al-Antari, M. A. (2023). Artificial Intelligence for Medical Diagnostics—Existing and Future AI Technology! Diagnostics, 13(4), 688. PubMed Central. https://doi.org/10.3390/diagnostics13040688
AI Could Help Predict Alzheimer’s Disease Early Using Language. (2020). Psychology Today. https://www.psychologytoday.com/us/blog/the-future-brain/202010/ai-could-help-predict-alzheimer-s-disease-early-using-language
Alice, M., Felipe, Fernando, Madeiro, F., & Lima, J. B. (2024). Machine Learning and Graph Signal Processing Applied to Healthcare: A Review. Bioengineering, 11(7), 671–671. https://doi.org/10.3390/bioengineering11070671
Applying artificial intelligence for early risk forecasting of Alzheimer’s disease. (n.d.). ScienceDaily. https://www.sciencedaily.com/releases/2023/06/230607124033.htm
Are AI and Talking Cars the Future of Driving? (2024). Psychology Today. https://www.psychologytoday.com/au/blog/the-future-brain/202410/are-ai-and-talking-cars-the-future-of-driving
Artificial intelligence outperforms clinical tests at predicting progress of Alzheimer’s disease. (2024). ScienceDaily. https://www.sciencedaily.com/releases/2024/07/240713121220.htm
Best Practices and Lessons Learned on Synthetic Data. (2022). Arxiv.org. https://arxiv.org/html/2404.07503
Davenport, T., & Kalakota, R. (2019). The Potential for Artificial Intelligence in Healthcare. Future Healthcare Journal, 6(2), 94–98. https://doi.org/10.7861/futurehosp.6-2-94
D’Hondt, E., Ashby, T. J., Chakroun, I., Koninckx, T., & Wuyts, R. (2022). Identifying and evaluating barriers for the implementation of machine learning in the intensive care unit. Communications Medicine, 2(1). https://doi.org/10.1038/s43856-022-00225-1
Ding, K., Chetty, M., Noori Hoshyar, A., Bhattacharya, T., & Klein, B. (2024). Speech based detection of Alzheimer’s disease: a survey of AI techniques, datasets and challenges. Artificial Intelligence Review, 57(12). https://doi.org/10.1007/s10462-024-10961-6
Dinh, A., Miertschin, S., Young, A., & Mohanty, S. D. (2019). A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Medical Informatics and Decision Making, 19(1). https://doi.org/10.1186/s12911-019-0918-5
Emiliano Garza-Frias, Kaviani, P., Karout, L., Roshan Fahimi, Hosseini, S., Preetham Putha, Manoj Tadepalli, Kiran, S., Arora, C., Robert, D., Bizzo, B., Dreyer, K. J., Kalra, M. K., & Digumarthy, S. R. (2024). Early Detection of Heart Failure with Autonomous AI-Based Model Using Chest Radiographs: A Multicenter Study. Diagnostics, 14(15), 1635–1635. https://doi.org/10.3390/diagnostics14151635
Fetzer, J. H. (1990). What is Artificial Intelligence? Artificial Intelligence: Its Scope and Limits, 4(1), 3–27. https://doi.org/10.1007/978-94-009-1900-6_1
Følstad, A., Araujo, T., Law, E. L.-C., Brandtzaeg, P. B., Papadopoulos, S., Reis, L., Baez, M., Laban, G., McAllister, P., Ischen, C., Wald, R., Catania, F., Meyer von Wolff, R., Hobert, S., & Luger, E. (2021). Future directions for chatbot research: an interdisciplinary research agenda. Computing, 103(12), 2915–2942. https://doi.org/10.1007/s00607-021-01016-7
Futoma, J., Simons, M., Panch, T., Doshi-Velez, F., & Celi, L. A. (2020). The myth of generalisability in clinical research and machine learning in health care. The Lancet Digital Health, 2(9), e489–e492. https://doi.org/10.1016/s2589-7500(20)30186-2
Google Health. (n.d.). Google Health. Health.google. https://health.google/health-research/imaging-and-diagnostics/
Grueso, S., & Viejo-Sobera, R. (2021). Machine learning methods for predicting progression from mild cognitive impairment to Alzheimer’s disease dementia: a systematic review. Alzheimer’s Research & Therapy, 13(1). https://doi.org/10.1186/s13195-021-00900-w
Gugerty, L. (2006). Newell and Simon’s Logic Theorist: Historical Background and Impact on Cognitive Modeling. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 50(9), 880–884. https://doi.org/10.1177/154193120605000904
Guzmán-Quezada, E., Mancilla-Jiménez, C., Rosas-Agraz, F., Romo-Vázquez, R., & Vélez-Pérez, H. (2024). Embedded Machine Learning System for Muscle Patterns Detection in a Patient with Shoulder Disarticulation. Sensors, 24(11), 3264. https://doi.org/10.3390/s24113264
Hao, S., Han, W., Jiang, T., Li, Y., Wu, H., Zhong, C., Zhou, Z., & Tang, H. (2024). Synthetic Data in AI: Challenges, Applications, and Ethical Implications. https://arxiv.org/pdf/2401.01629
Irving, S. J., Kocksch, L., & Munk, A. K. (2024). Synthetic Interlocutors. Experiments with Generative AI to Prolong Ethnographic Encounters. ArXiv.org. https://arxiv.org/abs/2410.11395
Jahan, S., Kazi Abu Taher, M. Shamim Kaiser, Mahmud, M., Md. Sazzadur Rahman, A. S. M. Sanwar Hosen, & Ra, I.-H. (2023). Explainable AI-based Alzheimer’s prediction and management using multimodal data. PloS One, 18(11), e0294253–e0294253. https://doi.org/10.1371/journal.pone.0294253
Jog, C. (2024, October 12). The “strawberrry” problem: How to overcome AI’s limitations. VentureBeat. https://venturebeat.com/ai/the-strawberrry-problem-how-to-overcome-ais-limitations/
Kalota, F. (2024). A Primer on Generative Artificial Intelligence. Education Sciences, 14(2), 172. https://doi.org/10.3390/educsci14020172
Kaul, V., Enslin, S., & Gross, S. A. (2020). History of artificial intelligence in medicine. Gastrointestinal Endoscopy, 92(4), 807–812. https://doi.org/10.1016/j.gie.2020.06.040
Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G., & King, D. (2019). Key Challenges for Delivering Clinical Impact with Artificial Intelligence. BMC Medicine, 17(1). BMC. https://doi.org/10.1186/s12916-019-1426-2
Kuzucu, S., Cheong, J., Gunes, H., & Kalkan, S. (2024). Uncertainty as a Fairness Measure. Journal of Artificial Intelligence Research, 81, 307–335. https://doi.org/10.1613/jair.1.16041
Lawrence Livermore National Laboratory. (2024). The Birth of Artificial Intelligence (AI) Research | Science and Technology. St.llnl.gov. https://st.llnl.gov/news/look-back/birth-artificial-intelligence-ai-research
Malik, P., Pathania, M., & Rathaur, V. (2019). Overview of artificial intelligence in medicine. Journal of Family Medicine and Primary Care, 8(7), 2328–2331. https://doi.org/10.4103/jfmpc.jfmpc_440_19
Manjur Kolhar, Nazir, R., Mohapatra, H., & Al, A. M. (2024). AI-Driven Real-Time Classification of ECG Signals for Cardiac Monitoring Using i-AlexNet Architecture. Diagnostics, 14(13), 1344–1344. https://doi.org/10.3390/diagnostics14131344
Mearian, L. (2024, February 7). What are LLMs, and how are they used in generative AI? Computerworld. https://www.computerworld.com/article/1627101/what-are-large-language-models-and-how-are-they-used-in-generative-ai.html
Mills, G. A., Dey, D., Kassim, M., Yiwere, A., & Broni, K. (2024). Diagnostic Tool for Early Detection of Rheumatic Disorders Using Machine Learning Algorithm and Predictive Models. BioMedInformatics, 4(2), 1174–1201. https://doi.org/10.3390/biomedinformatics4020065
Moghaddam, M. T., Yones Jahani, Zahra Arefzadeh, Dehghan, A., Mohsen Khaleghi, Sharafi, M., & Ghasem Nikfar. (2024). Predicting diabetes in adults: identifying important features in unbalanced data over a 5-year cohort study using machine learning algorithm. BMC Medical Research Methodology, 24(1). https://doi.org/10.1186/s12874-024-02341-z
New AI Program Could Predict Likelihood of Alzheimer’s | College of Engineering. (2025). Bu.edu. https://www.bu.edu/eng/2024/06/25/new-ai-program-could-predict-likelihood-of-alzheimers/
Nield, D. (2024, August). New AI Tool Predicts Alzheimer’s With Higher Accuracy Than Clinical Tests. ScienceAlert. https://www.sciencealert.com/new-ai-tool-predicts-alzheimers-with-higher-accuracy-than-clinical-tests
Nuñez, M. (2024, October). Nvidia just dropped a bombshell: Its new AI model is open, massive, and ready to rival GPT-4. VentureBeat. https://venturebeat.com/ai/nvidia-just-dropped-a-bombshell-its-new-ai-model-is-open-massive-and-ready-to-rival-gpt-4/https://venturebeat.com/ai/nvidia-just-dropped-a-bombshell-its-new-ai-model-is-open-massive-and-ready-to-rival-gpt-4/
Orland, K. (2024, October 15). Apple Engineers Show How Flimsy AI “Reasoning” Can Be. WIRED. https://www.wired.com/story/apple-ai-llm-reasoning-research/
Osorio, P., Jimenez-Perez, G., Montalt-Tordera, J., Hooge, J., Guillem Duran-Ballester, Singh, S., Moritz Radbruch, Bach, U., Schroeder, S., Siudak, K., Vienenkoetter, J., Lawrenz, B., & Mohammadi, S. (2024). Latent Diffusion Models with Image-Derived Annotations for Enhanced AI-Assisted Cancer Diagnosis in Histopathology. Diagnostics, 14(13), 1442–1442. https://doi.org/10.3390/diagnostics14131442
Riem, L., DuCharme, O., Cousins, M., Feng, X., Kenney, A., Morris, J., Tapscott, S. J., Tawil, R., Statland, J., Shaw, D., Wang, L., Walker, M., Lewis, L., Jacobs, M. A., Leung, D. G., Friedman, S. D., & Blemker, S. S. (2024). AI driven analysis of MRI to measure health and disease progression in FSHD. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-65802-x
Rowe, T. W., Katzourou, I. K., Stevenson-Hoare, J. O., Bracher-Smith, M. R., Ivanov, D. K., & Escott-Price, V. (2021). Machine learning for the life-time risk prediction of Alzheimer’s disease: a systematic review. Brain Communications, 3(4). https://doi.org/10.1093/braincomms/fcab24
Sandeep Singh Sengar, Affan Bin Hasan, Kumar, S., & Carroll, F. (2024). Generative artificial intelligence: a systematic review and applications. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-024-20016-1
Sharma, A. (2023, September 25). 15 Must-Try AI Social Media Content Creation Tools to Save Time. Buffer Resources. https://buffer.com/resources/ai-social-media-content-creation/
Siddiqui, I. A., Littlefield, N., Carlson, L. A., Gong, M., Chhabra, A., Menezes, Z., Mastorakos, G. M., Sakshi Mehul Thakar, Mehrnaz Abedian, Lohse, I., Weiss, K. R., Plate, J. F., Moradi, H., Amirian, S., & Tafti, A. P. (2024). Fair AI-powered orthopedic image segmentation: addressing bias and promoting equitable healthcare. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-66873-6
Stanford University. (2021). SQ2. What are the most important advances in AI? | One Hundred Year Study on Artificial Intelligence (AI100). Ai100.Stanford.edu. https://ai100.stanford.edu/gathering-strength-gathering-storms-one-hundred-year-study-artificial-intelligence-ai100-2021-1/sq2
tina. (2024, January 18). Guide to Open Source Large Language Models: Complete LLM Resource 2024. HYPEStudio - AI Automations, API Integrations, WordPress Website Development. https://hypestudio.org/blog/guide-to-open-source-large-language-models/
Zisis, K., Pavi, E., Geitona, M., & Athanasakis, K. (2024). Real-world data: a comprehensive literature review on the barriers, challenges, and opportunities associated with their inclusion in the health technology assessment process. Journal of Pharmacy & Pharmaceutical Sciences, 27, 12302. https://doi.org/10.3389/jpps.2024.12302
Downloads
Posted
Categories
License
Copyright (c) 2026 Siddhi Ananya

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.