Preprint / Version 1

Unmasking Misinformation: A Machine Learning Approach to Detecting Fake News

##article.authors##

  • Maya Hussain The Bush School

DOI:

https://doi.org/10.58445/rars.3224

Keywords:

AI, Artificial Intelligence, Misinformation, Fake News Detection

Abstract

In recent times, there has been an increase in misinformation, with misleading information being shared as real news to deceive and manipulate public opinion. The dissemination of misinformation, particularly in areas with global implications such as politics and health, can have severe consequences for society as a whole. For example, recent US elections related widespread misinformation has shown to deepen polarization and erode trust in both democratic institutions and our news media. Misleading reports during crises like the Ebola outbreak or COVID-19 misinformation about vaccines and treatments spread unnecessary fear, created barriers for public health response teams, and resulted in many preventable deaths. Social media further amplifies fake news, making it difficult for fact-checking efforts to keep pace. To distinguish misinformation from credible reporting, this paper aims to apply machine learning techniques to detect fake news with greater accuracy. To research this, we analyze datasets containing both fake and real news articles to uncover linguistic patterns and differences between the two. Natural Language Processing (NLP) techniques such as Term Frequency-Inverse Document Frequency (TF-IDF) are used to convert text data into numerical features for training machine learning models. Several classification algorithms, such as Logistic Regression, Random Forest, and XGBoost, are then trained to differentiate fake from real news. To further explore the differences in the data types, an analysis is done to examine sentiment differences. By leveraging data from everyday news, politics, and health sources, we keep the work grounded in the real-world implications of fake news disguised as fact. The goal is to develop an AI-powered automated fact-checking system to distinguish between real and fake sources, thereby contributing to ongoing efforts to protect the public from the harms of misinformation and uphold their trust in news media.

References

Wardle, C., & Derakhshan, H. (2017). Information disorder: Toward an interdisciplinary framework for research and policymaking. Council of Europe. https://rm.coe.int/information-disorder-report-version-august-2018/16808c9c77

Neely, S. R., Eldredge, C., & Ersing, R. (2022). Vaccine hesitancy and exposure to misinformation: A survey analysis. Journal of General Internal Medicine, 37(1), 179–187. https://doi.org/10.1007/s11606-021-07171-z

Li, H. O., Bailey, A., Huynh, D., & Chan, J. (2020). YouTube as a source of information on COVID-19: A pandemic of misinformation? BMJ Global Health, 5(5), e002604. https://doi.org/10.1136/bmjgh-2020-002604

Graves, L. (2018). Understanding the promise and limits of automated fact-checking. Oxford University Research Archive. https://ora.ox.ac.uk/objects/uuid:f321ff43-05f0-4430-b978-f5f517b73b9b

Choudhary, A., & Arora, A. (2021). Linguistic feature based learning model for fake news detection and classification. Expert Systems with Applications, 169, Article 114171. https://doi.org/10.1016/j.eswa.2020.114171

Indu, V., and Sabu M. Thampi. "Misinformation detection in social networks using emotion analysis and user behavior analysis." Pattern Recognition Letters 182 (2024): 60-66.

Huang, Y., & Chen, P. (2020). Fake news detection using an ensemble learning model based on self-adaptive harmony search algorithms. Expert Systems with Applications, 159, Article 113584. https://doi.org/10.1016/j.eswa.2020.113584

Rubin, V. L., Conroy, N., Chen, Y., & Cornwell, S. (2016). Fake news or truth? Using satirical cues to detect potentially misleading news. Proceedings of the Second Workshop on Computational Approaches to Deception Detection, 7–17. https://doi.org/10.18653/v1/W16-0802

Ma, J., Gao, W., Mitra, P., Zhou, J., & Wong, K.-F. (2016). Detecting rumors using time-aware propagated network embeddings. Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI-16), 3042–3048. https://doi.org/10.1609/aaai.v30i1.10310

Yang, Y., Zheng, L., Zhang, J., Chen, Q., Zhao, Y., & Sun, Y. (2018). TI-CNN: Convolutional neural networks for fake news detection. arXiv. https://doi.org/10.48550/arXiv.1806.00749

Wang, Y., Ma, F., Jin, Z., Yuan, Y., Xun, G., Jha, K., Su, L., & Gao, J. (2018). EANN: Event adversarial neural networks for multi-modal fake news detection. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 849–857). Association for Computing Machinery. https://doi.org/10.1145/3219819.3219892

Hutto, C., and E. Gilbert. “VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text”. Proceedings of the International AAAI Conference on Web and Social Media, vol. 8, no. 1, May 2014, pp. 216-25, doi:10.1609/icwsm.v8i1.14550.

Bisaillon, C. (2020). Fake and real news dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset

Cui, L., & Lee, D. (2020). CoAID: COVID-19 healthcare misinformation dataset. arXiv. https://doi.org/10.48550/arXiv.2006.00885

Wang, W. Y. (2017). "Liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv. https://doi.org/10.48550/arXiv.1705.00648

Sanchez, G. R., & Middlemass, K. (2022, July 26). Misinformation is eroding the public’s confidence in democracy. Brookings Institution. https://www.brookings.edu/articles/misinformation-is-eroding-the-publics-confidence-in-democracy/

Yasir, M., & Uwishema, O. (2021). Ebola outbreak amid COVID-19 in the Republic of Guinea: Priorities for achieving control. The American Journal of Tropical Medicine and Hygiene, 105 (2), 287–289. https://doi.org/10.4269/ajtmh.21-0228

Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359 (6380), 1146–1151. https://doi.org/10.1126/science.aap9559

Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19 (1), 22–36. https://doi.org/10.1145/3137597.3137600

Skafle, I., Nordahl-Hansen, A., Steinsbekk, S., & Engebretsen, E. (2022). Misinformation about COVID-19 vaccines on social media: Rapid review. Journal of Medical Internet Research, 24 (8), Article e37367. https://doi.org/10.2196/37367

Cui, L., & Lee, D. (2020). CoAID: COVID-19 healthcare misinformation dataset. arXiv. https://doi.org/10.48550/arXiv.2006.00885

Gifu, D. (2023). An intelligent system for detecting fake news. Procedia Computer Science, 221, 1058–1065. https://doi.org/10.1016/j.procs.2023.08.088

Downloads

Posted

2025-10-12