A Minimal Approach to Fake News Detection
DOI:
https://doi.org/10.58445/rars.1454Keywords:
fake news, xgboost, machine learning, journalismAbstract
The need for efficient categorization of fake and real media increases as the ubiquity of generative AI and motivated bad actors make producing fake news ever easier. Researchers have estimated that in 2021, $2.6 billion dollars of ad revenue can be attributed to misinformation publishing sites (Skibinski, 2021), providing ample motivation for the aforementioned bad actors to fabricate stories. This paper seeks to create an effective machine learning solution that gives readers the ability to classify articles they want to read as fake or real, enabling the consumption of solely accurate news. As users tend to prefer simple solutions, we provide a parsimonious model consisting of only 5 features, yet still able to achieve 71% testing accuracy. Among the most effective predictors of a real article is that of “perceived effort” - predicated by an article’s length, number of authors, and readability.
References
Allcott, H., & Gentzkow, M. (2017). Social Media and Fake News in the 2016 Election. Journal of Economic Perspectives, 31(2), 211–236. https://doi.org/10.1257/jep.31.2.211
Azzimonti, M., & Fernandes, M. (2023). Social media networks, fake news, and polarization. European Journal of Political Economy, 76, 102256. https://doi.org/10.1016/j.ejpoleco.2022.102256
Banic, V. & Smith, A. (2016). Fake News: How a Partying Macedonian Teen Earns Thousands Publishing Lies. NBC News. Retrieved from https://www.nbcnews.com/news/world/fake-news-how-partying-macedonian-teen-earns-thousands-publishing-lies-n692451
Burgess, J. (2022). The ‘digital town square’? What does it mean when billionaires own the online spaces where we gather? The Conversation. https://theconversation.com/the-digital-town-square-what-does-it-mean-when-billionaires-own-the-online-spaces-where-we-gather-18204
Butcher, S. (2024). 2024 may be the year online disinformation finally gets the better of us. Politico. Retrieved from https://www.politico.eu/article/eu-elections-online-disinformation-politics/
Chall, J. S., & Dale, E. (1995). Readability revisited. Brookline Books.
Conradi, P. (2023). Was Slovakia election the first swung by deepfakes? The Times. Retrieved from https://www.thetimes.com/world/russia-ukraine-war/article/was-slovakia-election-the-first-swung-by-deepfakes-7t8dbfl9b
Dale E; Chall J (1948). "A Formula for Predicting Readability". Educational Research Bulletin. 27: 11–20+28.
David, A. (2024, June 18). Misinformation might sway elections — but not in the way that you think. Nature. https://www.nature.com/articles/d41586-024-01696-z
Dawber, A. & Tomlinson H. (2023). Deepfakes of Donald Trump ‘arrest’ spread on social media. The Times. Retrieved from https://www.thetimes.com/business-money/technology/article/donald-trump-deepfakes-ai-twitter-g50n7vnbm
DeVoe, K. M. (2009). Bursts of Information: Microblogging. The Reference Librarian, 50(2), 212–214. https://doi.org/10.1080/02763870902762086
Editors at Sky News. (2023, October 9). Deepfake audio of Sir Keir Starmer released on first day of Labour conference. Sky News. https://news.sky.com/story/labour-faces-political-attack-after-deepfake-audio-is-posted-of-sir-keir-starmer-12980181
Gao, Y., Liu, F., & Gao, L. (2023). Echo chamber effects on short video platforms. Scientific Reports, 13(1), 6282. https://doi.org/10.1038/s41598-023-33370-1
Gottfried, J. & Shearer, E. (2017). News Use Across Social Media Platforms 2017. Pew Research Center. Retrieved from https://www.pewresearch.org/journalism/2017/09/07/news-use-across-social-media-platforms-2017/
Hermida, A. (2010). TWITTERING THE NEWS: The emergence of ambient journalism. Journalism Practice, 4(3), 297–308. https://doi.org/10.1080/17512781003640703
Hooi, B., Shah, N., Beutel, A., Gunnemann, S., Akoglu, L., Kumar, M., Makhija, D., & Faloutsos, C. (2015). BIRDNEST: Bayesian Inference for Ratings-Fraud Detection. https://doi.org/10.48550/arXiv.1511.06030
Jindal, N., & Liu, B. (2008). Opinion spam and analysis. In Proceedings of the 2008 International Conference on Web Search and Data Mining, 219–230. https://doi.org/10.1145/1341531.1341560
Kaplan, A. & Haenlein, M. (2011). The early bird catches the news: Nine things you should know about micro-blogging. Business Horizons, 54, 105-113. https://doi.org/10.1016/j.bushor.2010.09.004
Kumar, S., West, R., & Leskovec, J. (2016). Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes. In Proceedings of the 25th International Conference on World Wide Web, 591–602. https://doi.org/10.1145/2872427.2883085
Kumar, S., & Shah, N. (2018). False Information on Web and Social Media: A Survey.
Kumar, S., Hooi, B., Makhija, D., Kumar, M., Faloutsos, C., & Subrahmanian, V. (2018). REV2: Fraudulent User Prediction in Rating Platforms. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (pp. 333–341). Association for Computing Machinery. https://dl.acm.org/doi/10.1145/3159652.315972
Levendusky, M. (2013). Partisan Media Exposure and Attitudes Toward the Opposition. Political Communication, 30(4), 565–581. https://doi.org/10.1080/10584609.2012.737435
Matthew Dahl, Varun Magesh, Mirac Suzgun, & Daniel E. Ho. (2024). Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models. https://doi.org/10.48550/arXiv.2401.01301
Pérez-Rosas V., Kleinberg B., Lefevre A., & Mihalcea R. (2017). Automatic Detection of Fake News. https://doi.org/10.48550/arXiv.1708.07104
Sandulescu, V., & Ester, M. (2015). Detecting Singleton Review Spammers Using Semantic Similarity. In Proceedings of the 24th International Conference on World Wide Web. ACM. https://doi.org/10.48550/arXiv.1609.02727
Shah, N., Beutel, A., Hooi, B., Akoglu, L., Gunnemann, S., Makhija, D., Kumar, M., & Faloutsos, C. (2015). EdgeCentric: Anomaly Detection in Edge-Attributed Networks. https://doi.org/10.48550/arXiv.1510.05544
Shu, K., Mahudeswaran, D., Wang, S., Lee, D., & Liu, H. (2018). FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media. arXiv Preprint arXiv:1809. 01286. https://doi.org/10.48550/arXiv.1809.01286
Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake News Detection on Social Media: A Data Mining Perspective. ACM SIGKDD Explorations Newsletter, 19(1), 22–36. https://doi.org/10.48550/arXiv.1708.01967
Shu, K., Wang, S., & Liu, H. (2017). Exploiting Tri-Relationship for Fake News Detection. arXiv Preprint arXiv:1712. 07709. https://doi.org/10.48550/arXiv.1712.07709
Skibinski, M. (2021). Special Report: Top brands are sending $2.6 billion to misinformation websites each year. NewsGuard. Retrieved from https://www.newsguardtech.com/special-reports/brands-send-billions-to-misinformation-websites-newsguard-comscore-report/
Subrahmanian, V., Azaria, A., Durst, S., Kagan, V., Galstyan, A., Lerman, K., Zhu, L., Ferrara, E., Flammini, A., & Menczer, F. (2016). The DARPA Twitter Bot Challenge. Computer, 49(6), 38–46. https://doi.org/10.48550/arXiv.1601.05140
Vasist, P. N., Chatterjee, D., & Krishnan, S. (2023). The Polarizing Impact of Political Disinformation and Hate Speech: A Cross-country Configural Narrative. Information Systems Frontiers. Advance online publication. https://doi.org/10.1007/s10796-023-10390-w
Downloads
Posted
Categories
License
Copyright (c) 2024 Daniel Markusson
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.