Misinformation model research paper
DOI:
https://doi.org/10.58445/rars.3444Keywords:
automated fact-checking, claim verification, evidence retrievalAbstract
We introduce a modular pipeline for automated fact‑checking that integrates neural text understanding with retrieval‑based ranking (e.g., BM25). Claims from four public corpora, FEVER, LIAR, PolitiFact, and GossipCop, are unified into a three‑class FEVER label scheme (SUPPORTS, REFUTES, NOT ENOUGH INFO), ensuring a balanced training pool. We fine‑tune a BERT‑based verifier on claim–evidence pairs for semantic verification. Simultaneously, we train a lightweight CNN on claim‑only text for complementary classification. Their probability outputs are then fused in a stacked ensemble via a logistic regression meta‑classifier.
On a 600‑claim validation set, individual model accuracies reach 47.6% (BERT), 45.0% (CNN), and 39.6% (BM25). The ensemble of BERT and CNN achieves an accuracy of 55.0% (macro-F1 = 0.550), representing a 7.4-point improvement over the best single model. Confusion‑matrix analysis shows REFUTES statements are detected most reliably, while SUPPORTS and NOT ENOUGH INFO remain challenging. Our findings confirm that a simple, interpretable ensemble can effectively leverage complementary strengths of neural models and retrieval methods, providing a strong foundation for scalable fact‑checking.
References
Diaz Ruiz, Carlos, and Tomas Nilsson. “Disinformation and Echo Chambers: How Disinformation Circulates in Social Media through Identity-Driven Controversies.” Journal of Public Policy & Marketing, vol. 42, no. 1, 16 May 2022, pp. 18–35. https://doi.org/10.1177/07439156221103852.
Gamage, Dilrukshi, et al. “Designing Credibility Tools to Combat Mis/Disinformation: A Human-Centered Approach.” CHI Conference on Human Factors in Computing Systems Extended Abstracts, 27 Apr. 2022. https://doi.org/10.1145/3491101.3503700.
Jones, Dominic Zaun Eu, and Eshwar Chandrasekharan. “Measuring Epistemic Trust: Towards a New Lens for Democratic Legitimacy, Misinformation, and Echo Chambers.” Proceedings of the ACM on Human-Computer Interaction, vol. 8, no. CSCW2, 7 Nov. 2024, pp. 1–33. https://doi.org/10.1145/3687001.
“Fake News Challenge.” Fakenewschallenge.org, 2016, www.fakenewschallenge.org/.
“CheckThat!” Checkthat.gitlab.io, checkthat.gitlab.io/clef2024/.
Wang, William Yang. “‘Liar, Liar Pants on Fire’: A New Benchmark Dataset for Fake News Detection.” ACLWeb, Association for Computational Linguistics, 1 July 2017, www.aclweb.org/anthology/P17-2067/.
Thorne, James, et al. “FEVER: A Large-Scale Dataset for Fact Extraction and Verification.” ACLWeb, Association for Computational Linguistics, 1 June 2018, aclanthology.org/N18-1074/.
Shu, Kai, et al. “FakeNewsNet: A Data Repository with News Content, Social Context and Spatiotemporal Information for Studying Fake News on Social Media.” arXiv, 1 Jan. 2018. https://doi.org/10.48550/arxiv.1809.01286.
Robertson, Stephen. “The Probabilistic Relevance Framework: BM25 and Beyond.” Foundations and Trends in Information Retrieval, vol. 3, no. 4, 2010, pp. 333–389. https://doi.org/10.1561/1500000019.
Karpukhin, Vladimir, et al. “Dense Passage Retrieval for Open-Domain Question Answering.” Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020. https://doi.org/10.18653/v1/2020.emnlp-main.550.
Devlin, Jacob, et al. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, vol. 1, 2019, pp. 4171–4186. aclanthology.org/N19-1423/. https://doi.org/10.18653/v1/n19-1423.
Dietterich, Thomas G. “Ensemble Methods in Machine Learning.” Multiple Classifier Systems: Lecture Notes in Computer Science, edited by Josef Kittler and Fabio Roli, vol. 1857, Springer, 2000, pp. 1–15.
Downloads
Posted
Categories
License
Copyright (c) 2025 Seokyoon Kong

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.