Preprint / Version 1

Misinformation model research paper

##article.authors##

  • Seokyoon Kong Leigh High school

DOI:

https://doi.org/10.58445/rars.3444

Keywords:

automated fact-checking, claim verification, evidence retrieval

Abstract

We introduce a modular pipeline for automated fact‑checking that integrates neural text understanding with retrieval‑based ranking (e.g., BM25). Claims from four public corpora, FEVER, LIAR, PolitiFact, and GossipCop, are unified into a three‑class FEVER label scheme (SUPPORTS, REFUTES, NOT ENOUGH INFO), ensuring a balanced training pool. We fine‑tune a BERT‑based verifier on claim–evidence pairs for semantic verification. Simultaneously, we train a lightweight CNN on claim‑only text for complementary classification. Their probability outputs are then fused in a stacked ensemble via a logistic regression meta‑classifier. 

On a 600‑claim validation set, individual model accuracies reach 47.6% (BERT), 45.0% (CNN), and 39.6% (BM25). The ensemble of BERT and CNN achieves an accuracy of 55.0% (macro-F1 = 0.550), representing a 7.4-point improvement over the best single model. Confusion‑matrix analysis shows REFUTES statements are detected most reliably, while SUPPORTS and NOT ENOUGH INFO remain challenging. Our findings confirm that a simple, interpretable ensemble can effectively leverage complementary strengths of neural models and retrieval methods, providing a strong foundation for scalable fact‑checking.

References

Diaz Ruiz, Carlos, and Tomas Nilsson. “Disinformation and Echo Chambers: How Disinformation Circulates in Social Media through Identity-Driven Controversies.” Journal of Public Policy & Marketing, vol. 42, no. 1, 16 May 2022, pp. 18–35. https://doi.org/10.1177/07439156221103852.

Gamage, Dilrukshi, et al. “Designing Credibility Tools to Combat Mis/Disinformation: A Human-Centered Approach.” CHI Conference on Human Factors in Computing Systems Extended Abstracts, 27 Apr. 2022. https://doi.org/10.1145/3491101.3503700.

Jones, Dominic Zaun Eu, and Eshwar Chandrasekharan. “Measuring Epistemic Trust: Towards a New Lens for Democratic Legitimacy, Misinformation, and Echo Chambers.” Proceedings of the ACM on Human-Computer Interaction, vol. 8, no. CSCW2, 7 Nov. 2024, pp. 1–33. https://doi.org/10.1145/3687001.

“Fake News Challenge.” Fakenewschallenge.org, 2016, www.fakenewschallenge.org/.

“CheckThat!” Checkthat.gitlab.io, checkthat.gitlab.io/clef2024/.

Wang, William Yang. “‘Liar, Liar Pants on Fire’: A New Benchmark Dataset for Fake News Detection.” ACLWeb, Association for Computational Linguistics, 1 July 2017, www.aclweb.org/anthology/P17-2067/.

Thorne, James, et al. “FEVER: A Large-Scale Dataset for Fact Extraction and Verification.” ACLWeb, Association for Computational Linguistics, 1 June 2018, aclanthology.org/N18-1074/.

Shu, Kai, et al. “FakeNewsNet: A Data Repository with News Content, Social Context and Spatiotemporal Information for Studying Fake News on Social Media.” arXiv, 1 Jan. 2018. https://doi.org/10.48550/arxiv.1809.01286.

Robertson, Stephen. “The Probabilistic Relevance Framework: BM25 and Beyond.” Foundations and Trends in Information Retrieval, vol. 3, no. 4, 2010, pp. 333–389. https://doi.org/10.1561/1500000019.

Karpukhin, Vladimir, et al. “Dense Passage Retrieval for Open-Domain Question Answering.” Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020. https://doi.org/10.18653/v1/2020.emnlp-main.550.

Devlin, Jacob, et al. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, vol. 1, 2019, pp. 4171–4186. aclanthology.org/N19-1423/. https://doi.org/10.18653/v1/n19-1423.

Dietterich, Thomas G. “Ensemble Methods in Machine Learning.” Multiple Classifier Systems: Lecture Notes in Computer Science, edited by Josef Kittler and Fabio Roli, vol. 1857, Springer, 2000, pp. 1–15.

Downloads

Posted

2025-11-24