Phisher - A Multimodal Approach for Phishing  Detection: Phisher is a novel multimodal approach called Phisher. Our multimodal models utilize the BERT Multimodal Large Language Model (MLLM) for combined lexical analysis, ResNet50 for image processing, and semantic  characteristics for URL extraction, thereby enhancing phishing classification.

Shubham Bhadra

doi:10.58445/rars.3223

##article.authors##

Shubham Bhadra Mission San Jose High

DOI:

https://doi.org/10.58445/rars.3223

Keywords:

cybersecurity, phishing detection,, multimodal learning, BERT, ResNet50, TR-OP dataset

Abstract

Phishing is an illegal method used to trick people into revealing confidential information, such as login details, credit card numbers, and Social Security numbers. The majority of these phishing activities are carried out by duplicating the appearance of authentic websites or emails and exploiting people’s trust, rather than technical vulnerabilities. Numerous awareness campaigns and technical countermeasures are designed to alert individuals to the dangers of phishing. Still, it remains one of the most effective methods of cyber assault due to its malleability and continually evolving complexity. Many single-modal models are effective to a certain degree, but cannot identify advanced phishing techniques that incorporate dynamic web content, obfuscated scripts, and sophisticated visual mimicry. We introduced a novel multimodal approach called Phisher. Our multimodal models utilize the BERT Multimodal Large Language Model (MLLM) for combined lexical analysis, ResNet50 for image processing, and semantic characteristics for URL extraction, thereby enhancing phishing classification. By combining these signals, we can achieve better accuracy, precision, and F1 score, which facilitates more effective detection of phishing sites. To test our multimodal model, we utilized the TR-OP real-life dataset, which contains 10,000 labeled phishing and legitimate websites, including HTML content, URLs, and website snapshots. The results show a significant improvement in accuracy and precision compared to other models.
Aside from the technical benefits, this research also demonstrates how Multimodal learning can create more resilient defenses against evolving cybercrimes and phishing and offer practical applications for enterprises and security providers to build a safer digital ecosystem.

References

Ali Aljofey, Qingshan Jiang, Abdur Rasool, Hui Chen, Wenyin Liu, Qiang Qu, and Yang Wang. An effective detection approach for phishing websites using url and html features.

Scientific Reports, 12(1):8842, 2022.

M Vijayalakshmi, S Mercy Shalinie, Ming Hour Yang, and Raja Meenakshi U. Web phishing detection techniques: a survey on the state-of-the-art, taxonomy and future

directions. Iet Networks, 9(5):235–246, 2020.

Samuel Marchal, Kalle Saari, Nidhi Singh, and N Asokan. Know your phish: Novel techniques for detecting phishing sites and their targets. In 2016 IEEE 36th international

conference on distributed computing systems (ICDCS), pages 323–333. IEEE, 2016.

Yuexin Li, Chengyu Huang, Shumin Deng, Mei Lin Lock, Tri Cao, Nay Oo, Hoon Wei Lim, and Bryan Hooi. {KnowPhish}: Large language models meet multimodal knowledge graphs for enhancing {Reference-Based} phishing detection. In 33rd USENIX Security

Symposium (USENIX Security 24), pages 793–810, 2024.

Routhu Srinivasa Rao and Alwyn Roshan Pais. Jail-phish: An improved search engine based phishing detection system. Computers & Security, 83:246–267, 2019.

Jehyun Lee, Peiyuan Lim, Bryan Hooi, and Dinil Mon Divakaran. Multimodal large language models for phishing webpage detection and identification. In 2024 APWG Symposium on Electronic Crime Research (eCrime), pages 1–13. IEEE, 2024.

Wenhao Li, Selvakumar Manickam, Yung-wey Chong, and Shankar Karuppayah. Phishdebate: An llm-based multi-agent framework for phishing website detection. arXiv preprint arXiv:2506.15656, 2025.

Yun Lin, Ruofan Liu, Dinil Mon Divakaran, Jun Yang Ng, Qing Zhou Chan, Yiwen Lu, Yuxuan Si, Fan Zhang, and Jin Song Dong. Phishpedia: A hybrid deep learning based approach to visually identify phishing webpages. In 30th USENIX Security Symposium(USENIX Security 21), pages 3793–3810, 2021.

Huilin Wang and Bryan Hooi. Automated phishing detection using urls and webpages. arXiv preprint arXiv:2408.01667, 2024.

Takashi Koide, Naoki Fukushi, Hiroki Nakano, and Daiki Chiba. Detecting phishing sites using chatgpt. arXiv preprint arXiv:2306.05816, 2023.

Tri Cao, Chengyu Huang, Yuexin Li, Wang Huilin, Amy He, Nay Oo, and Bryan Hooi. Phishagent: A robust multimodal agent for phishing webpage detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27869–27877, 2025.

Tandin Wangchuk and Tad Gonsalves. Multimodal phishing detection on social networking sites: A systematic review. IEEE Access, 2025.

Sahar Abdelnabi, Katharina Krombholz, and Mario Fritz. Visualphishnet: Zero-day phishing website detection by visual similarity. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pages 1681–1698, 2020.

Shayan Abad, Hassan Gholamy, and Mohammad Aslani. Classification of malicious urls using machine learning. Sensors, 23(18):7760, 2023.

Jinmeng Rao, Song Gao, Gengchen Mai, and Krzysztof Janowicz. Building privacy preserving and secure geospatial artificial intelligence foundation models (vision paper). In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems, pages 1–4, 2023.

Phisher - A Multimodal Approach for Phishing Detection

Phisher is a novel multimodal approach called Phisher. Our multimodal models utilize the BERT Multimodal Large Language Model (MLLM) for combined lexical analysis, ResNet50 for image processing, and semantic characteristics for URL extraction, thereby enhancing phishing classification.

##article.authors##

DOI:

Keywords:

Abstract

References

Downloads

Posted

Categories

License