Enhancing Fine-Grained Image Classification with Attention Mechanisms and Transfer Learning: A Case Study on the Stanford Dogs Dataset
ENHANCING FINE-GRAINED IMAGE CLASSIFICATION
DOI:
https://doi.org/10.58445/rars.2355Keywords:
computer vision, deep learning, transfer learning, attention mechanisms, fine-grained classificationAbstract
This research explores enhancing fine-grained image classification using transfer learning and attention mechanisms. The study applies a ResNet50 architecture pretrained on ImageNet and augmented with Convolutional Block Attention Modules (CBAM) to the Stanford Dogs dataset. Results show significant improvements in classification accuracy, with the model achieving 86.92% accuracy on the test set. This performance gain demonstrates the effectiveness of combining transfer learning and attention mechanisms in overcoming challenges in fine-grained image classification, paving the way for more accurate systems in domains requiring detailed visual analysis.
References
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. Published in ICLR 2021. Available at: https://openreview.net/forum?id=YicbFdNTTy.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778). Available at: https://www.semanticscholar.org/paper/Deep-Residual-Learning-for-Image-Recognition-He-Zhang/2c03df8b48bf3fa39054345bafabfeff15bfd11d.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Available at: https://www.semanticscholar.org/paper/Very-Deep-Convolutional-Networks-for-Large-Scale-Simonyan-Zisserman/eb42cf88027de515750f230b23b1a057dc782108.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (pp. 5998-6008). Available at: https://www.semanticscholar.org/paper/Attention-is-All-you-Need-Vaswani-Shazeer/204e3073870fae3d05bcbc2f6a8e263d9b72e776.
Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 3-19). Available at: https://openaccess.thecvf.com/content_ECCV_2018/html/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.html.
Zheng, H., Fu, J., Mei, T., & Luo, J. (2017). Learning multi-attention convolutional neural networks for fine-grained image recognition. In Proceedings of the IEEE International Conference on Computer Vision (pp. 5209-5217). Available at: https://openaccess.thecvf.com/content_ICCV_2017/papers/Zheng_Learning_Multi-Attention_Convolutional_ICCV_2017_paper.pdf.
Downloads
Posted
Categories
License
Copyright (c) 2025 Jay Vishal Mehta

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.