Preprint / Version 1

Enhancing Fine-Grained Image Classification with Attention Mechanisms and Transfer Learning: A Case Study on the Stanford Dogs Dataset

ENHANCING FINE-GRAINED IMAGE CLASSIFICATION

##article.authors##

  • Jay Vishal Mehta Polygence

DOI:

https://doi.org/10.58445/rars.2355

Keywords:

computer vision, deep learning, transfer learning, attention mechanisms, fine-grained classification

Abstract

This research explores enhancing fine-grained image classification using transfer learning and attention mechanisms. The study applies a ResNet50 architecture pretrained on ImageNet and augmented with Convolutional Block Attention Modules (CBAM) to the Stanford Dogs dataset. Results show significant improvements in classification accuracy, with the model achieving 86.92% accuracy on the test set. This performance gain demonstrates the effectiveness of combining transfer learning and attention mechanisms in overcoming challenges in fine-grained image classification, paving the way for more accurate systems in domains requiring detailed visual analysis.

References

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. Published in ICLR 2021. Available at: https://openreview.net/forum?id=YicbFdNTTy.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778). Available at: https://www.semanticscholar.org/paper/Deep-Residual-Learning-for-Image-Recognition-He-Zhang/2c03df8b48bf3fa39054345bafabfeff15bfd11d.

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Available at: https://www.semanticscholar.org/paper/Very-Deep-Convolutional-Networks-for-Large-Scale-Simonyan-Zisserman/eb42cf88027de515750f230b23b1a057dc782108.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (pp. 5998-6008). Available at: https://www.semanticscholar.org/paper/Attention-is-All-you-Need-Vaswani-Shazeer/204e3073870fae3d05bcbc2f6a8e263d9b72e776.

Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 3-19). Available at: https://openaccess.thecvf.com/content_ECCV_2018/html/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.html.

Zheng, H., Fu, J., Mei, T., & Luo, J. (2017). Learning multi-attention convolutional neural networks for fine-grained image recognition. In Proceedings of the IEEE International Conference on Computer Vision (pp. 5209-5217). Available at: https://openaccess.thecvf.com/content_ICCV_2017/papers/Zheng_Learning_Multi-Attention_Convolutional_ICCV_2017_paper.pdf.

Downloads

Posted

2025-03-23