How the Sigmoid Activation Function Causes Vanishing Gradients in Deep Neural Networks
DOI:
https://doi.org/10.58445/rars.3471Keywords:
Sigmoid Activation, Neural networks, Gradient flowAbstract
Backpropagation is the fundamental algorithm that enables neural networks to learn. Neural networks rely on it to update their parameters. However, the effectiveness of this algorithm depends on maintaining a sufficiently large gradient across layers. Some activation functions significantly reduce the gradient flow. The sigmoid in particular causes this weak gradient flow. This review looks closely at how the derivative of the sigmoid function contributes to the vanishing gradient problem, analyses the mathematical form, show that the derivative flattens out to zero over a large range of input values and makes gradient propagation inefficient in deep neural networks. A small experiment is presented to show how networks using sigmoid train slower than networks using the ReLU. This review highlights why the sigmoid has largely been replaced in modern neural networks.
References
Glorot, Xavier, and Yoshua Bengio. “Understanding the Difficulty of Training Deep Feedforward Neural Networks.” Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010, https://proceedings.mlr.press/v9/glorot10a.html.
Van, Leni, and Johannes Lederer. “Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks.” arXiv, 2021, https://arxiv.org/abs/2106.02260.
Zeng, Jinshan, et al. “On ADMM in Deep Learning: Convergence and Saturation-Avoidance.” arXiv, 2019, https://arxiv.org/abs/1902.02060.
Hu, Z., J. Zhang, and Y. Ge. “Handling Vanishing Gradient Problem Using Artificial Derivative.” IEEE Access, 2021, https://doi.org/10.1109/ACCESS.2021.3055358.
Gullipalli, T., K. Murali, and S. Peri. “An Improved Taylor Hyperbolic Tangent and Sigmoid Activations for Avoiding Vanishing Gradients in Recurrent Neural Nets.” International Arab Journal of Information Technology, 2023, https://www.iajit.org/paper/5282.
Alzubaidi, S., et al. “Vanishing Gradient Problem.” Journal of Big Data, 2021, https://doi.org/10.1186/s40537-021-00444-8.
Downloads
Posted
Categories
License
Copyright (c) 2025 Archit Roy

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license