How the Sigmoid Activation Function Causes Vanishing Gradients in Deep Neural Networks

Archit Roy

doi:10.58445/rars.3471

##article.authors##

Archit Roy Student

DOI:

https://doi.org/10.58445/rars.3471

Keywords:

Sigmoid Activation, Neural networks, Gradient flow

Abstract

Backpropagation is the fundamental algorithm that enables neural networks to learn. Neural networks rely on it to update their parameters. However, the effectiveness of this algorithm depends on maintaining a sufficiently large gradient across layers. Some activation functions significantly reduce the gradient flow. The sigmoid in particular causes this weak gradient flow. This review looks closely at how the derivative of the sigmoid function contributes to the vanishing gradient problem, analyses the mathematical form, show that the derivative flattens out to zero over a large range of input values and makes gradient propagation inefficient in deep neural networks. A small experiment is presented to show how networks using sigmoid train slower than networks using the ReLU. This review highlights why the sigmoid has largely been replaced in modern neural networks.

References

Glorot, Xavier, and Yoshua Bengio. “Understanding the Difficulty of Training Deep Feedforward Neural Networks.” Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010, https://proceedings.mlr.press/v9/glorot10a.html.

Van, Leni, and Johannes Lederer. “Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks.” arXiv, 2021, https://arxiv.org/abs/2106.02260.

Zeng, Jinshan, et al. “On ADMM in Deep Learning: Convergence and Saturation-Avoidance.” arXiv, 2019, https://arxiv.org/abs/1902.02060.

Hu, Z., J. Zhang, and Y. Ge. “Handling Vanishing Gradient Problem Using Artificial Derivative.” IEEE Access, 2021, https://doi.org/10.1109/ACCESS.2021.3055358.

Gullipalli, T., K. Murali, and S. Peri. “An Improved Taylor Hyperbolic Tangent and Sigmoid Activations for Avoiding Vanishing Gradients in Recurrent Neural Nets.” International Arab Journal of Information Technology, 2023, https://www.iajit.org/paper/5282.

Alzubaidi, S., et al. “Vanishing Gradient Problem.” Journal of Big Data, 2021, https://doi.org/10.1186/s40537-021-00444-8.

How the Sigmoid Activation Function Causes Vanishing Gradients in Deep Neural Networks

##article.authors##

DOI:

Keywords:

Abstract

References

Downloads

Posted

Categories

License