ADAPTIVE RECTIFIED LINEAR UNIT FOR DEEP NEURAL NETWORK ARCHITECTURE
DOI:
https://doi.org/10.61591/jslhu.26.1075Keywords:
Image Classification, Deep Learning, Computer Vision, Activation FunctionAbstract
Activation function is a crucial factor in many deep neural network structures as it helps to boost the generalization for learning more complex mapping functions between inputs and outputs. Many activation functions can be used to build a neural network model including Rectified Linear Unit and its extended forms. By setting negative values to zeros, ReLU can introduce sparsity representation and mimic nonlinearity during training. Even though ReLU works well in creating almost Deep Neural Network structures, it still faces the problem of dying neurons especially when weights are randomly initialized. This paper introduces a novel method to activate neurons in a Deep Neural Network layer. Instead of assigning the negative values to zero, this paper automatically calculates the distribution of positive and negative values in each layer before performing the activation function. According to the distribution, the values with a higher distribution are then suppressed. Experimental results are performed with a simple Deep Neural Network structure on some public datasets such as MNIST, CIFAR10, and CIFAR100 showing that the proposed method obtains much more stable convergence results compared to the ReLU. Thus, it can be used as the default activation function when training a Deep Neural Network model.
References
Esteva, A., Chou, K., Yeung, S. et al. ”Deep learning-enabled medical computer vision”. npj Digital Medicine 4, 2021.
Yunzhu Li, Andre Esteva, Brett Kuprel, Roberto A. Novoa, Justin Ko, Sebastian Thrun, ”Skin Cancer Detection and Tracking Using Data Synthesis and Deep Learning”. AAAI Workshops 2017.
Rahnemoonfar, M., Chowdhury, T. Murphy, R., ”RescueNet: A High Resolution UAV Semantic Segmentation Dataset for Natural Disaster Damage Assessment”. Sci Data 10, 913, 2023.
Marti A. Hearst, ”Trends Controversies: Support Vector Machines”. IEEE Intell. Syst. 13(4): p. 18-28, 1998.
Kotsiantis, S.B., ”Decision trees: a recent overview”. Artif Intell Rev. vol. 39, p. 261–283, 2013.
Breiman, L., ”Random Forests”. Machine Learning vol. 45, p. 5–32, 2001.
Robert E. Schapire, ”Advances in Boosting”. UAI p. 446-452, 2002.
Robert E. Schapire, Marie Rochery, Mazin G. Rahim, Narendra K. Gupta, ”Incorporating Prior Knowledge into Boosting”. International Conference on Machine Learning p. 538-545, 2002.
Alberto Prieto, Beatriz Prieto, Eva M. Ortigosa, Eduardo Ros, Francisco J. Pelayo, Julio Ortega, Ignacio Rojas, ”Neural networks: An overview of early research, current frameworks and new challenges”. Neurocomputing 214: 242-268, 2016.
Sridhar Narayan, ”The Generalized Sigmoid Activation Function: Competetive Supervised Learning”. Inf. Sci. 99(1-2): 69-82, 1997.
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ”ImageNet Classification with Deep Convolutional Neural Networks”. NIPS, 1106-1114, 2012.
J. Deng, W. Dong, R. Socher, L. -J. Li, Kai Li and Li Fei-Fei, ”ImageNet: A large-scale hierarchical image database,” IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, pp. 248- 255, 2009.
Yann LeCun, L´eon Bottou, Yoshua Bengio, Patrick Haffner, ”Gradient-based learning applied to document recognition”. Proc. IEEE 86(11), 2278-2324, 1998.
Karen Simonyan, Andrew Zisserman, ”Very Deep Convolutional Networks for Large-Scale Image Recognition”. ICLR 2015.
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, ”Going deeper with convolutions”. CVPR, 1-9, 2015.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, "Deep Residual Learning for Image Recognition". IEEE Conference on Computer Vision and Pattern Recognition, p. 770-778, 2016.
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam, "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications". CoRR abs/1704.04861, 2017.
Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks". IEEE Conference on Computer Vision and Pattern Recognition, p. 4510-4520, 2018.
Gao Huang, Shichen Liu, Laurens van der Maaten, Kilian Q. Weinberger, "CondenseNet: An Efficient DenseNet Using Learned Group Convolutions". IEEE Conference on Computer Vision and Pattern Recognition, 2752-2761, 2018.
Le Yang, Haojun Jiang, Ruojin Cai, Yulin Wang, Shiji Song, Gao Huang, Qi Tian, "CondenseNet V2: Sparse Feature Reactivation for Deep Networks". IEEE Conference on Computer Vision and Pattern Recognition, 3569-3578, 2021.
Mingxing Tan, Quoc V. Le, "EfficientNetV2: Smaller Models and Faster Training". ICML, 10096-10106, 2021.
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le, "Learning Transferable Architectures for Scalable Image Recognition". IEEE Conference on Computer Vision and Pattern Recognition, 8697-8710, 2018.
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie, "A ConvNet for the 2020s". IEEE Conference on Computer Vision and Pattern Recognition, 11966-11976, 2022.
Ross B. Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation". IEEE Conference on Computer Vision and Pattern Recognition, 580-587, 2013.
Yanghao Li, Hanzi Mao, Ross B. Girshick, Kaiming He, "Exploring Plain Vision Transformer Backbones for Object Detection". ECCV (9), 280-296, 2022.
Juan R. Terven, Diana Margarita Córdova Esparza, Julio-Alejandro Romero-González, "A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS". Mach. Learn. Knowl. Extr. 5(4): 1680-1716, 2023.
Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar, "Masked-attention Mask Transformer for Universal Image Segmentation". IEEE Conference on Computer Vision and Pattern Recognition, 1280-1289, 2022.
Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross B. Girshick, "Mask R-CNN". IEEE Trans. Pattern Anal. Mach. Intell. 42(2): 386-97, 2020.
Leonid Datta, "A Survey on Activation Functions and their relation with Xavier and He Normal Initialization". CoRR abs/2004.06632, 2020.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification". ICCV, 1026-1034, 2015.
Xavier Glorot and Yoshua Bengio, "Understanding the difficulty of training deep feedforward neural networks". In Aistats, volume 9, pages 249–256, 2010.
Bernhard Schölkopf, John Platt; Thomas Hofmann, "Greedy Layer-Wise Training of Deep Networks," in Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference, MIT Press, pp.153-160, 2007.
Dumitru Erhan, Yoshua Bengio, Aaron C. Courville, Pierre-Antoine Manzagol, Pascal Vincent, Samy Bengio, "Why Does Unsupervised Pretraining Help Deep Learning?" J. Mach. Learn. Res. 11: 625-660, 2010.
Dumitru Erhan, Aaron C. Courville, Yoshua Bengio, Pascal Vincent, "Why Does Unsupervised Pretraining Help Deep Learning?" AISTATS, p.201-208, 2010.
David Sussillo, "Random Walks: Training Very Deep Nonlinear Feed-Forward Networks with Smart Initialization". CoRR abs/1412.6558, 2014.
Vinod Nair and Geoffrey E Hinton "Rectified linear units improve restricted boltzmann machines". In Proceedings of the 27th International Conference on Machine Learning, pp. 807–814, 2010.
Andrew L Maas, Awni Y Hannun, and Andrew Y Ng, "Rectifier nonlinearities improve neural network acoustic models". In Proc. International Conference on Machine Learning, volume 30, pp. 3, 2013.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification". In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1026–1034, 2015.
Jinghui Chen, Saket Sathe, Charu Aggarwal, and Deepak Turaga, "Outlier detection with autoencoder ensembles". In Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 90–98. SIAM, 2017.
Dan Hendrycks, Kevin Gimpel, "Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units". CoRR abs/1606.08415, 2016.
Hongyang Gao, Lei Cai, Shuiwang Ji, "Adaptive Convolutional ReLUs". AAAI 3914-3921, 2020.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, "Attention is All you Need". Advances in Neural Information Processing Systems, p. 5998-6008, 2017.