Experimental exploration on loss surface of deep neural network |
| |
Authors: | Qunyong Yuan Nanfeng Xiao |
| |
Affiliation: | School of Computer Science and Engineering, South China University of Technology, Guangzhou, China |
| |
Abstract: | The loss function of the deep neural network is high dimensional, nonconvex and complex. So far, the geometric properties of the loss surface of the neural network have not been well understood. Different from most theoretical studies on the loss surface, this article makes the experimental exploration on the loss surface of the deep neural network, including trajectories of various adaptive optimization algorithms, the Hessian matrix of the loss function of the deep neural network, the curvature of the loss surface along the trajectories of the various adaptive optimization algorithms. It is found that the gradient direction of the adaptive optimization algorithms is almost perpendicular to the direction of the maximum curvature of the loss surface, while the gradient directions of the stochastic gradient descent (SGD) algorithm do not have such a rule. The Hessian matrix of the loss surface along the trajectory of the optimization algorithm is degraded, which is inconsistent with the hypothetical that nonsingular of the Hessian matrix in many theoretical studies of deep learning. Besides, this article proposes a new ensemble learning method of the neural network based on the scaling invariance of the ReLu neural network and mode connectivity. |
| |
Keywords: | loss surface of deep neural network Hessian matrix deep neural network the trajectories of the various adaptive optimizations curvature of the loss surface ensemble learning |
|
|