共查询到20条相似文献,搜索用时 0 毫秒
1.
Stochastic gradient descent (SGD)-based optimizers play a key role in most deep learning models,yet the learning dynamics of the complex model remain obscure.SG... 相似文献
2.
Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical modelling and machine learning. Our theoretical analysis shows that DPSGD leads to speed-up with respect to the number of cores and number of workers while guaranteeing an asymptotic convergence rate of \(O(1/\sqrt{T})\) given that the number of cores is bounded by \(T^{1/4}\) and the number of workers is bounded by \(T^{1/2}\) where T is the number of iterations. The potential gains that can be achieved by DPSGD are demonstrated empirically on a stochastic variational inference problem (Latent Dirichlet Allocation) and on a deep reinforcement learning (DRL) problem (advantage actor critic - A2C) resulting in two algorithms: DPSVI and HSA2C. Empirical results validate our theoretical findings. Comparative studies are conducted to show the performance of the proposed DPSGD against the state-of-the-art DRL algorithms. 相似文献
4.
We present two classes of convergent algorithms for learning continuous functions and regressions that are approximated by feedforward networks. The first class of algorithms, applicable to networks with unknown weights located only in the output layer, is obtained by utilizing the potential function methods of Aizerman et al. (1970). The second class, applicable to general feedforward networks, is obtained by utilizing the classical Robbins-Monro style stochastic approximation methods (1951). Conditions relating the sample sizes to the error bounds are derived for both classes of algorithms using martingale-type inequalities. For concreteness, the discussion is presented in terms of neural networks, but the results are applicable to general feedforward networks, in particular to wavelet networks. The algorithms can be directly adapted to concept learning problems. 相似文献
5.
Inversion answers the question of which input patterns to a trained multilayer neural network approximate a given output target. This method is a tool for visualization of the information processing capability of a network stored in its weights. This knowledge about the network enables one to make informed decisions on the improvement of the training task and the choice of training sets. An inversion algorithm for multilayer perceptrons is derived from the backpropagation scheme. We apply inversion to networks for digit recognition. We observe that the multilayer perceptrons perform well with respect to generalization, i.e. correct classification of untrained digits. They are however bad on rejection of counterexamples, i.e. random patterns. Inversion gives an explanation for this drawback. We suggest an improved training scheme, and we show that a tradeoff exists between generalization and rejection of counterexamples. 相似文献
6.
Conjugate gradient methods have many advantages in real numerical experiments, such as fast convergence and low memory requirements. This paper considers a class of conjugate gradient learning methods for backpropagation neural networks with three layers. We propose a new learning algorithm for almost cyclic learning of neural networks based on PRP conjugate gradient method. We then establish the deterministic convergence properties for three different learning modes, i.e., batch mode, cyclic and almost cyclic learning. The two deterministic convergence properties are weak and strong convergence that indicate that the gradient of the error function goes to zero and the weight sequence goes to a fixed point, respectively. It is shown that the deterministic convergence results are based on different learning modes and dependent on different selection strategies of learning rate. Illustrative numerical examples are given to support the theoretical analysis. 相似文献
7.
A gradient method with momentum for two-layer feedforward neural networks is considered. The learning rate is set to be a constant and the momentum factor an adaptive variable. Both the weak and strong convergence results are proved, as well as the convergence rates for the error function and for the weight. Compared to the existing convergence results, our results are more general since we do not require the error function to be quadratic. 相似文献
8.
The problem of inverting trained feedforward neural networks is to find the inputs which yield a given output. In general, this problem is an ill-posed problem. We present a method for dealing with the inverse problem by using mathematical programming techniques. The principal idea behind the method is to formulate the inverse problem as a nonlinear programming problem, a separable programming (SP) problem, or a linear programming problem according to the architectures of networks to be inverted or the types of network inversions to be computed. An important advantage of the method over the existing iterative inversion algorithm is that various designated network inversions of multilayer perceptrons and radial basis function neural networks can be obtained by solving the corresponding SP problems, which can be solved by a modified simplex method. We present several examples to demonstrate the proposed method and applications of network inversions to examine and improve the generalization performance of trained networks. The results show the effectiveness of the proposed method. 相似文献
9.
Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. We show why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a trade-off between efficient learning by gradient descent and latching on information for long periods. Based on an understanding of this problem, alternatives to standard gradient descent are considered. 相似文献
10.
Non-convex models,like deep neural networks,have been widely used in machine learning applications.Training non-convex models is a difficult task owing to the s... 相似文献
11.
Optimizing the training speed of support vector machines (SVMs) is one of the most important topics in the SVM research. In this paper, we propose an algorithm in which the size of working set is reduced to one in order to obtain a faster training speed. Instead of the complex heuristic criteria, the random order for selecting the elements into the working set is adopted. The proposed algorithm shows a better performance in linear SVM training, especially in the large-scale scenario. 相似文献
12.
Data Mining and Knowledge Discovery - Matrix tri-factorization subject to binary constraints is a versatile and powerful framework for the simultaneous clustering of observations and features, also... 相似文献
13.
Recently, some researchers have focused on the applications of neural networks for the system identification problems. In this letter we describe how to use the gradient descent (GD) technique with single layer neural networks to identify the parameters of a linear dynamical system whose states and derivatives of state are given. It is shown that the use of the GD technique for the purpose of system identification of a linear time invariant dynamical system is simpler and less expensive in implementation because it involves less hardware than the technique using the Hopfield network as discussed by Chu. The circuit is considered to be faster and is recommended for online computation because of the parallel nature of its architecture and the possibility of the use of analog circuit components. A mathematical formulation of the technique is presented and the simulation results of the network are included. 相似文献
14.
The problem of the necessary complexity of neural networks is of interest in applications. In this paper, learning capability and storage capacity of feedforward neural networks are considered. We markedly improve the recent results by introducing neural-network modularity logically. This paper rigorously proves in a constructive method that two-hidden-layer feedforward networks (TLFNs) with 2/spl radic/(m+2)N (/spl Lt/N) hidden neurons can learn any N distinct samples (x/sub i/, t/sub i/) with any arbitrarily small error, where m is the required number of output neurons. It implies that the required number of hidden neurons needed in feedforward networks can be decreased significantly, comparing with previous results. Conversely, a TLFN with Q hidden neurons can store at least Q/sup 2//4(m+2) any distinct data (x/sub i/, t/sub i/) with any desired precision. 相似文献
15.
This paper presents an approach to learning polynomial feedforward neural networks (PFNNs). The approach suggests, first, finding the polynomial network structure by means of a population-based search technique relying on the genetic programming paradigm, and second, further adjustment of the best discovered network weights by an especially derived backpropagation algorithm for higher order networks with polynomial activation functions. These two stages of the PFNN learning process enable us to identify networks with good training as well as generalization performance. Empirical results show that this approach finds PFNN which outperform considerably some previous constructive polynomial network algorithms on processing benchmark time series. 相似文献
16.
Boltzmann-based models with asymmetric connections are investigated. Although they are initially unstable, these networks spontaneously self-stabilize as a result of learning. Moreover, pairs of weights symmetrize during learning; however, the symmetry is not enough to account for the observed stability. To characterize the system it is useful to consider how its entropy is affected by learning and the entropy of the information stream. The stability of an asymmetric network is confirmed with an electronic model. 相似文献
17.
In this paper, optimal control for stochastic linear singular system with quadratic performance is obtained using neural networks. The goal is to provide optimal control with reduced calculus effort by comparing the solutions of the matrix Riccati differential equation (MRDE) obtained from well known traditional Runge–Kutta (RK) method and nontraditional neural network method. To obtain the optimal control, the solution of MRDE is computed by feed forward neural network (FFNN). Accuracy of the solution of the neural network approach to the problem is qualitatively better. The advantage of the proposed approach is that, once the network is trained, it allows instantaneous evaluation of solution at any desired number of points spending negligible computing time and memory. The computation time of the proposed method is shorter than the traditional RK method. An illustrative numerical example is presented for the proposed method. 相似文献
18.
Networks of linear units are the simplest kind of networks, where the basic questions related to learning, generalization, and self-organization can sometimes be answered analytically. We survey most of the known results on linear networks, including: 1) backpropagation learning and the structure of the error function landscape, 2) the temporal evolution of generalization, and 3) unsupervised learning algorithms and their properties. The connections to classical statistical ideas, such as principal component analysis (PCA), are emphasized as well as several simple but challenging open questions. A few new results are also spread across the paper, including an analysis of the effect of noise on backpropagation networks and a unified view of all unsupervised algorithms. 相似文献
19.
Jankowski et al. proposed (1996) a complex-valued neural network (CVNN) which is capable of storing and recalling gray-scale images. The convergence property of the CVNN has also been proven by means of the energy function approach. However, the memory capacity of the CVNN is very low because they use a generalized Hebb rule to construct the connection matrix. In this letter, a modified gradient descent learning rule (MGDR) is proposed to enhance the capacity of the CVNN. The proposed technique is derived by applying gradient search over a complex error surface. Simulation shows that the capacity of CVNN with MGDR is greatly improved. 相似文献
20.
The problem of training feedforward neural networks is considered. To solve it, new algorithms are proposed. They are based on the asymptotic analysis of the extended Kalman filter (EKF) and on a separable network structure. Linear weights are interpreted as diffusion random variables with zero expectation and a covariance matrix proportional to an arbitrarily large parameter λ. Asymptotic expressions for the EKF are derived as λ→∞. They are called diffusion learning algorithms (DLAs). It is shown that they are robust with respect to the accumulation of rounding errors in contrast to their prototype EKF with a large but finite λ and that, under certain simplifying assumptions, an extreme learning machine (ELM) algorithm can be obtained from a DLA. A numerical example shows that the accuracy of a DLA may be higher than that of an ELM algorithm. 相似文献
|