首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider the use of a curvature-adaptive step size in gradient-based iterative methods, including quasi-Newton methods, for minimizing self-concordant functions, extending an approach first proposed for Newton's method by Nesterov. This step size has a simple expression that can be computed analytically; hence, line searches are not needed. We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant functions. We present numerical experiments comparing gradient descent and BFGS methods using the curvature-adaptive step size to traditional methods on deterministic logistic regression problems, and to versions of stochastic gradient descent on stochastic optimization problems.  相似文献   

2.
随机梯度下降算法研究进展   总被引:6,自引:1,他引:5  
在机器学习领域中, 梯度下降算法是求解最优化问题最重要、最基础的方法. 随着数据规模的不断扩大, 传统的梯度下降算法已不能有效地解决大规模机器学习问题. 随机梯度下降算法在迭代过程中随机选择一个或几个样本的梯度来替代总体梯度, 以达到降低计算复杂度的目的. 近年来, 随机梯度下降算法已成为机器学习特别是深度学习研究的焦点. 随着对搜索方向和步长的不断探索, 涌现出随机梯度下降算法的众多改进版本, 本文对这些算法的主要研究进展进行了综述. 将随机梯度下降算法的改进策略大致分为动量、方差缩减、增量梯度和自适应学习率等四种. 其中, 前三种主要是校正梯度或搜索方向, 第四种对参数变量的不同分量自适应地设计步长. 着重介绍了各种策略下随机梯度下降算法的核心思想、原理, 探讨了不同算法之间的区别与联系. 将主要的随机梯度下降算法应用到逻辑回归和深度卷积神经网络等机器学习任务中, 并定量地比较了这些算法的实际性能. 文末总结了本文的主要研究工作, 并展望了随机梯度下降算法的未来发展方向.  相似文献   

3.
在自适应波束形成技术中,共轭梯度法是求解最优化问题的一种常用方法,最速下降法在不需要矩阵求逆的情况下,通过递推方式寻求加权矢量的最佳值。文中将最速下降法与共轭梯度法有机结合,构造出一种混合的优化算法。该方法在每次更新迭代过程中,采用负梯度下降搜索方向,最优自适应步长,既提高了共轭梯度算法的收敛速度,又解决了最速下降法在随相关矩阵特征值分散程度增加而下降缓慢的问题,具有收敛速度快,运算量低的特点。计算机仿真给出了五阵元均匀线阵的数字波束形成系统实例,分别从波束形成、误差收敛及最佳权值等方面与传统LMS 算法进行了比较分析,结果表明了该方法的可行性与有效性。  相似文献   

4.
The Barzilai–Borwein (BB) gradient method is favourable over the classical steepest descent method both in theory and in real computations. This method takes a ‘fixed’ step size rather than following a set of line search rules to ensure convergence. Along this line, we present a new approach for the two-point approximation to the quasi-Newton equation within the BB framework on the basis of a well-known least change result for the Davidon–Fletcher–Powell update and propose a new gradient method that belongs to the same class of BB gradient method in which the line search procedure is replaced by a fixed step size. Some preliminary numerical results suggest that improvements have been achieved.  相似文献   

5.
In this paper a general class of fast learning algorithms for feedforward neural networks is introduced and described. The approach exploits the separability of each layer into linear and nonlinear blocks and consists of two steps. The first step is the descent of the error functional in the space of the outputs of the linear blocks (descent in the neuron space), which can be performed using any preferred optimization strategy. In the second step, each linear block is optimized separately by using a least squares (LS) criterion. To demonstrate the effectiveness of the new approach, a detailed treatment of a gradient descent in the neuron space is conducted. The main properties of this approach are the higher speed of convergence with respect to methods that employ an ordinary gradient descent in the weight space backpropagation (BP), better numerical conditioning, and lower computational cost compared to techniques based on the Hessian matrix. The numerical stability is assured by the use of robust LS linear system solvers, operating directly on the input data of each layer. Experimental results obtained in three problems are described, which confirm the effectiveness of the new method.  相似文献   

6.
For gradient descent learning to yield connectivity consistent with real biological networks, the simulated neurons would have to include more realistic intrinsic properties such as frequency adaptation. However, gradient descent learning cannot be used straightforwardly with adapting rate-model neurons because the derivative of the activation function depends on the activation history. The objectives of this study were to (1) develop a simple computational approach to reproduce mathematical gradient descent and (2) use this computational approach to provide supervised learning in a network formed of rate-model neurons that exhibit frequency adaptation.The results of mathematical gradient descent were used as a reference in evaluating the performance of the computational approach. For this comparison, standard (nonadapting) rate-model neurons were used for both approaches. The only difference was the gradient calculation: the mathematical approach used the derivative at a point in weight space, while the computational approach used the slope for a step change in weight space. Theoretically, the results of the computational approach should match those of the mathematical approach, as the step size is reduced but floating-point accuracy formed a lower limit to usable step sizes. A systematic search for an optimal step size yielded a computational approach that faithfully reproduced the results of mathematical gradient descent.The computational approach was then used for supervised learning of both connection weights and intrinsic properties of rate-model neurons to convert a tonic input into a phasic-tonic output pattern. Learning produced biologically realistic connectivity that essentially used a monosynaptic connection from the tonic input neuron to an output neuron with strong frequency adaptation as compared to a complex network when using nonadapting neurons. Thus, more biologically realistic connectivity was achieved by implementing rate-model neurons with more realistic intrinsic properties. Our computational approach could be applied to learning of other neuron properties.  相似文献   

7.
An iterative gradient-like technique — referred to as the double operator gradient method — is described. The method is shown to be generally much more efficient than either the fixed step or steepest descent type gradient methods. In addition, the method is applied to a well known singular optimal control problem which has recently been efficiently solved using a control-averaging technique. The double operator gradient method is shown to give a good suboptimal solution in relatively small computational time, being considerably more efficient than the controlaveraging technique for the extensive range of cases considered.  相似文献   

8.
In level set based structural optimization, semi-Lagrange method has an advantage to allow for a large time step without the limitation of Courant–Friedrichs–Lewy (CFL) condition for numerical stability. In this paper, a line search algorithm and a sensitivity modulation scheme are introduced for the semi-Lagrange method. The line search attempts to adaptively determine an appropriate time step in each iteration of optimization. With consideration of some practical characteristics of the topology optimization process, incorporating the line search into semi-Lagrange optimization method can yield fewer design iterations and thus improve the overall computational efficiency. The sensitivity modulation is inspired from the conjugate gradient method in finite-dimensions, and provides an alternative to the standard steepest descent search in level set based optimization. Two benchmark examples are presented to compare the sensitivity modulation and the steepest descent techniques with and without the line search respectively.  相似文献   

9.
李瑞涵  王耀南  谭建豪 《机器人》2018,40(6):852-859
传统的无人机梯度下降姿态融合算法的步长难以确定,收敛较慢,动态性能较差,并且对于非重力加速度敏感.针对上述不足,提出了一种Nesterov加速梯度姿态融合方法,融合加速度计与陀螺仪数据并对非重力加速度作抑制处理;利用Pixhawk开源飞控实验平台进行多组对比试验.实验结果表明,Nesterov加速梯度姿态融合算法在机体静止时误差在0.05°之内,在水平滑动实验中产生的误差在0.5°之内,在绕轴转动实验中角度跟随性好、无明显滞后,在实际飞行实验中也获得了良好的实验结果.因此,Nesterov加速梯度姿态融合算法收敛速度明显优于普通梯度下降姿态解算法,抑制非重力加速度的能力明显优于互补滤波与梯度下降法,可有效跟踪无人机的真实姿态变化.  相似文献   

10.
针对基于递推下降法的多输出支持向量回归算法在模型参数拟合过程中收敛速度慢、预测精度低的情况,使用一种基于秩2校正规则且具有二阶收敛速度的修正拟牛顿算法(BFGS)进行多输出支持向量回归算法的模型参数拟合,同时为了保证模型迭代过程中的下降量和全局收敛性,应用非精确线性搜索技术确定步长因子。通过分析支持向量机(SVM)中核函数的几何结构,构造数据依赖核函数替代传统核函数,生成多输出数据依赖核支持向量回归模型。将模型与基于梯度下降法、修正牛顿法拟合的多输出支持向量回归模型进行对比。实验结果表明,在200个样本下该算法的迭代时间为72.98 s,修正牛顿法的迭代时间为116.34 s,递推下降法的迭代时间为2065.22 s。所提算法能够减少模型迭代时间,具有更快的收敛速度。  相似文献   

11.
基于梯度下降的脉冲神经元有监督学习算法通过计算梯度最小化目标序列和实际输出序列间的误差使得神经元能激发出目标脉冲序列。然而该算法中的误差函数是基于实际输出脉冲序列和相对应的目标输出脉冲序列动态构建而成,导致算法在收敛时可能出现实际输出序列的个数和期望输出个数不相等的情况。针对这一缺陷提出了一种改进的脉冲神经元梯度下降学习算法,算法在学习过程中检测目标序列脉冲个数和实际激发脉冲个数,并引入虚拟实际激发脉冲和期望激发脉冲构建误差函数以分别解决激发个数不足和激发个数多余的问题。实验结果证明该算法能有效地防止学习算法在输出脉冲个数不等的情况下提前结束,使得神经元能够精确地激发出目标脉冲序列。  相似文献   

12.
Optimization theory and method profoundly impact numerous engineering designs and applications. The gradient descent method is simpler and more extensively used to solve numerous optimization problems than other search methods. However, the gradient descent method is easily trapped into a local minimum and slowly converges. This work presents a Gradient Forecasting Search Method (GFSM) for enhancing the performance of the gradient descent method in order to resolve optimization problems.GFSM is based on the gradient descent method and on the universal Discrete Difference Equation Prediction Model (DDEPM) proposed herein. In addition, the concept of the universal DDEPM is derived from the grey prediction model. The original grey prediction model uses a mathematical hypothesis and approximation to transform a continuous differential equation into a discrete difference equation. This is not a logical approach because the forecasting sequence data is invariably discrete. To construct a more precise prediction model, this work adopts a discrete difference equation. GFSM proposed herein can accurately predict the precise searching direction and trend of the gradient descent method via the universal DDEPM and can adjust prediction steps dynamically using the golden section search algorithm.Experimental results indicate that the proposed method can accelerate the searching speed of gradient descent method as well as help the gradient descent method escape from local minima. Our results further demonstrate that applying the golden section search method to achieve dynamic prediction steps of the DDEPM is an efficient approach for this search algorithm.  相似文献   

13.
In this paper, we propose an implicit gradient descent algorithm for the classic k-means problem. The implicit gradient step or backward Euler is solved via stochastic fixed-point iteration, in which we randomly sample a mini-batch gradient in every iteration. It is the average of the fixed-point trajectory that is carried over to the next gradient step. We draw connections between the proposed stochastic backward Euler and the recent entropy stochastic gradient descent for improving the training of deep neural networks. Numerical experiments on various synthetic and real datasets show that the proposed algorithm provides better clustering results compared to k-means algorithms in the sense that it decreased the objective function (the cluster) and is much more robust to initialization.  相似文献   

14.
双语词嵌入通常采用从源语言空间到目标语言空间映射,通过源语言映射嵌入到目标语言空间的最小距离线性变换实现跨语言词嵌入。然而大型的平行语料难以获得,词嵌入的准确率难以提高。针对语料数量不对等、双语语料稀缺情况下的跨语言词嵌入问题,该文提出一种基于小字典不对等语料的跨语言词嵌入方法,首先对单语词向量进行归一化,对小字典词对正交最优线性变换求得梯度下降初始值,然后通过对大型源语言(英语)语料进行聚类,借助小字典找到与每一聚类簇相对应的源语言词,取聚类得到的每一簇词向量均值和源语言与目标语言对应的词向量均值,建立新的双语词向量对应关系,将新建立的双语词向量扩展到小字典中,使得小字典得以泛化和扩展。最后,利用泛化扩展后的字典对跨语言词嵌入映射模型进行梯度下降求得最优值。在英语—意大利语、德语和芬兰语上进行了实验验证,实验结果证明该文方法可以在跨语言词嵌入中减少梯度下降迭代次数,减少训练时间,同时在跨语言词嵌入上表现出较好的正确率。  相似文献   

15.
We present a stochastic gradient descent optimisation method for image registration with adaptive step size prediction. The method is based on the theoretical work by Plakhov and Cruz (J. Math. Sci. 120(1):964–973, 2004). Our main methodological contribution is the derivation of an image-driven mechanism to select proper values for the most important free parameters of the method. The selection mechanism employs general characteristics of the cost functions that commonly occur in intensity-based image registration. Also, the theoretical convergence conditions of the optimisation method are taken into account. The proposed adaptive stochastic gradient descent (ASGD) method is compared to a standard, non-adaptive Robbins-Monro (RM) algorithm. Both ASGD and RM employ a stochastic subsampling technique to accelerate the optimisation process. Registration experiments were performed on 3D CT and MR data of the head, lungs, and prostate, using various similarity measures and transformation models. The results indicate that ASGD is robust to these variations in the registration framework and is less sensitive to the settings of the user-defined parameters than RM. The main disadvantage of RM is the need for a predetermined step size function. The ASGD method provides a solution for that issue.  相似文献   

16.
ABSTRACT

Learning parameters of a probabilistic model is a necessary step in machine learning tasks. We present a method to improve learning from small datasets by using monotonicity conditions. Monotonicity simplifies the learning and it is often required by users. We present an algorithm for Bayesian Networks parameter learning. The algorithm and monotonicity conditions are described, and it is shown that with the monotonicity conditions we can better fit underlying data. Our algorithm is tested on artificial and empiric datasets. We use different methods satisfying monotonicity conditions: the proposed gradient descent, isotonic regression EM, and non-linear optimization. We also provide results of unrestricted EM and gradient descent methods. Learned models are compared with respect to their ability to fit data in terms of log-likelihood and their fit of parameters of the generating model. Our proposed method outperforms other methods for small sets, and provides better or comparable results for larger sets.  相似文献   

17.
Botta  Marco  Piola  Roberto 《Machine Learning》2000,38(1-2):109-131
This paper proposes a method for refining numerical constants occurring in rules of a knowledge base expressed in a first order logic language. The method consists in tuning numerical parameters by performing error gradient descent. The knowledge base to be refined can be manually handcrafted or automatically acquired by a symbolic relational learner, able to deal with numerical features. The results of an experimental analysis performed on four case studies show that the refinement step can be effective in improving classification performances.  相似文献   

18.
The conjugate gradient method is an effective method for large-scale unconstrained optimization problems. Recent research has proposed conjugate gradient methods based on secant conditions to establish fast convergence of the methods. However, these methods do not always generate a descent search direction. In contrast, Y. Narushima, H. Yabe, and J.A. Ford [A three-term conjugate gradient method with sufficient descent property for unconstrained optimization, SIAM J. Optim. 21 (2011), pp. 212–230] proposed a three-term conjugate gradient method which always satisfies the sufficient descent condition. This paper makes use of both ideas to propose descent three-term conjugate gradient methods based on particular secant conditions, and then shows their global convergence properties. Finally, numerical results are given.  相似文献   

19.
This paper proposes a nonmonotone scaled conjugate gradient algorithm for solving large-scale unconstrained optimization problems, which combines the idea of scaled memoryless Broyden–Fletcher–Goldfarb–Shanno preconditioned conjugate gradient method with the nonmonotone technique. An attractive property of the proposed method is that the search direction always provides sufficient descent step at each iteration. This property is independent of the line search used. Under appropriate assumptions, the method is proven to possess global convergence for nonconvex smooth functions, and R-linear convergence for strongly convex functions. Preliminary numerical results and related comparisons show the efficiency of the proposed method in practical computation.  相似文献   

20.
The conjugate gradient method for optimal control problems   总被引:1,自引:0,他引:1  
This paper extends the conjugate gradient minimization method of Fletcher and Reeves to optimal control problems. The technique is directly applicable only to unconstrained problems; if terminal conditions and inequality constraints are present, the problem must be converted to an unconstrained form; e.g., by penalty functions. Only the gradient trajectory, its norm, and one additional trajectory, the actual direction of search, need be stored. These search directions are generated from past and present values of the objective and its gradient. Successive points are determined by linear minimization down these directions, which are always directions of descent. Thus, the method tends to converge, even from poor approximations to the minimum. Since, near its minimum, a general nonlinear problem can be approximated by one with a linear system and quadratic objective, the rate of convergence is studied by considering this case. Here, the directions of search are conjugate and hence the objective is minimized over an expanding sequence of sets. Also, the distance from the current point to the miminum is reduced at each step. Three examples are presented to compare the method with the method of steepest descent. Convergence of the proposed method is much more rapid in all cases. A comparison with a second variational technique is also given in Example 3.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号