首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
赵杰  张春元  刘超  周辉  欧宜贵  宋淇 《自动化学报》2022,48(8):2050-2061
针对循环神经网络(Recurrent neural networks, RNNs)一阶优化算法学习效率不高和二阶优化算法时空开销过大,提出一种新的迷你批递归最小二乘优化算法.所提算法采用非激活线性输出误差替代传统的激活输出误差反向传播,并结合加权线性最小二乘目标函数关于隐藏层线性输出的等效梯度,逐层导出RNNs参数的迷你批递归最小二乘解.相较随机梯度下降算法,所提算法只在RNNs的隐藏层和输出层分别增加了一个协方差矩阵,其时间复杂度和空间复杂度仅为随机梯度下降算法的3倍左右.此外,本文还就所提算法的遗忘因子自适应问题和过拟合问题分别给出一种解决办法.仿真结果表明,无论是对序列数据的分类问题还是预测问题,所提算法的收敛速度要优于现有主流一阶优化算法,而且在超参数的设置上具有较好的鲁棒性.  相似文献   

2.
Tuning support vector machine (SVM) hyperparameters is an important step in achieving a high-performance learning machine. It is usually done by minimizing an estimate of generalization error based on the bounds of the leave-one-out (LOO) such as radius-margin bound and on the performance measures such as generalized approximate cross-validation (GACV), empirical error, etc. These usual automatic methods used to tune the hyperparameters require an inversion of the Gram-Schmidt matrix or a resolution of an extra-quadratic programming problem. In the case of a large data set these methods require the addition of huge amounts of memory and a long CPU time to the already significant resources used in SVM training. In this paper, we propose a fast method based on an approximation of the gradient of the empirical error, along with incremental learning, which reduces the resources required both in terms of processing time and of storage space. We tested our method on several benchmarks, which produced promising results confirming our approach. Furthermore, it is worth noting that the gain time increases when the data set is large.  相似文献   

3.
Benchmarking Least Squares Support Vector Machine Classifiers   总被引:16,自引:0,他引:16  
In Support Vector Machines (SVMs), the solution of the classification problem is characterized by a (convex) quadratic programming (QP) problem. In a modified version of SVMs, called Least Squares SVM classifiers (LS-SVMs), a least squares cost function is proposed so as to obtain a linear set of equations in the dual space. While the SVM classifier has a large margin interpretation, the LS-SVM formulation is related in this paper to a ridge regression approach for classification with binary targets and to Fisher's linear discriminant analysis in the feature space. Multiclass categorization problems are represented by a set of binary classifiers using different output coding schemes. While regularization is used to control the effective number of parameters of the LS-SVM classifier, the sparseness property of SVMs is lost due to the choice of the 2-norm. Sparseness can be imposed in a second stage by gradually pruning the support value spectrum and optimizing the hyperparameters during the sparse approximation procedure. In this paper, twenty public domain benchmark datasets are used to evaluate the test set performance of LS-SVM classifiers with linear, polynomial and radial basis function (RBF) kernels. Both the SVM and LS-SVM classifier with RBF kernel in combination with standard cross-validation procedures for hyperparameter selection achieve comparable test set performances. These SVM and LS-SVM performances are consistently very good when compared to a variety of methods described in the literature including decision tree based algorithms, statistical algorithms and instance based learning methods. We show on ten UCI datasets that the LS-SVM sparse approximation procedure can be successfully applied.  相似文献   

4.
Kernel functions are used in support vector machines (SVM) to compute inner product in a higher dimensional feature space. SVM classification performance depends on the chosen kernel. The radial basis function (RBF) kernel is a distance-based kernel that has been successfully applied in many tasks. This paper focuses on improving the accuracy of SVM by proposing a non-linear combination of multiple RBF kernels to obtain more flexible kernel functions. Multi-scale RBF kernels are weighted and combined. The proposed kernel allows better discrimination in the feature space. This new kernel is proved to be a Mercer’s kernel. Furthermore, evolutionary strategies (ESs) are used for adjusting the hyperparameters of SVM. Training accuracy, the bound of generalization error, and subset cross-validation on training accuracy are considered to be objective functions in the evolutionary process. The experimental results show that the accuracy of multi-scale RBF kernels is better than that of a single RBF kernel. Moreover, the subset cross-validation on training accuracy is more suitable and it yields the good results on benchmark datasets.  相似文献   

5.
We consider the minimum-compliance formulation of the truss topology problem with additional linear constraints on the displacements: the so-called displacement constraints. We propose a new bilevel programming approach to this problem. Our primal goal (upper-level) is to satisfy the displacement constraint as well as possible — we minimize the gap between the actual and prescribed displacement. Our second goal (lower level) is to minimize the compliance — we still want to find the stiffest structure satisfying the displacement constraints. On the lower level we solve a standard truss topology problem and hence we can solve it in the formulation suitable for the fast interior point alogrithms. The overall bilevel problem is solved by means of the so-called implicit programming approach. This approach leads to a nonsmooth optimization problem which is finally solved by a nonsmooth solver.  相似文献   

6.
In this paper we provide a detailed analysis of the iteration complexity of dual first-order methods for solving conic convex problems. When it is difficult to project on the primal feasible set described by conic and convex constraints, we use the Lagrangian relaxation to handle the conic constraints and then, we apply dual first-order algorithms for solving the corresponding dual problem. We give convergence analysis for dual first-order algorithms (dual gradient and fast gradient algorithms): we provide sublinear or linear estimates on the primal suboptimality and feasibility violation of the generated approximate primal solutions. Our analysis relies on the Lipschitz property of the gradient of the dual function or an error bound property of the dual. Furthermore, the iteration complexity analysis is based on two types of approximate primal solutions: the last primal iterate or an average primal sequence.  相似文献   

7.
In this paper, we propose a new unconstrained twin support vector regression model in the primal space (UPTSVR). With the addition of a regularization term in the formulation of the problem, the structural risk is minimized. The proposed formulation solves two smaller sized unconstrained minimization problems having continues, piece-wise quadratic objective functions by gradient based iterative methods. However, since their objective functions contain the non-smooth ‘plus’ function, two approaches are taken: (i) replace the non-smooth ‘plus’ function with their smooth approximate functions; (ii) apply a generalized derivative of the non-smooth ‘plus’ function. They lead to five algorithms whose pseudo-codes are also given. Experimental results obtained on a number of interesting synthetic and real-world benchmark datasets using these algorithms in comparison with the standard support vector regression (SVR) and twin SVR (TSVR) clearly demonstrates the effectiveness of the proposed method.  相似文献   

8.
Gradient-based optimization of hyperparameters   总被引:3,自引:0,他引:3  
Bengio Y 《Neural computation》2000,12(8):1889-1900
Many machine learning algorithms can be formulated as the minimization of a training criterion that involves a hyperparameter. This hyperparameter is usually chosen by trial and error with a model selection criterion. In this article we present a methodology to optimize several hyperparameters, based on the computation of the gradient of a model selection criterion with respect to the hyperparameters. In the case of a quadratic training criterion, the gradient of the selection criterion with respect to the hyperparameters is efficiently computed by backpropagating through a Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyperparameter gradient involving second derivatives of the training criterion.  相似文献   

9.
Indefinite kernel support vector machine(IKSVM)has recently attracted increasing attentions in machine learning.Since IKSVM essentially is a non-convex problem,existing algorithms either change the spectrum of indefinite kernel directly but risking losing some valuable information or solve the dual form of IKSVM whereas suffering from a dual gap problem.In this paper,we propose a primal perspective for solving the problem.That is,we directly focus on the primal form of IKSVM and present a novel algorithm termed as IKSVM-DC for binary and multi-class classification.Concretely,according to the characteristics of the spectrum for the indefinite kernel matrix,IKSVM-DC decomposes the primal function into the subtraction of two convex functions as a difference of convex functions(DC)programming.To accelerate convergence rate,IKSVM-DC combines the classical DC algorithm with a line search step along the descent direction at each iteration.Furthermore,we construct a multi-class IKSVM model which can classify multiple classes in a unified form.A theoretical analysis is then presented to validate that IKSVM-DC can converge to a local minimum.Finally,we conduct experiments on both binary and multi-class datasets and the experimental results show that IKSVM-DC is superior to other state-of-the-art IKSVM algorithms.  相似文献   

10.
We introduce a new variational formulation for the problem of reconstructing a watertight surface defined by an implicit equation, from a finite set of oriented points; a problem which has attracted a lot of attention for more than two decades. As in the Poisson Surface Reconstruction approach, discretizations of the continuous formulation reduce to the solution of sparse linear systems of equations. But rather than forcing the implicit function to approximate the indicator function of the volume bounded by the implicit surface, in our formulation the implicit function is forced to be a smooth approximation of the signed distance function to the surface. Since an indicator function is discontinuous, its gradient does not exist exactly where it needs to be compared with the normal vector data. The smooth signed distance has approximate unit slope in the neighborhood of the data points. As a result, the normal vector data can be incorporated directly into the energy function without implicit function smoothing. In addition, rather than first extending the oriented points to a vector field within the bounding volume, and then approximating the vector field by a gradient field in the least squares sense, here the vector field is constrained to be the gradient of the implicit function, and a single variational problem is solved directly in one step. The formulation allows for a number of different efficient discretizations, reduces to a finite least squares problem for all linearly parameterized families of functions, and does not require boundary conditions. The resulting algorithms are significantly simpler and easier to implement, and produce results of quality comparable with state‐of‐the‐art algorithms. An efficient implementation based on a primal‐graph octree‐based hybrid finite element‐finite difference discretization, and the Dual Marching Cubes isosurface extraction algorithm, is shown to produce high quality crack‐free adaptive manifold polygon meshes.  相似文献   

11.
Model-based localization, the task of estimating an object's pose from sensed and corresponding model features, is a fundamental task in machine vision. Exact constant time localization algorithms have been developed for the case where the sensed features and the model features are the same type. Still, it is not uncommon for the sensed features and the model features to be of different types, i.e., sensed data points may correspond to model faces or edges. Previous localization approaches have handled different model and sensed features of different types via sampling and synthesizing virtual features to reduce the problem of matching features of dissimilar types to the problem of matching features of similar types. Unfortunately, these approaches may be suboptimal because they introduce artificial errors. Other localization approaches have reformulated object localization as a nonlinear least squares problem where the error is between the sensed data and model features in image coordinates (the Euclidean image error metric). Unfortunately, all of the previous approaches which minimized the Euclidean image error metric relied on gradient descent methods to find the global minima, and gradient descent methods may suffer from problems of local minima. In this paper, we describe an exact, efficient solution to the nonlinear least squares minimization problem based upon resultants, linear algebra, and numerical techniques. On a SPARC 20, our localization algorithm runs in a few microseconds for rectilinear polygonal models, a few milliseconds for generic polygonal models, and one second for generalized polygonal models (models composed of linear edges and circular arcs).  相似文献   

12.
In this paper, we propose a fast algorithm to solve the well known total variation (TV) inpainting model. Classically, the Euler-Lagrange equation deduced from TV inpainting model is solved by the gradient descent method and discretized by an explicit scheme, which produces a slow inpainting process. Sometimes an implicit scheme is also used to tackle the problem. Although the implicit scheme is several times faster than the explicit one, it is still too slow in many practical applications. In this paper, we propose to use an operator splitting method by adding new variables in the Euler-Lagrange equation of TV inpainting model such that the equation is split into a few very simple subproblems. Then we solve these subproblems by an alternate iteration. Numerically, the proposed algorithm is very easy to implement. In the numerical experiments, we mainly compare our algorithm with the existing implicit TV inpainting algorithms. It is shown that our algorithm is about ten to twenty times faster than the implicit TV inpainting algorithms with similar inpainting quality. The comparison of our algorithm with harmonic inpainting algorithm also shows some advantages and disadvantages of the TV inpainting model.  相似文献   

13.
In this paper, we have developed a new algorithm for solving nonconvex large-scale problems. The new algorithm performs explicit matrix modifications adaptively, mimicing the implicit modifications used by trust-region methods. Thus, it shares the equivalent theoretical strength of trust-region approaches, without needing to accommodate an explicit step-size constraint. We show that the algorithm is well suited for solving very large-scale nonconvex problems whenever Hessian-vector products are available. The numerical results on the CUTEr problems demonstrate the effectiveness of this approach in the context of a line-search method for large-scale unconstrained nonconvex optimization. Moreover, applications in deep-learning problems further illustrate the usefulness of this algorithm. It does not share any of the prohibitive traits of popular matrix-free algorithms such as truncated conjugate gradient (CG) due to the difficult nature of deep-learning problems. Thus the proposed algorithm serves to bridge the gap between the needs of data-mining community and existing state-of-the-art approaches embraced foremost by the optimization community. Moreover, the proposed approach can be realized with minimal modification to the CG algorithm itself with negligible storage and computational overhead.  相似文献   

14.
A high-order condensation expression is proposed to generalize Duffin's condensation formula. Based on the higher-order condensation expression, two primal algorithms of the second-order are constructed to solve GGP (generalized geometric programming) problems. To overcome the difficulty of solving the structural optimization with multiparameter elements, two methods are proposed and implemented in a computer program. The first method takes one geometrical property of the cross-section as one component of the design variables for each element and adopts SQP (sequential quadratic programming) to solve the optimization model. The second method takes geometrical sizes of the cross-section as the design variables for each element and adopts primal algorithms of GGP problems to solve the optimization model.  相似文献   

15.
针对无约束图像分割模型的实现问题,提出一种基于分块协调下降方法的快速数值算法.该算法将模型的对偶问题转化为一组约束一元或二元二次极值问题,不仅避免了原问题求解时局部不可微性和高非线性性等难点,使得求解过程简单并易于实现:而且与现有的基于梯度下降的算法相比,具有无条件全局收敛性并显著地提高了收敛速度.仿真实验结果表明了所提出算法的有效性和在分割效率上的优越性.  相似文献   

16.
A general mathematical model for optimal design of structural and mechanical systems is defined. The finite element techniques for analysis of these systems are incorporated into the model. Optimality conditions for the model are derived. It is shown that an analytical solution of the optimality conditions is impossible except for trivial design problems. Numerical methods for solving these optimality conditions are presented. A direct method based on the gradient projection concept is derived. Numerical aspects for the method are discussed. These include step size selection, calculation of Langrange multipliers, identification of dependent design sensitivity vectors of constraint functions, and convergence criterion. Both direct and indirect approaches require total derivatives of cost and various constraint functions with respect to design variables. This is accomplished by integrating into both the approaches the state space design sensitivity analysis that has proven to be very general and efficient. It is shown that other major calculations of the two approaches are essentially the same. Thus there is potential for continuous transition between various algorithms for structural optimization. Optimal solutions for two design examples are obtained by hybrid methods to show the potential for further development of these methods.  相似文献   

17.
In this paper, two modified spectral conjugate gradient methods which satisfy sufficient descent property are developed for unconstrained optimization problems. For uniformly convex problems, the first modified spectral type of conjugate gradient algorithm is proposed under the Wolfe line search rule. Moreover, the search direction of the modified spectral conjugate gradient method is sufficiently descent for uniformly convex functions. Furthermore, according to the Dai–Liao's conjugate condition, the second spectral type of conjugate gradient algorithm can generate some sufficient decent direction at each iteration for general functions. Therefore, the second method could be considered as a modification version of the Dai–Liao's algorithm. Under the suitable conditions, the proposed algorithms are globally convergent for uniformly convex functions and general functions. The numerical results show that the approaches presented in this paper are feasible and efficient.  相似文献   

18.
The blind equalizers based on complex valued feedforward neural networks, for linear and nonlinear communication channels, yield better performance as compared to linear equalizers. The learning algorithms are, generally, based on stochastic gradient descent, as they are simple to implement. However, these algorithms show a slow convergence rate. In the blind equalization problem, the unavailability of the desired output signal and the presence of nonlinear activation functions make the application of recursive least squares algorithm difficult. In this letter, a new scheme using recursive least squares algorithm is proposed for blind equalization. The learning of weights of the output layer is obtained by using a modified version of constant modulus algorithm cost function. For the learning of weights of hidden layer neuron space adaptation approach is used. The proposed scheme results in faster convergence of the equalizer.  相似文献   

19.
We generalize discrete variational models involving the infimal convolution (IC) of first and second order differences and the total generalized variation (TGV) to manifold-valued images. We propose both extrinsic and intrinsic approaches. The extrinsic models are based on embedding the manifold into an Euclidean space of higher dimension with manifold constraints. An alternating direction methods of multipliers can be employed for finding the minimizers. However, the components within the extrinsic IC or TGV decompositions live in the embedding space which makes their interpretation difficult. Therefore, we investigate two intrinsic approaches: for Lie groups, we employ the group action within the models; for more general manifolds, our IC model is based on recently developed absolute second order differences on manifolds, while our TGV approach uses an approximation of the parallel transport by the pole ladder. For computing the minimizers of the intrinsic models, we apply gradient descent algorithms. Numerical examples demonstrate that our approaches work well for certain manifolds.  相似文献   

20.
《国际计算机数学杂志》2012,89(11):2552-2567
This paper is concerned with minimal norm least squares solution to general linear matrix equations including the well-known Lyapunov matrix equation and Sylvester matrix equation as special cases. Two iterative algorithms are proposed to solve this problem. The first method is based on the gradient search principle for solving optimization problem and the second one can be regarded as its dual form. For both algorithms, necessary and sufficient conditions guaranteeing the convergence of the algorithms are presented. The optimal step sizes such that the convergence rates of the algorithms are maximized are established in terms of the singular values of some coefficient matrix. It is believed that the proposed methods can perform important functions in many analysis and design problems in systems theory.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号