首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 140 毫秒
1.
随机梯度下降(stochastic gradient descent,SGD)是一种求解大规模优化问题的简单高效方法,近期的研究表明,在求解强凸优化问题时其收敛速率可通过α-suffix平均技巧得到有效的提升.但SGD属于黑箱方法,难以得到正则化优化问题所期望的实际结构效果.另一方面,COMID(composite objective mirror descent)是一种能保证L1正则化结构的稀疏随机算法,但对于强凸优化问题其收敛速率仅为O(logT?T).主要考虑"L1+Hinge"优化问题,首先引入L2强凸项将其转化为强凸优化问题,进而将COMID算法和α-suffix平均技巧结合得到L1MD-α算法.证明了L1MD-α具有O(1?T)的收敛速率,并且获得了比COMID更好的稀疏性.大规模数据库上的实验验证了理论分析的正确性和所提算法的有效性.  相似文献   

2.
L1正则化机器学习问题求解分析   总被引:3,自引:1,他引:2       下载免费PDF全文
孔康  汪群山  梁万路 《计算机工程》2011,37(17):175-177
以稀疏学习为主线,从多阶段、多步骤优化思想的角度出发,对当前流行的L1正则化求解算法进行分类,比较基于次梯度的多步骤方法、基于坐标优化的多阶段方法,以及软L1正则化方法的收敛性能、时空复杂度和解的稀疏程度。分析表明,基于机器学习问题特殊结构的学习算法可以获得较好的稀疏性和较快的收敛速度。  相似文献   

3.
Pegasos算法是求解大规模支持向量机问题的有效方法,在随机梯度下降过程中植入多阶段循环步骤,能使该算法得到最优的收敛速度O(1/T)。COMID算法是由镜面下降算法推广得到的正则化随机形式,可保证正则化项的结构,但对于强凸的优化问题,该算法的收敛速度仅为O(logT/T)。为此,在COMID算法中引入多阶段循环步骤,提出一种求解L1+L2混合正则化项问题的最优正则化镜面下降算法,证明其具有最优的收敛速度O(1/T),以及与COMID算法相同的稀疏性。在大规模数据库上的实验结果验证了理论分析的正确性和所提算法的有效性。  相似文献   

4.
刘宇翔  程禹嘉  陶卿 《软件学报》2020,31(4):1051-1062
随机优化方法已经成为处理大规模正则化和深度学习优化问题的首选方法,其收敛速率的获得通常都建立在目标函数梯度无偏估计的基础上,但对机器学习问题来说,很多现象都导致了梯度有偏情况的出现.与梯度无偏情形不同的是,著名的Nesterov加速算法NAG(Nesterov accelerated gradient)会逐步累积每次迭代中的梯度偏差,从而导致不能获得最优的收敛速率甚至收敛性都无法保证.近期的研究结果表明,NAG方法也是求解非光滑问题投影次梯度关于个体收敛的加速算法,但次梯度有偏对其影响的研究未见报道.针对非光滑优化问题,证明了在次梯度偏差有界的情况下,NAG能够获得稳定的个体收敛界,而当次梯度偏差按照一定速率衰减时,NAG仍然可获得最优的个体收敛速率.作为应用,得到了一种无需精确计算投影的投影次梯度方法,可以在保持收敛性的同时较快地达到稳定学习的精度.实验验证了理论分析的正确性及非精确方法的性能.  相似文献   

5.
陶卿  高乾坤  姜纪远  储德军 《软件学报》2013,24(11):2498-2507
机器学习正面临着数据规模日益扩大的严峻挑战,如何处理大规模甚至超大规模数据问题,是当前统计学习亟需解决的关键性科学问题.大规模机器学习问题的训练样本集合往往具有冗余和稀疏的特点,机器学习优化问题中的正则化项和损失函数也蕴含着特殊的结构含义,直接使用整个目标函数梯度的批处理黑箱方法不仅难以处理大规模问题,而且无法满足机器学习对结构的要求.目前,依靠机器学习自身特点驱动而迅速发展起来的坐标优化、在线和随机优化方法成为解决大规模问题的有效手段.针对L1 正则化问题,介绍了这些大规模算法的一些研究进展.  相似文献   

6.
在大数据领域中预测高维稀疏矩阵中的缺失数据,通常采用随机梯度下降算法构造隐语义模型来对缺失数据进行预测。在随机梯度下降算法来求解模型的过程中经常加入正则化项来提高模型的性能,由于[L1]正则化项不可导,目前在隐语义模型中主要通过加入[L2]正则化项来构建隐语义模型(SGD_LF)。但因为[L1]正则化项能提高模型的稀疏性增强模型求解能力,因此提出一种基于[L1]和[L2]正则化约束的隐语义(SPGD_LF)模型。在通过构建目标函数时,同时引入[L1]和[L2]正则化项。由于目标函数满足利普希茨条件,并通过二阶的泰勒展开对目标函数进行逼近,构造出随机梯度下降的求解器,在随机梯度下降求解隐语义模型的过程中通过软阈值来处理[L1]正则化项所对应的边界优化问题。通过此优化方案,可以更好地表达目标矩阵中的已知数据在隐语义空间中的特征和对应的所属社区关系,提高了模型的泛化能力。通过在大型工业数据集上的实验表明,SPGD_LF模型的预测精度、稀疏性和收敛速度等性能都有显著提高。  相似文献   

7.
针对非光滑损失问题提出一种新的坐标下降算法,采用排序搜索的方式求解子问题解析解。分析了算法的时间复杂度,并给出了三种提高收敛速度的实用技巧。实验表明算法对正则化Hinge损失问题具有良好的性能,达到了预期的效果。  相似文献   

8.
针对大规模集成电路领域CT重建图像的特点,提出TV约束条件下采用l1范数作正则项的重建模型,并给出了基于Bregman迭代的模型求解算法.算法分为两步: 1)采用Bregman迭代求解图像的l1范数作为正则项,误差的加权l2范数作为保真项的约束极值问题;2) 采用TV约束对1)中得到的重建图像进行修正.算法对TV约束条件下采用l1作正则项的重建模型分开求解,降低了算法的复杂度,加快了收敛速度.算法在稀疏投影数据下可以快速重建CT图像且质量较好.本文采用经典的Shepp-Logan图像进行仿真实验并对实际得到的电路板投影数据进行重建,结果表明该算法可满足重建质量要求且重建速度有较大提升.  相似文献   

9.
Tikhonov正则化多分类支持向量机是一种将多分类问题简化为单个优化问题的新型支持向量机.由于Tikhonov正则化多分类支持向量机利用全部类别数据样本构建核函数矩阵,因此不适合大规模数据集的模式分类问题,鉴于该原因,一种稀疏Tikhonov正则化多分类支持量机被建立,其训练算法首先构建样本重要性评价标准,在标准下通过迭代学习获取约简集,最后利用约简集构建核函数矩阵并训练支持向量机.仿真实验结果表明稀疏Tikhonov正则化多分类支持向量机在训练速度和稀疏性方面具有很大的优越性.  相似文献   

10.
正则化图像复原最终会导致一个大规模优化问题,提出了一种基于Bregman迭代双正则化的图像复原方法。该方法中目标函数同时考虑总变分正则化和小波域稀疏正则化,在Bregman框架下解决图像复原问题,并且给出了用于解该问题的分裂Bregman迭代算法。该算法将复杂的优化问题转化为几十次简单的迭代加以解决,每次迭代只需几次快速傅里叶变换和收缩操作即可。实验结果表明,提出的复原算法不论从客观改善信噪比还是主观视觉,都能取得很好的效果。同时与目前的复原算法相比,该算法有更快的收敛速度。  相似文献   

11.
An ensemble of multiple classifiers is widely considered to be an effective technique for improving accuracy and stability of a single classifier. This paper proposes a framework of sparse ensembles and deals with new linear weighted combination methods for sparse ensembles. Sparse ensemble is to sparsely combine the outputs of multiple classifiers by using a sparse weight vector. When the continuous outputs of multiple classifiers are provided in our methods, the problem of solving sparse weight vector can be formulated as linear programming problems in which the hinge loss or/and the 1-norm regularization are exploited. Both the hinge loss and the 1-norm regularization are techniques inducing sparsity used in machine learning. We only ensemble classifiers with nonzero weight coefficients. In these LP-based methods, the ensemble training error is minimized while the weight vector of ensemble learning is controlled, which can be thought as implementing the structure risk minimization rule and naturally explains good performance of these methods. The promising experimental results over UCI data sets and the radar high-resolution range profile data are presented.  相似文献   

12.
随机优化方法是求解大规模机器学习问题的主流方法,其研究的焦点问题是算法是否达到最优收敛速率与能否保证学习问题的结构。目前,正则化损失函数问题已得到了众多形式的随机优化算法,但绝大多数只是对迭代进行 平均的输出方式讨论了收敛速率,甚至无法保证最为典型的稀疏结构。与之不同的是,个体解能很好保持稀疏性,其最优收敛速率已经作为open问题被广泛探索。另外,随机优化普遍采用的梯度无偏假设往往不成立,加速方法收敛界中的偏差在有偏情形下会随迭代累积,从而无法应用。本文对一阶随机梯度方法的研究现状及存在的问题进行综述,其中包括个体收敛速率、梯度有偏情形以及非凸优化问题,并在此基础上指出了一些值得研究的问题。  相似文献   

13.
Sparsity of a classifier is a desirable condition for high-dimensional data and large sample sizes. This paper investigates the two complementary notions of sparsity for binary classification: sparsity in the number of features and sparsity in the number of examples. Several different losses and regularizers are considered: the hinge loss and ramp loss, and ?2, ?1, approximate ?0, and capped ?1 regularization. We propose three new objective functions that further promote sparsity, the capped ?1 regularization with hinge loss, and the ramp loss versions of approximate ?0 and capped ?1 regularization. We derive difference of convex functions algorithms (DCA) for solving these novel non-convex objective functions. The proposed algorithms are shown to converge in a finite number of iterations to a local minimum. Using simulated data and several data sets from the University of California Irvine (UCI) machine learning repository, we empirically investigate the fraction of features and examples required by the different classifiers.  相似文献   

14.
目的 阿尔茨海默症(Alzheimer’s disease,AD)是主要的老年病之一,并正向年轻化发展。早期通过核磁共振(magnetic resonance imaging,MRI)图像识别AD的发病阶段,有助于在AD初期及时采取相关干预措施和治疗手段,控制和延缓AD疾病恶化。为此,提出了基于平滑函数的组L1/2稀疏正则化(smooth group L1/2,SGL1/2)方法。方法 通过引入平滑组L1/2正则化实现组内稀疏,并将原先组L1/2方法中含有的非平滑的绝对值函数向平滑函数逼近,解决了组L1/2方法中数值计算振荡和收敛难的缺点。SGL1/2方法能够在保持分类精度的前提下,加速对模型的求解。同时在分类方法中,引入一个校准hinge函数(calibrated hinge,Chinge)代替标准支持向量机(support vector machine,SVM)中的hinge函数,形成校准SVM (calibrated SVM,C-SVM)用于疾病的分类,使处于分类平面附近的样本更倾向于分类的正确一侧,对一些难以区分的样本能够进行更好的分类。结果 与其他组级别上的正则化方法相比,SGL1/2与校准支持向量机结合的分类模型对AD的识别具有更高的分类性能,分类准确率高达94.70%。结论 本文提出的组稀疏分类模型,实现了组间稀疏和组内稀疏的优点,为未来AD的自动诊断提供了客观参照。  相似文献   

15.
王一宾    裴根生  程玉胜   《智能系统学报》2019,14(4):831-842
将正则化极限学习机或者核极限学习机理论应用到多标记分类中,一定程度上提高了算法的稳定性。但目前这些算法关于损失函数添加的正则项都基于L2正则,导致模型缺乏稀疏性表达。同时,弹性网络正则化既保证模型鲁棒性且兼具模型稀疏化学习,但结合弹性网络的极限学习机如何解决多标记问题鲜有研究。基于此,本文提出一种对核极限学习机添加弹性网络正则化的多标记学习算法。首先,对多标记数据特征空间使用径向基核函数映射;随后,对核极限学习机损失函数施加弹性网络正则项;最后,采用坐标下降法迭代求解输出权值以得到最终预测标记。通过对比试验和统计分析表明,提出的算法具有更好的性能表现。  相似文献   

16.

Dictionary plays an important role in multi-instance data representation. It maps bags of instances to histograms. Earth mover’s distance (EMD) is the most effective histogram distance metric for the application of multi-instance retrieval. However, up to now, there is no existing multi-instance dictionary learning methods designed for EMD-based histogram comparison. To fill this gap, we develop the first EMD-optimal dictionary learning method using stochastic optimization method. In the stochastic learning framework, we have one triplet of bags, including one basic bag, one positive bag, and one negative bag. These bags are mapped to histograms using a multi-instance dictionary. We argue that the EMD between the basic histogram and the positive histogram should be smaller than that between the basic histogram and the negative histogram. Base on this condition, we design a hinge loss. By minimizing this hinge loss and some regularization terms of the dictionary, we update the dictionary instances. The experiments over multi-instance retrieval applications shows its effectiveness when compared to other dictionary learning methods over the problems of medical image retrieval and natural language relation classification.

  相似文献   

17.
The sparsity driven classification technologies have attracted much attention in recent years, due to their capability of providing more compressive representations and clear interpretation. Two most popular classification approaches are support vector machines (SVMs) and kernel logistic regression (KLR), each having its own advantages. The sparsification of SVM has been well studied, and many sparse versions of 2-norm SVM, such as 1-norm SVM (1-SVM), have been developed. But, the sparsification of KLR has been less studied. The existing sparsification of KLR is mainly based on L 1 norm and L 2 norm penalties, which leads to the sparse versions that yield solutions not so sparse as it should be. A very recent study on L 1/2 regularization theory in compressive sensing shows that L 1/2 sparse modeling can yield solutions more sparse than those of 1 norm and 2 norm, and, furthermore, the model can be efficiently solved by a simple iterative thresholding procedure. The objective function dealt with in L 1/2 regularization theory is, however, of square form, the gradient of which is linear in its variables (such an objective function is the so-called linear gradient function). In this paper, through extending the linear gradient function of L 1/2 regularization framework to the logistic function, we propose a novel sparse version of KLR, the 1/2 quasi-norm kernel logistic regression (1/2-KLR). The version integrates advantages of KLR and L 1/2 regularization, and defines an efficient implementation scheme of sparse KLR. We suggest a fast iterative thresholding algorithm for 1/2-KLR and prove its convergence. We provide a series of simulations to demonstrate that 1/2-KLR can often obtain more sparse solutions than the existing sparsity driven versions of KLR, at the same or better accuracy level. The conclusion is also true even in comparison with sparse SVMs (1-SVM and 2-SVM). We show an exclusive advantage of 1/2-KLR that the regularization parameter in the algorithm can be adaptively set whenever the sparsity (correspondingly, the number of support vectors) is given, which suggests a methodology of comparing sparsity promotion capability of different sparsity driven classifiers. As an illustration of benefits of 1/2-KLR, we give two applications of 1/2-KLR in semi-supervised learning, showing that 1/2-KLR can be successfully applied to the classification tasks in which only a few data are labeled.  相似文献   

18.
This paper studies the use of fast and exact unidimensional L2–L1 minimization as a line search for accelerating iterative reconstruction algorithms. In L2–L1 minimization reconstruction problems, the squared Euclidean, or L2 norm, measures signal-data discrepancy and the L1 norm stands for a sparsity preserving regularization term. Functionals as these arise in important applications such as compressed sensing and deconvolution. Optimal unidimensional L2–L1 minimization has only recently been studied by Li and Osher for denoising problems and by Wen et al. for line search. A fast L2–L1 optimization procedure can be adapted for line search and used in iterative algorithms, improving convergence speed with little increase in computational cost. This paper proposes a new method for exact L2–L1 line search and compares it with the Li and Osher's, Wen et al.'s, as well as with a standard line search algorithm, the method of false position. The use of the proposed line search improves convergence speed of different iterative algorithms for L2–L1 reconstruction such as iterative shrinkage, iteratively reweighted least squares, and nonlinear conjugate gradient. This assertion is validated experimentally in applications to signal reconstruction in compressed sensing and sparse signal deblurring.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号