首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
模糊最小二乘孪生支持向量机模型融合了模糊函数和最小二乘孪生支持向量机算法特性,以解决训练数据集存在孤立点噪声和运算效率低下问题。针对回归过程基于统计学习结构风险最小化原则,对该模型进行L_2范数正则化改进。考虑到大规模数据集的训练效率问题,对原始模型进行了L_1范数正则化改进。基于增量学习特性,对数据集训练过程进行增量选择迭加以加快训练速度。在UCI数据集上验证了相关改进算法的优越性。  相似文献   

2.
增量式支持向量机学习算法是一种重要的在线学习方法。传统的单增量支持向量机学习算法使用一个数据样本更新支持向量机模型。在增加或删除的数据样本点较多时,这种模型更新模式耗时巨大,具体原因是每个被插入或删除的样本都要进行一次模型参数更新的判断。该文提出一种基于参数规划的多重增量式的支持向量机优化训练算法,使用该训练算法,多重的支持向量机的训练时间大为减少。在合成数据集及真实测试数据集上的实验结果显示,该文提出的方法可以大大降低多重支持向量机训练算法的计算复杂度并提高分类器的精度。  相似文献   

3.
程昊翔  王坚 《控制与决策》2016,31(4):755-758
为了使数据集的内在分布更好地影响训练模型,提出一种密度加权孪生支持向量回归机算法.该算法通过k近邻算法计算获得每个数据点基于数据密度分布的密度加权值,并将密度加权值引入到标准孪生支持向量回归机算法中.算法能够很好地反映训练数据集的内在分布,使数据点准确影响训练模型.通过6个UCI数据集上的实验结果分析验证了所提出算法的有效性.  相似文献   

4.
对支持向量机的大规模训练问题进行了深入研究,提出一种类似SMO的块增量算法.该算法利用increase和decrease两个过程依次对每个输入数据块进行学习,避免了传统支持向量机学习算法在大规模数据集情况下急剧增大的计算开销.理论分析表明新算法能够收敛到近似最优解.基于KDD数据集的实验结果表明,该算法能够获得接近线性的训练速率,且泛化性能和支持向量数目与LIBSVM方法的结果接近.  相似文献   

5.
介绍了支持向量机,报告了支持向量机增量学习算法的研究现状,分析了支持向量集在加入新样本后支持向量和非支持向量的转化情况.针对淘汰机制效率不高的问题,提出了一种改进的SVM增量学习淘汰算法--二次淘汰算法.该算法经过两次有效的淘汰,对分类无用的样本进行舍弃,使得新的增量训练在淘汰后的有效数据集进行,而无需在复杂难处理的整个训练数据集中进行,从而显著减少了后继训练时间.理论分析和实验结果表明,该算法能在保证分类精度的同时有效地提高训练速度.  相似文献   

6.
一种SVM增量训练淘汰算法   总被引:8,自引:0,他引:8  
基于KKT条件分析了样本增加后支持向量集的变化情况,深入研究了支持向量分布特点,提出了一种新的支持向量机增量训练淘汰机制——挖心淘汰算法。该算法只需设定一个参数,即可对训练数据进行有效的遗忘淘汰。通过对标准数据集的实验结果表明,使用该方法进行增量训练在保证训练精度的同时,能有效地提高训练速度并降低存储空间的占用。  相似文献   

7.
针对基于支持向量机的Web文本分类效率低的问题,提出了一种基于支持向量机Web文本的快速增量分类FVI-SVM算法。算法保留增量训练集中违反KKT条件的Web文本特征向量,克服了Web文本训练集规模巨大,造成支持向量机训练效率低的缺点。算法通过计算支持向量的共享最近邻相似度,去除冗余支持向量,克服了在增量学习过程中不断加入相似文本特征向量而导致增量学习的训练时间消耗加大、分类效率下降的问题。实验结果表明,该方法在保证分类精度的前提下,有效提高了支持向量机的训练效率和分类效率。  相似文献   

8.
为了减小支持向量回归机(SVR)的计算复杂度、缩短训练时间,将应用于分类问题的近似支持向量机(PSVM)扩展到回归问题中,针对其原始优化问题采用直接法求取最优解,而不是转换为对偶问题求解,给出了近似支持向量回归机(PSVR)线性和非线性回归算法.并与同样基于等式约束的最小二乘支持向量回归机(LSSVR)进行了比较,在一维、二维函数回归以及不同规模通用数据集上的测试结果表明,PSVR算法简单,训练速度快,尤其在大规模数据集处理上更具优势.  相似文献   

9.
摘要针对经典支持向量机难以快速有效地进行增量学习的缺点,提出了基于KKT条件与壳向量的增量学习算法,该算法首先选择包含所有支持向量的壳向量,利用KKT条件淘汰新增样本中无用样本,减小参与训练的样本数目,然后在新的训练集中快速训练支持向量机进行增量学习。将该算法应用于UCI数据集和电路板故障分类识别,实验结果表明,该算法不仅能保证学习机器的精度和良好的推广能力,而且其学习速度比经典的SMO算法快,可以进行增量学习。  相似文献   

10.
一种SVM增量学习淘汰算法   总被引:1,自引:1,他引:1  
基于SVM寻优问题的KKT条件和样本之间的关系,分析了样本增加后支持向量集的变化情况,支持向量在增量学习中的活动规律,提出了一种新的支持向量机增量学习遗忘机制--计数器淘汰算法.该算法只需设定一个参数,即可对训练数据进行有效的遗忘淘汰.通过对标准数据集的实验结果表明,使用该方法进行增量学习在保证训练精度的同时,能有效地提高训练速度并降低存储空间的占用.  相似文献   

11.
Several algorithms have been proposed in the literature for building decision trees (DT) for large datasets, however almost all of them have memory restrictions because they need to keep in main memory the whole training set, or a big amount of it, and such algorithms that do not have memory restrictions, because they choose a subset of the training set, need extra time for doing this selection or have parameters that could be very difficult to determine. In this paper, we introduce a new algorithm that builds decision trees using a fast splitting attribute selection (DTFS) for large datasets. The proposed algorithm builds a DT without storing the whole training set in main memory and having only one parameter but being very stable regarding to it. Experimental results on both real and synthetic datasets show that our algorithm is faster than three of the most recent algorithms for building decision trees for large datasets, getting a competitive accuracy.  相似文献   

12.
为了提高支持向量机求解大规模问题的训练速度,提出了一种新的工作集选择策略--预备工作集策略:在SMO中,利用可行方向策略提取最大违反对的同时,从核缓存cache中提取违反KKT条件程度最大的一系列样本组成预备工作集,为此后历次SMO迭代优化提供工作集.该方法提高了核缓存的命中率,减少了工作集选择的代价.理论分析和实验结果表明,预备工作集策略能够很好地胜任待优化的工作集,加快了支持向量机求解大规模问题的训练速度.  相似文献   

13.
Instance selection is becoming more and more relevant due to the huge amount of data that is being constantly produced. However, although current algorithms are useful for fairly large datasets, scaling problems are found when the number of instances is of hundreds of thousands or millions. In the best case, these algorithms are of efficiency O(n 2), n being the number of instances. When we face huge problems, scalability is an issue, and most algorithms are not applicable. This paper presents a divide-and-conquer recursive approach to the problem of instance selection for instance based learning for very large problems. Our method divides the original training set into small subsets where the instance selection algorithm is applied. Then the selected instances are rejoined in a new training set and the same procedure, partitioning and application of an instance selection algorithm, is repeated. In this way, our approach is based on the philosophy of divide-and-conquer applied in a recursive manner. The proposed method is able to match, and even improve, for the case of storage reduction, the results of well-known standard algorithms with a very significant reduction of execution time. An extensive comparison in 30 datasets form the UCI Machine Learning Repository shows the usefulness of our method. Additionally, the method is applied to 5 huge datasets with from 300,000 to more than a million instances, with very good results and fast execution time.  相似文献   

14.
基于快速估计的相关向量机优化算法   总被引:1,自引:0,他引:1       下载免费PDF全文
赵榈  苏一丹  覃华 《计算机工程》2012,38(9):205-207
针对相关向量机在大规模数据集上训练速度较慢的问题,提出一种基于快速估计的相关向量机优化算法。利用阈值系数、约减最大上限并结合迭代估计,对训练样本的超参进行快速预估计,去除训练集中大量的非相关向量,减小训练样本规模,减少训练时间。在UCI等数据集上的实验结果表明,该算法在保持训练精度的同时具有更快的训练速度。  相似文献   

15.
针对标签信息不完整的多标签分类问题,一种新的多标签算法MCWD被提出。它通过有效地恢复训练数据中缺失的标签信息,能够产生更好的分类结果。在训练阶段,MCWD通过迭代更新每个训练实例的权重以及利用两两标签之间的相关性来恢复训练数据中缺失的标签信息;在标签恢复完毕后,利用新得到的训练集来训练分类模型;用此模型对测试集进行预测。实验结果表明,该算法在14个多标签数据集上具有一定的优势。  相似文献   

16.
Inductive Logic Programming (ILP) combines rule-based and statistical artificial intelligence methods, by learning a hypothesis comprising a set of rules given background knowledge and constraints for the search space. We focus on extending the XHAIL algorithm for ILP which is based on Answer Set Programming and we evaluate our extensions using the Natural Language Processing application of sentence chunking. With respect to processing natural language, ILP can cater for the constant change in how we use language on a daily basis. At the same time, ILP does not require huge amounts of training examples such as other statistical methods and produces interpretable results, that means a set of rules, which can be analysed and tweaked if necessary. As contributions we extend XHAIL with (i) a pruning mechanism within the hypothesis generalisation algorithm which enables learning from larger datasets, (ii) a better usage of modern solver technology using recently developed optimisation methods, and (iii) a time budget that permits the usage of suboptimal results. We evaluate these improvements on the task of sentence chunking using three datasets from a recent SemEval competition. Results show that our improvements allow for learning on bigger datasets with results that are of similar quality to state-of-the-art systems on the same task. Moreover, we compare the hypotheses obtained on datasets to gain insights on the structure of each dataset.  相似文献   

17.
《Information Fusion》2008,9(1):41-55
Ensemble methods for classification and regression have focused a great deal of attention in recent years. They have shown, both theoretically and empirically, that they are able to perform substantially better than single models in a wide range of tasks. We have adapted an ensemble method to the problem of predicting future values of time series using recurrent neural networks (RNNs) as base learners. The improvement is made by combining a large number of RNNs, each of which is generated by training on a different set of examples. This algorithm is based on the boosting algorithm where difficult points of the time series are concentrated on during the learning process however, unlike the original algorithm, we introduce a new parameter for tuning the boosting influence on available examples. We test our boosting algorithm for RNNs on single-step-ahead and multi-step-ahead prediction problems. The results are then compared to other regression methods, including those of different local approaches. The overall results obtained through our ensemble method are more accurate than those obtained through the standard method, backpropagation through time, on these datasets and perform significantly better even when long-range dependencies play an important role.  相似文献   

18.
跨域目标检测是最近兴起的研究方向,旨在解决训练集到测试集的泛化问题.在已有的方法中利用图像风格转换并在转换后的数据集上训练模型是一个有效的方法,然而这一方法存在不能端到端训练的问题,效率低,流程繁琐.为此,我们提出一种新的基于图像风格迁移的跨域目标检测算法,可以把图像风格迁移和目标检测结合在一起,进行端到端训练,大大简...  相似文献   

19.
The training of neural classifiers with condensed datasets   总被引:2,自引:0,他引:2  
In this paper we apply a k-nearest-neighbor-based data condensing algorithm to the training set of multilayer perceptron neural networks. By removing the overlapping data and retaining only training exemplars adjacent to the decision boundary we are able to significantly speed the network training time while achieving an undegraded misclassification rate compared to a network trained on the unedited training set. We report results on a range of synthetic and real datasets that indicate that a training speed-up of an order of magnitude is typical.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号