首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 109 毫秒
1.
核函数及相关参数的选择是支持向量机中的一个重要问题, 它对模型的推广能力有很大的影响。当有大量样本参与训练的时候,寻找最优参数的网格搜索算法将消耗过长的时间。针对这一问题,提出一种舍弃非支持向量的样本点的策略,从而缩减了训练样本集。能够在基本保持原有测试准确度的前提下,将搜索时间减少一半。  相似文献   

2.
数据分块数的选择是并行/分布式机器学习模型选择的基本问题之一,直接影响着机器学习算法的泛化性和运行效率。现有并行/分布式机器学习方法往往根据经验或处理器个数来选择数据分块数,没有明确的数据分块数选择准则。提出一个并行效率敏感的并行/分布式机器学习数据分块数选择准则,该准则可在保证并行/分布式机器学习模型测试精度的情况下,提高计算效率。首先推导并行/分布式机器学习模型的泛化误差与分块数目的关系。然后以此为基础,提出折衷泛化性与并行效率的数据分块数选择准则。最后,在ADMM框架下随机傅里叶特征空间中,给出采用该数据分块数选择准则的大规模支持向量机实现方案,并在高性能计算集群和大规模标准数据集上对所提出的数据分块数选择准则的有效性进行实验验证。  相似文献   

3.
Support vector machines (SVMs) are a class of popular classification algorithms for their high generalization ability. However, it is time-consuming to train SVMs with a large set of learning samples. Improving learning efficiency is one of most important research tasks on SVMs. It is known that although there are many candidate training samples in some learning tasks, only the samples near decision boundary which are called support vectors have impact on the optimal classification hyper-planes. Finding these samples and training SVMs with them will greatly decrease training time and space complexity. Based on the observation, we introduce neighborhood based rough set model to search boundary samples. Using the model, we firstly divide sample spaces into three subsets: positive region, boundary and noise. Furthermore, we partition the input features into four subsets: strongly relevant features, weakly relevant and indispensable features, weakly relevant and superfluous features, and irrelevant features. Then we train SVMs only with the boundary samples in the relevant and indispensable feature subspaces, thus feature and sample selection is simultaneously conducted with the proposed model. A set of experimental results show the model can select very few features and samples for training; in the mean time the classification performances are preserved or even improved.  相似文献   

4.
核选择问题是支持向量机(Support Vector Machine,SVM)建模中的一个关键问题,虽然支持向量机具有良好的泛化性能,但其性能受核函数的影响比较明显,而对于一个给定问题,选择合适的核函数及参数通常很困难。提出一种基于SVM集成的核选择方法,利用不同的核函数构造子SVM学习器,然后对子学习器的预测结果集成。提出的核选择方法将SVM集成学习与核选择同时进行,不仅避免了单个SVM的核选择对泛化能力的影响,而且可以获得良好的泛化能力。在UCI标准数据集上的结果说明了提出的方法的有效性。  相似文献   

5.
Support vector machines (SVMs) are the effective machine-learning methods based on the structural risk minimization (SRM) principle, which is an approach to minimize the upper bound risk functional related to the generalization performance. The parameter selection is an important factor that impacts the performance of SVMs. Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) is an evolutionary optimization strategy, which is used to optimize the parameters of SVMs in this paper. Compared with the traditional SVMs, the optimal SVMs using CMA-ES have more accuracy in predicting the Lorenz signal. The industry case illustrates that the proposed method is very successfully in forecasting the short-term fault of large machinery.  相似文献   

6.
Type-2 fuzzy logic-based classifier fusion for support vector machines   总被引:1,自引:0,他引:1  
As a machine-learning tool, support vector machines (SVMs) have been gaining popularity due to their promising performance. However, the generalization abilities of SVMs often rely on whether the selected kernel functions are suitable for real classification data. To lessen the sensitivity of different kernels in SVMs classification and improve SVMs generalization ability, this paper proposes a fuzzy fusion model to combine multiple SVMs classifiers. To better handle uncertainties existing in real classification data and in the membership functions (MFs) in the traditional type-1 fuzzy logic system (FLS), we apply interval type-2 fuzzy sets to construct a type-2 SVMs fusion FLS. This type-2 fusion architecture takes considerations of the classification results from individual SVMs classifiers and generates the combined classification decision as the output. Besides the distances of data examples to SVMs hyperplanes, the type-2 fuzzy SVMs fusion system also considers the accuracy information of individual SVMs. Our experiments show that the type-2 based SVM fusion classifiers outperform individual SVM classifiers in most cases. The experiments also show that the type-2 fuzzy logic-based SVMs fusion model is better than the type-1 based SVM fusion model in general.  相似文献   

7.
A parallel mixture of SVMs for very large scale problems   总被引:7,自引:0,他引:7  
Support vector machines (SVMs) are the state-of-the-art models for many classification problems, but they suffer from the complexity of their training algorithm, which is at least quadratic with respect to the number of examples. Hence, it is hopeless to try to solve real-life problems having more than a few hundred thousand examples with SVMs. This article proposes a new mixture of SVMs that can be easily implemented in parallel and where each SVM is trained on a small subset of the whole data set. Experiments on a large benchmark data set (Forest) yielded significant time improvement (time complexity appears empirically to locally grow linearly with the number of examples). In addition, and surprisingly, a significant improvement in generalization was observed.  相似文献   

8.
张耿  张桂新 《微机发展》2007,17(7):24-27
支持向量机(SVM)算法是统计学习理论中最年轻的分支。结构风险最小化原则使其具有良好的学习推广性。但在实际应用中,训练速度慢一直是支持向量机理论几个亟待解决的问题之一,这一点在SVM向多类问题领域推广时表现的尤为明显。文中将从样本分布与类别数量两方面入手,对传统的SVM多分类OAO算法进行训练时间性能上的分析,并引入分层的思想,提出传统OAO-SVMs算法的改进模型H-OAO-SVMs。通过与其他常见多分类SVMs训练时间的比较表明:改进后的H-OAO-SVMs模型具有更优的训练时间性能。  相似文献   

9.
Support vector machines (SVMs) are one of the most popular classification tools and show the most potential to address under-sampled noisy data (a large number of features and a relatively small number of samples). However, the computational cost is too expensive, even for modern-scale samples, and the performance largely depends on the proper setting of parameters. As the data scale increases, the improvement in speed becomes increasingly challenging. As the dimension (feature number) largely increases while the sample size remains small, the avoidance of overfitting becomes a significant challenge. In this study, we propose a two-phase sequential minimal optimization (TSMO) to largely reduce the training cost for large-scale data (tested with 3186–70,000-sample datasets) and a two-phased-in differential-learning particle swarm optimization (tDPSO) to ensure the accuracy for under-sampled data (tested with 2000–24481-feature datasets). Because the purpose of training SVMs is to identify support vectors that denote a hyperplane, TSMO is developed to quickly select support vector candidates from the entire dataset and then identify support vectors from those candidates. In this manner, the computational burden is largely reduced (a 29.4%–65.3% reduction rate). The proposed tDPSO uses topology variation and differential learning to solve PSO’s premature convergence issue. Population diversity is ensured through dynamic topology until a ring connection is achieved (topology-variation phases). Further, particles initiate chemo-type simulated-annealing operations, and the global-best particle takes a two-turn diversion in response to stagnation (event-induced phases). The proposed tDPSO-embedded SVMs were tested with several under-sampled noisy cancer datasets and showed superior performance over various methods, even those methods with feature selection for the preprocessing of data.  相似文献   

10.
GA-based learning bias selection mechanism for real-time scheduling systems   总被引:1,自引:0,他引:1  
The use of machine learning technologies in order to develop knowledge bases (KBs) for real-time scheduling (RTS) problems has produced encouraging results in recent researches. However, few researches focus on the manner of selecting proper learning biases in the early developing stage of the RTS system to enhance the generalization ability of the resulting KBs. The selected learning bias usually assumes a set of proper system features that are known in advance. Moreover, the machine learning algorithm for developing scheduling KBs is predetermined. The purpose of this study is to develop a genetic algorithm (GA)-based learning bias selection mechanism to determine an appropriate learning bias that includes the machine learning algorithm, feature subset, and learning parameters. Three machine learning algorithms are considered: the back propagation neural network (BPNN), C4.5 decision tree (DT) learning, and support vector machines (SVMs). The proposed GA-based learning bias selection mechanism can search the best machine learning algorithm and simultaneously determine the optimal subset of features and the learning parameters used to build the RTS system KBs. In terms of the accuracy of prediction of unseen data under various performance criteria, it also offers better generalization ability as compared to the case where the learning bias selection mechanism is not used. Furthermore, the proposed approach to build RTS system KBs can improve the system performance as compared to other classifier KBs under various performance criteria over a long period.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号