首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Adaptive binary tree for fast SVM multiclass classification   总被引:1,自引:0,他引:1  
Jin  Cheng  Runsheng   《Neurocomputing》2009,72(13-15):3370
This paper presents an adaptive binary tree (ABT) to reduce the test computational complexity of multiclass support vector machine (SVM). It achieves a fast classification by: (1) reducing the number of binary SVMs for one classification by using separating planes of some binary SVMs to discriminate other binary problems; (2) selecting the binary SVMs with the fewest average number of support vectors (SVs). The average number of SVs is proposed to denote the computational complexity to exclude one class. Compared with five well-known methods, experiments on many benchmark data sets demonstrate our method can speed up the test phase while remain the high accuracy of SVMs.  相似文献   

2.
3.
Large-margin methods, such as support vector machines (SVMs), have been very successful in classification problems. Recently, maximum margin discriminant analysis (MMDA) was proposed that extends the large-margin idea to feature extraction. It often outperforms traditional methods such as kernel principal component analysis (KPCA) and kernel Fisher discriminant analysis (KFD). However, as in the SVM, its time complexity is cubic in the number of training points m, and is thus computationally inefficient on massive data sets. In this paper, we propose an (1+epsilon)(2)-approximation algorithm for obtaining the MMDA features by extending the core vector machine. The resultant time complexity is only linear in m, while its space complexity is independent of m. Extensive comparisons with the original MMDA, KPCA, and KFD on a number of large data sets show that the proposed feature extractor can improve classification accuracy, and is also faster than these kernel-based methods by over an order of magnitude.  相似文献   

4.
Support vector machines (SVMs) have good accuracy and generalization properties, but they tend to be slow to classify new examples. In contrast to previous work that aims to reduce the time required to fully classify all examples, we present a method that provides the best-possible classification given a specific amount of computational time. We construct two SVMs: a “full” SVM that is optimized for high accuracy, and an approximation SVM (via reduced-set or subset methods) that provides extremely fast, but less accurate, classifications. We apply the approximate SVM to the full data set, estimate the posterior probability that each classification is correct, and then use the full SVM to reclassify items in order of their likelihood of misclassification. Our experimental results show that this method rapidly achieves high accuracy, by selectively devoting resources (reclassification) only where needed. It also provides the first such progressive SVM solution that can be applied to multiclass problems.  相似文献   

5.
More than two decades ago the imbalanced data problem turned out to be one of the most important and challenging problems. Indeed, missing information about the minority class leads to a significant degradation in classifier performance. Moreover, comprehensive research has proved that there are certain factors increasing the problem’s complexity. These additional difficulties are closely related to the data distribution over decision classes. In spite of numerous methods which have been proposed, the flexibility of existing solutions needs further improvement. Therefore, we offer a novel rough–granular computing approach (RGA, in short) to address the mentioned issues. New synthetic examples are generated only in specific regions of feature space. This selective oversampling approach is applied to reduce the number of misclassified minority class examples. A strategy relevant for a given problem is obtained by formation of information granules and an analysis of their degrees of inclusion in the minority class. Potential inconsistencies are eliminated by applying an editing phase based on a similarity relation. The most significant algorithm parameters are tuned in an iterative process. The set of evaluated parameters includes the number of nearest neighbours, complexity threshold, distance threshold and cardinality redundancy. Each data model is built by exploiting different parameters’ values. The results obtained by the experimental study on different datasets from the UCI repository are presented. They prove that the proposed method of inducing the neighbourhoods of examples is crucial in the proper creation of synthetic positive instances. The proposed algorithm outperforms related methods in most of the tested datasets. The set of valid parameters for the Rough–Granular Approach (RGA) technique is established.  相似文献   

6.
Support vector machines (SVMs), initially proposed for two-class classification problems, have been very successful in pattern recognition problems. For multi-class classification problems, the standard hyperplane-based SVMs are made by constructing and combining several maximal-margin hyperplanes, and each class of data is confined into a certain area constructed by those hyperplanes. Instead of using hyperplanes, hyperspheres that tightly enclosed the data of each class can be used. Since the class-specific hyperspheres are constructed for each class separately, the spherical-structured SVMs can be used to deal with the multi-class classification problem easily. In addition, the center and radius of the class-specific hypersphere characterize the distribution of examples from that class, and may be useful for dealing with imbalance problems. In this paper, we incorporate the concept of maximal margin into the spherical-structured SVMs. Besides, the proposed approach has the advantage of using a new parameter on controlling the number of support vectors. Experimental results show that the proposed method performs well on both artificial and benchmark datasets.  相似文献   

7.
为解决网络入侵检测系统中检测算法分类精度不高训练样本数需要较多以及训练学习时间较长等问题,在基于支持向量机的基础上,提出一种新的利用隐空间支持向量机设计IDS的检测算法.仿真实验结果表明本算法较基于支持向量机的检测算法具有更良好的泛化性能,更快的迭代速度,更高的检测精度和更低的误报率.  相似文献   

8.
传统支持向量机通常关注于数据分布的边缘样本,支持向量通常在这些边缘样本中产生。本文提出一个新的支持向量算法,该算法的支持向量从全局的数据分布中产生,其稀疏性能在大部分数据集上远远优于经典支持向量机算法。该算法在多类问题上的时间复杂度仅等价于原支持向量机算法的二值问题,解决了设计多类算法时变量数目庞大或者二值子分类器数目过多的问题。  相似文献   

9.
Support vector machines (SVMs) are a class of popular classification algorithms for their high generalization ability. However, it is time-consuming to train SVMs with a large set of learning samples. Improving learning efficiency is one of most important research tasks on SVMs. It is known that although there are many candidate training samples in some learning tasks, only the samples near decision boundary which are called support vectors have impact on the optimal classification hyper-planes. Finding these samples and training SVMs with them will greatly decrease training time and space complexity. Based on the observation, we introduce neighborhood based rough set model to search boundary samples. Using the model, we firstly divide sample spaces into three subsets: positive region, boundary and noise. Furthermore, we partition the input features into four subsets: strongly relevant features, weakly relevant and indispensable features, weakly relevant and superfluous features, and irrelevant features. Then we train SVMs only with the boundary samples in the relevant and indispensable feature subspaces, thus feature and sample selection is simultaneously conducted with the proposed model. A set of experimental results show the model can select very few features and samples for training; in the mean time the classification performances are preserved or even improved.  相似文献   

10.
Identifying statistically significant anomalies in an unlabeled data set is of key importance in many applications such as financial security and remote sensing. Rare category detection (RCD) helps address this issue by passing candidate data examples to a labeling oracle (e.g., a human expert) for labeling. A challenging task in RCD is to discover all categories without any prior information about the given data set. A few approaches have been proposed to address this issue, which are on quadratic or cubic time complexities w.r.t. the data set size N and require considerable labeling queries involving time-consuming and expensive labeling effort of a human expert. In this paper, aiming at solutions with lower time complexity and less labeling queries, we propose two prior-free (i.e., without any prior information about a given data set) RCD algorithms, namely (1) iFRED which achieves linear time complexity w.r.t. N, and (2) vFRED which substantially reduces the number of labeling queries. This is done by tabulating each dimension of the data set into bins, followed by zooming out to shrink each bin down to a position and conducting wavelet analysis on the data density function to fast locate the position (i.e., a bin) of a rare category, and zooming in the located bin to select candidate data examples for labeling. Theoretical analysis guarantees the effectiveness of our algorithms, and comprehensive experiments on both synthetic and real data sets further verify the effectiveness and efficiency.  相似文献   

11.
Cutting-plane training of structural SVMs   总被引:4,自引:0,他引:4  
Discriminative training approaches like structural SVMs have shown much promise for building highly complex and accurate models in areas like natural language processing, protein structure prediction, and information retrieval. However, current training algorithms are computationally expensive or intractable on large datasets. To overcome this bottleneck, this paper explores how cutting-plane methods can provide fast training not only for classification SVMs, but also for structural SVMs. We show that for an equivalent “1-slack” reformulation of the linear SVM training problem, our cutting-plane method has time complexity linear in the number of training examples. In particular, the number of iterations does not depend on the number of training examples, and it is linear in the desired precision and the regularization parameter. Furthermore, we present an extensive empirical evaluation of the method applied to binary classification, multi-class classification, HMM sequence tagging, and CFG parsing. The experiments show that the cutting-plane algorithm is broadly applicable and fast in practice. On large datasets, it is typically several orders of magnitude faster than conventional training methods derived from decomposition methods like SVM-light, or conventional cutting-plane methods. Implementations of our methods are available at .  相似文献   

12.
Rare-category detection helps discover new rare classes in an unlabeled data set by selecting their candidate data examples for labeling. Most of the existing approaches for rare-category detection require prior information about the data set without which they are otherwise not applicable. The prior-free algorithms try to address this problem without prior information about the data set; though, the compensation is high time complexity, which is not lower than $O(dN^2)$ where $N$ is the number of data examples in a data set and $d$ is the data set dimension. In this paper, we propose CLOVER a prior-free algorithm by introducing a novel rare-category criterion known as local variation degree (LVD), which utilizes the characteristics of rare classes for identifying rare-class data examples from other types of data examples and passes those data examples with maximum LVD values to CLOVER for labeling. A remarkable improvement is that CLOVER’s time complexity is $O(dN^{2-1/d})$ for $d > 1$ or $O(N\log N)$ for $d = 1$ . Extensive experimental results on real data sets demonstrate the effectiveness and efficiency of our method in terms of new rare classes discovery and lower time complexity.  相似文献   

13.
Incremental training of support vector machines   总被引:13,自引:0,他引:13  
We propose a new algorithm for the incremental training of support vector machines (SVMs) that is suitable for problems of sequentially arriving data and fast constraint parameter variation. Our method involves using a "warm-start" algorithm for the training of SVMs, which allows us to take advantage of the natural incremental properties of the standard active set approach to linearly constrained optimization problems. Incremental training involves quickly retraining a support vector machine after adding a small number of additional training vectors to the training set of an existing (trained) support vector machine. Similarly, the problem of fast constraint parameter variation involves quickly retraining an existing support vector machine using the same training set but different constraint parameters. In both cases, we demonstrate the computational superiority of incremental training over the usual batch retraining method.  相似文献   

14.
Choosing Multiple Parameters for Support Vector Machines   总被引:158,自引:0,他引:158  
The problem of automatically tuning multiple parameters for pattern recognition Support Vector Machines (SVMs) is considered. This is done by minimizing some estimates of the generalization error of SVMs using a gradient descent algorithm over the set of parameters. Usual methods for choosing parameters, based on exhaustive search become intractable as soon as the number of parameters exceeds two. Some experimental results assess the feasibility of our approach for a large number of parameters (more than 100) and demonstrate an improvement of generalization performance.  相似文献   

15.
Hong Qiao 《Pattern recognition》2007,40(9):2543-2549
Support vector machines (SVMs) are a new and important tool in data classification. Recently much attention has been devoted to large scale data classifications where decomposition methods for SVMs play an important role.So far, several decomposition algorithms for SVMs have been proposed and applied in practice. The algorithms proposed recently and based on rate certifying pair/set provide very attractive features compared with many other decomposition algorithms. They converge not only with finite termination but also in polynomial time. However, it is difficult to reach a good balance between low computational cost and fast convergence.In this paper, we propose a new simple decomposition algorithm based on a new philosophy on working set selection. It has been proven that the working set selected by the new algorithm is a rate certifying set. Further, compared with the existing algorithms based on rate certifying pair/set, our algorithm provides a very good feature in combination of lower computational complexity and faster convergence.  相似文献   

16.
Random forest classifier for remote sensing classification   总被引:4,自引:0,他引:4  
Growing an ensemble of decision trees and allowing them to vote for the most popular class produced a significant increase in classification accuracy for land cover classification. The objective of this study is to present results obtained with the random forest classifier and to compare its performance with the support vector machines (SVMs) in terms of classification accuracy, training time and user defined parameters. Landsat Enhanced Thematic Mapper Plus (ETM+) data of an area in the UK with seven different land covers were used. Results from this study suggest that the random forest classifier performs equally well to SVMs in terms of classification accuracy and training time. This study also concludes that the number of user‐defined parameters required by random forest classifiers is less than the number required for SVMs and easier to define.  相似文献   

17.
杨静  于旭  谢志强 《计算机学报》2012,35(5):1002-1010
针对基于向量投影的支持向量预选取方法选取投影直线过于简单粗糙,导致需要选取较多的边界向量才能包含原始问题的支持向量的问题,提出了一种新的支持向量预选取方法.该方法通过定义好的投影直线具备的3个必要特征,提出:对于线性可分情况,利用Fisher线性判别算法来获取最佳的投影直线;对于非线性可分情况,利用特征空间中心向量所在直线作为相应的投影直线.由于该方法确定的投影直线可以更好地对样本投影进行分离,因此,与基于向量投影的支持向量预选取方法相比,该方法可用更少的原始样本来构造边界向量集合,可有效降低支持向量机算法的时空复杂度.在两个人工数据集和一个现实数据集上的实验表明,所提方法不仅可以达到以往各种实用的支持向量机算法分类精度,而且更为高效.  相似文献   

18.
一种基于聚类的PU主动文本分类方法   总被引:1,自引:0,他引:1  
刘露  彭涛  左万利  戴耀康 《软件学报》2013,24(11):2571-2583
文本分类是信息检索的关键问题之一.提取更多的可信反例和构造准确高效的分类器是PU(positive andunlabeled)文本分类的两个重要问题.然而,在现有的可信反例提取方法中,很多方法提取的可信反例数量较少,构建的分类器质量有待提高.分别针对这两个重要步骤提供了一种基于聚类的半监督主动分类方法.与传统的反例提取方法不同,利用聚类技术和正例文档应与反例文档共享尽可能少的特征项这一特点,从未标识数据集中尽可能多地移除正例,从而可以获得更多的可信反例.结合SVM 主动学习和改进的Rocchio 构建分类器,并采用改进的TFIDF(term frequency inverse document frequency)进行特征提取,可以显著提高分类的准确度.分别在3 个不同的数据集中测试了分类结果(RCV1,Reuters-21578,20 Newsgoups).实验结果表明,基于聚类寻找可信反例可以在保持较低错误率的情况下获取更多的可信反例,而且主动学习方法的引入也显著提升了分类精度.  相似文献   

19.
Support vector machines (SVMs) have been applied successfully to construction knowledge domains. However, SVMs, as a baseline model, still have a potential improvement space by integrating hybrid intelligence. This work compares the performance of various classification models using the combination of fuzzy logic, a fast and messy genetic algorithm, and SVMs. A set of public–private partnership projects was collected as a real case study in construction management. The data were split into mutually independent folds for cross validation. Experimental results demonstrate that the proposed hybrid artificial intelligence system has the best and most reliable classification accuracy at 77.04%, a 24.76% improvement compared with that of SVMs in predicting project dispute resolution (PDR) outcomes (i.e., mediation, arbitration, litigation, negotiation, and administrative appeals) when the dispute category and phase in which a dispute occurs are known during project execution. This work demonstrates the improvement capability of hybrid intelligence in classifying PDR predictions related to public infrastructure projects.  相似文献   

20.
Support vector machines (SVMs) have been broadly applied to classification problems. However, a successful application of SVMs depends heavily on the determination of the right type and suitable hyperparameter settings of kernel functions. Recently, multiple-kernel learning (MKL) algorithms have been developed to deal with these issues by combining different kernels together. The weight with each kernel in the combination is obtained through learning. Lanckriet et al. proposed a way of deriving the weights by transforming the learning into a semidefinite programming (SDP) problem with a transduction setting. However, the amount of time and space required by this method is demanding. In this paper, we reformulate the SDP problem with an induction setting and incorporate two strategies to reduce the search complexity of the learning process, based on the comments discussed in the Lanckriet et al. paper. The primal and dual forms of SDP are derived. A discussion on computation complexity is given. Experimental results obtained from synthetic and benchmark datasets show that the proposed method runs efficiently in multiple-kernel learning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号