共查询到19条相似文献,搜索用时 93 毫秒
1.
2.
《计算机应用与软件》2013,(4)
传统支持向量机算法由于时空复杂度较高,因此很难有效地处理大规模数据。为了降低支持向量机算法的时空复杂度,提出一种基于距离排序的快速支持向量机分类算法。该算法首先计算两类样本点的样本中心,然后对每一个样本计算它与另一类样本中心之间的距离,最后根据距离排序选择一定比例的小距离样本作为边界样本。由于边界样本集合很好地包含了支持向量,而且数目较原始样本集合少得多,因此算法可以在保证支持向量机学习精度的前提下,有效地缩短训练时间和节约存储空间。在UCI标准数据集和20-Newsgroups文本分类数据集上的实验说明算法较以往支持向量预选取算法而言可以更为快速准确地进行支持向量预选取。 相似文献
3.
针对直推式支持向量机(TSVM)需要遍历所有无标签样本花费时间长的缺点,提出一种基于改进k近邻法的直推式支持向量机学习算法--k2TSVM。该算法首先使用k均值聚类将无标签样本分成若干簇,然后求出每簇中心点的k近邻并根据其中正负样本个数对无标签样本进行删减,将删减后的数据集输入直推式支持向量机进行训练。k2TSVM改善传统TSVM需要遍历所有无标签数据的缺点,有效减少训练样本规模,能够提高运行速度。实验结果表明,k2TSVM在降低运行时间的同时,能够取得比类似TSVM改进算法更好的分类结果。
相似文献
4.
传统支持向量机通常关注于数据分布的边缘样本,支持向量通常在这些边缘样本中产生。本文提出一个新的支持向量算法,该算法的支持向量从全局的数据分布中产生,其稀疏性能在大部分数据集上远远优于经典支持向量机算法。该算法在多类问题上的时间复杂度仅等价于原支持向量机算法的二值问题,解决了设计多类算法时变量数目庞大或者二值子分类器数目过多的问题。 相似文献
5.
《计算机应用与软件》2015,(11)
针对多分类支持向量机算法中的低效问题和样本不平衡问题,提出一种有向无环图-双支持向量机DAG-TWSVM(directed acyclic graph and twin support vector machine)的多分类方法。该算法综合了双支持向量机和有向无环图支持向量机的优势,使其不仅能够得到较好的分类精度,同时还能够大大缩减训练时间。在处理较大规模数据集多分类问题时,其时间优势更为突出。采用UCI(University of California Irvine)机器学习数据库和Statlog数据库对该算法进行验证,实验结果表明,有向无环图-双支持向量机多分类方法在训练时间上较其他多分类支持向量机大大缩短,且在样本不平衡时的分类性能要优于其他多分类支持向量机,同时解决了经典支持向量机一对一多分类算法可能存在的不可分区域问题。 相似文献
6.
王睿 《计算机与数字工程》2013,(12):1900-1902
传统转导支持向量机有效地利用了未标记样本,具有较高的分类准确率,但是计算复杂度较高。针对该不足,论文提出了一种基于核聚类的启发式转导支持向量机学习算法。首先将未标记样本利用核聚类算法进行划分,然后对划分后的每一簇样本标记为同一类别,最后根据传统的转导支持向量机算法进行新样本集合上的分类学习。所提方法通过对核聚类后同一簇未标记样本赋予同样的类别,极大地降低了传统转导支持向量机算法的计算复杂度。在MNIST手写阿拉伯数字识别数据集上的实验表明,所提算法较好地保持了传统转导支持向量机分类精度高的优势。 相似文献
7.
李琳 《数字社区&智能家居》2014,(1):115-119
增量式支持向量机学习算法是一种重要的在线学习方法。传统的单增量支持向量机学习算法使用一个数据样本更新支持向量机模型。在增加或删除的数据样本点较多时,这种模型更新模式耗时巨大,具体原因是每个被插入或删除的样本都要进行一次模型参数更新的判断。该文提出一种基于参数规划的多重增量式的支持向量机优化训练算法,使用该训练算法,多重的支持向量机的训练时间大为减少。在合成数据集及真实测试数据集上的实验结果显示,该文提出的方法可以大大降低多重支持向量机训练算法的计算复杂度并提高分类器的精度。 相似文献
8.
针对传统的半监督SVM训练方法把大量时间花费在非支持向量优化上的问题,提出了在凹半监督支持向量机方法中采用遗传FCM(GeneticFuzzyCMean,遗传模糊C均值)进行工作集样本预选取的方法。半监督SVM优化学习过程中,在原来训练集上(标签数据)加入了工作集(无标签数据),从而构成了新的训练集。该方法首先利用遗传FCM算法将未知数据划分成某个数量的子集,然后用凹半监督SVM对新数据进行训练得到决策边界与支持矢量,最后对无标识数据进行分类。这样通过减小工作样本集,选择那些可能成为支持向量的边界向量来加入训练集,减少参与训练的样本总数,从而减小了内存开销。并且以随机三维数据为例进行分析,实验结果表明,工作集减小至原工作集的一定范围内,按比例减少工作集后的分类准确率、支持向量数与用原工作集相比差别不大,而分类时间却大为减少,获得了较为理想的样本预选取效果。 相似文献
9.
《计算机工程与应用》2017,(3):169-173
针对支持向量机(Support Vector Machine,SVM)处理大规模数据集的学习时间长、泛化能力下降等问题,提出基于边界样本选择的支持向量机加速算法。首先,进行无监督的K均值聚类;然后,在各个聚簇内依照簇的混合度、支持度因素应用K近邻算法剔除非边界样本,获得最终的类别边界区域样本,参与SVM模型训练。在标准数据集上的实验结果表明,算法在保持传统支持向量机的分类泛化能力的同时,显著降低了模型训练时间。 相似文献
10.
11.
Adaptive binary tree for fast SVM multiclass classification 总被引:1,自引:0,他引:1
This paper presents an adaptive binary tree (ABT) to reduce the test computational complexity of multiclass support vector machine (SVM). It achieves a fast classification by: (1) reducing the number of binary SVMs for one classification by using separating planes of some binary SVMs to discriminate other binary problems; (2) selecting the binary SVMs with the fewest average number of support vectors (SVs). The average number of SVs is proposed to denote the computational complexity to exclude one class. Compared with five well-known methods, experiments on many benchmark data sets demonstrate our method can speed up the test phase while remain the high accuracy of SVMs. 相似文献
12.
Large-margin methods, such as support vector machines (SVMs), have been very successful in classification problems. Recently, maximum margin discriminant analysis (MMDA) was proposed that extends the large-margin idea to feature extraction. It often outperforms traditional methods such as kernel principal component analysis (KPCA) and kernel Fisher discriminant analysis (KFD). However, as in the SVM, its time complexity is cubic in the number of training points m, and is thus computationally inefficient on massive data sets. In this paper, we propose an (1+epsilon)(2)-approximation algorithm for obtaining the MMDA features by extending the core vector machine. The resultant time complexity is only linear in m, while its space complexity is independent of m. Extensive comparisons with the original MMDA, KPCA, and KFD on a number of large data sets show that the proposed feature extractor can improve classification accuracy, and is also faster than these kernel-based methods by over an order of magnitude. 相似文献
13.
In this paper, a novel automatic image annotation system is proposed, which integrates two sets of support vector machines (SVMs), namely the multiple instance learning (MIL)-based and global-feature-based SVMs, for annotation. The MIL-based bag features are obtained by applying MIL on the image blocks, where the enhanced diversity density (DD) algorithm and a faster searching algorithm are applied to improve the efficiency and accuracy. They are further input to a set of SVMs for finding the optimum hyperplanes to annotate training images. Similarly, global color and texture features, including color histogram and modified edge histogram, are fed into another set of SVMs for categorizing training images. Consequently, two sets of image features are constructed for each test image and are, respectively, sent to the two sets of SVMs, whose outputs are incorporated by an automatic weight estimation method to obtain the final annotation results. Our proposed annotation approach demonstrates a promising performance for an image database of 12 000 general-purpose images from COREL, as compared with some current peer systems in the literature. 相似文献
14.
Given its importance, the problem of predicting rare classes in large-scale multi-labeled data sets has attracted great attention
in the literature. However, rare class analysis remains a critical challenge, because there is no natural way developed for
handling imbalanced class distributions. This paper thus fills this crucial void by developing a method for classification
using local clustering (COG). Specifically, for a data set with an imbalanced class distribution, we perform clustering within
each large class and produce sub-classes with relatively balanced sizes. Then, we apply traditional supervised learning algorithms,
such as support vector machines (SVMs), for classification. Along this line, we explore key properties of local clustering
for a better understanding of the effect of COG on rare class analysis. Also, we provide a systematic analysis of time and
space complexity of the COG method. Indeed, the experimental results on various real-world data sets show that COG produces
significantly higher prediction accuracies on rare classes than state-of-the-art methods and the COG scheme can greatly improve
the computational performance of SVMs. Furthermore, we show that COG can also improve the performances of traditional supervised
learning algorithms on data sets with balanced class distributions. Finally, as two case studies, we have applied COG for
two real-world applications: credit card fraud detection and network intrusion detection. 相似文献
15.
16.
We study support vector machines (SVM) for which the kernel matrix is not specified exactly and it is only known to belong
to a given uncertainty set. We consider uncertainties that arise from two sources: (i) data measurement uncertainty, which
stems from the statistical errors of input samples; (ii) kernel combination uncertainty, which stems from the weight of individual
kernel that needs to be optimized in multiple kernel learning (MKL) problem. Much work has been studied, such as uncertainty
sets that allow the corresponding SVMs to be reformulated as semi-definite programs (SDPs), which is very computationally
expensive however. Our focus in this paper is to identify uncertainty sets that allow the corresponding SVMs to be reformulated
as second-order cone programs (SOCPs), since both the worst case complexity and practical computational effort required to
solve SOCPs is at least an order of magnitude less than that needed to solve SDPs of comparable size. In the main part of
the paper we propose four uncertainty sets that meet this criterion. Experimental results are presented to confirm the validity
of these SOCP reformulations. 相似文献
17.
A parallel mixture of SVMs for very large scale problems 总被引:7,自引:0,他引:7
Support vector machines (SVMs) are the state-of-the-art models for many classification problems, but they suffer from the complexity of their training algorithm, which is at least quadratic with respect to the number of examples. Hence, it is hopeless to try to solve real-life problems having more than a few hundred thousand examples with SVMs. This article proposes a new mixture of SVMs that can be easily implemented in parallel and where each SVM is trained on a small subset of the whole data set. Experiments on a large benchmark data set (Forest) yielded significant time improvement (time complexity appears empirically to locally grow linearly with the number of examples). In addition, and surprisingly, a significant improvement in generalization was observed. 相似文献
18.
Support vector machines (SVMs) are state-of-the-art tools used to address issues pertinent to classification. However, the explanation capabilities of SVMs are also their main weakness, which is why SVMs are typically regarded as incomprehensible black box models. In the present study, a rule extraction algorithm to extract the comprehensible rule from SVMs and enhance their explanation capability is proposed. The proposed algorithm seeks to use the support vectors from a training model of SVMs and combine genetic algorithms for constructing rule sets. The proposed method can not only generate rule sets from SVMs based on the mixed discrete and continuous variables but can also select important variables in the rule set simultaneously. Measurements of accuracy, sensitivity, specificity, and fidelity are utilized to compare the performance of the proposed method with direct learner algorithms and several rule-extraction techniques from SVMs. The results indicate that the proposed method performs at least as well as with the most successful direct rule learners. Finally, an actual case of pressure ulcer was studied, and the results indicated the practicality of our proposed method in real applications. 相似文献
19.
CHEN Jian-Xue 《计算机科学》2004,31(Z2):242-244
This paper presents a novel active learning approach for transductive support vector machines with applications to text classification. The concept of the centroid of the support vectors is proposed so that the selective sampling based on measuring the distance from the unlabeled samples to the centroid is feasible and simple to compute. With additional hypothesis, active learning offers better performance with comparison to regular inductive SVMs and transductive SVMs with random sampling,and it is even competitive to transductive SVMs on all available training data. Experimental results prove that our approach is efficient and easy to implement. 相似文献