首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 20 毫秒
1.
为验证理论训练数量(10~30 p)对参数分类器(如最大似然分类)、非参数分类器(如支撑向量机)的适用性以及样本特征(光谱统计、空间分布特征)对分类器分类精度的影响,选择不同规模的训练样本进行最大似然分类和支撑向量机分类,分析分类精度与样本之间的关系。实验结果表明:随着样本量的增加,最大似然、支撑向量机分类精度均随样本量增多而提高并趋于稳定,最大似然分类精度的增长速度要快于支撑向量机。MLC受样本量的影响较大,在小样本的时候(5个),分类精度不稳定,超过30个样本的时候,分类精度稳定下来;对于SVM分类器,在小样本的时候(5个),分类精度较高且稳定,因此SVM分类适合于小样本分类,不受限于理论样本量的影响。当样本量超过最小理论样本量值(30个)的时候,最大似然分类精度要优于支撑向量机,主要是由于当样本量增加后,最大似然更易于获得有效的信息量样本,而对于支撑向量机边缘信息样本的增加数量不大。研究结果为进一步优化样本进行分类打下前期的实验基础。  相似文献   

2.
针对传统的半监督SVM训练方法把大量时间花费在非支持向量优化上的问题,提出了在凹半监督支持向量机方法中采用遗传FCM(Genetic Fuzzy C Mean,遗传模糊C均值)进行工作集样本预选取的方法。半监督SVM优化学习过程中,在原来训练集上(标签数据)加入了工作集(无标签数据),从而构成了新的训练集。该方法首先利用遗传FCM算法将未知数据划分成某个数量的子集,然后用凹半监督SVM对新数据进行训练得到决策边界与支持矢量,最后对无标识数据进行分类。这样通过减小工作样本集,选择那些可能成为支持向量的边界向量来加入训练集,减少参与训练的样本总数,从而减小了内存开销。并且以随机三维数据为例进行分析,实验结果表明,工作集减小至原工作集的一定范围内,按比例减少工作集后的分类准确率、支持向量数与用原工作集相比差别不大,而分类时间却大为减少,获得了较为理想的样本预选取效果。  相似文献   

3.
为解决监督学习过程中难以获得大量带有类标记样本且样本数据标记代价较高的问题,结合主动学习和半监督学习方法,提出基于Tri-training半监督学习和凸壳向量的SVM主动学习算法.通过计算样本集的壳向量,选择最有可能成为支持向量的壳向量进行标记.为解决以往主动学习算法在选择最富有信息量的样本标记后,不再进一步利用未标记样本的问题,将Tri-training半监督学习方法引入SVM主动学习过程,选择类标记置信度高的未标记样本加入训练样本集,利用未标记样本集中有利于学习器的信息.在UCI数据集上的实验表明,文中算法在标记样本较少时获得分类准确率较高和泛化性能较好的SVM分类器,降低SVM训练学习的样本标记代价.  相似文献   

4.
5.
SVM在处理不平衡数据分类问题(class imbalance problem)时,其分类结果常倾向于多数类。为此,综合考虑类间不平衡和类内不平衡,提出一种基于聚类权重的分阶段支持向量机(WSVM)。预处理时,采用K均值算法得到多数类中各样本的权重。分类时,第一阶段根据权重选出多数类内各簇边界区域的与少数类数目相等的样本;第二阶段对选取的样本和少数类样本进行初始分类;第三阶段用多数类中未选取的样本对初始分类器进行优化调整,当满足停止条件时,得到最终分类器。通过对UCI数据集的大量实验表明,WSVM在少数类样本的识别率和分类器的整体性能上都优于传统分类算法。  相似文献   

6.
This paper presents the implementation of a new text document classification framework that uses the Support Vector Machine (SVM) approach in the training phase and the Euclidean distance function in the classification phase, coined as Euclidean-SVM. The SVM constructs a classifier by generating a decision surface, namely the optimal separating hyper-plane, to partition different categories of data points in the vector space. The concept of the optimal separating hyper-plane can be generalized for the non-linearly separable cases by introducing kernel functions to map the data points from the input space into a high dimensional feature space so that they could be separated by a linear hyper-plane. This characteristic causes the implementation of different kernel functions to have a high impact on the classification accuracy of the SVM. Other than the kernel functions, the value of soft margin parameter, C is another critical component in determining the performance of the SVM classifier. Hence, one of the critical problems of the conventional SVM classification framework is the necessity of determining the appropriate kernel function and the appropriate value of parameter C for different datasets of varying characteristics, in order to guarantee high accuracy of the classifier. In this paper, we introduce a distance measurement technique, using the Euclidean distance function to replace the optimal separating hyper-plane as the classification decision making function in the SVM. In our approach, the support vectors for each category are identified from the training data points during training phase using the SVM. In the classification phase, when a new data point is mapped into the original vector space, the average distances between the new data point and the support vectors from different categories are measured using the Euclidean distance function. The classification decision is made based on the category of support vectors which has the lowest average distance with the new data point, and this makes the classification decision irrespective of the efficacy of hyper-plane formed by applying the particular kernel function and soft margin parameter. We tested our proposed framework using several text datasets. The experimental results show that this approach makes the accuracy of the Euclidean-SVM text classifier to have a low impact on the implementation of kernel functions and soft margin parameter C.  相似文献   

7.
针对支持向量机方法在标记用户数据不充分的情况下无法有效实现托攻击检测的不足,提出一种基于SVM-KNN的半监督托攻击检测方法。根据少量标记用户数据训练一个初始SVM分类器,利用初始SVM对大量未标记用户数据进行分类,挑选出分类边界附近有可能成为支持向量的样本点,利用KNN分类器优化边界向量的标记质量,再将重新标注过的边界向量融入训练集,迭代训练逐步改善SVM的分类边界,最终获得系统决策函数。实验结果表明在标记用户数据较少的情况下,方法能有效提高托攻击的检测精度和效率,具有较强的推广能力。  相似文献   

8.
In pattern classification problem, one trains a classifier to recognize future unseen samples using a training dataset. Practically, one should not expect the trained classifier could correctly recognize samples dissimilar to the training dataset. Therefore, finding the generalization capability of a classifier for those unseen samples may not help in improving the classifiers accuracy. The localized generalization error model was proposed to bound above the generalization mean square error for those unseen samples similar to the training dataset only. This error model is derived based on the stochastic sensitivity measure(ST-SM)of the classifiers. We present the ST-SMS for various Gaussian based classifiers: radial basis function neural networks and support vector machine in this paper. At the end of this work, we compare the decision boundaries visualization using the training samples yielding the largest sensitivity measures and the one using support vectors in the input space.  相似文献   

9.
支持向量机方法具有良好的分类准确率、稳定性与泛化性,在网络流量分类领域已有初步应用,但在面对大规模网络流量分类问题时却存在计算复杂度高、分类器训练速度慢的缺陷。为此,提出一种基于比特压缩的快速SVM方法,利用比特压缩算法对初始训练样本集进行聚合与压缩,建立具有权重信息的新样本集,在损失尽量少原始样本信息的前提下缩减样本集规模,进一步利用基于权重的SVM算法训练流量分类器。通过大规模样本集流量分类实验对比,快速SVM方法能在损失较少分类准确率的情况下,较大程度地缩减流量分类器的训练时间以及未知样本的预测时间,同时,在无过度压缩前提下,其分类准确率优于同等压缩比例下的随机取样SVM方法。本方法在保留SVM方法较好分类稳定性与泛化性能的同时,有效提升了其应对大规模流量分类问题的能力。  相似文献   

10.
SVM-KNN分类算法研究   总被引:1,自引:0,他引:1  
SVM-KNN分类算法是一种将支持向量机(SVM)分类和最近邻(NN)分类相结合的新分类方法。针对传统SVM分类器中存在的问题,该算法通过支持向量机的序列最小优化(SMO)训练算法对数据集进行训练,将距离差小于给定阈值的样本代入以每类所有的支持向量作为代表点的K近邻分类器中进行分类。在UCI数据集上的实验结果表明,该分类器的分类准确率比单纯使用SVM分类器要高,它在一定程度上不受核函数参数选择的影响,具有较好的稳健性。  相似文献   

11.
针对高光谱遥感图像维数高、样本少导致分类精度低的问题,提出一种基于DS聚类的高光谱图像集成分类算法(DSCEA)。首先,根据高光谱数据特点,从整体波段中随机选择一定数量的波段,构成不同的训练样本;其次,分析图像的空谱信息,构造无向加权图,利用优势集(DS)聚类方法得到最大特征差异的波段子集;最后,根据不同样本,利用支持向量机训练具有差异的单个分类器,采用多数表决法集成最终分类器,实现对高光谱遥感图像的分类。在Indian Pines数据集上DSCEA算法的分类精度最高可达到84.61%,在Pavia University数据集上最高可达到91.89%,实验结果表明DSCEA算法可以有效的解决高光谱分类问题。  相似文献   

12.
基于支持向量机的流量分类方法*   总被引:2,自引:0,他引:2  
林森  徐鹏  刘琼 《计算机应用研究》2008,25(8):2488-2490
针对现有流量分类方法存在的准确率低、应用范围受限、计算复杂度高等问题,提出使用支持向量机方法来解决流量分类问题。使用公开的人工标注数据集作为训练集和测试集,通过有监督学习构建支持向量机流量分类器。此外,通过实验进一步分析了训练集大小、核函数、惩罚因子等因素对支持向量机分类性能的影响。实验结果表明支持向量机分类器可以达到98%以上的流分类准确率。  相似文献   

13.
This study evaluates the potential of object-based image analysis in combination with supervised machine learning to identify urban structure type patterns from Landsat Thematic Mapper (TM) images. The main aim is to assess the influence of several critical choices commonly made during the training stage of a learning machine on the classification performance and to give recommendations for classifier-dependent intelligent training. Particular emphasis is given to assess the influence of size and class distribution of the training data, the approach of training data sampling (user-guided or random) and the type of training samples (squares or segments) on the classification performance of a Support Vector Machine (SVM). Different feature selection algorithms are compared and segmentation and classifier parameters are dynamically tuned for the specific image scene, classification task, and training data. The performance of the classifier is measured against a set of reference data sets from manual image interpretation and furthermore compared on the basis of landscape metrics to a very high resolution reference classification derived from light detection and ranging (lidar) measurements. The study highlights the importance of a careful design of the training stage and dynamically tuned classifier parameters, especially when dealing with noisy data and small training data sets. For the given experimental set-up, the study concludes that given optimized feature space and classifier parameters, training an SVM with segment-shaped samples that were sampled in a guided manner and are balanced between the classes provided the best classification results. If square-shaped samples are used, a random sampling provided better results than a guided selection. Equally balanced sample distributions outperformed unbalanced training sets.  相似文献   

14.
在机器学习及其分类问题时经常会遇到非平衡数据集,为了提高非平衡数据集分类的有效性,提出了基于商空间理论的过采样分类算法,即QMSVM算法。对训练集中多数类样本进行聚类结构划分,所得划分结果和少数类样本合并进行线性支持向量机(SVM)学习,从而获取多数类样本的支持向量和错分的样本粒;另一方面,获取少数类样本的支持向量和错分的样本,进行SMOTE采样,最后把上述得到的两类样本合并进行SVM学习,这样来实现学习数据集的再平衡处理,从而得到更加合理的分类超平面。实验结果表明,和其他几种算法相比,所提算法虽在正确分类率上有所降低,但较大改善了g_means值和acc+值,且对非平衡率较大的数据集效果会更好。  相似文献   

15.
The Internet has been flooded with spam emails, and during the last decade there has been an increasing demand for reliable anti-spam email filters. The problem of filtering emails can be considered as a classification problem in the field of supervised learning. Theoretically, many mature technologies, for example, support vector machines (SVM), can be used to solve this problem. However, in real enterprise applications, the training data are typically collected via honeypots and thus are always of huge amounts and highly biased towards spam emails. This challenges both efficiency and effectiveness of conventional technologies. In this article, we propose an undersampling method to compress and balance the training set used for the conventional SVM classifier with minimal information loss. The key observation is that we can make a trade-off between training set size and information loss by carefully defining a similarity measure between data samples. Our experiments show that the SVM classifier provides a better performance by applying our compressing and balancing approach.  相似文献   

16.
利用SVM改进Adaboost算法的人脸检测精度   总被引:1,自引:0,他引:1  
提出利用SVM分类方法改进Adaboost算法的人脸检测精度。该方法先通过Adaboost算法找出图像中的候选人脸区域,根据训练样本集中的人脸和非人脸样本训练出分类器支持向量机(SVM),然后通过SVM分类器从候选人脸区域中最终确定人脸区域。实验结果证明,SVM分类算法可以提高检测精度,使检测算法具有更好的检测效果。  相似文献   

17.
基于SVM的高维多光谱图像分类算法及其特性的研究   总被引:4,自引:0,他引:4  
夏建涛  何明一 《计算机工程》2003,29(13):27-28,89
针对传统模式分类算法在处理高维多光谱图像时面临的困难,文章把支持向量机(Support Vector Machine,SVM)用于高维多光谱图像分类,有效地减弱了Hughes现象,获得了比传统方法更好的分类精度。研究了高维多光谱图像分类中SVM的分类性能与训练样本数目和数据维数之间的关系。实验结果表明,与传统模式分类方法相比,SVM具有分类精度高、推广性强的优点,尤其是当学习样本数目较少、数据维数高时,SVM的优势更加明显。  相似文献   

18.
The monitoring of tool wear status is paramount for guaranteeing the workpiece quality and improving the manufacturing efficiency. In some cases, classifier based on small training samples is preferred because of the complex tool wear process and time consuming samples collection process. In this paper, a tool wear monitoring system based on relevance vector machine (RVM) classifier is constructed to realize multi categories classification of tool wear status during milling process. As a Bayesian algorithm alternative to the support vector machine (SVM), RVM has stronger generalization ability under small training samples. Moreover, RVM classifier results in fewer relevance vectors (RVs) compared with SVM classifier. Hence, it can be carried out much faster compared to the SVM. To show the advantages of the RVM classifier, milling experiment of Titanium alloy was carried out and the multi categories classification of tool wear status under different numbers of training samples and test samples are realized by using SVM and RVM classifier respectively. The comparison of SVM with RVM shows that the RVM can get more accurate results under different number of small training samples. Moreover, the speed of classification is faster than SVM. This method casts some new lights on the industrial environment of the tool condition monitoring.  相似文献   

19.
介绍了支持向量机,报告了支持向量机增量学习算法的研究现状,分析了支持向量集在加入新样本后支持向量和非支持向量的转化情况.针对淘汰机制效率不高的问题,提出了一种改进的SVM增量学习淘汰算法--二次淘汰算法.该算法经过两次有效的淘汰,对分类无用的样本进行舍弃,使得新的增量训练在淘汰后的有效数据集进行,而无需在复杂难处理的整个训练数据集中进行,从而显著减少了后继训练时间.理论分析和实验结果表明,该算法能在保证分类精度的同时有效地提高训练速度.  相似文献   

20.
An adaptive feature fusion framework is proposed for multi-class classification based on SVM. In a similar manner of one-versus-all (OVA), one of the multi-class SVM schemes, the proposed approach decomposes a multi-class classification into several binary classifications. The main difference lies in that each classifier is created with the most suitable feature vectors to discriminate one class from all the other classes. The feature vectors of the unknown samples are selected by each classifier adaptively such that recognition is fulfilled accordingly. In addition, novel evaluation criterions are defined to deal with the frequent small-number sample problems. A writer recognition experiment is carried out to accomplish this framework with three kinds of feature vectors: texture, structure and morphological features. Finally, the performance of the proposed approach is illustrated as compared with the OVA by applying the same feature vectors for all classes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号