首页 | 本学科首页   官方微博 | 高级检索  
 共查询到10条相似文献,搜索用时 78 毫秒
提出了一种改进的SVM(支持向量机)主动学习方法,通过多次迭代提供给用户信息量最大的样本并将其加入训练集,可以大大减少人工标记样本所耗费的代价。为了评估分类器的性能,实验中对包含了五种音乐流派类别(舞曲、抒情、爵士、民乐、摇滚)的801首音乐样本进行了分类,并在分类准确率的收敛速度和达到同等准确率下需要标注的样本数目两个方面验证了提出的SVM主动学习方法的有效性。  相似文献   

Support Vector Machines (SVM) has been developed for Chinese official document classification in One-against-All (OAA) multi-class scheme. Several data retrieving techniques including sentence segmentation, term weighting, and feature extraction are used in preprocess. We observe that most documents of which contents are indistinguishable make poor classification results. The traditional solution is to add misclassified documents to the training set in order to adjust classification rules. In this paper, indistinguishable documents are observed to be informative for strengthening prediction performance since their labels are predicted by the current model in low confidence. A general approach is proposed to utilize decision values in SVM to identify indistinguishable documents. Based on verified classification results and distinguishability of documents, four learning strategies that select certain documents to training sets are proposed to improve classification performance. Experiments report that indistinguishable documents are able to be identified in a high probability and are informative for learning strategies. Furthermore, LMID that adds both of misclassified documents and indistinguishable documents to training sets is the most effective learning strategy in SVM classification for large set of Chinese official documents in terms of computing efficiency and classification accuracy.  相似文献   

Consider a supervised learning problem in which examples contain both numerical- and text-valued features. To use traditional feature-vector-based learning methods, one could treat the presence or absence of a word as a Boolean feature and use these binary-valued features together with the numerical features. However, the use of a text-classification system on this is a bit more problematic—in the most straight-forward approach each number would be considered a distinct token and treated as a word. This paper presents an alternative approach for the use of text classification methods for supervised learning problems with numerical-valued features in which the numerical features are converted into bag-of-words features, thereby making them directly usable by text classification methods. We show that even on purely numerical-valued data the results of text classification on the derived text-like representation outperforms the more naive numbers-as-tokens representation and, more importantly, is competitive with mature numerical classification methods such as C4.5, Ripper, and SVM. We further show that on mixed-mode data adding numerical features using our approach can improve performance over not adding those features.  相似文献   

支持向量机分类器遥感图像分类研究   总被引:1,自引:0,他引:1       下载免费PDF全文
SVM分类器核函数的选择以及参数的设置直接影响系统的泛化能力和运行速度。引入交叉验证技术和栅格搜索技术,对径向基核、多项式核和Sigmoid核函数应用于图像多类别分类的性能进行理论推导、测试及分析,求得三种核函数应用于SVM分类器的性能,并证明了栅格搜索寻找最优参数的有效性。最后通过对TM 6波段BSQ格式遥感图像进行分类对比证明了SVM分类器核函数用于TM图像分类的可行性及高效性。  相似文献   

A novel classification method based on SVM is proposed for binary classification tasks of homogeneous data in this paper. The proposed method can effectively predict the binary labeling of the sequence of observation samples in the test set by using the following procedure: we first make different assumptions about the class labeling of this sequence, then we utilize SVM to obtain two classification errors respectively for each assumption, and finally the binary labeling is determined by comparing the obtained two classification errors. The proposed method leverages the homogeneity within the same classes and exploits the difference between different classes, and hence can achieve the effective classification for homogeneous data. Experimental results indicate the power of the proposed method.  相似文献   

一种基于SVM的P2P网络流量分类方法   总被引:10,自引:1,他引:9       下载免费PDF全文
提出一种基于SVM的P2P网络流量分类的方法。这种方法利用网络流量的统计特征和基于统计理论的SVM方法,对不同应用类型的P2P网络流量进行分类研究。主要对文件共享中的BitTorrent,流媒体中的PPLive,网络电话中的Skype,即时通讯中的MSN 4种P2P网络流量进行分类研究。介绍了基于SVM的P2P流量分类的整体框架,描述了流量样本的获取及处理方法,并对分类器的构建及实验结果进行了介绍。实验结果验证了提出方法的有效性,平均分类精确率为92.38%。  相似文献   

处理不平衡数据分类时,传统支持向量机技术(SVM)对少数类样本识别率较低。鉴于SVM+技术能利用样本间隐藏信息的启发,提出了多任务学习的不平衡SVM+算法(MTL-IC-SVM+)。MTL-IC-SVM+基于SVM+将不平衡数据的分类表示为一个多任务的学习问题,并从纠正分类面的偏移出发,分别赋予多数类和少数类样本不同的错分惩罚因子,且设置少数类样本到分类面的距离大于多数类样本到分类面的距离。UCI数据集上的实验结果表明,MTL-IC-SVM+在不平衡数据分类问题上具有较高的分类精度。  相似文献   

Examining past near-miss reports can provide us with information that can be used to learn about how we can mitigate and control hazards that materialise on construction sites. Yet, the process of analysing near-miss reports can be a time-consuming and labour-intensive process. However, automatic text classification using machine learning and ontology-based approaches can be used to mine reports of this nature. Such approaches tend to suffer from the problem of weak generalisation, which can adversely affect the classification performance. To address this limitation and improve classification accuracy, we develop an improved deep learning-based approach to automatically classify near-miss information contained within safety reports using Bidirectional Transformers for Language Understanding (BERT). Our proposed approach is designed to pre-train deep bi-directional representations by jointly extracting context features in all layers. We validate the effectiveness and feasibility of our approach using a database of near-miss reports derived from actual construction projects that were used to train and test our model. The results demonstrate that our approach can accurately classify ‘near misses’, and outperform prevailing state-of-the-art automatic text classification approaches. Understanding the nature of near-misses can provide site managers with the ability to identify work-areas and instances where the likelihood of an accident may occur.  相似文献   

SVM在基因微阵列癌症数据分类中的应用   总被引:1,自引:0,他引:1  
在总结二分类支持向量机应用的基础上,提出了利用t-验证方法和Wilcoxon验证方法进行特征选取,以支持向量机(SVM)为分类器,针对基因微阵列癌症数据进行分析的新方法,通过对白血病数据集和结肠癌数据集的分类实验,证明提出的方法不但识别率高,而且需要选取的特征子集小,分类速度快,提高了分类的准确性与分类速度。  相似文献   

传统分类器的构建需要正样本和负样本两类数据。在遥感影像分类中,常出现这样一类情形:感兴趣的地物只有一种。由于标记样本耗时耗力,未标记样本往往容易获取并且包含有用信息,鉴于此,提出了一种基于正样本和未标记样本的遥感图像分类方法(PUL)。首先,根据正样本固有特征并结合支持向量数据描述(SVDD)从未标记集筛选出可信正负样本,再将其从未标记集中剔除;接着将其带入SVM训练,根据未标记集在分类器中的表现设立阈值,再从未标记集中筛选出相对可靠的正负样本;最后是加权SVM(Weighted SVM)过程,初始正样本及提取出的可靠正负样本权重为1,SVM训练筛选出的样本权重范围0~1。为验证PUL的有效性,在遥感影像进行分类实验,并与单类支持向量机(OC-SVM)、高斯数据描述(GDD)、支持向量数据描述(SVDD)、有偏SVM(Biased SVM)以及多类SVM分类对比,实验结果表明PUL提高了分类效果,优于上述单类分类方法及多类SVM方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号