首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
张秋余  竭洋  李凯 《计算机应用》2008,28(12):3227-3230
针对模糊支持向量机在文本分类应用中的隶属度函数确定问题,提出了一种基于模糊支持向量机与决策树的文本分类器的构建方法。该方法不仅考虑了样本与类中心之间的关系,还根据传统支持向量机中包含支持向量且平行于分类面的平面构建切球,来确定类中各个样本之间的关系,由样本点与球的位置关系计算其隶属度,可以合理地区分有效样本和噪音、孤立点样本。并与决策树方法相结合,实现多类分类。实验结果表明,该方法具有良好的分类效果。  相似文献   

2.
对支持向量机的多类分类问题进行研究,提出了一种基于核聚类的多类分类方法。利用核聚类方法将原始样本特征映射到高维特征进行聚类分组,对每一组使用一个支持向量机二值分类器进行分类,并用这些二值分类器组成决策树的节点,构成了一个决策分类树。给出决策树的生成算法,提出了利用交叠系数来控制交叠,从而克服错分积累,提高分类准确率。实验结果表明,采用该方法,手写体汉字识别速度和正确率都达到了实用的要求。  相似文献   

3.
针对传统对支持向量机多类分类算法(Multi-TWSVM)中出现的模糊性问题,提出了一种基于遗传算法的决策树对支持向量机(GA-DTTSVM)多类分类算法。GA-DTTSVM用遗传算法对特征数据建立决策树,通过构建决策树可以分离样本的模糊区域,提高模糊区域样本的识别率。在决策树的每个节点上用对支持向量机(TWSVM)训练分类器,最后用训练的分类器进行分类和预测。实验结果表明,与决策树对支持向量机(DTTSVM)多类分类算法以及Multi-TWSVM相比,GA-DTTSVM多类分类算法具有较高的分类精度和较快的训练速度。  相似文献   

4.
针对现有支持向量机多类分类算法在分类精度上的不足,提出一种改进的支持向量机决策树多类分类算法。为了最大限度地减少误差积累的影响,该算法利用投影向量的思想作为衡量类分离性的标准,由此构建非平衡决策树,并且在决策树节点处对正负样本选取不同的惩罚因子来处理不平衡数据集的影响,最后引入KNN算法与SVM共同识别数据集。通过在手写体数字识别数据集上的仿真实验,分析比较各种方法,表明该方法能有效提高分类精度。  相似文献   

5.
为了在标记样本数目有限时尽可能地提高支持向量机的分类精度,提出了一种基于聚类核的半监督支持向量机分类方法。该算法依据聚类假设,即属于同一类的样本点在聚类中被分为同一类的可能性较大的原则去对核函数进行构造。采用K-均值聚类算法对已有的标记样本和所有的无标记样本进行多次聚类,根据最终的聚类结果去构造聚类核函数,从而更好地反映样本间的相似程度,然后将其用于支持向量机的训练和分类。理论分析和计算机仿真结果表明,该方法充分利用了无标记样本信息,提高了支持向量机的分类精度。  相似文献   

6.
一种半监督支持向量机优化方法   总被引:1,自引:1,他引:0  
针对半监督支持向量机在采用间隔最大化思想对有标签样本和无标签样本进行分类时面临的非凸优化问题,提出了一种采用分布估计算法进行半监督支持向量机优化的方法EDA_S3VM。该方法把无标签样本的标签作为需要优化的参数,从而得到一个在标准支持向量机上的组合优化问题,利用分布估计算法通过概率模型的学习和采样来对问题进行求解。在人工数据集和公共数据集上的实验结果表明,EDA_S3VM与其它一些半监督支持向量机算法相比有更高的分类准确率。  相似文献   

7.
支持向量机作为一种新的机器学习方法,由于其建立在结构风险最小化准则之上,而不是仅仅使经验风险达到最小,从而使对支持向量分类器具有较好的推广能力。本文分析了支持向量机在解决无监督分类问题上的不足,提出一种基于支持向量机思想的最大间距的聚类新方法。实验结果表明.该算法能成功地解决很多非监督分类问题。  相似文献   

8.
提出了一种将无监督聚类和支持向量机相结合的新的入侵检测方法。算法具有无监督聚类速度快和支持向量机精度高的优点,其基本思想是通过将网络数据包和聚类中心的比较确定是否需要进一步的采用支持向量机进行分类,从而减少了通过支持向量机的数据量,达到速度与精度的统一。实验采用KDD99的测试数据,结果表明,该方法能够有效的检测网络数据中的已知和未知入侵行为。  相似文献   

9.
针对标签均值半监督支持向量机在图像分类中随机选取无标记样本会导致分类正确率不高,以及算法的稳定性较低的问题,提出了基于聚类标签均值的半监督支持向量机算法。该算法修改了原算法对于无标记样本的惩罚项,对选取的无标记样本聚类,使用聚类标签均值替换标签均值。实验结果表明,使用聚类标签均值训练的分类器大大减少了背景与目标的错分情况,提高了分类的正确率以及算法的稳定性,适合用于图像分类。  相似文献   

10.
提出一种基于超椭球支持向量机的多类文本分类算法。对每一类样本,利用超椭球支持向量机方法在特征空间求得一个超椭球,使其包含该类尽可能多的样本,同时将噪音点排除在外。分类时,利用待分类样本映射到每个超椭球球心的马氏距离确定其类别。在标准数据集Reuters 21578上的实验结果表明,该算法有效地提高了分类精度。  相似文献   

11.
We propose a method for hierarchical clustering based on the decision tree approach. As in the case of supervised decision tree, the unsupervised decision tree is interpretable in terms of rules, i.e., each leaf node represents a cluster, and the path from the root node to a leaf node represents a rule. The branching decision at each node of the tree is made based on the clustering tendency of the data available at the node. We present four different measures for selecting the most appropriate attribute to be used for splitting the data at every branching node (or decision node), and two different algorithms for splitting the data at each decision node. We provide a theoretical basis for the approach and demonstrate the capability of the unsupervised decision tree for segmenting various data sets. We also compare the performance of the unsupervised decision tree with that of the supervised one.  相似文献   

12.
一种SOM和GRNN结合的模式全自动分类新方法   总被引:1,自引:0,他引:1  
非监督学习算法的分类精度通常很难令人满意,而监督的学习算法需要人工选取训练样本,这有时很难得到,并且其分类精度直接依赖于所选取的学习样本。针对这些缺陷,提出一种非监督自组织神经网络(SOMNN)和监督的广义回归网络(GRNN)结合的全自动模式分类新方法。新方法首先通过SOMNN将原始数据进行自动聚类,再用所得的聚类中心以及中心邻近数据点训练GRNN,然后根据GRNN的分类结果重新计算聚类中心,再根据新的聚类中心和中心邻近点训练GRNN,如此反复,直至得到稳定的中心为止。Iris数据,Wine数据的实验结果都验证了新方法的可行性。  相似文献   

13.
In this paper, we present a new algorithm for learning oblique decision trees. Most of the current decision tree algorithms rely on impurity measures to assess the goodness of hyperplanes at each node while learning a decision tree in top-down fashion. These impurity measures do not properly capture the geometric structures in the data. Motivated by this, our algorithm uses a strategy for assessing the hyperplanes in such a way that the geometric structure in the data is taken into account. At each node of the decision tree, we find the clustering hyperplanes for both the classes and use their angle bisectors as the split rule at that node. We show through empirical studies that this idea leads to small decision trees and better performance. We also present some analysis to show that the angle bisectors of clustering hyperplanes that we use as the split rules at each node are solutions of an interesting optimization problem and hence argue that this is a principled method of learning a decision tree.  相似文献   

14.
This paper describes a general fuzzy min-max (GFMM) neural network which is a generalization and extension of the fuzzy min-max clustering and classification algorithms of Simpson (1992, 1993). The GFMM method combines supervised and unsupervised learning in a single training algorithm. The fusion of clustering and classification resulted in an algorithm that can be used as pure clustering, pure classification, or hybrid clustering classification. It exhibits a property of finding decision boundaries between classes while clustering patterns that cannot be said to belong to any of existing classes. Similarly to the original algorithms, the hyperbox fuzzy sets are used as a representation of clusters and classes. Learning is usually completed in a few passes and consists of placing and adjusting the hyperboxes in the pattern space; this is an expansion-contraction process. The classification results can be crisp or fuzzy. New data can be included without the need for retraining. While retaining all the interesting features of the original algorithms, a number of modifications to their definition have been made in order to accommodate fuzzy input patterns in the form of lower and upper bounds, combine the supervised and unsupervised learning, and improve the effectiveness of operations. A detailed account of the GFMM neural network, its comparison with the Simpson's fuzzy min-max neural networks, a set of examples, and an application to the leakage detection and identification in water distribution systems are given  相似文献   

15.
Semi-supervised classification methods can perform even worse than the supervised counterparts in some cases. It undoubtedly reduces their confidence in real applications, and it is desired to improve the safety of semi-supervised classification such that it never performs worse than the supervised counterpart. Considering that the cluster assumption may not well reflect the real data distribution, which can be one possible cause of unsafe learning, we develop a safe semi-supervised support vector machine method in this paper by adjusting the cluster assumption (ACA-S3VM for short). Specifically, when samples from different classes are seriously overlapped, the real boundary actually lies not in the low density region, which will not be found by the cluster assumption. However, an unsupervised clustering method is able to detect the real boundary in this case. As a result, we design ACA-S3VM by adjusting the cluster assumption with the help of clustering, which considers the distances of individual unlabeled instances to the distribution boundary in learning. Empirical results show the competition of ACA-S3VM compared with the off-the-shelf safe semi-supervised classification methods.  相似文献   

16.
This paper presents a novel host-based combinatorial method based on k-Means clustering and ID3 decision tree learning algorithms for unsupervised classification of anomalous and normal activities in computer network ARP traffic. The k-Means clustering method is first applied to the normal training instances to partition it into k clusters using Euclidean distance similarity. An ID3 decision tree is constructed on each cluster. Anomaly scores from the k-Means clustering algorithm and decisions of the ID3 decision trees are extracted. A special algorithm is used to combine results of the two algorithms and obtain final anomaly score values. The threshold rule is applied for making the decision on the test instance normality. Experiments are performed on captured network ARP traffic. Some anomaly criteria has been defined and applied to the captured ARP traffic to generate normal training instances. Performance of the proposed approach is evaluated using five defined measures and empirically compared with the performance of individual k-Means clustering and ID3 decision tree classification algorithms and the other proposed approaches based on Markovian chains and stochastic learning automata. Experimental results show that the proposed approach has specificity and positive predictive value of as high as 96 and 98%, respectively.  相似文献   

17.
一种基于混合重取样策略的非均衡数据集分类算法   总被引:1,自引:0,他引:1  
非均衡数据是分类中的常见问题,当一类实例远远多于另一类实例,则代表类非均衡,真实世界的分类问题存在很多类别非均衡的情况并得到众多专家学者的重视,非均衡数据的分类问题已成为数据挖掘和模式识别领域中新的研究热点,是对传统分类算法的重大挑战。本文提出了一种新型重取样算法,采用改进的SMOTE算法对少数类数据进行过取样,产生新的少数类样本,使类之间数据量基本均衡,然后再根据SMO算法的特点,提出使用聚类的数据欠取样方法,删除冗余或噪音数据。通过对数据集的过取样和清理之后,一些有用的样本被保留下来,减少了数据集规模,增强支持向量机训练执行的效率。实验结果表明,该方法在保持整体分类性能的情况下可以有效地提高少数类的分类精度。  相似文献   

18.
张永  浮盼盼  张玉婷 《计算机应用》2013,33(10):2801-2803
针对大规模数据的分类问题,将监督学习与无监督学习结合起来,提出了一种基于分层聚类和重采样技术的支持向量机(SVM)分类方法。该方法首先利用无监督学习算法中的k-means聚类分析技术将数据集划分成不同的子集,然后对各个子集进行逐类聚类,分别选出各类中心邻域内的样本点,构成最终的训练集,最后利用支持向量机对所选择的最具代表样本点进行训练建模。实验表明,所提方法可以大幅度降低支持向量机的学习代价,其分类精度比随机欠采样更优,而且可以达到采用完整数据集训练所得的结果  相似文献   

19.
Currently cluster analysis techniques are used mainly to aggregate objects into groups according to similarity measures. Whether the number of groups is pre-defined (supervised clustering) or not (unsupervised clustering), clustering techniques do not provide decision rules or a decision tree for the associations that are implemented. The current study proposes and evaluates a new technique to define decision tree based on cluster analysis. The proposed model was applied and tested on two large datasets of real life HR classification problems. The results of the model were compared to results obtained by conventional decision trees. It was found that the decision rules obtained by the model are at least as good as those obtained by conventional decision trees. In some cases the model yields better results than decision trees. In addition, a new measure is developed to help fine-tune the clustering model to achieve better and more accurate results.  相似文献   

20.
基于核聚类方法的多层次支持向量机分类树   总被引:2,自引:0,他引:2  
针对解决多类模式识别问题的SVM方法进行研究。在比较几种常用的多类SVM分类算法的基础上,提出一种基于核聚类方法的多层次SVM分类树,将核空问中的无监督学习方法和有监督学习方法结合起来,实现了一种结构更加简洁清晰、计算效率更高的多层SVM分类树算法,并在实验中取得了良好的结果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号