首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
为了使用支持向量机(SVM)算法进行多类分类,在SVM二分类基础上,提出使用排序算法中冒泡排序的思想进行SVM多类别数据分类。使用该方法在选取的UCI数据集进行实验,结果表明,在保证较高正确率的情况下,相对传统一对一的多分类方法,该方法较大幅地减少了分类时间,是一种应用性较强的SVM多类分类方法。  相似文献   

2.
针对大规模文本的自动层次分类问题,K近邻(KNN)算法分类效率较高,但是对于处于类别边界的样本分类准确度不是很高。而支持向量机(SVM)分类算法准确度比较高,但以前的多类SVM算法很多基于多个独立二值分类器组成,训练过程比较缓慢并且不适合层次类别结构等。提出一种融合KNN与层次SVM的自动分类方法。首先对KNN算法进行改进以迅速得到K个最近邻的类别标签,以此对文档的候选类别进行有效筛选。然后使用一个统一学习的多类稀疏层次SVM分类器对其进行自上而下的类别划分,从而实现对文档的高效准确的分类过程。实验结果表明,该方法在单层和多层的分类数据集上的分类准确度比单独使用其中任何一种要好,同时分类时间上也比较接近其中最快的单个分类器。  相似文献   

3.
基于支持向量机和k-近邻分类器的多特征融合方法   总被引:1,自引:0,他引:1  
陈丽  陈静 《计算机应用》2009,29(3):833-835
针对传统分类方法只采用一种分类器而存在的片面性,分类精度不高,以及支持向量机分类超平面附近点易错分的问题,提出了基于支持向量机(SVM)和k 近邻(KNN)的多特征融合方法。在该算法中,设样本集特征可分为L组,先用SVM算法根据训练集中每组特征数据构造分类超平面,共构造L个;其次用SVM KNN方法对测试集进行测试,得到由L组后验概率构成的决策轮廓矩阵;最后将其进行多特征融合,输出最终的分类结果。用鸢尾属植物数据进行了数值实验,实验结果表明:采用基于SVM KNN的多特征融合方法比单独使用一种SVM或SVM KNN方法的平均预测精度分别提高了28.7%和1.9%。  相似文献   

4.
CTM与SVM相结合的文本分类方法   总被引:1,自引:0,他引:1       下载免费PDF全文
王燕霞  邓伟 《计算机工程》2010,36(22):203-205
研究一种相关主题模型(CTM)与支持向量机(SVM)相结合的文本分类方法。该方法用CTM对数据集建模以降低数据的维度,用SVM对简化后的文本数据进行分类。为使CTM模型能够较好地对数据集进行建模,在该方法中用DBSCAN聚类方法对数据进行聚类,根据聚类所得到的聚类中心点数目确定CTM模型的主题参数。实验结果表明,该方法可以加快分类速度并提高分类精度。  相似文献   

5.
基于改进的F-score与支持向量机的特征选择方法   总被引:1,自引:0,他引:1  
将传统F-score度量样本特征在两类之间的辨别能力进行推广,提出了改进的F-score,使其不但能够评价样本特征在两类之间的辨别能力,而且能够度量样本特征在多类之间的辨别能力大小。以改进的F-score作为特征选择准则,用支持向量机(SVM)评估所选特征子集的有效性,实现有效的特征选择。通过UCI机器学习数据库中六组数据集的实验测试,并与SVM、PCA+SVM方法进行比较,证明基于改进F-score与SVM的特征选择方法不仅提高了分类精度,并具有很好的泛化能力,且在训练时间上优于PCA+SVM方法。  相似文献   

6.
针对传统分类器在数据不均衡的情况下分类效果不理想的缺陷,为提高分类器在不均衡数据集下的分类性能,特别是少数类样本的分类能力,提出了一种基于BSMOTE 和逆转欠抽样的不均衡数据分类算法。该算法使用BSMOTE进行过抽样,人工增加少数类样本的数量,然后通过优先去除样本中的冗余和噪声样本,使用逆转欠抽样方法逆转少数类样本和多数类样本的比例。通过多次进行上述抽样形成多个训练集合,使用Bagging方法集成在多个训练集合上获得的分类器来提高有效信息的利用率。实验表明,该算法较几种现有算法不仅能够提高少数类样本的分类性能,而且能够有效提高整体分类准确度。  相似文献   

7.
杨婷  孟相如  温祥西  伍文 《计算机应用》2013,33(9):2553-2556
针对支持向量机(SVM)训练不平衡样本数据产生最优分类面的偏移会降低分类模型泛化性的问题,提出一种基于Fisher类内散度平均分布比的分类面修正方法。对样本数据进行SVM训练后获得分类面的法向量;通过计算两类样本在该法向量方向上的Fisher类内散度来评价这两类样本的分布情况;依据类内散度综合考虑样本个数所得到的平均分布比重新修正最优分类面的位置。在benchmarks数据集上的实验结果说明该方法能够提高SVM分类模型在处理不均衡数据集时对于少数类的识别率,从而有助于提高模型的泛化性。  相似文献   

8.
文章提出了一种基于算法选择和结果评估的自动聚类方法。对给定数据集,该方法首先通过分析数据集的潜在簇结构,并依据所发现的簇结构为数据集挑选一种合适的备选聚类算法集;然后利用聚类有效性指标对这个算法集的算法聚类结果进行评估,以确保得到高质量聚类结果。实验结果表明该方法能够自动地挑选适合数据集的聚类算法,并获得高质量的聚类结果。  相似文献   

9.
一种基于有向无环图的多类SVM分类器   总被引:1,自引:0,他引:1  
本文提出了一种多类SVM分类器--ACDMSVM,它是基于决策有向无环图和积极约束的多类SVM分类器.对于k类问题,它将k(k-1)/2个改进的二类SVM分类器进行组合.为了提高分类器的训练及决策速度,对标准的二类SVM分类器进行三个方面的改进:利用大间隔方法,对软间隔错误变量采用2-范数形式并应用积极约束.在训练阶段,使用含有根的二元有向无环图进行节点的选择,该有向无环图含k(k-1)/2个内部节点和k个叶节点.数值实验表明这是一种快速的多类SVM分类器.  相似文献   

10.
在SVM分类识别中,分类器模型一经训练得到,对所有测试样本进行无差别的识别。但在高速列车故障中,样本的分类识别是存在区域分类精度的。本文提出了一种基于选择性集成学习的SVM多分类器融合算法,该方法选取测试样本最邻近的k个训练样本,然后选择对其分类效果好的SVM分类器进行融合,以提高分类准确率。最后使用高速列车故障数据进行了实验,并与AdaBoost、KNN、Bayes、SVM分类方法进行了比较。实验结果表明,该算法提高了分类识别准确率。  相似文献   

11.
时间序列数据广泛存在于我们的生活中,吸引了越来越多的学者对其进行深入的研究.时间序列分类是时间序列的一个重要研究领域,目前已有上百种分类算法被提出.这些方法大致分为基于距离的方法、基于特征的方法以及基于深度学习的方法.前两类方法需要手动处理特征和人为选择分类器,而大多数的深度学习方法属于端到端的方法,并且在时间序列分类...  相似文献   

12.
滚动轴承作为风电机组的关键部件,对于整个机组的安全运行起着决定性作用.针对机组滚动轴承故障诊断问题,提出一种节点优化型有向无环图大间隔分布机(O-DAG-LDM)的故障诊断方法.结合DAG多分类扩展性能与LDM二分类器泛化性能的优点,构建一种面向滚动轴承故障诊断的DAG结构扩展式LDM多分类器方法.在DAG-LDM算法框架下,利用优化算法对DAG节点进行优化排列以减小随机排布引起的累积误差,提高LDM故障分类准确率.实验表明,与其他主流智能诊断方法相比,所提出的节点优化型DAG-LDM故障诊断方法具有较高的准确率和更好的抗噪性能.  相似文献   

13.
Decision trees for hierarchical multi-label classification   总被引:3,自引:0,他引:3  
Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. This article presents several approaches to the induction of decision trees for HMC, as well as an empirical study of their use in functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to two approaches that learn a set of regular classification trees (one for each class). The first approach defines an independent single-label classification task for each class (SC). Obviously, the hierarchy introduces dependencies between the classes. While they are ignored by the first approach, they are exploited by the second approach, named hierarchical single-label classification (HSC). Depending on the application at hand, the hierarchy of classes can be such that each class has at most one parent (tree structure) or such that classes may have multiple parents (DAG structure). The latter case has not been considered before and we show how the HMC and HSC approaches can be modified to support this setting. We compare the three approaches on 24 yeast data sets using as classification schemes MIPS’s FunCat (tree structure) and the Gene Ontology (DAG structure). We show that HMC trees outperform HSC and SC trees along three dimensions: predictive accuracy, model size, and induction time. We conclude that HMC trees should definitely be considered in HMC tasks where interpretable models are desired.  相似文献   

14.
This research presents the augmentation of the original contour preserving classification technique to support multi-class data and to reduce the number of synthesized vectors, called multi-class outpost vectors (MCOVs). The technique has been proven to function on both synthetic-problem data sets and real-world data sets correctly. The technique also includes three methods to reduce the number of MCOVs by using minimum vector distance selection between fundamental multi-class outpost vectors and additional multi-class outpost vectors to select only MCOVs located at the decision boundary between consecutive classes of data. The three MCOV reduction methods include the FF-AA reduction method, the FA-AF reduction method, and the FAF-AFA reduction method. An evaluation has been conducted to show the reduction capability, the contour preservation capability, and the levels of classification accuracy of the three MCOV reduction methods on both non-overlapping and highly overlapping synthetic-problem data sets and highly overlapping real-world data sets. For non-overlapping problems, the experimental results present that the FA-AF reduction method can partially reduce the number of MCOVs while preserving the contour of the problem most accurately and obtaining similar levels of classification accuracy as when the whole set of MCOVs is used. For highly overlapping problems, the experimental results present that the FF-AA reduction method can partially reduce the number of MCOVs while preserving the contour of the problem most accurately and obtaining similar levels of classification accuracy as when the whole set of MCOVs is used.  相似文献   

15.
J. Li  X. Tang  J. Liu  J. Huang  Y. Wang 《Pattern recognition》2008,41(6):1975-1984
Various microarray experiments are now done in many laboratories, resulting in the rapid accumulation of microarray data in public repositories. One of the major challenges of analyzing microarray data is how to extract and select efficient features from it for accurate cancer classification. Here we introduce a new feature extraction and selection method based on information gene pairs that have significant change in different tissue samples. Experimental results on five public microarray data sets demonstrate that the feature subset selected by the proposed method performs well and achieves higher classification accuracy on several classifiers. We perform extensive experimental comparison of the features selected by the proposed method and features selected by other methods using different evaluation methods and classifiers. The results confirm that the proposed method performs as well as other methods on acute lymphoblastic-acute myeloid leukemia, adenocarcinoma and breast cancer data sets using a fewer information genes and leads to significant improvement of classification accuracy on colon and diffuse large B cell lymphoma cancer data sets.  相似文献   

16.
Many gene selection methods have been proposed to select a subset of genes that can have a high prediction accuracy for cancer classification, and most set the same preference for all genes. However, many biological reports have pointed out that mutated or flawed genes, named as risk genes, can be one of the major causes of a specific disease. This study proposes a gene selection method based on the risk genes found in biological reports. The information provided by risk genes can reduce the time complexity for gene selection and increase the accuracy of cancer classification. This gene selection method is composed of two stages. Since all risk genes must be chosen, the first stage is to remove the genes that have similar expression levels or functions to risk genes. The next stage is to perform gene selection and gene replacement based on the results of a process that divides the remaining genes into clusters. Based on the test results from four microarray data sets, our gene selection method outperforms those proposed by previous studies, and genes that have the potential to be new risk genes are presented.  相似文献   

17.
Since most cancer treatments come with a certain degree of toxicity it is very essential to identify a cancer type correctly and then administer the relevant therapy. With the arrival of powerful tools such as gene expression microarrays the cancer classification basis is slowly changing from morphological properties to molecular signatures. Several recent studies have demonstrated a marked improvement in prediction accuracy of tumor types based on gene expression microarray measurements over clinical markers. The main challenge in working with gene expression microarrays is that there is a huge number of genes to work with. Out of them only a small fraction are actually relevant for differentiating between different types of cancer. A Bayesian nearest neighbor model equipped with an integrated variable selection technique is proposed to overcome this challenge. This classification and gene selection model is able to classify different cancer types accurately and simultaneously identify the relevant or important genes. The proposed model is completely automatic in the sense that it adaptively picks up the neighborhood size and the important covariates. The method is successfully applied to three simulated data sets and four well known real data sets. To demonstrate the competitiveness of the method a comparative study is also done with several other “off the shelf” popular classification methods. For all the simulated data sets and real life data sets, the proposed method produced highly competitive if not better results. While the standard approach is two step model building for gene selection and then tumor prediction, this novel adaptive gene selection technique automatically selects the relevant genes along with tumor class prediction in one go. The biological relevance of the selected genes are also discussed to validate the claim.  相似文献   

18.
Hepatitis is a disease which is seen at all levels of age. Hepatitis disease solely does not have a lethal effect, but the early diagnosis and treatment of hepatitis is crucial as it triggers other diseases. In this study, a new hybrid medical decision support system based on rough set (RS) and extreme learning machine (ELM) has been proposed for the diagnosis of hepatitis disease. RS-ELM consists of two stages. In the first one, redundant features have been removed from the data set through RS approach. In the second one, classification process has been implemented through ELM by using remaining features. Hepatitis data set, taken from UCI machine learning repository has been used to test the proposed hybrid model. A major part of the data set (48.3%) includes missing values. As removal of missing values from the data set leads to data loss, feature selection has been done in the first stage without deleting missing values. In the second stage, the classification process has been performed through ELM after the removal of missing values from sub-featured data sets that were reduced in different dimensions. The results showed that the highest 100.00% classification accuracy has been achieved through RS-ELM and it has been observed that RS-ELM model has been considerably successful compared to the other methods in the literature. Furthermore in this study, the most significant features have been determined for the diagnosis of the hepatitis. It is considered that proposed method is to be useful in similar medical applications.  相似文献   

19.
The monitoring of the expression profiles of thousands of genes have proved to be particularly promising for biological classification. DNA microarray data have been recently used for the development of classification rules, particularly for cancer diagnosis. However, microarray data present major challenges due to the complex, multiclass nature and the overwhelming number of variables characterizing gene expression profiles. A regularized form of sliced inverse regression (REGSIR) approach is proposed. It allows the simultaneous development of classification rules and the selection of those genes that are most important in terms of classification accuracy. The method is illustrated on some publicly available microarray data sets. Furthermore, an extensive comparison with other classification methods is reported. The REGSIR performance is comparable with the best classification methods available, and when appropriate feature selection is made the performance can be considerably improved.  相似文献   

20.
支持向量机是最有效的分类技术之一,具有很高的分类精度和良好的泛化能力,但其应用于大型数据集时的训练过程还是非常复杂。对此提出了一种基于单类支持向量机的分类方法。采用随机选择算法来约简训练集,以达到提高训练速度的目的;同时,通过恢复超球体交集中样本在原始数据中的邻域来保证支持向量机的分类精度。实验证明,该方法能在较大程度上减小计算复杂度,从而提高大型数据集中的训练速度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号