首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
不平衡数据集分类为机器学习热点研究问题之一,近年来研究人员提出很多理论和算法以改进传统分类技术在不平衡数据集上的性能,其中用阈值判定标准确定神经网络中的阈值是重要的方法之一。常用的阈值判定标准存在一定缺点,如不能使少数类及多数类分类精度同时取得最好、过于偏好多数类的精度等。为此提出一种新的阈值判定标准,依据该标准能够使少数类及多数类分类精度同时取得最好而不受样例类别比例的影响。以神经网络与遗传算法相结合训练分类器,作为阈值选择条件和分类器的评价标准,新标准能够得到较好的结果。  相似文献   

2.
We propose three model-free feature extraction approaches for solving the multiple class classification problem; we use multi-objective genetic programming (MOGP) to derive (near-)optimal feature extraction stages as a precursor to classification with a simple and fast-to-train classifier. Statistically-founded comparisons are made between our three proposed approaches and seven conventional classifiers over seven datasets from the UCI Machine Learning database. We also make comparisons with other reported evolutionary computation techniques. On almost all the benchmark datasets, the MOGP approaches give better or identical performance to the best of the conventional methods. Of our proposed MOGP-based algorithms, we conclude that hierarchical feature extraction performs best on multi-classification problems.  相似文献   

3.
现实中许多领域产生的数据通常具有多个类别并且是不平衡的。在多类不平衡分类中,类重叠、噪声和多个少数类等问题降低了分类器的能力,而有效解决多类不平衡问题已经成为机器学习与数据挖掘领域中重要的研究课题。根据近年来的多类不平衡分类方法的文献,从数据预处理和算法级分类方法两方面进行了分析与总结,并从优缺点和数据集等方面对所有算法进行了详细的分析。在数据预处理方法中,介绍了过采样、欠采样、混合采样和特征选择方法,对使用相同数据集算法的性能进行了比较。从基分类器优化、集成学习和多类分解技术三个方面对算法级分类方法展开介绍和分析。最后对多类不平衡数据分类研究领域的未来发展方向进行总结归纳。  相似文献   

4.
产生式方法和判别式方法是解决分类问题的两种不同框架,具有各自的优势。为利用两种方法各自的优势,文中提出一种产生式与判别式线性混合分类模型,并设计一种基于遗传算法的产生式与判别式线性混合分类模型的学习算法。该算法将线性混合分类器混合参数的学习看作一个最优化问题,以两个基分类器对每个训练数据的后验概率值为数据依据,用遗传算法找出线性混合分类器混合参数的最优值。实验结果表明,在大多数数据集上,产生式与判别式线性混合分类器的分类准确率优于或近似于它的两个基分类器中的优者。  相似文献   

5.
In this work, we propose two novel classifiers for multi-class classification problems using mathematical programming optimisation techniques. A hyper box-based classifier (Xu & Papageorgiou, 2009) that iteratively constructs hyper boxes to enclose samples of different classes has been adopted. We firstly propose a new solution procedure that updates the sample weights during each iteration, which tweaks the model to favour those difficult samples in the next iteration and therefore achieves a better final solution. Through a number of real world data classification problems, we demonstrate that the proposed refined classifier results in consistently good classification performance, outperforming the original hyper box classifier and a number of other state-of-the-art classifiers.Furthermore, we introduce a simple data space partition method to reduce the computational cost of the proposed sample re-weighting hyper box classifier. The partition method partitions the original dataset into two disjoint regions, followed by training sample re-weighting hyper box classifier for each region respectively. Through some real world datasets, we demonstrate the data space partition method considerably reduces the computational cost while maintaining the level of prediction accuracies.  相似文献   

6.
主要讨论了基于Fuzzy ARTMAP神经网络的高分辨率遥感图象土地覆盖分类方法及其实践.首先介绍了Fuzzy ARTMAP神经网络的原理,然后用SPOT XS图象试验数据进行土地覆盖分类.分类结果与传统的最大似然监督分类(MLC)、反馈式(Back Propagation,BP)神经网络的分类结果进行了比较.通过抽取500个样点对3种分类结果进行精度评价表明,Fuzzy ARTMAP神经网络相对其他两种方法,分类精度均有不同程度的改善,具有更好的分类结果,总分类精度比MLC和BP算法分别提高17.41%、7.32%.最后,对不同分类方法对于土地覆盖分类结果的影响进行了评价和分析.试验表明,Fuzzy ARTMAP神经网络用于高分辨图象土地覆盖分类研究可以获得相对较好的分类结果.  相似文献   

7.
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature.While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC).Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases.RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems.  相似文献   

8.
Support Vector Machines (SVM) represent one of the most promising Machine Learning (ML) tools that can be applied to the problem of traffic classification in IP networks. In the case of SVMs, there are still open questions that need to be addressed before they can be generally applied to traffic classifiers. Having being designed essentially as techniques for binary classification, their generalization to multi-class problems is still under research. Furthermore, their performance is highly susceptible to the correct optimization of their working parameters. In this paper we describe an approach to traffic classification based on SVM. We apply one of the approaches to solving multi-class problems with SVMs to the task of statistical traffic classification, and describe a simple optimization algorithm that allows the classifier to perform correctly with as little training as a few hundred samples. The accuracy of the proposed classifier is then evaluated over three sets of traffic traces, coming from different topological points in the Internet. Although the results are relatively preliminary, they confirm that SVM-based classifiers can be very effective at discriminating traffic generated by different applications, even with reduced training set sizes.  相似文献   

9.
多标签代价敏感分类集成学习算法   总被引:12,自引:2,他引:10  
付忠良 《自动化学报》2014,40(6):1075-1085
尽管多标签分类问题可以转换成一般多分类问题解决,但多标签代价敏感分类问题却很难转换成多类代价敏感分类问题.通过对多分类代价敏感学习算法扩展为多标签代价敏感学习算法时遇到的一些问题进行分析,提出了一种多标签代价敏感分类集成学习算法.算法的平均错分代价为误检标签代价和漏检标签代价之和,算法的流程类似于自适应提升(Adaptive boosting,AdaBoost)算法,其可以自动学习多个弱分类器来组合成强分类器,强分类器的平均错分代价将随着弱分类器增加而逐渐降低.详细分析了多标签代价敏感分类集成学习算法和多类代价敏感AdaBoost算法的区别,包括输出标签的依据和错分代价的含义.不同于通常的多类代价敏感分类问题,多标签代价敏感分类问题的错分代价要受到一定的限制,详细分析并给出了具体的限制条件.简化该算法得到了一种多标签AdaBoost算法和一种多类代价敏感AdaBoost算法.理论分析和实验结果均表明提出的多标签代价敏感分类集成学习算法是有效的,该算法能实现平均错分代价的最小化.特别地,对于不同类错分代价相差较大的多分类问题,该算法的效果明显好于已有的多类代价敏感AdaBoost算法.  相似文献   

10.
多分类问题代价敏感AdaBoost算法   总被引:8,自引:2,他引:6  
付忠良 《自动化学报》2011,37(8):973-983
针对目前多分类代价敏感分类问题在转换成二分类代价敏感分类问题存在的代价合并问题, 研究并构造出了可直接应用于多分类问题的代价敏感AdaBoost算法.算法具有与连续AdaBoost算法 类似的流程和误差估计. 当代价完全相等时, 该算法就变成了一种新的多分类的连续AdaBoost算法, 算法能够确保训练错误率随着训练的分类器的个数增加而降低, 但不直接要求各个分类器相互独立条件, 或者说独立性条件可以通过算法规则来保证, 但现有多分类连续AdaBoost算法的推导必须要求各个分类器相互独立. 实验数据表明, 算法可以真正实现分类结果偏向错分代价较小的类, 特别当每一类被错分成其他类的代价不平衡但平均代价相等时, 目前已有的多分类代价敏感学习算法会失效, 但新方法仍然能 实现最小的错分代价. 研究方法为进一步研究集成学习算法提供了一种新的思路, 得到了一种易操作并近似满足分类错误率最小的多标签分类问题的AdaBoost算法.  相似文献   

11.
针对多分类不均衡问题,提出了一种新的基于一对一(one-versus-one,OVO)分解策略的方法。首先基于OVO分解策略将多分类不均衡问题分解成多个二值分类问题;再利用处理不均衡二值分类问题的算法建立二值分类器;接着利用SMOTE过抽样技术处理原始数据集;然后采用基于距离相对竞争力加权方法处理冗余分类器;最后通过加权投票法获得输出结果。在KEEL不均衡数据集上的大量实验结果表明,所提算法比其他经典方法具有显著的优势。  相似文献   

12.
In recent years, heuristic algorithms have been successfully applied to solve clustering and classification problems. In this paper, gravitational search algorithm (GSA) which is one of the newest swarm based heuristic algorithms is used to provide a prototype classifier to face the classification of instances in multi-class data sets. The proposed method employs GSA as a global searcher to find the best positions of the representatives (prototypes). The proposed GSA-based classifier is used for data classification of some of the well-known benchmark sets. Its performance is compared with the artificial bee colony (ABC), the particle swarm optimization (PSO), and nine other classifiers from the literature. The experimental results of twelve data sets from UCI machine learning repository confirm that the GSA can successfully be applied as a classifier to classification problems.  相似文献   

13.
特征选择是文本分类中一种重要的文本预处理技术,它能够有效地提高分类器的精度和效率。文本分类中特征选择的关键是寻求有效的特征评价指标。一般来说,同一个特征评价指标对不同的分类器,其效果不同,由此,一个好的特征评价指标应当考虑分类器的特点。由于朴素贝叶斯分类器简单、高效而且对特征选择很敏感,因此,对用于该种分类器的特征选择方法的研究具有重要的意义。有鉴于此,提出了一种有效的用于贝叶斯分类器的多类别文本特征评价指标:CDM。利用贝叶斯分类器在两个多类别的文本数据集上进行了实验。实验结果表明提出的CDM指标具有比其它特征评价指标更好的特征选择效果。  相似文献   

14.
15.
This paper aims at automatic classification of power quality events using Wavelet Packet Transform (WPT) and Support Vector Machines (SVM). The features of the disturbance signals are extracted using WPT and given to the SVM for effective classification. Recent literature dealing with power quality establishes that support vector machine methods generally outperform traditional statistical and neural methods in classification problems involving power disturbance signals. However, the two vital issues namely the determination of the most appropriate feature subset and the model selection, if suitably addressed, could pave way for further improvement of their performances in terms of classification accuracy and computation time. This paper addresses these issues through a classification system using two optimization techniques, the genetic algorithms and simulated annealing. This system detects the best discriminative features and estimates the best SVM kernel parameters in a fully automatic way. Effectiveness of the proposed detection method is shown in comparison with the conventional parameter optimization methods discussed in literature like grid search method, neural classifiers like Probabilistic Neural Network (PNN), fuzzy k-nearest neighbor classifier (FkNN) and hence proved that the proposed method is reliable as it produces consistently better results.  相似文献   

16.
In this paper, a classifier motivated from statistical learning theory, i.e., support vector machine, with a new approach based on multiclass directed acyclic graph has been proposed for classification of four types of electrocardiogram signals. The motivation for selecting Directed Acyclic Graph Support Vector Machine (DAGSVM) is to have more accurate classifier with less computational cost. Empirical mode decomposition and subsequently singular value decomposition have been used for computing the feature vector matrix. Further, fivefold cross-validation and particle swarm optimization have been used for optimal selection of SVM model parameters to improve the performance of DAGSVM. A comparison has been made between proposed algorithm and other two classifiers, i.e., K-Nearest Neighbor (KNN) and Artificial Neural Network (ANN). The DAGSVM has yielded an average accuracy of 98.96% against 95.83% and 96.66% for the KNN and the ANN, respectively. The results obtained clearly confirm the superiority of the DAGSVM approach over other classifiers.  相似文献   

17.
The vulnerabilities in the Communication (TCP/IP) protocol stack and the availability of more sophisticated attack tools breed in more and more network hackers to attack the network intentionally or unintentionally, leading to Distributed Denial of Service (DDoS) attack. The DDoS attacks could be detected using the existing machine learning techniques such as neural classifiers. These classifiers lack generalization capabilities which result in less performance leading to high false positives. This paper evaluates the performance of a comprehensive set of machine learning algorithms for selecting the base classifier using the publicly available KDD Cup dataset. Based on the outcome of the experiments, Resilient Back Propagation (RBP) was chosen as base classifier for our research. The improvement in performance of the RBP classifier is the focus of this paper. Our proposed classification algorithm, RBPBoost, is achieved by combining ensemble of classifier outputs and Neyman Pearson cost minimization strategy, for final classification decision. Publicly available datasets such as KDD Cup, DARPA 1999, DARPA 2000, and CONFICKER were used for the simulation experiments. RBPBoost was trained and tested with DARPA, CONFICKER, and our own lab datasets. Detection accuracy and Cost per sample were the two metrics evaluated to analyze the performance of the RBPBoost classification algorithm. From the simulation results, it is evident that RBPBoost algorithm achieves high detection accuracy (99.4%) with fewer false alarms and outperforms the existing ensemble algorithms. RBPBoost algorithm outperforms the existing algorithms with maximum gain of 6.6% and minimum gain of 0.8%.  相似文献   

18.
In this work we present the first efficient algorithm for unsupervised training of multi-class regularized least- squares classifiers. The approach is closely related to the unsupervised extension of the support vector machine classifier known as maximum margin clustering, which recently has received considerable attention, though mostly considering the binary classification case. We present a combinatorial search scheme that combines steepest descent strategies with powerful meta-heuristics for avoiding bad local optima. The regularized least-squares based formulation of the problem allows us to use matrix algebraic optimization enabling constant time checks for the intermediate candidate solutions during the search. Our experimental evaluation indicates the potential of the novel method and demonstrates its superior clustering performance over a variety of competing methods on real world datasets. Both time complexity analysis and experimental comparisons show that the method can scale well to practical sized problems.  相似文献   

19.
Group decision making is a multi-criteria decision-making method applied in many fields. However, the use of group decision-making techniques in multi-class classification problems and rule generation is not explored widely. This investigation developed a group decision classifier with particle swarm optimization (PSO) and decision tree (GDCPSODT) for analyzing students’ mathematic and scientific achievements, which is a multi-class classification problem involving rule generation. The PSO technique is employed to determine weights of condition attributes; the decision tree is used to generate rules. To demonstrate the performance of the developed GDCPSODT model, other classifiers such as the Bayesian classifier, the k-nearest neighbor (KNN) classifier, the back propagation neural networks classifier with particle swarm optimization (BPNNPSO) and the radial basis function neural networks classifier with PSO (RBFNNPSO) are used to cope with the same data. Experimental results indicated the testing accuracy of GDCPSODT is higher than the other four classifiers. Furthermore, rules and some improvement directions of academic achievements are provided by the GDCPSODT model. Therefore, the GDCPSODT model is a feasible and promising alternative for analyzing student-related mathematic and scientific achievement data.  相似文献   

20.
阴影是影响山地针叶林遥感识别精度的关键因素。选取天山一块面积约为10 000 km2的区域为案例,基于太阳高度角和方位角差异较大的两期Sentinel-2影像,从遥感数据阴影分布的时相特性、分类特征以及分类器选择三方面进行综合分析,提出了一种适用于天山山地针叶林的遥感综合分类方案。该综合分类方案首先开展阴影识别以及阴影再分类以排除阴影对针叶林识别的影响;然后筛选出了海拔、归一化差值植被指数(NDVI)、红光到近红外波段斜率、蓝光波段、红光波段、短波红外波段和坡度作为区分天山山地针叶林的重要特征;最后比较支持向量机(Support Vector Machine,SVM)、随机森林(Random Forest,RF)和BP神经网络(Back Propagation Neural Network,BPNN)3种分类器的分类效果。结果表明:采用地形校正方法来消除山体阴影的效果不但不明显,反而还会造成过矫正现象,从而影响后续的针叶林识别,但利用太阳高度角和方位角差异较大的两期影像开展阴影识别以及阴影再分类来排除阴影对针叶林识别的影响,可使针叶林的总体精度提高1.3%~3.7%;SVM、RF和BPNN 3种分类器都能取得较好的山地针叶林识别精度,但SVM分类器的分类精度最高,其总体分类精度和Kappa系数分别是93.33%和0.87。该遥感综合分类方案经参数调整之后有望应用于北方干旱半干旱区的其他山地针叶林区域。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号