首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
多标签代价敏感分类集成学习算法   总被引:12,自引:2,他引:10  
付忠良 《自动化学报》2014,40(6):1075-1085
尽管多标签分类问题可以转换成一般多分类问题解决,但多标签代价敏感分类问题却很难转换成多类代价敏感分类问题.通过对多分类代价敏感学习算法扩展为多标签代价敏感学习算法时遇到的一些问题进行分析,提出了一种多标签代价敏感分类集成学习算法.算法的平均错分代价为误检标签代价和漏检标签代价之和,算法的流程类似于自适应提升(Adaptive boosting,AdaBoost)算法,其可以自动学习多个弱分类器来组合成强分类器,强分类器的平均错分代价将随着弱分类器增加而逐渐降低.详细分析了多标签代价敏感分类集成学习算法和多类代价敏感AdaBoost算法的区别,包括输出标签的依据和错分代价的含义.不同于通常的多类代价敏感分类问题,多标签代价敏感分类问题的错分代价要受到一定的限制,详细分析并给出了具体的限制条件.简化该算法得到了一种多标签AdaBoost算法和一种多类代价敏感AdaBoost算法.理论分析和实验结果均表明提出的多标签代价敏感分类集成学习算法是有效的,该算法能实现平均错分代价的最小化.特别地,对于不同类错分代价相差较大的多分类问题,该算法的效果明显好于已有的多类代价敏感AdaBoost算法.  相似文献   

2.
多分类问题代价敏感AdaBoost算法   总被引:8,自引:2,他引:6  
付忠良 《自动化学报》2011,37(8):973-983
针对目前多分类代价敏感分类问题在转换成二分类代价敏感分类问题存在的代价合并问题, 研究并构造出了可直接应用于多分类问题的代价敏感AdaBoost算法.算法具有与连续AdaBoost算法 类似的流程和误差估计. 当代价完全相等时, 该算法就变成了一种新的多分类的连续AdaBoost算法, 算法能够确保训练错误率随着训练的分类器的个数增加而降低, 但不直接要求各个分类器相互独立条件, 或者说独立性条件可以通过算法规则来保证, 但现有多分类连续AdaBoost算法的推导必须要求各个分类器相互独立. 实验数据表明, 算法可以真正实现分类结果偏向错分代价较小的类, 特别当每一类被错分成其他类的代价不平衡但平均代价相等时, 目前已有的多分类代价敏感学习算法会失效, 但新方法仍然能 实现最小的错分代价. 研究方法为进一步研究集成学习算法提供了一种新的思路, 得到了一种易操作并近似满足分类错误率最小的多标签分类问题的AdaBoost算法.  相似文献   

3.
郭冰楠  吴广潮 《计算机应用》2019,39(10):2888-2892
在网络贷款用户数据集中,贷款成功和贷款失败的用户数量存在着严重的不平衡,传统的机器学习算法在解决该类问题时注重整体分类正确率,导致贷款成功用户的预测精度较低。针对此问题,在代价敏感决策树敏感函数的计算中加入类分布,以减弱正负样本数量对误分类代价的影响,构建改进的代价敏感决策树;以该决策树作为基分类器并以分类准确度作为衡量标准选择表现较好的基分类器,将它们与最后阶段生成的分类器集成得到最终的分类器。实验结果表明,与已有的常用于解决此类问题的算法(如MetaCost算法、代价敏感决策树、AdaCost算法等)相比,改进的代价敏感决策树对网络贷款用户分类可以降低总体的误分类错误率,具有更强的泛化能力。  相似文献   

4.
Cost-sensitive learning with conditional Markov networks   总被引:1,自引:0,他引:1  
There has been a recent, growing interest in classification and link prediction in structured domains. Methods such as conditional random fields and relational Markov networks support flexible mechanisms for modeling correlations due to the link structure. In addition, in many structured domains, there is an interesting structure in the risk or cost function associated with different misclassifications. There is a rich tradition of cost-sensitive learning applied to unstructured (IID) data. Here we propose a general framework which can capture correlations in the link structure and handle structured cost functions. We present two new cost-sensitive structured classifiers based on maximum entropy principles. The first determines the cost-sensitive classification by minimizing the expected cost of misclassification. The second directly determines the cost-sensitive classification without going through a probability estimation step. We contrast these approaches with an approach which employs a standard 0/1-loss structured classifier to estimate class conditional probabilities followed by minimization of the expected cost of misclassification and with a cost-sensitive IID classifier that does not utilize the correlations present in the link structure. We demonstrate the utility of our cost-sensitive structured classifiers with experiments on both synthetic and real-world data.  相似文献   

5.
刘丽倩  董东 《计算机科学》2018,45(Z11):497-500
长方法(Long Method)是由于一个方法太长而需要重构的软件设计的问题。为了提高传统机器学习方法对长方法的识别率,针对代码坏味数据不平衡的特性,提出代价敏感集成分类器算法。以传统决策树算法为基础,利用欠采样策略对样本进行重采样,进而生成多个平衡的子集,并将这些子集训练生成多个相同的基分类器,然后将这些基分类器组合形成一个集成分类器。最后在集成分类器中引入由认知复杂度决定的误分类代价,使得分类器向准确分类少数类倾斜。与传统机器学习算法相比,此方法对长方法检测结果的查准率和查全率均有一定提升。  相似文献   

6.
基于传统模型的实际分类问题,不均衡分类是一个常见的挑战问题。由于传统分类器较难学习少数类数据集内部的本质结构,导致更多地偏向于多数类,从而使少数类样本被误分为多数类样本。与此同时,样本集中的冗余数据和噪音数据也会对分类器造成困扰。为有效处理上述问题,提出一种新的不均衡分类框架SSIC,该框架充分考虑数据统计特性,自适应从大小类中选取有价值样本,并结合代价敏感学习构建不均衡数据分类器。首先,SSIC通过组合部分多数类实例和所有少数类实例来构造几个平衡的数据子集。在每个子集上,SSIC充分利用数据的特征来提取可区分的高级特征并自适应地选择重要样本,从而可以去除冗余噪声数据。其次,SSIC通过在每个样本上自动分配适当的权重来引入一种代价敏感的支持向量机(SVM),以便将少数类视为与多数类相等。  相似文献   

7.
Machine learning techniques are widely used for network intrusion detection (NID). However, it has to face the unbalance of training samples between classes as it is hard to collect samples of some intrusion classes. This would produce false positives for these intrusion classes. Meanwhile, since there are various types of intrusions, classification boundaries between different classes are seriously nonlinear. Due to the huge amount of training data, computational efficiency is also required. This paper therefore proposes an efficient cascaded classifier for NID. This classifier consists of a collection of binary base classifiers which are serially connected. Each base classifier corresponds to a type of intrusion. The order of these base classifiers is automatically determined based on the number of false positives to cope with the unbalance of training samples. Extreme learning machine algorithm, which has low computational cost, is used to train these base classifiers to delineate the nonlinear boundaries between classes. This proposed NID method is evaluated on the KDD99 data set. Experimental results have shown that this proposed method outperforms other state-of-the-art methods including decision tree, back-propagation neural network and support vector machines.  相似文献   

8.
This paper presents cluster‐based ensemble classifier – an approach toward generating ensemble of classifiers using multiple clusters within classified data. Clustering is incorporated to partition data set into multiple clusters of highly correlated data that are difficult to separate otherwise and different base classifiers are used to learn class boundaries within the clusters. As the different base classifiers engage on different difficult‐to‐classify subsets of the data, the learning of the base classifiers is more focussed and accurate. A selection rather than fusion approach achieves the final verdict on patterns of unknown classes. The impact of clustering on the learning parameters and accuracy of a number of learning algorithms including neural network, support vector machine, decision tree and k‐NN classifier is investigated. A number of benchmark data sets from the UCI machine learning repository were used to evaluate the cluster‐based ensemble classifier and the experimental results demonstrate its superiority over bagging and boosting.  相似文献   

9.
师彦文  王宏杰 《计算机科学》2017,44(Z11):98-101
针对不平衡数据集的有效分类问题,提出一种结合代价敏感学习和随机森林算法的分类器。首先提出了一种新型不纯度度量,该度量不仅考虑了决策树的总代价,还考虑了同一节点对于不同样本的代价差异;其次,执行随机森林算法,对数据集作K次抽样,构建K个基础分类器;然后,基于提出的不纯度度量,通过分类回归树(CART)算法来构建决策树,从而形成决策树森林;最后,随机森林通过投票机制做出数据分类决策。在UCI数据库上进行实验,与传统随机森林和现有的代价敏感随机森林分类器相比,该分类器在分类精度、AUC面积和Kappa系数这3种性能度量上都具有良好的表现。  相似文献   

10.
This study aims to evaluate the effect of heart rate variability (HRV) indices on the New York Heart Association (NYHA) classification of patients with congestive heart failure and to test the effectiveness of different machine learning algorithms. Twenty‐nine long‐term RR interval recordings from subjects (aged 34 to 79) with congestive heart failure (NYHA classes I, II, and III) in MIT‐BIH Database were studied. We firstly removed the unreasonable RR intervals and segment the RR recordings with a 300‐RR interval length window. Then the multiple HRV indexes were calculated for each RR segment. Support vector machine (SVM) and classification and regression tree (CART) methods were then separately used to distinguish patients with different NYHA classes based on the selected HRV indices. Receiver operating characteristic curve analysis was finally employed as the evaluation indicator to compare the performance of the two classifiers. The SVM classifier achieved accuracy, sensitivity, and specificity of 84.0%, 71.2%, and 83.4%, respectively, whereas the CART classifier achieved 81.4%, 66.5%, and 81.6%, respectively. The area under the curve of receiver operating characteristic for the two classifiers was 86.4% and 84.7%, respectively. It is possible for accurately classifying the NYHA functional classes I, II, and III when using the combination of HRV indices and machine learning algorithms. The SVM classifier performed better in classification than the CART classifier using the same HRV indices.  相似文献   

11.
Multiple classifier systems (MCS) are attracting increasing interest in the field of pattern recognition and machine learning. Recently, MCS are also being introduced in the remote sensing field where the importance of classifier diversity for image classification problems has not been examined. In this article, Satellite Pour l'Observation de la Terre (SPOT) IV panchromatic and multispectral satellite images are classified into six land cover classes using five base classifiers: contextual classifier, k-nearest neighbour classifier, Mahalanobis classifier, maximum likelihood classifier and minimum distance classifier. The five base classifiers are trained with the same feature sets throughout the experiments and a posteriori probability, derived from the confusion matrix of these base classifiers, is applied to five Bayesian decision rules (product rule, sum rule, maximum rule, minimum rule and median rule) for constructing different combinations of classifier ensembles. The performance of these classifier ensembles is evaluated for overall accuracy and kappa statistics. Three statistical tests, the McNemar's test, the Cochran's Q test and the Looney's F-test, are used to examine the diversity of the classification results of the base classifiers compared to the results of the classifier ensembles. The experimental comparison reveals that (a) significant diversity amongst the base classifiers cannot enhance the performance of classifier ensembles; (b) accuracy improvement of classifier ensembles can only be found by using base classifiers with similar and low accuracy; (c) increasing the number of base classifiers cannot improve the overall accuracy of the MCS and (d) none of the Bayesian decision rules outperforms the others.  相似文献   

12.
Multi-class classification problems can be addressed by using decomposition strategy. One of the most popular decomposition techniques is the One-vs-One (OVO) strategy, which consists of dividing multi-class classification problems into as many as possible pairs of easier-to-solve binary sub-problems. To discuss the presence of classes with different cost, in this paper, we examine the behavior of an ensemble of Cost-Sensitive Back-Propagation Neural Networks (CSBPNN) with OVO binarization techniques for multi-class problems. To implement this, the original multi-class cost-sensitive problem is decomposed into as many sub-problems as possible pairs of classes and each sub-problem is learnt in an independent manner using CSBPNN. Then a combination method is used to aggregate the binary cost-sensitive classifiers. To verify the synergy of the binarization technique and CSBPNN for multi-class cost-sensitive problems, we carry out a thorough experimental study. Specifically, we first develop the study to check the effectiveness of the OVO strategy for multi-class cost-sensitive learning problems. Then, we develop a comparison of several well-known aggregation strategies in our scenario. Finally, we explore whether further improvement can be achieved by using the management of non-competent classifiers. The experimental study is performed with three types of cost matrices and proper statistical analysis is employed to extract the meaningful findings.  相似文献   

13.
Over the last two decades, automatic speaker recognition has been an interesting and challenging problem to speech researchers. It can be classified into two different categories, speaker identification and speaker verification. In this paper, a new classifier, extreme learning machine, is examined on the text-independent speaker verification task and compared with SVM classifier. Extreme learning machine (ELM) classifiers have been proposed for generalized single hidden layer feedforward networks with a wide variety of hidden nodes. They are extremely fast in learning and perform well on many artificial and real regression and classification applications. The database used to evaluate the ELM and SVM classifiers is ELSDSR corpus, and the Mel-frequency Cepstral Coefficients were extracted and used as the input to the classifiers. Empirical studies have shown that ELM classifiers and its variants could perform better than SVM classifiers on the dataset provided with less training time.  相似文献   

14.
The management of atmospheric pollution using video is not yet widespread. However it is an efficient way to evaluate the polluting rejects coming from large industrial facilities when traditional captors are not usable. This paper presents a comparison of different classifiers for a monitoring system of polluting smokes. The data used in this work are stemming from a system of video analysis and signal processing. The database includes the pollution level of puffs of smoke defined by an expert. Six machine learning techniques are tested and compared to classify the puffs of smoke: k-nearest neighbour, naïve Bayes classifier, artificial neural network, decision tree, support vector machine and a fuzzy model. The parameters of each type of classifier are split into three categories: learned parameters, parameters determined by a first step of the experimentation, and parameters set by the programmer. We compare the results of the best classifier of each type depending on the size of the learning set. A part of the discussion concerns the robustness of the classifier facing the case where classes of interest are under-represented, as the high level of pollution in our data.  相似文献   

15.
目前纹理图像分类有不同的方法,但对纹理的描述还不够全面,而且当有新方法提取的特征加入时,系统的可扩展性也不够,通用性不好。本文针对上述问题提出了一种将D-S证据理论与极限学习机相结合的决策级融合模型,用来对纹理图像进行分类。采用三种不同方法来提取特征以获得更多更全面的纹理表现形式,并对提取的每种特征向量用极限学习机建立相应的分类器,最后用D-S证据理论在不确定性表示、度量和组合方面有着的优势来进行决策级融合。对于证据理论中基本概率赋值函数(BPAF)难以有效获取的问题,由于极限学习机具有学习速度快,泛化性能好的优点并且产生唯一的最优解的优点,所以利用其来构造其基本概率赋值函数。实验结果表明这种方法比单个分类器具有更高的识别正确率,降低了识别的不确定性。  相似文献   

16.
This paper presents a method for combining domain knowledge and machine learning (CDKML) for classifier generation and online adaptation. The method exploits advantages in domain knowledge and machine learning as complementary information sources. Whereas machine learning may discover patterns in interest domains that are too subtle for humans to detect, domain knowledge may contain information on a domain not present in the available domain dataset. CDKML has three steps. First, prior domain knowledge is enriched with relevant patterns obtained by machine learning to create an initial classifier. Second, genetic algorithms refine the classifier. Third, the classifier is adapted online on the basis of user feedback using the Markov decision process. CDKML was applied in fall detection. Tests showed that the classifiers developed by CDKML have better performance than machine‐learning classifiers generated on a training dataset that does not adequately represent all real‐life cases of the learned concept. The accuracy of the initial classifier was 10 percentage points higher than the best machine‐learning classifier and the refinement added 3 percentage points. The online adaptation improved the accuracy of the refined classifier by an additional 15 percentage points.  相似文献   

17.
在现实生活中的很多应用里,对不同类别的样本错误地分类往往会造成不同程度的损失,这些损失可以用非均衡代价来刻画.代价敏感学习的目标就是最小化总体代价.提出了一种新的代价敏感分类方法——代价敏感大间隔分布学习机(cost-sensitive large margin distribution machine, CS-LDM).与传统的大间隔学习方法试图最大化“最小间隔”不同,CS-LDM在最小化总体代价的同时致力于对“间隔分布”进行优化,并通过对偶坐标下降方法优化目标函数,以有效地进行代价敏感学习.实验结果表明,CS-LDM的性能显著优于代价敏感支持向量机CS-SVM,平均总体代价下降了24%.  相似文献   

18.
The ability to accurately predict business failure is a very important issue in financial decision-making. Incorrect decision-making in financial institutions is very likely to cause financial crises and distress. Bankruptcy prediction and credit scoring are two important problems facing financial decision support. As many related studies develop financial distress models by some machine learning techniques, more advanced machine learning techniques, such as classifier ensembles and hybrid classifiers, have not been fully assessed. The aim of this paper is to develop a novel hybrid financial distress model based on combining the clustering technique and classifier ensembles. In addition, single baseline classifiers, hybrid classifiers, and classifier ensembles are developed for comparisons. In particular, two clustering techniques, Self-Organizing Maps (SOMs) and k-means and three classification techniques, logistic regression, multilayer-perceptron (MLP) neural network, and decision trees, are used to develop these four different types of bankruptcy prediction models. As a result, 21 different models are compared in terms of average prediction accuracy and Type I & II errors. By using five related datasets, combining Self-Organizing Maps (SOMs) with MLP classifier ensembles performs the best, which provides higher predication accuracy and lower Type I & II errors.  相似文献   

19.
Using neural network ensembles for bankruptcy prediction and credit scoring   总被引:2,自引:0,他引:2  
Bankruptcy prediction and credit scoring have long been regarded as critical topics and have been studied extensively in the accounting and finance literature. Artificial intelligence and machine learning techniques have been used to solve these financial decision-making problems. The multilayer perceptron (MLP) network trained by the back-propagation learning algorithm is the mostly used technique for financial decision-making problems. In addition, it is usually superior to other traditional statistical models. Recent studies suggest combining multiple classifiers (or classifier ensembles) should be better than single classifiers. However, the performance of multiple classifiers in bankruptcy prediction and credit scoring is not fully understood. In this paper, we investigate the performance of a single classifier as the baseline classifier to compare with multiple classifiers and diversified multiple classifiers by using neural networks based on three datasets. By comparing with the single classifier as the benchmark in terms of average prediction accuracy, the multiple classifiers only perform better in one of the three datasets. The diversified multiple classifiers trained by not only different classifier parameters but also different sets of training data perform worse in all datasets. However, for the Type I and Type II errors, there is no exact winner. We suggest that it is better to consider these three classifier architectures to make the optimal financial decision.  相似文献   

20.
Cultural modeling aims at developing behavioral models of groups and analyzing the impact of culture factors on group behavior using computational methods. Machine learning methods and in particular classification, play a central role in such applications. In modeling cultural data, it is expected that standard classifiers yield good performance under the assumption that different classification errors have uniform costs. However, this assumption is often violated in practice. Therefore, the performance of standard classifiers is severely hindered. To handle this problem, this paper empirically studies cost-sensitive learning in cultural modeling. We consider cost factor when building the classifiers, with the aim of minimizing total misclassification costs. We conduct experiments to investigate four typical cost-sensitive learning methods, combine them with six standard classifiers and evaluate their performance under various conditions. Our empirical study verifies the effectiveness of cost-sensitive learning in cultural modeling. Based on the experimental results, we gain a thorough insight into the problem of non-uniform misclassification costs, as well as the selection of cost-sensitive methods, base classifiers and method-classifier pairs for this domain. Furthermore, we propose an improved algorithm which outperforms the best method-classifier pair using the benchmark cultural datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号