首页 | 本学科首页   官方微博 | 高级检索  
     

基于主动学习不平衡多分类AdaBoost算法的心脏病分类
引用本文:王莉莉,付忠良,陶攀,胡鑫.基于主动学习不平衡多分类AdaBoost算法的心脏病分类[J].计算机应用,2017,37(7):1994-1998.
作者姓名:王莉莉  付忠良  陶攀  胡鑫
作者单位:1. 中国科学院 成都计算机应用研究所, 成都 610041;2. 中国科学院大学, 北京 100049
基金项目:四川省科技支撑计划项目(2016JZ0035);中国科学院西部之光项目。
摘    要:针对不平衡分类中小类样本识别率低问题,提出一种基于主动学习不平衡多分类AdaBoost改进算法。首先,利用主动学习方法通过多次迭代抽样,选取少量的、对分类器最有价值的样本作为训练集;然后,基于不确定性动态间隔的样本选择策略,降低训练集的不平衡性;最后,利用代价敏感方法对多分类AdaBoost算法进行改进,对不同的类别给予不同的错分代价,调整样本权重更新速度,强迫弱分类器"关注"小类样本。在临床经胸超声心动图(TTE)测量数据集上的实验分析表明:与多分类支持向量机(SVM)相比,心脏病总体识别率提升了5.9%,G-mean指标提升了18.2%,瓣膜病(VHD)识别率提升了0.8%,感染性心内膜炎(IE)(小类)识别率提升了12.7%,冠心病(CAD)(小类)识别率提升了79.73%;与SMOTE-Boost相比,总体识别率提升了6.11%,G-mean指标提升了0.64%,VHD识别率提升了11.07%,先心病(CHD)识别率提升了3.69%。在TTE数据集和4个UCI数据集上的实验结果表明,该算法在不平稳多分类时能有效提高小类样本识别率,并且保证其他类别识别率不会大幅度降低,综合提升分类器性能。

关 键 词:主动学习  不平衡分类  多分类AdaBoost  多类别分类  心脏病分类  
收稿时间:2017-01-12
修稿时间:2017-02-27

Heart disease classification based on active imbalance multi-class AdaBoost algorithm
WANG Lili,FU Zhongliang,TAO Pan,HU Xin.Heart disease classification based on active imbalance multi-class AdaBoost algorithm[J].journal of Computer Applications,2017,37(7):1994-1998.
Authors:WANG Lili  FU Zhongliang  TAO Pan  HU Xin
Affiliation:1. Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu Sichuan 610041, China;2. University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:An imbalance multi-class AdaBoost algorithm with active learning was proposed to improve the recognition accuracy of minority class in imbalance classification. Firstly, active learning was adopted to select the most informative samples for classifiers through multiple iterations of sampling. Secondly, a new sample selection strategy based on uncertainty of dynamic margin was proposed to tackle the problem of data imbalance in the multi-class case. Finally, the cost sensitive method was adopted to improve the multi-class AdaBoost algorithm: giving different class with different misclassification cost, adjusting sample weight update speed, and forcing weak learners to "concern" minority class. The experimental results on clinical TransThoracic Echocardiography (TTE) data set illustrate that, when compared with multi-class Support Vector Machine (SVM), the total recognition accuracy of heart disease increases by 5.9%, G-mean improves by 18.2%, the recognition accuracy of Valvular Heart Disease (VHD) improves by 0.8%, the recognition accuracy of Infective Endocarditis (IE) (minority class) improves by 12.7% and the recognition accuracy of Coronary Artery Disease (CAD) (minority class) improves by 79.73%; compared with SMOTE-Boost, the total recognition accuracy of heart disease increases by 6.11%, the G-mean improves by 0.64%, the recognition accuracy of VHD improves by 11.07%, the recognition accuracy of Congenital Heart Disease (CHD) improves by 3.67%. The experiment results on TTE data and 4 UCI data sets illustrate that when used in imbalanced multi-class classification, the proposed algorithm can improve the recognition accuracy of minority class effectively, and upgrade the overall classifier performance while guaranteeing the recognition accuracy of other classes not to be decreased dramatically.
Keywords:active learning                                                                                                                        imbalance classification                                                                                                                        multi-class AdaBoost                                                                                                                        multi-class classification                                                                                                                        heart disease classification
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号