首页 | 本学科首页   官方微博 | 高级检索  
     

一种不平衡数据渐进学习算法
引用本文:董元方,李雄飞,李军.一种不平衡数据渐进学习算法[J].计算机工程,2010,36(24):161-163.
作者姓名:董元方  李雄飞  李军
作者单位:(1. 吉林大学符号计算与知识工程教育部重点实验室,长春 130012;2. 长春理工大学 a. 经济管理学院;b. 数学系,长春 130022)
基金项目:国家科技支撑计划基金资助项目,吉林省科技发展计划基金资助项目
摘    要:针对不平衡数据学习问题,提出一种采用渐进学习方式的分类算法。根据属性值域分布,逐步添加合成少数类样例,并在阶段分类器出现误分时,及时删除被误分的合成样例。当数据达到预期的平衡程度时,用原始数据和合成数据训练学习算法,得到最终分类器。实验结果表明,该算法优于C4.5算法,并在多数数据集上优于SMOTEBoost和DataBoost-IM。

关 键 词:分类  不平衡数据  渐进学习

Gradually Learning Algorithm for Imbalanced Data
DONG Yuan-fang,LI Xiong-fei,LI Jun.Gradually Learning Algorithm for Imbalanced Data[J].Computer Engineering,2010,36(24):161-163.
Authors:DONG Yuan-fang  LI Xiong-fei  LI Jun
Affiliation:(1. Key Laboratory of Symbolic Computation and Knowledge Engineering for Ministry of Education, Jilin University, Changchun 130012, China; 2a. School of Economics and Management; 2b. Dept. of Mathematics, Changchun University of Science and Technology,Changchun 130022, China)
Abstract:For problem of imbalanced data learning, a gradually learning classification algorithm is proposed. This classification algorithm gradually adds the synthetic minority class examples according to attribute value-range distribution, and removes timely the synthetic examples which the stage classifier misclassifies. As the data achieves the desired degree of balance, the method uses raw data and synthetic data training learning algorithm, and gets the final classifier. Experimental results show that the gradually learning algorithm is better than C4.5, and better than SMOTEBoost and DataBoost-IM on most data sets.
Keywords:classification  imbalanced data  gradually learning
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号