首页 | 本学科首页   官方微博 | 高级检索  
     

不平衡数据多粒度集成分类算法研究
引用本文:陈丽芳,代琪,赵佳亮.不平衡数据多粒度集成分类算法研究[J].计算机工程与科学,2021,43(5):917-925.
作者姓名:陈丽芳  代琪  赵佳亮
作者单位:(华北理工大学理学院,河北 唐山 063210)
基金项目:河北省自然科学基金(F2014209086)
摘    要:针对传统模型在解决不平衡数据分类问题时存在精度低、稳定性差、泛化能力弱等问题,提出基于序贯三支决策多粒度集成分类算法MGE-S3WD。采用二元关系实现粒层动态划分;根据代价矩阵计算阈值并构建多层次粒结构,将各粒层数据划分为正域、边界域和负域;将各粒层上的划分,按照正域与负域、正域与边界域、负域与边界域重新组合形成新的数据子集,并在各数据子集上构建基分类器,实现不平衡数据的集成分类。仿真结果表明,该算法能够有效降低数据子集的不平衡比,提升集成学习中基分类器的差异性,在G-mean和F-measure1 2个评价指标下,分类性能优于或部分优于其他集成分类算法,有效提高了分类模型的分类精度和稳定性,为不平衡数据集的集成学习提供了新的研究思路。

关 键 词:序贯三支决策  多粒度  代价敏感  不平衡数据  集成学习  
收稿时间:2020-03-07
修稿时间:2020-05-13

A multi-granularity ensemble classification algorithm for imbalanced data
CHEN Li-fang,DAI Qi,ZHAO Jia-liang.A multi-granularity ensemble classification algorithm for imbalanced data[J].Computer Engineering & Science,2021,43(5):917-925.
Authors:CHEN Li-fang  DAI Qi  ZHAO Jia-liang
Affiliation:(College of Science,North China University of Science and Technology,Tangshan 063210,China)
Abstract:To address the problems of low accuracy, poor stability and weak generalization ability used in the traditional model when solving the problem of imbalanced data classification, a sequential three-way decision multi-granulation ensemble classification algorithm is proposed. A binary relationship is adopted to realize the dynamic division of the granular layer. The threshold value is calculated according to the cost matrix and a multi-layer granular structure is constructed. The data of each granular layer is divided into a positive domain, a boundary domain, and a negative domain, and the division on each granular layer is recombined according to positive and negative domains, positive and boundary domains, and negative and boundary domains to form a new data subset. A base classifier is built on each data subset to achieve the ensemble classification of imbalanced data. Simulation results show that the algorithm can effectively reduce the imbalance ratio of data subsets and improve the difference of the base classifier in ensemble learning. Under the two evaluation indexes of G-mean and F-measure1, the classification performance is better or partially better than other ensemble classification algorithms. The new algorithm effectively improves the classification accuracy and stability of the classification model, and provides new research thoughts for ensemble learning of imbalanced data sets.
Keywords:sequential three-way decision  multi-granularity  cost sensitive  imbalanced data  ensemble learning  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号