首页 | 本学科首页   官方微博 | 高级检索  
     

基于高斯混合模型的非平衡数据对称翻转算法
引用本文:陈刚,王丽娟.基于高斯混合模型的非平衡数据对称翻转算法[J].信息与控制,2020(2):203-209,218.
作者姓名:陈刚  王丽娟
作者单位:大连海事大学理学院
基金项目:国家自然科学基金资助项目(11571056)。
摘    要:针对传统分类器对于非平衡数据的分类效果存在的问题,提出了一种基于高斯混合模型-期望最大化(GMM-EM)的对称翻转算法.该算法的核心思想是基于概率论中的"3σ法则"使数据达到平衡.首先,利用高斯混合模型和EM算法得到多数类与少数类数据的密度函数;其次,以少数类数据的均值为对称中心,根据"3σ法则"确定多数类侵入少数类的翻转边界,进行数据翻转,同时剔除与翻转区间中少数类原始数据数据重复的点;此时,若两类数据不平衡,则在翻转区域内使用概率密度增强方法使数据达到平衡.最后,从UCI、KEEL数据库中选取的14组数据使用决策树分类器对平衡后的数据进行分类,实例分析表明了该算法的有效性.

关 键 词:非平衡数据  数据分类  对称翻转  GMM-EM算法

Symmetric Inverting Algorithm for Imbalanced Datasets Based on Gaussian Mixture Model
CHEN Gang,WANG Lijuan.Symmetric Inverting Algorithm for Imbalanced Datasets Based on Gaussian Mixture Model[J].Information and Control,2020(2):203-209,218.
Authors:CHEN Gang  WANG Lijuan
Affiliation:(School of Science,Dalian Maritime University,Dalian 116026,China)
Abstract:Facing the unfavorable classification on imbalanced datasets,we propose a symmetric inverting algorithm based on Gaussian mixture model and expectation maximization(GMM-EM).The algorithm is used to balance the datasets based on the"3σrule"in probability theory.Firstly,we obtain the density functions of the minority class and majority class using GMM algorithm and EM algorithm.Secondly,we operate the symmetric transformation of minority class after obtaining the centers and the radius of the inverting region according to the"3σrule."After the inverting process,we eliminate the repetitive points of the original data of the minority class.At this moment,if the two types of data are imbalanced,the samples of the minority class are generated by using the probability density enhancing method.Finally,we apply our algorithm and other methods together with decision tree classifier for assessment.We choose 14 imbalanced datasets from UCI and KEEL repositories.Experimental results show that our algorithm is more effective than other methods.
Keywords:imbalanced dataset  data classification  symmetric inverting  GMM-EM algorithm
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号