首页 | 本学科首页   官方微博 | 高级检索  
     

基于主曲线的不均衡在线贯序极限学习机研究
引用本文:王金婉,毛文涛,王礼云,何玲.基于主曲线的不均衡在线贯序极限学习机研究[J].计算机科学,2016,43(3):62-67.
作者姓名:王金婉  毛文涛  王礼云  何玲
作者单位:河南师范大学计算机与信息工程学院 新乡453007,河南师范大学计算机与信息工程学院 新乡453007;河南省高校“计算智能与数据挖掘”工程技术研究中心 新乡453007,河南师范大学计算机与信息工程学院 新乡453007,河南师范大学计算机与信息工程学院 新乡453007
基金项目:本文受国家自然科学基金(U1204609),河南省基础与前沿技术研究计划项目(132300410430)资助
摘    要:针对现有机器学习算法难以有效提高不均衡在线贯序数据中少类样本分类精度的问题,提出了一种基于主曲线的不均衡在线贯序极限学习机。该方法的核心思路是根据在线贯序数据的分布特性,均衡各类别样本,以减少少类样本合成过程中的盲目性,主要包括离线和在线两个阶段。离线阶段采用主曲线分别建立各类别样本的分布模型,利用少类样本合成过采样算法对少类样本过采样,并根据各样本点到对应主曲线的投影距离分别为其设定相应大小的隶属度,最后根据隶属区间削减多类和少类虚拟样本,进而建立初始模型。在线阶段对贯序到达的少类样本过采样,并根据隶属区间均衡贯序样本,进而动态更新网络权值。通过理论分析证明了所提算法在理论上存在损失信息上界。采用UCI标准数据集和实际澳门气象数据进行仿真实验,结果表明,与现有典型算法相比,该算法对少类样本的预测精度更高,数值稳定性更好。

关 键 词:在线贯序极限学习机  不均衡数据  主曲线  少类样本合成过采样
收稿时间:2015/3/20 0:00:00
修稿时间:2015/6/20 0:00:00

Imbalanced Online Sequential Extreme Learning Machine Based on Principal Curve
WANG Jin-wan,MAO Wen-tao,WANG Li-yun and HE Ling.Imbalanced Online Sequential Extreme Learning Machine Based on Principal Curve[J].Computer Science,2016,43(3):62-67.
Authors:WANG Jin-wan  MAO Wen-tao  WANG Li-yun and HE Ling
Affiliation:College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China,College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China;Engineering Technology Research Center for Computing Intelligence & Data Mining in Henan Province,Xinxiang 453007,China,College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China and College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China
Abstract:Many traditional machine learning methods tend to get biased classifier which leads to lower classification precision for minor class in sequential imbalanced data.To improve the classification accuracy of minor class,a new imbalanced online sequential extreme learning machine based on principal curve was proposed.The core idea of the method is to get balanced samples based on the distribution features of online sequential data,reducing the blindness in the process of synthetic minority,which contains two stages.In offline stage,the principal curve is introduced to establish the distribution model of two kinds of samples.Over-sampling is done by using SMOTE for minor class.Then the membership degree of each sample is set according to the projection distance respectively,and the majority and virtual minor samples are deleted according to the under interval.Then the initial model is established.In online stage,over-sampling is done by using SMOTE for online sequential minor samples,getting the balanced samples according to the under interval.Then network weight is updated dynamically.The proposed algorithm has upper bound of the loss of information through the theoretical proof.The experiment was taken on three UCI datasets and the real-world air pollutant forecasting dataset,which shows that the proposed method outperforms the traditional methods in terms of prediction accuracy and numerical stability.
Keywords:Online sequential extreme learning machine  Imbalanced data  Principal curve  Synthetic minority over-sampling
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号