首页 | 官方网站   微博 | 高级检索  
     

基于主动学习的离群点集成挖掘方法研究
引用本文:赵晓永,王宁宁,王磊.基于主动学习的离群点集成挖掘方法研究[J].计算机工程与应用,2020,56(12):112-117.
作者姓名:赵晓永  王宁宁  王磊
作者单位:北京信息科技大学 信息管理学院,北京 100129
基金项目:北京市教育委员会科技计划一般项目;国家自然科学基金
摘    要:离群点检测任务通常缺少可用的标注数据,且离群数据只占整个数据集的很小一部分,相较于其他的数据挖掘任务,离群点检测的难度较大,尚没有单一的算法适合于所有的场景。因此,结合多样性模型集成和主动学习思想,提出了一种基于主动学习的离群点集成检测方法OMAL(Outlier Mining based on Active Learning)。在主动学习框架指导下,根据各种基学习器的对比分析,选择了基于统计的、基于相似性的、基于子空间划分的三个无监督模型作为基学习器。将各基学习器评判的处于离群和正常边界的数据整合后呈现给人类专家进行标注,以最大化人类专家反馈的信息量;从标注的数据集和各基学习器投票产生的数据集中抽样,基于GBM(Gradient BoostingMachine)训练一个有监督二元分类模型,并将该模型应用于全数据集,得出最终的挖掘结果。实验表明,提出方法的AUC有了较为明显的提升,且具有良好的运行效率,具备较好的实用价值。

关 键 词:离群检测  主动学习  模型集成  

Research of Outlier Ensemble Mining Based on Active Learning
ZHAO Xiaoyong,WANG Ningning,WANG Lei.Research of Outlier Ensemble Mining Based on Active Learning[J].Computer Engineering and Applications,2020,56(12):112-117.
Authors:ZHAO Xiaoyong  WANG Ningning  WANG Lei
Affiliation:School of Information and Management, Beijing Information Science & Technology University, Beijing 100129, China
Abstract:Outlier detection tasks usually lack available labeled data, and outlier data only accounts for a small part of the entire data set. Compared to other data mining tasks, outlier detection is more difficult, and there is no single algorithm suitable for all the scenes. Therefore, combined with the idea of diversity model ensemble and active learning, this paper proposes an outlier ensemble detection method named Outlier Mining based on Active Learning(OMAL). Under the guidance of the active learning framework, five unsupervised models based on statistics, similarity and axis-parallel subspace are selected as the base learners according to the comparative analysis of various learners. Then, the outlier and normal boundary data of each base learner are integrated, filtered and presented to the human experts for labeling to maximize information feedback from the human experts. Sampling from the labeled dataset and the dataset generated by the voting of the base learners. A supervised binary classification model based on Gradient Boosting Machine(GBM) is trained and applied to the full dataset to mining the final results. Experiments show that the AUC of OMAL method has been significantly improved while providing good performance and practical value.
Keywords:outlier detection  active learning  model ensemble  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号