首页 | 本学科首页   官方微博 | 高级检索  
     

基于随机投影与集成学习的离群点检测算法
引用本文:郭一阳,于炯,杜旭升,曹铭.基于随机投影与集成学习的离群点检测算法[J].计算机应用研究,2022,39(9).
作者姓名:郭一阳  于炯  杜旭升  曹铭
作者单位:新疆大学,新疆大学,新疆大学,中国海洋大学
基金项目:国家自然科学基金资助项目(61862060,61462079,61562086,61562078)
摘    要:针对传统基于相似度的离群点检测算法在高维不均衡数据集上效果不够理想的问题,提出一种新颖的基于随机投影与集成学习的离群点检测(ensemble learning and random projection-based outlier detection,EROD)框架。算法首先集成多个随机投影方法对高维数据进行降维,提升数据多样性;然后集成多个不同的传统离群点检测器构建异质集成模型,增加算法鲁棒性;最后使用异质模型对降维后的数据进行训练,训练后的模型经过两次优化组合以降低泛化误差,输出最终的对象离群值,离群值高的对象被算法判定为离群点。分别在四个不同领域的高维不均衡真实数据集上进行对比实验,结果表明该算法与传统离群点检测算法和基于集成学习的离群点检测算法相比,在AUC和precision@n值上平均提高了3.6%和14.45%,证明EROD算法具有处理高维不均衡数据异常的优势。

关 键 词:数据挖掘    离群点检测    随机投影    集成学习
收稿时间:2022/2/10 0:00:00
修稿时间:2022/8/20 0:00:00

Outlier detection algorithm based on random projection and ensemble learning
GUO Yiyang,Yu Jiong,Du Xusheng and Cao Ming.Outlier detection algorithm based on random projection and ensemble learning[J].Application Research of Computers,2022,39(9).
Authors:GUO Yiyang  Yu Jiong  Du Xusheng and Cao Ming
Affiliation:Xinjiang University,,,
Abstract:To address the problem that traditional similarity-based outlier detection algorithms were not effective enough on high-dimensional unbalanced datasets, this paper proposed a novel ensemble learning and random projection-based outlier detection(EROD) framework. Firstly, the EROD algorithm integrated several random projection methods to reduce the dimensionality of high-dimensional data, which improved the data diversity. Secondly, it integrated several different traditional outlier detectors to build a heterogeneous ensemble model, which increased the robustness of the algorithm. Finally, the EROD acquired the final outlier value of the object by using the heterogeneous ensemble model to train the reduced-dimensional data and by using two optimal combinations of the trained model to reduce the total error, and the algorithm determined the object with high outlier value as outlier point. The results show that the algorithm has an average improvement of 3.6% and 14.45% in AUC and precision@n value compared with the traditional outlier detection algorithm and the outlier detection algorithm based on ensemble learning. Therefore, the EROD algorithm has the advantage of handling the anomalies of high-dimensional unbalanced data.
Keywords:data mining  outlier detection  random projection  ensemble learning
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号