首页 | 本学科首页   官方微博 | 高级检索  
     

基于Isolation Forest的并行化异常探测设计
引用本文:侯泳旭,段磊,秦江龙,秦攀,唐常杰.基于Isolation Forest的并行化异常探测设计[J].计算机工程与科学,2017,39(2):236-244.
作者姓名:侯泳旭  段磊  秦江龙  秦攀  唐常杰
作者单位:;1.四川大学计算机学院;2.四川大学华西公共卫生学院;3.云南大学软件学院
基金项目:国家自然科学基金(61572332,61379032);中国博士后科学基金特别资助(2016T90850);中央高校基本科研业务费(2016SCU04A22)
摘    要:异常探测具有广泛的应用,受到了工业界和学术界的共同关注。在众多异常探测方法中,Isolation Forest算法具有执行效率高、探测准确度好的特点,获得了众多应用。但是,传统Isolation Forest算法难以处理大规模数据。为解决此问题,设计了一种基于云计算平台的算法。具体地,使用Hadoop分布式存储系统和MapReduce分布式计算框架设计并实现了基于Isolation Forest的并行化异常探测算法PIFH。通过将探测模型构建和数据异常评价的过程并行化,提升了PIFH算法探测异常的执行效率,扩展了其应用范围。利用真实世界数据集验证了所提算法的执行效率和可扩展性。

关 键 词:异常探测  云计算  并行化
收稿时间:2016-09-11
修稿时间:2017-02-25

Parallel anomaly detection based on Isolation Forest
HOU Yong xu,DUAN Lei,QIN Jiang long,QIN Pan,TANG Chang jie.Parallel anomaly detection based on Isolation Forest[J].Computer Engineering & Science,2017,39(2):236-244.
Authors:HOU Yong xu  DUAN Lei  QIN Jiang long  QIN Pan  TANG Chang jie
Affiliation:(1.School of Computer Science,Sichuan University,Chengdu 610065; 2.West China School of Public Health,Sichuan University,Chengdu 610041; 3.School of Software,Yunnan University,Kunming 650091,China) 
Abstract:Anomaly detection, which is used in a variety of applications, attracts attention both in industry and academia. Among numerous methods for anomaly detection, the Isolation Forest algorithm, whose characteristics include high efficiency, sound detection accuracy, has wide real world applications. However, the conventional Isolation forest algorithm can hardly deal with large scale data sets. To break this limitation, we propose a cloud computing platform based algorithm. Specifically, we design and implement a parallel algorithm for anomaly detection based on Isolation Forest, named PIFH,using the Hadoop distributed storage system and the MapReduce distributed computational framework. By parallelizing the processes of detection model construction and anomaly evaluation, its efficiency is improved, and the application range is also extended. Experiments using real world data sets demonstrate that the proposed algorithm is efficient and scalable.
Keywords:anomaly detection  cloud computing  parallelization  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号