首页 | 本学科首页   官方微博 | 高级检索  
     

基于孤立森林算法的取用水量异常数据检测方法
引用本文:赵臣啸,薛惠锋,王磊,万毅.基于孤立森林算法的取用水量异常数据检测方法[J].中国水利水电科学研究院学报,2020,18(1):31-39.
作者姓名:赵臣啸  薛惠锋  王磊  万毅
作者单位:中国航天系统科学与工程研究院, 北京 100048,中国航天系统科学与工程研究院, 北京 100048,中国航天系统科学与工程研究院, 北京 100048,水利部水资源管理中心, 北京 100053
基金项目:国家自然科学基金重点项目(U1501253)
摘    要:水资源管理系统中储存着海量的取用水量数据,通过筛选数据中的异常值定位异常取水行为,是水资源监管的重要手段。对取用水量数据中的异常值普遍缺乏明确定义,传统的异常值检测算法在实时性和稳定性方面存在不足。在总结归纳现阶段取用水量异常数据种类、特点的基础上,首先运用平均插值法对可直观识别异常值进行预处理,在预处理后的数据中随机取样训练,建立多个孤立二叉树形成孤立森林,以此为工具对数据样本进行异常值检测。对某供水公司连续两年日取水量监测数据的实证分析结果表明,基于孤立森林算法的异常值检测方法将数据样本的特征通过非监督学习方式存储在森林中,具有更高的稳定性;能够准确检测出数据样本中的异常值,相比于传统最小二乘拟合方法具有更高的检出率。

关 键 词:水资源监测  异常数据  平均插值  孤立森林  最小二乘拟合
收稿时间:2018/10/18 0:00:00

Water Consumption Abnormal Data Detection Method based on Isolation Forest
ZHAO Chenxiao,XUE Huifeng,WANG Lei and WAN Yi.Water Consumption Abnormal Data Detection Method based on Isolation Forest[J].Journal of China Institute of Water Resources and Hydropower Research,2020,18(1):31-39.
Authors:ZHAO Chenxiao  XUE Huifeng  WANG Lei and WAN Yi
Affiliation:China Aerospace Academy of Systems Science and Engineering, Beijing 100048, China,China Aerospace Academy of Systems Science and Engineering, Beijing 100048, China,China Aerospace Academy of Systems Science and Engineering, Beijing 100048, China and Water Resources Management Center, The Ministry of Water Resources of the People''s Republic of China, Beijing 100053, China
Abstract:Water resource management system store hugs amounts of data on water consumption,and it is an important means of water resource regulation to locate abnormal water intake behavior by screening the abnormal values in the data. These outliers lack effective classification. The traditional outlier detection algorithm has shortcomings in real-time and stability. On the basis of summarizing the types and characteristics of abnormal data of water consumption at the present stage, firstly, the average interpolation method is used to pre-process the outliers, and random sampling training is performed in the pre-processed data to establish multiple isolated binary trees to form isolation forest. The forest is used to perform outlier detection on data samples. The empirical analysis of the daily water intake monitoring data of a water supply company shows that the outlier detection method based on the isolation forest algorithm stores the characteristics of the data samples in the forest through unsupervised learning, which has higher stability and can accurately detect. The outliers in the data samples have a higher detection rate than the traditional least squares fitting method;they are suitable for real-time monitoring of water resources data.
Keywords:water resources monitoring  abnormal data  average interpolation  isolation forest  least squares
本文献已被 CNKI 等数据库收录!
点击此处可从《中国水利水电科学研究院学报》浏览原始摘要信息
点击此处可从《中国水利水电科学研究院学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号