首页 | 本学科首页   官方微博 | 高级检索  
     

高维大数据分析的无监督异常检测方法
引用本文:邹承明,陈德.高维大数据分析的无监督异常检测方法[J].计算机科学,2021,48(2):121-127.
作者姓名:邹承明  陈德
作者单位:交通物联网技术湖北省重点实验室 武汉 430070;武汉理工大学计算机科学与技术学院 武汉 430070;鹏城实验室 广东 深圳 518000;武汉理工大学计算机科学与技术学院 武汉 430070
摘    要:高维数据的无监督异常检测是机器学习的重要挑战之一。虽然先前基于单一深度自动编码器和密度估计的方法已经取得了显著的进展,但是其仅通过一个深度自编码器来生成低维表示,这表明没有足够的信息来执行后续的密度估计任务。为了解决上述问题,文中提出了一种混合自动编码器高斯混合模型(Mixed Auto-encoding Gaussian Mixture Model,MAGMM)。MAGMM使用混合自动编码器来代替单一深度自动编码器生成串联的低维表示,因此它可以保存来自输入样本的特定集群的关键信息。此外,其利用分配网络来约束混合自动编码器,这样每个样本都可以分配给一个占主导地位的自动编码器。利用上述机制,MAGMM避免了陷入局部最优,降低了重构误差,从而可以促进密度估计任务的完成,提高高维数据异常检测的准确性。实验结果表明,该方法优于DAGMM,并在标准F1分数上提高了29%。

关 键 词:数据挖掘  无监督异常检测  降维  高斯混合模型  密度估计

Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis
ZOU Cheng-ming,CHEN De.Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis[J].Computer Science,2021,48(2):121-127.
Authors:ZOU Cheng-ming  CHEN De
Affiliation:(Hubei Key Laboratory of Transportation Internet of Things Technology,Wuhan 430070,China;School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China;Peng Cheng Laboratory,Shenzhen,Guangdong 518000,China)
Abstract:Unsupervised anomaly detection on high-dimensional data is one of the most significant challenges in machine learning.Although previous approaches based on single deep auto-encoder and density estimations have made significant progress,they generate low-dimensional representations as they use only a single deep auto-encoder,indicating that there is insufficient information to perform the subsequent density estimation task.To address the above challenge,a mixed auto-encoding gaussian mixture model(MAGMM)is proposed in this paper.MAGMM substitutes a single deep auto-encoder with a mixture of auto-encoders to generate concatenated low-dimensional representations,so that it can preserve key information from a specific cluster of the input sample.In addition,it utilizes an allocation network to constrain the mixture of auto-encoders,so that each sample can be assigned to a dominant auto-encoder.With the above mechanisms,MAGMM avoids from trapping into local optima and reduces the reconstruction errors,which can facilitate completing the density estimation tasks and improve the accuracy of high-dimensional data anomaly detection.Experimental results show that the proposed method performs better than DAGMM and achieves up to 29% improvement based on the standard F1 score.
Keywords:Data mining  Unsupervised anomaly detection  Dimensionality reduction  Gaussian mixture model  Density estimation
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号