首页 | 本学科首页   官方微博 | 高级检索  
     

基于插值的高维稀疏数据离群点检测方法
引用本文:陈旺虎,田真,张礼智,梁小燕,高雅琼. 基于插值的高维稀疏数据离群点检测方法[J]. 计算机工程与科学, 2020, 42(6): 966-972
作者姓名:陈旺虎  田真  张礼智  梁小燕  高雅琼
作者单位:(西北师范大学计算机科学与工程学院, 甘肃 兰州 730070)
摘    要:离群点检测问题中的数据可被看作是正常点与异常点在空间中的高度混合,在减少正常点损失的前提下,离群点通常包含在离聚类中心最远的样本集中。受这种思想启发,提出一种针对高维稀疏数据的基于插值的离群点检测方法,该方法在K-means基础上应用遗传算法对原始数据进行插值处理,解决了K-means聚类中稀疏数据容易被合并的问题。实验结果表明,对比基于传统K-means聚类的离群点检测方法以及几种典型的基于改进K-means的检测方法,本文方法损失的正常点更少,提高了检测的准确率和精确率。

关 键 词:稀疏数据  离群点检测  插值  聚类  遗传算法
收稿时间:2019-04-22
修稿时间:2019-12-11

An interpolation based outlier detection method of sparse high-dimensional data
CHEN Wang-hu,TIAN Zhen,ZHANG Li-zhi,LIANG Xiao-yan,GAO Ya-qiong. An interpolation based outlier detection method of sparse high-dimensional data[J]. Computer Engineering & Science, 2020, 42(6): 966-972
Authors:CHEN Wang-hu  TIAN Zhen  ZHANG Li-zhi  LIANG Xiao-yan  GAO Ya-qiong
Affiliation:(College of Computer Science & Engineering,Northwest Normal University,Lanzhou 730070,China)
Abstract:The data in the outlier detection problem can be considered as the mixture of normal and abnormal points in a space. Under the premise of reducing the loss of normal points, outliers are usually contained in the sample sets farthest from all clustering centroids. Inspired by this idea, this paper proposes an interpolation-based outlier detection method for sparse high-dimensional data. This method interpolates the original data by applying genetic algorithm on the basis of k-means clustering, solving the problem that sparse data in k-means clustering is easy to be merged. Experimental results show that, compared with traditional outlier detection methods based on k-means clustering and several typical detection methods based on improved k-means clustering, the proposed method can not only lose fewer normal points, but also improve the accuracy and precision of detection.
Keywords:sparse data  outlier detection  interpolation  clustering  genetic algorithm  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号