首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于加权KNN的大数据集下离群检测算法
引用本文:王茜,杨正宽.一种基于加权KNN的大数据集下离群检测算法[J].计算机科学,2011,38(10):177-180.
作者姓名:王茜  杨正宽
作者单位:重庆大学计算机学院 重庆400044
基金项目:国家自然科学基金项目(61073058)资助
摘    要:传统KNN算法是在基于距离的离群检测算法的基础上提出的一种在大数据集下进行离群点挖掘的算法, 然而KNN算法只以最近的第k个部居的距离作为判断是否是离群点的标准有时也失准确性。给出了一种在大数据 集下基于KNN的离群点检测算法,即在传统KNN方法的基础上为每个数据点增加了权重,权重值为与最近的k个 邻居的平均距离,离群点为那些与第k个部居的距离最大且相同条件下权重最大的点。算法能提高离群点检测的准 确性,通过实验验证了算法的可行性,并与传统KNN算法的性能进行了对比。

关 键 词:离群点,数据挖掘,权重,划分

Algorithm for Outlier Detection in Large Dataset Based on Weighted KNN
WANG Qian,YANG Zheng-kuan.Algorithm for Outlier Detection in Large Dataset Based on Weighted KNN[J].Computer Science,2011,38(10):177-180.
Authors:WANG Qian  YANG Zheng-kuan
Affiliation:(College of Computer Science, C;hongqing University, Chongqing 400044, China)
Abstract:Traditional KNN is an advanced algorithm based on the distance of outlicr detection algorithm on large data- set. However this algorithm only uses the k`h nearest neighbor as the criterion for outher which is inaccurate under cer- lain condition. This paper presented a weighted KNN outlier detection algorithm for large datasets. In this algorithm, a weight factor is presented. It represents the average distance of its k nearest neighbors. The outlicrs arc those having the largest distance with it's k`h neighbor and having the biggest weight under the same condition. The algorithm improves the accuracy of the outlicr detection algorithm. Experiment result shows that the algorithm is feasible compared with the traditional KNN.
Keywords:Outfier  Data mining  Weight  Partition
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号