首页 | 本学科首页   官方微博 | 高级检索  
     

高维空间中的离群点发现
引用本文:魏藜,宫学庆,钱卫宁,周傲英.高维空间中的离群点发现[J].软件学报,2002,13(2):280-290.
作者姓名:魏藜  宫学庆  钱卫宁  周傲英
作者单位:复旦大学,计算机科学与工程系,上海,200433
基金项目:国家自然科学基金资助项目(60003016;60003008);国家重点基础研究发展规划973资助项目(G1998030404)
摘    要:在许多KDD(knowledge discovery in databases)应用中,如电子商务中的欺诈行为监测,例外情况或离群点的发现比常规知识的发现更有意义.现有的离群点发现大多是针对数值属性的,而且这些方法只能发现离群点,不能对其含义进行解释.提出了一种基于超图模型的离群点(outlier)定义,这一定义既体现了"局部"的概念,又能很好地解释离群点的含义.同时给出了HOT(hypergraph-based outlier test)算法,通过计算每个点的支持度、隶属度和规模偏差来检测离群点.该算法既能够处理数值属性,又能够处理类别属性.分析表明,该算法能有效地发现高维空间数据中的离群点.

关 键 词:数据挖掘  离群点  超图模型  聚类
文章编号:1000-9825/2002/13(02)0280-11
收稿时间:2001/4/20 0:00:00
修稿时间:2001年4月20日

Finding Outliers in High-Dimensional Space
WEI Li,GONG Xue-qing,QIAN Wei-ning and ZHOU Ao-ying.Finding Outliers in High-Dimensional Space[J].Journal of Software,2002,13(2):280-290.
Authors:WEI Li  GONG Xue-qing  QIAN Wei-ning and ZHOU Ao-ying
Abstract:For many KDD (knowledge discovery in databases) applications, such as fraud detection in E-commerce, it is more interesting to find the exceptional instances or the outliers than to find the common knowledge. Most existing work in outlier detection deals with data with numerical attributes. And these methods give no explanation to the outliers after finding them. In this paper, a hypergraph-based outlier definition is presented, which considers the locality of the data and can give good explanation to the outliers,and it also gives an algorithm called HOT(hypergraph-based outlier test) to find outliers by counting three measurements,the support,belongingness and deviation of size,for each vertex in the hypergraph.This algorithm can manage both numerical attributes and categorical attributes.Analysis shows that this approach can find the outliers in high-dimensionsal space effctively.
Keywords:data mining  outlier  hypergraph model  clustering
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号