The <Emphasis Type="Italic">k</Emphasis>-Nearest Neighbour Join: Turbo Charging the KDD Process期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

The <Emphasis Type="Italic">k</Emphasis>-Nearest Neighbour Join: Turbo Charging the KDD Process

Authors:	Email author" target="_blank">Christian?B?hm Email author Florian?Krebs

Affiliation:	(1) University of Munich, Oettingenstr. 67, 80538 München, Germany;(2) University of Munich, München, Germany

Abstract:	The similarity join has become an important database primitive for supporting similarity searches and data mining. A similarity join combines two sets of complex objects such that the result contains all pairs of similar objects. Two types of the similarity join are well-known, the distance range join, in which the user defines a distance threshold for the join, and the closest pair query or k-distance join, which retrieves the k most similar pairs. In this paper, we propose an important, third similarity join operation called the k-nearest neighbour join, which combines each point of one point set with its k nearest neighbours in the other set. We discover that many standard algorithms of Knowledge Discovery in Databases (KDD) such as k-means and k-medoid clustering, nearest neighbour classification, data cleansing, postprocessing of sampling-based data mining, etc. can be implemented on top of the k-nn join operation to achieve performance improvements without affecting the quality of the result of these algorithms. We propose a new algorithm to compute the k-nearest neighbour join using the multipage index (MuX), a specialised index structure for the similarity join. To reduce both CPU and I/O costs, we develop optimal loading and processing strategies.

Keywords:	Data mining Knowledge discovery in databases (KDD) Similarity join Nearest neighbour Multimedia database High-dimensional indexing
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏