首页 | 本学科首页   官方微博 | 高级检索  
     

利用分区和距离实现高维空间快速KNN查询
引用本文:梁俊杰,王长磊.利用分区和距离实现高维空间快速KNN查询[J].计算机研究与发展,2007,44(11):1980-1985.
作者姓名:梁俊杰  王长磊
作者单位:1. 湖北大学数学与计算机科学学院,武汉,430062;华中科技大学计算机科学与技术学院,武汉,430074
2. 湖北省公安厅行动技术总队,武汉,430072
基金项目:国家高技术研究发展计划(863计划)
摘    要:在高维空间KNN查询算法中,近似向量和一维转换表示法能有效克服维数灾难,结合这两种思想,提出一种基于区位码和距离的索引结构(BD)以实现快速KNN查询.根据高维空间向量分布特点,合理分区使得大量分布在空间表面的点尽可能地划分到不同的分区中,提高检索剪枝效率.引入区位码概念和转换函数,将高维向量近似表示并转换为一维数值形式,组织成B 树索引.利用快速KNN查询算法,实现两层过滤,缩小搜索范围,降低树搜索代价.采用模拟数据和真实数据,大量实验验证了BD比其他同类索引具有更高的检索效率.

关 键 词:高维向量空间  KNN查询  区位码  近似向量  索引结构  利用分区  距离  高维空间  快速  查询  Spaces  Search  Fast  Distance  检索效率  树索引  实验验证  真实数据  模拟数据  搜索代价  搜索范围  过滤  组织  数值  近似表示
修稿时间:2006-05-25

Indexing Bit-Code and Distance for Fast KNN Search in High-Dimensional Spaces
Liang Junjie,Wang Changlei.Indexing Bit-Code and Distance for Fast KNN Search in High-Dimensional Spaces[J].Journal of Computer Research and Development,2007,44(11):1980-1985.
Authors:Liang Junjie  Wang Changlei
Abstract:In the recent literature, a variety of index structures have been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one dimensional transformation can efficiently break the curse of dimensionality. Based on the two techniques above, a novel high-dimensional index, called bit-code and distance based index (BD) is proposed. On the basis of the data distribution in high-dimensional spaces, the BD partitions the surface of dimensionality in a special way, such that all the data points are divided into lots of partitions according to some cluster centroids. By the definitions of bit code and transformation function, a high-dimensional vector can be first approximately represented and then transformed into a one-dimensional vector, the key managed by a special B -tree. During the process of KNN search, the BD enables two levels filtering: the transformation function prunes away points that do not share similar distance ranges, while the bit code component filters away points by the lower bounded distance. A fast algorithm is presented for KNN search, by which one can greatly reduce the number of distance computations and the cost of the tree search. By using both synthetic and real data, the results of extensive experiments demonstrate that the BD outperforms the existing index structures for KNN search in high-dimensional spaces.
Keywords:high-dimensional vector space  KNN search  bit code  approximate vector  index structure
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号