首页 | 本学科首页   官方微博 | 高级检索  
     

MapReduce框架下的优化高维索引与KNN查询
引用本文:梁俊杰,李凤华,刘琼妮,尹利. MapReduce框架下的优化高维索引与KNN查询[J]. 电子学报, 2016, 44(8): 1873-1880. DOI: 10.3969/j.issn.0372-2112.2016.08.015
作者姓名:梁俊杰  李凤华  刘琼妮  尹利
作者单位:1. 湖北大学计算机与信息工程学院, 湖北武汉 430062;2. 中国科学院信息工程研究所信息安全国家重点实验室, 北京 100093;3. 北京电子科技学院, 北京 100070
基金项目:国家发改委2012年信息安全专项(No.发改办高技[2013]1309);国家自然科学基金(No.61170251);湖北省自然科学基金重点项目(No.2013CFA115);武汉市科技攻关计划(2013012401010851)
摘    要:针对大规模高维数据近似查询效率低下的问题,利用MapReduce编程模型在大规模集群上的数据与任务的并行计算与处理优势,提出MapReduce框架下大规模高维数据索引及KNN查询方法(iPBM),重点突破MapReduce数据块(block)的优化划分与各数据块对计算的共同贡献两大难题,利用两阶段数据划分策略并依据相关性与并行性原则将数据均匀分配到各数据块中,设计分布式的双层空间索引结构与并行KNN查询算法,检索时利用全局索引、局部索引与二维位码索引实现三层数据过滤,大幅缩小搜索范围并降低高维向量计算代价,实验表明iPBM对大规模高维数据的近似查询具有准确性、高效性和扩展性.

关 键 词:云计算  MapReduce  KNN查询  高维索引  
收稿时间:2014-12-31

Optimized High-Dimensional Index and KNN Query in MapReduce
LIANG Jun-jie,LI Feng-hua,LIU Qiong-ni,YIN Li. Optimized High-Dimensional Index and KNN Query in MapReduce[J]. Acta Electronica Sinica, 2016, 44(8): 1873-1880. DOI: 10.3969/j.issn.0372-2112.2016.08.015
Authors:LIANG Jun-jie  LI Feng-hua  LIU Qiong-ni  YIN Li
Affiliation:1. Department of Computer Science and Technology, Hubei University, Wuhan, Hubei 430062, China;2. State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;3. Beijing Electronic Science & Technology Institute, Beijing 100070, China
Abstract:To address the low efficiency problem caused by the approximate large-scale high-dimensional data query, we propose a novel high-dimensional index and KNN query method,called iPBM,which exploits two main problems,inclu-ding the optimal division on the MapReduce’s data block and their contributions to the computing.Specifically,based on the principles of relativity and parallelity,iPBM employs a two-phase partitioning scheme of clustering and zoning to equally split the data to the available blocks,then we design a distributed two-layer index structure and parallel KNN query algo-rithm.With fully considering the global index,local index and two-dimensional bitcode property,iPBM achieves triple-layers filtering,and thus the number of queried area and the computing cost on the high-dimensional data is minimized.The accura-cy,efficiency and scalability of the proposed iPBM are thoroughly evaluated via detailed simulations.
Keywords:cloud  MapReduce  KNN query  high-dimensional index
本文献已被 万方数据 等数据库收录!
点击此处可从《电子学报》浏览原始摘要信息
点击此处可从《电子学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号