首页 | 本学科首页   官方微博 | 高级检索  
     

MapReduce框架下基于B 树的高维索引
引用本文:梁俊杰,肖瑶,余敦辉.MapReduce框架下基于B 树的高维索引[J].计算机应用研究,2016,33(3).
作者姓名:梁俊杰  肖瑶  余敦辉
作者单位:湖北大学,湖北大学,湖北大学
基金项目:省自然科学基金资助项目;其它
摘    要:针对MapReduce数据块处理机制、高维数据分布特征和KNN查询需求,本文设计一种基于B 树的高维索引结构(iPartition),创新性提出基于主成分区分度的优化数据划分策略和邻接数据域分散存储等原则,将数据均匀划分到不同的Slave节点,使尽可能多的数据域对计算共同贡献,提升MapReduce任务处理并行性;利用B 树构造分布式的双层索引实现查询时数据范围快速过滤,降低高维计算代价。实验表明,iPartition在高维数据近似查询环境下,具有良好的性能和扩展性。

关 键 词:大数据  MapReduce  KNN查询  高维索引
收稿时间:2014/11/3 0:00:00
修稿时间:2015/1/17 0:00:00

High dimensional index based on B tree in MapReduce framework
Liang Junjie,Xiao Yao and Yu dun hui.High dimensional index based on B tree in MapReduce framework[J].Application Research of Computers,2016,33(3).
Authors:Liang Junjie  Xiao Yao and Yu dun hui
Affiliation:Hubei University,Hubei University,Hubei University
Abstract:On the basis of MapReduce processing mechanism for data blocks, high-dimensional data distribution characteristics and KNN query requirements, a novel high-dimensional index on B -tree called iPartition is proposed. For efficient parallel computing in MapReduce, the data is partitioned over multiple servers, where each server stores autonomously a fraction of the data. Towards this goal, data partitioning strategy is optimized with principal component discrimination and adjacency data blocks are stored in different nodes, thus the data is equally split to the available blocks and equal runtime is guaranteed. That distributed index files including a global index file on Master machine and a set of local index files on a cluster of Slave machines is used for efficiently identifying the servers that store KNN results. Furthermore, B -tree is extended to organize multi-dimensional data point in PCA space, thus the number of queried servers and transferred data is minimized. Experiment shows that iPartition performs efficiently, that provides a viable solution when a high degree of dimensionality is required, along with better scalability.
Keywords:
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号