首页 | 本学科首页   官方微博 | 高级检索  
     

MapReduce框架下一种负载均衡的Top-k连接查询算法
引用本文:胡东明,刘旭敏,徐维祥.MapReduce框架下一种负载均衡的Top-k连接查询算法[J].计算机测量与控制,2018,26(8):238-242.
作者姓名:胡东明  刘旭敏  徐维祥
作者单位:首都师范大学 信息工程学院,首都师范大学 信息工程学院,北京交通大学 交通运输学院
基金项目:国家自然科学基金(61672002);北京市长城学者项目(CIT TCD20170322)
摘    要:针对传统Top-k连接查询算法在处理海量数据时的时效问题,提出一种基于MapReduce框架的负载均衡的并行Top-k连接查询算法(P-TKJ)。使用直方图形式来存储数据,有助于提高CPU的利用率。同时融入了提前终止策略和磁盘数据的选择性访问,以便提高对HDFS数据访问的性能。另外,提出了一种基于最长处理时间优先(LPT)算法的负载均衡策略来均衡Reduce任务,以此设计出高效的并行Top-k连接算法。一个集群实验结果表明,该方法能够有效缩短算法的执行时间。

关 键 词:Top-k连接查询  MapReduce框架  负载均衡  执行时间
收稿时间:2018/1/11 0:00:00
修稿时间:2018/1/29 0:00:00

A Load Balancing Top-k Join Query Algorithm in MapReduce Framework
Affiliation:College of Information Engineering, Capital Normal University,College of Information Engineering, Capital Normal University,College of Traffic and Transportation,Beijing Jiaotong University
Abstract:For the issues that the time efficiency problem of traditional Top-k join algorithm when dealing with massive data, a load-balanced parallel Top-k join query algorithm (P-TKJ) based on MapReduce framework is proposed. It used histograms to store data helps to increase CPU utilization. An early termination strategy and disk data selective access mechanism is incorporated to improve the performance of HDFS data access. In addition, a load balancing strategy based on the longest processing time-first (LPT) algorithm is proposed to balance the Reduce tasks, so as to design an efficient parallel Top-k join algorithm. A cluster experiment shows that this method can shorten the execution time of the algorithm effectively.
Keywords:Top-k join query  MapReduce framework  Load balancing  Execution time
点击此处可从《计算机测量与控制》浏览原始摘要信息
点击此处可从《计算机测量与控制》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号