首页 | 本学科首页   官方微博 | 高级检索  
     

基于Hadoop的多关键字排序方法研究
引用本文:周国军. 基于Hadoop的多关键字排序方法研究[J]. 计算机工程与应用, 2016, 52(17): 79-83
作者姓名:周国军
作者单位:玉林师范学院 数学与信息科学学院,广西 玉林 537000
摘    要:在单机环境下按多关键字对大数据排序需要较长的执行时间,为了提高按多关键字对大数据排序的效率,根据Hadoop的MapReduce模型,给出了两种基于Hadoop的多关键字排序方法。方法一在Reduce函数中使用链式基数排序算法按多关键字对大数据并行排序,利用多个节点的计算能力提高排序的效率。方法二通过定义组合键和比较器实现了对记录的多个关键字按字节比较,节省了将字节流反序列化为对象的时间。通过实验测试了两种方法的性能,实验结果表明,两种方法均能取得较高的排序效率和较好的可扩展性。

关 键 词:Hadoop  MapReduce模型  多关键字排序  基数排序  

Study of multi-keywords sorting method based on Hadoop
ZHOU Guojun. Study of multi-keywords sorting method based on Hadoop[J]. Computer Engineering and Applications, 2016, 52(17): 79-83
Authors:ZHOU Guojun
Affiliation:School of Mathematics and Information Science, Yulin Normal University, Yulin, Guangxi 537000, China
Abstract:It takes a long time to sort big data by multi-keywords with single machine. In order to improve the efficiency of sorting, two methods of multi-keywords sort are given according to MapReduce model of Hadoop. In method one, chain radix sort algorithm is used by Reduce function to sort big data by multi-keywords in parallel, which can improve the efficiency of sorting with multiple nodes. In method two, composite key and comparator are defined, which implements multi-keywords comparison between records by byte so that it can save more time on deserializing objects. The performance of the two methods is tested by experiments. The experimental results show that the two methods can achieve high sorting efficiency and good scalability.
Keywords:Hadoop  MapReduce model  multi-keywords sort  radix sort  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号