首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于改进的链式MapReduce的并行ETL应用
引用本文:吴斌,刘心光. 一种基于改进的链式MapReduce的并行ETL应用[J]. 电信科学, 2013, 29(12): 1-8. DOI: 10.3969/j.issn.1000-0801.2013.12.001
作者姓名:吴斌  刘心光
作者单位:北京邮电大学计算机学院通信软件工程中心 北京100876
基金项目:国家自然科学基金资助项目
摘    要:介绍了并行ETL 的相关工作和常见的处理多MapReduce 作业流程的方法;提出一种改进的链式MapReduce 框架,并将此框架应用于一个并行ETL 工具,同时提出一些针对ETL 处理的流程级优化规则,使ETL流程产生更少的MapReduce作业,从而减少I/O以及网络传输的消耗;利用某省份手机上网数据与Hive进行了大数据对比实验,结果表明,本ETL工具的性能平均比Hive快10%~20%。

关 键 词:ETL  优化规则  改进的链式MapReduce  

A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework
Bin Wu,Xinguang Liu. A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework[J]. Telecommunications Science, 2013, 29(12): 1-8. DOI: 10.3969/j.issn.1000-0801.2013.12.001
Authors:Bin Wu  Xinguang Liu
Affiliation:Telecommunication and Software Engineering Center, School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
Abstract:The related work in parallel ETL and common methods to deal with multiple MapReduce jobs were introduced. Then an improved chain-MapReduce framework was presented, based on this framework,a parallel ETL tool was designed. Several optimization rules on ETL which will make the ETL process generate less MapReduce jobs to avoid unnecessary I/O and network cost were presented. The ETL tool on real queries and real big datasets were evaluated. Compared with Hive, the tool reduces time on average by 10% to 20%.
Keywords:improved chain-MapReduce  ETL  optimization rule  
点击此处可从《电信科学》浏览原始摘要信息
点击此处可从《电信科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号