首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于缓存的并发ETL数据流程处理框架
引用本文:罗后启,周伟,叶丹,于瑾维. 一种基于缓存的并发ETL数据流程处理框架[J]. 计算机应用与软件, 2012, 0(1): 88-91,144
作者姓名:罗后启  周伟  叶丹  于瑾维
作者单位:中国科学院软件研究所软件工程技术中心;中国科学院研究生院;江苏熔盛重工有限公司
基金项目:国家科技重大专项核高基项目(2009ZX01043-003-001,2010ZX01045-001-010);国家科技支撑计划(2009BAG18B00)
摘    要:ETL(Extraction-Transformation-Loader)是企业内部和企业间信息资源交换和共享的关键技术。随着企业数据量的剧增,如何提高数据处理能力和执行效率成为ETL需要解决的难题之一。提出一个基于缓存的并发ETL数据流程处理框架,该框架使用基于组件分类的缓存复用技术来降低内存消耗和数据拷贝次数;同时使用一种并发的数据处理流程调度执行策略,该策略具有任务、流水线、数据处理多粒度并行的特点。该方法已在网驰平台ONCE DQ实现并得到验证。

关 键 词:数据集成  数据流程  并发  缓存复用

A BUFFER-BASED PARALLEL ETL DATA FLOW PROCESSING FRAMEWORK
Luo Houqi,Zhou Wei,Ye Dan,Yu Jinwei. A BUFFER-BASED PARALLEL ETL DATA FLOW PROCESSING FRAMEWORK[J]. Computer Applications and Software, 2012, 0(1): 88-91,144
Authors:Luo Houqi  Zhou Wei  Ye Dan  Yu Jinwei
Affiliation:1(Technology Center of Software Engineering,Institute of Software,Chinese Academy of Sciences,Beijing 100190,China) 2(Graduate University of Chinese Academy of Sciences,Beijing 100190,China) 3(Jiangsu Rongsheng Heavy Industry Group Co.,Ltd.,Rugao 226532,Jiangsu,China)
Abstract:ETL is a key technology for information exchanging and sharing inside an enterprise or among enterprises.With the rapid increase of enterprise data volumes,it has become one of the hard problems for ETL to solve how to improve the data processing capacity and execution efficiency.The paper proposes a buffer-based parallel ETL data flow processing framework.The Framework uses component classification based buffer reusing technology to save memory consumption and decrease data copying frequency.At the mean time a parallel data processing flow scheduling execution strategy is used,which bears such characteristics as tasking,pipelining,and data processing multi-granularity paralleling.The method has been realized and validated on ONCE DQ Platform.
Keywords:Data integration Data flow Parallel Buffer reuse
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号