首页 | 本学科首页   官方微博 | 高级检索  
     

基于Hadoop的电信大数据采集方案研究与实现
引用本文:汪保友,钱晶,袁时金.基于Hadoop的电信大数据采集方案研究与实现[J].电信科学,2017(1):135-142.
作者姓名:汪保友  钱晶  袁时金
摘    要:ETL是数据仓库实施过程中一个非常重要的步骤,设计一个能够对大数据进行有效处理的ETL流程以提高运营平台的采集效率,具有重要的实际意义.首先简单介绍某运营商大数据平台采集的主要数据内容.随后,为提升海量数据采集效率,提出了Hadoop与Oracle混搭架构解决方案.继而,提出一种动态触发式ETL调度流程与算法,与定时启动的ETL流程调度方式相比,可有效缩短部分流程的超长等待时间;有效避免资源抢占拥堵现象.最后,根据Hadoop和Oracle的系统运行日志,比较分析了两个平台的采集效率与数据量之间的关系.实践表明,混搭架构的大数据平台优势互补,可有效提升数据采集时效性,获得比较好的应用效果.


Research and implementation on acquisition scheme of telecom big data based on Hadoop
Abstract:ETL is a very important step in the implementation process of data warehouse.A good ETL flow is important,which can effectively process the telecom big data and improve the acquisition efficiency of the operation platform.Firstly,the main data content of the big data platform was expounded.Secondly,in order to improve the efficiency of massive data collection,Hadoop and Oracle mashup solution was suggested.Subsequently,a dynamic triggered ETL scheduling flow and algorithm was proposed.Compared with timer start ETL scheduling method,it could effectively shorten waiting time and avoid the phenomenon of resources to seize and congestion.Finally,according to the running log of Hadoop platform and Oracle database,the relationship between acquisition efficiency and data quantity was analyzed comparatively.Furthermore,practice result shows that the hybrid data structure of the big data platform complement each other and can effectively enhance the timeliness of data collection and access better application effect.
Keywords:
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号