首页 | 本学科首页   官方微博 | 高级检索  
     

基于MapReduce的高能物理数据分析系统
引用本文:臧冬松,霍 菁,梁 栋,等. 基于MapReduce的高能物理数据分析系统[J]. 计算机工程, 2014, 0(2): 1-5
作者姓名:臧冬松  霍 菁  梁 栋  
作者单位:[1]中国科学院高能物理研究所,北京100049 [2]中国科学院大学,北京100049
基金项目:国家自然科学基金资助重点项目(90912004)
摘    要:将MapReduce思想引入到高能物理数据分析中,提出一个基于Hadoop框架的高能物理数据分析系统。通过建立事例的TAG信息数据库,将需要进一步分析的事例数减少2~3个数量级,从而减轻I/O压力,提高分析作业的效率。利用基于TAG信息的事例预筛选模型以及事例分析的MapReduce模型,设计适用于ROOT框架的数据拆分、事例读取、结果合并等MapReduce类库。在北京正负电子对撞机实验上进行系统实现后,将其应用于一个8节点实验集群上进行测试,结果表明,该系统可使4×106个事例的分析时间缩短23%,当增加节点个数时,每秒钟能够并发分析的事例数与集群的节点数基本呈正比,说明事例分析集群具有良好的扩展性。

关 键 词:高能物理  大数据  数据分析  MapReduce模型  集群  分布式计算

High Energy Physics Data Analysis System Based on MapReduce
ZANG vong-song,HUO Jing,LIANG Dong,SUN Gong-xing. High Energy Physics Data Analysis System Based on MapReduce[J]. Computer Engineering, 2014, 0(2): 1-5
Authors:ZANG vong-song  HUO Jing  LIANG Dong  SUN Gong-xing
Affiliation:1. Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China; 2. University of Chinese Academy of Sciences, Beijing 100049, China)
Abstract:This paper brings the idea of MapReduce parallel processing to high energy physics data analysis, proposes a high energy physics data analysis system based on Hadoop framework. It significantly reduces the number of events that need to do further analysis by 2-3 classes by establishing an event TAG information database, which reduces the I/O volume and improves the efficiency of data analysis jobs. It designs proper MapReduce libs that fit for the ROOT framework to do things such as data splitting, event fetching and result merging by using event pre-selection model based on TAG information and MapReduce model of event analysis. A real system is implemented on BESIII experiment, an 8-nodes cluster is used for data analysis system test, the test result shows that the system shortens the data analyzing time by 23% of 4x l06 event, and event number of concurrence analysis per second is higher than cluster nodes when adding more worker nodes, which explains that the case analysis cluster has a good scalability.
Keywords:high energy physics  big data  data analysis  MapReduce model  cluster  distributed computing
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号