首页 | 本学科首页   官方微博 | 高级检索  
     

面向高性能计算机的海量数据处理平台实现与评测
引用本文:黄訸,易晓东,李姗姗,廖湘科.面向高性能计算机的海量数据处理平台实现与评测[J].计算机研究与发展,2012(Z1):357-361.
作者姓名:黄訸  易晓东  李姗姗  廖湘科
作者单位:国防科学技术大学计算机学院
基金项目:国家“八六三”高技术研究发展计划基金项目(2010ZX01045-001-002-5)
摘    要:高性能计算机主要应用于传统的科学计算领域,而在云计算时代,数据密集型应用成为一大类新型应用,已经变得越来越重要.主要探索如何在高性能计算机上高效地进行海量数据处理,使高性能计算机在进行科学计算的同时,能够非常好地支持数据密集型应用,拓展高性能计算机的应用领域.分析了高性能计算机上MapReduce模型实现和部署的可行性之后,在高性能计算环境中进行了实验.实验结果表明,存储系统的并行I/O能力不能充分发挥,是造成系统无法高效运行的主要瓶颈.而导致这个性能瓶颈的原因,是高并发带来的对集群文件系统资源的竞争和冲突.最后,提出了几种解决集群文件系统资源冲突的方案,这是今后的研究方向.

关 键 词:高性能计算机  海量数据处理  MapReduce编程模型

Implementation and Evaluation of Massive Data Processing Paradigm on High Performance Computers
Huang He, Yi Xiaodong, Li Shanshan, and Liao Xiangke.Implementation and Evaluation of Massive Data Processing Paradigm on High Performance Computers[J].Journal of Computer Research and Development,2012(Z1):357-361.
Authors:Huang He  Yi Xiaodong  Li Shanshan  and Liao Xiangke
Affiliation:(College of Computer, National University of Defense Technology, Changsha 410073)
Abstract:High Performance Computers (HPCs) usually deal with problems in traditional scientific computing area, i.e., computation-intensive applications. In the era of cloud computing, data-intensive applications are rising as a very attractive problem in many fields. This paper aims to explore how to make HPCs deal with massive data processing problems efficiently, and how to make HPCs support both the traditional computation-intensive applications and the novel data-intensive applications. After analyzing the feasibility of deploying MapReduce paradigm on HPCs, this paper has done experiments on real HPC environment. The results show that, the bottleneck of the MapReduce paradigm on HPCs is the storage subsystem. And the degradation of parallel I/O performance of the storage subsystem is due to the contention and collision of cluster file system resources which is caused by high concurrency. At last, several approaches are proposed to deal with the cluster file system resource contention problem, and these are the directions of our future work.
Keywords:high performance computer  massive data processing  MapReduce paradigm
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号