首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于迭代聚类的并行应用性能分析方法
引用本文:朱鹏,李巍,李云春.一种基于迭代聚类的并行应用性能分析方法[J].软件学报,2010,21(Z1):284-289.
作者姓名:朱鹏  李巍  李云春
作者单位:北京航空航天大学 网络技术北京市重点实验室,北京 100191;北京航空航天大学 网络技术北京市重点实验室,北京 100191;北京航空航天大学 网络技术北京市重点实验室,北京 100191
基金项目:Supported by the National High-Tech Research and Development Plan of China under Grant No.2007AA01A127 (国家高技术研究发展计划(863))
摘    要:随着超级计算机的发展,其使用到的核心数逐渐达到数十万,而且运行于其上的应用的复杂性也不断加大.因此,开发人员需要对并行应用的性能进行测量,并做出分析,以便对程序源码进行优化,提高程序的执行效率.但是由于核心数的大量增加,对并行程序性能进行测量将得到海量的性能数据,如何处理海量性能数据,以便分析并行程序性能成为一个难点.介绍了一种基于迭代聚类的并行应用性能分析方法,该方法使用数据挖掘的聚类算法处理处理海量性能数据,并可以根据条件迭代执行,确定影响并行程序性能的函数和进程,然后通过贝叶斯信息准则评价聚类结果,以确定迭代聚类的可靠性,最后用实验证明了方法的有效性.

关 键 词:海量数据  并行应用  聚类分析  性能测量  性能分析
收稿时间:2010/6/15 0:00:00
修稿时间:2010/12/10 0:00:00

An Iterative Clustering Based Approach for Parallel Performance Analysis
ZHU Peng,LI Wei and LI Yun-Chun.An Iterative Clustering Based Approach for Parallel Performance Analysis[J].Journal of Software,2010,21(Z1):284-289.
Authors:ZHU Peng  LI Wei and LI Yun-Chun
Affiliation:Beijing Key Laboratory of Network Technology, Beihang University, Beijing 100191, China;Beijing Key Laboratory of Network Technology, Beihang University, Beijing 100191, China;Beijing Key Laboratory of Network Technology, Beihang University, Beijing 100191, China
Abstract:With the development of supercomputers, the CPU core numbers of which come to several hundreds of thousands, and on which the complexity of the applications run are increasing. Therefore, in order to optimize the source code of the programs, developers of parallel applications need to measure the performance of parallel applications and make a useful analysis, so that they can improve the performance of the applications. However, due to a substantial increasing of the CPU core numbers, performance measurement will produce vast amounts of performance data, and then, how to deal with massive data is a very critical problem for parallel performance analysis. A new approach, named Iterative based Clustering Approach for Parallel Performance Analysis (ICAPPA), is proposed for parallel performance analysis in this paper. In this approach, clustering method of data mining technique, which is used to processing massive data, will be carried out iteratively for the result in some conditions after previous clustering, to find out the dominating functions and processes of the parallel performance. And Bayesian Information Criteria (BIC) is applied to evaluate the result of clustering method. By using BIC score, whether iterative clustering applied to the result is reliable or not can be decided. And at the end of this paper, the validity of that approach is verified by experimental analysis.
Keywords:massive data  parallel application  clustering analysis  performance measurement  performance analysis
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号