首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
为了便于对异构平台下的并行程序性能进行分析,在对可视化技术和并行计算与控制显示平台研究的基础上设计了一种异构环境下的性能可视化模型.针对该模型的特点利用监测代码插桩技术、性能数据事后分析等方法,给出了并行性能数据获取、转换与绘图的具体方法和实现过程,为跨平台并行性能数据的采集和转换提供了一种简便方法.实验结果表明了在异构环境下该方法对并行性能数据可视化的可行性与有效性.  相似文献   

2.
基于事件跟踪的并行程序性能分析,就是通过分析各处理器采集的事件记录、计算程序对象的执行时间和探究事件间的相互关系,来揭示程序的性能问题。这一工作要求各处理器采集的事件时戳必须具有可比性。由于各种原因,通过测量获得的事件时戳往往是不同步的,这直接影响性能分析工作的开展。介绍处理器时钟误差的概念及产生原因、并行程序性能分析中的测量误差、时钟条件和时戳同步需求,最后介绍一种基于恒定时钟漂移的线性误差插值技术,在一定程度上解决了并行程序性能分析中的时戳同步问题。  相似文献   

3.
朱鹏  李巍  李云春 《软件学报》2010,21(Z1):284-289
随着超级计算机的发展,其使用到的核心数逐渐达到数十万,而且运行于其上的应用的复杂性也不断加大.因此,开发人员需要对并行应用的性能进行测量,并做出分析,以便对程序源码进行优化,提高程序的执行效率.但是由于核心数的大量增加,对并行程序性能进行测量将得到海量的性能数据,如何处理海量性能数据,以便分析并行程序性能成为一个难点.介绍了一种基于迭代聚类的并行应用性能分析方法,该方法使用数据挖掘的聚类算法处理处理海量性能数据,并可以根据条件迭代执行,确定影响并行程序性能的函数和进程,然后通过贝叶斯信息准则评价聚类结果,以确定迭代聚类的可靠性,最后用实验证明了方法的有效性.  相似文献   

4.
并行程序性能分析工具的一种主要设计思想是采用源程序们随法,而其中性能监测库是这类并行程序性能分析工具的重要组成部分,提出了玫种基于事件的并行程序性能监测库的实现技术,并给出了一个基于SVM系统的性能分析工具的性能监测库的实现方法。  相似文献   

5.
本文介绍了一个通用的pvm并行程序性能可视化软件工具VP~4。针对工作站机群的特点,它采用多层次性能数据采集方法和基于事件的采取策略,这样可以在尽量减少“侵入影响”的前提下,采集并汇总全部性能数据。VP~4对汇总的性能数据进行处理后,利用图形与动画生成各种易于使用的可视化性能视图。通过实验表明,本软件工具可以有效的帮助用户发现性能瓶颈,辅助用户开发高性能的并行程序。  相似文献   

6.
为解决传统方法设计的采集器传感器通道单一、运行功耗大、抗干扰能力差及采集效率低问题,设计了一种多传感器通道多点数据并行高速采集器;分析数据采集器工作原理,先对影响采集器高效运行的高频率噪音和高频载波两大干扰因素进行过滤处理;同步采集数据信号,对数据进行校验,通过数据采集处理,实现多传感器通道多点数据并行高速采集器的设计;实验结果表明,改进的数据采集器运行功耗小,抗干扰能力强,采集效率高。  相似文献   

7.
VP^4:基于机群的pvm并行程序性能可视化工具   总被引:1,自引:0,他引:1  
本文研究并实现了一个通用的 pvm并行程序性能可视化软件工具 VP^4.针对工作站机群的特点 ,它采用多层次性能数据采集方法和基于事件的采取策略 ,这样可以在尽量减少“侵入影响”的前提下 ,采集并汇总全部性能数据 .对汇总的性能数据进行处理后 ,VP^4利用图形与动画生成各种易于使用的可视化性能视图 .通过实验表明 ,本软件工具可以有效地帮助用户发现性能瓶颈 ,辅助用户开发高性能的并行程序 .  相似文献   

8.
为了便于用户快速、直观地了解到机群系统中并行应用程序的性能情况,将Linux计算机群与Windows控制显示平台相结合,提出了一种基于事件的异构平台并行程序性能可视化方法.该方法以MPI作为底层编程环境,在高层使用MPE技术,依据动态性能检测方式获取程序执行过程信息;设计C#语言及Jumpshot日志图形化分析集成工具实现并行程序性能可视化.实验结果表明,该方法可准确,直观地反映程序性能信息,有助于程序员简便、有效地对并行程序进行量化分析,对提高机群系统的可用性、改善程序性能及效率等方面具有较高的实用价值.  相似文献   

9.
本文针对工程实践的需要,设计了一种基于无线传输方式的多路信号并行采集系统RFDAQ-Ⅱ。RFDAQ-Ⅱ由一个数据接收单元和若干个并行数据采集单元组成,数据采集单元与接收单元之间通过无线的方式进行通讯,每个采集单元可进行最多四个通道的并行数据采集。本文介绍了RFDAQ-Ⅱ的系统原理和结构,并详细讲述了它的硬件结构、无线通讯协议及软件设计。  相似文献   

10.
无线多路并行数据采集系统的设计   总被引:1,自引:0,他引:1  
本文针对工程实践的需要,设计了一种基于无线传输方式的多路信号并行采集系统RFDAQ-Ⅱ.RFDAQ-Ⅱ由一个数据接收单元和若干个并行数据采集单元组成,数据采集单元与接收单元之间通过无线的方式进行通讯,每个采集单元可进行最多四个通道的并行数据采集.本文介绍了RFDAQ-Ⅱ的系统原理和结构,并详细讲述了它的硬件结构、无线通讯协议及软件设计.  相似文献   

11.
Tools for performance monitoring and analysis become indispensable parts of programming environments for parallel computers. As the number of processors increases, the conventional techniques for monitoring the performance of parallel programs will produce large amounts of data in the form of event trace files. On the other hand, this wealth of information is a problem for the programmer who is forced to navigate through it, and for the tools that must store and process it. What makes this situation worse is that most of the time, a large amount of the data are irrelevant to understanding the performance of an application. In this paper, we present a new approach for collecting performance data. By tracing all the events but storing only the statistics of the performance, our approach can provide accurate and useful performance information yet require far less data to be stored. In addition, this approach also supports real-time performance monitoring.  相似文献   

12.
并行程序的优化与性能评价   总被引:5,自引:0,他引:5       下载免费PDF全文
文中讨论了并行程序的优化问题,指出并行程序的优化应从数据划分、通信优化和串行优化三个方面着手。针对传统加速比的缺点和不足,我们提出了优化加速比模型来评价优化并行程序的性能;对NAS基准测试程序MG和FT进行了优化,用优化加速比模型分析了上述两个程序在IBM SP2上的性能。  相似文献   

13.
Parallel simulation of parallel programs for large datasets has been shown to offer significant reduction in the execution time of many discrete event models. The paper describes the design and implementation of MPI-SIM, a library for the execution driven parallel simulation of task and data parallel programs. MPI-SIM can be used to predict the performance of existing programs written using MPI for message passing, or written in UC, a data parallel language, compiled to use message passing. The simulation models can be executed sequentially or in parallel. Parallel execution of the models are synchronized using a set of asynchronous conservative protocols. The paper demonstrates how protocol performance is improved by the use of application-level, runtime analysis. The analysis targets the communication patterns of the application. We show the application-level analysis for message passing and data parallel languages. We present the validation and performance results for the simulator for a set of applications that include the NAS Parallel Benchmark suite. The application-level optimization described in the paper yielded significant performance improvements in the simulation of parallel programs, and in some cases completely eliminated the synchronizations in the parallel execution of the simulation model  相似文献   

14.
Using runtime information of load distributions and processor affinity, the authors propose an adaptive scheduling algorithm and its variations from different control mechanisms. The proposed algorithm applies different degrees of aggressiveness to adjust loop scheduling granularities, aiming at improving the execution performance of parallel loops by making scheduling decisions that match the real workload distributions at runtime. They experimentally compared the performance of the algorithm and its variations with several existing scheduling algorithms on two parallel machines: the KSR-1 and the Convex Exemplar. The kernel application programs used for performance evaluation were carefully selected for different classes of parallel loops. The results show that using runtime information to adaptively adjust scheduling granularity is an effective way to handle loops with a wide range of load distributions when no prior knowledge of the execution can be used. The overhead caused by collecting runtime information is insignificant in comparison with the performance improvement. The experiments show that the adaptive algorithm and its five variations outperformed the existing scheduling algorithms  相似文献   

15.
一个可预测并行程序效率的评价模型   总被引:2,自引:0,他引:2  
陈昌生  孙永强  何积丰 《软件学报》2000,11(11):1485-1491
并行程序的性能分析,特别是效率分析 往往需要程序在实际运行后,根据实验结果再对并行算法进行优化,或改变数据的分配策略, 甚至重新选择并行算法.结合通用并行计算模型BSP(bulk-synchronous parallel),提出一 种有效的并行程序效率评测模型,使得程序员在设计、分析阶段即可对程序效率进行分析和 评估,并据此进一步优化程序.实验结果表明,该模型的预测是精确的.  相似文献   

16.
The VMMP (virtual machine for multiprocessors) software package is presented. It provides a coherent set of services for parallel application programs running on diverse multiple input multiple data (MIMD) multiprocessors, including shared memory and message passing multiprocessors. The communication, synchronization, and data distribution requirements of parallel algorithms are analyzed. Related languages and tools are described. VMMP services are identified. VMMP implementation, coding and portability are discussed. Some measurements of the performance of VROMP application programs and VMMP overhead are given. Several hints for improving the performance of application programs are described  相似文献   

17.
该文引入speedup作为并行程序的性能评测指标,分析了并行程序在不同类型和不同数量的客户虚拟机中运行的性能差异,实验表明,MPI并行程序在xVM虚拟化环境中的运行性能接近非虚拟化本地主机的性能,在半虚拟化环境中的并行程序性能超过全虚拟化环境中的并行程序性能。  相似文献   

18.
基于嵌入式应用的指纹处理模块板设计与实现   总被引:1,自引:0,他引:1  
阐述了利用数字信号处理器(DSP)实现指纹检测和处理模块板的实现方祛,在处理板中,利用FPS200指纹检测芯片进行指纹检测,TMS320VC5402作为处理器完成指纹图像采集、提取指纹特征和特征匹配等工作,利用外扩的Flash存储全部程序和需要保存的指纹特征,并且模块板中扩展了外部RAM,作为数据存储器和程序存储器,系统中还提供了小键盘输入功能,该板通过扩展的串口可以与PC机进行通信,文中还说明了存储器映射方法、CPLD译码功能实现和Flash中的程序实现并行引导的方法等。  相似文献   

19.
Current and future processor generations are based on multicore architectures where the performance increase comes from an increasing number of cores on a chip. In order to utilize the performance potential of multicore architectures the programs also need to be parallel, but writing parallel programs is a non-trivial task. Transactional memory tries to ease parallel program development by providing atomic and isolated execution of code sequences, enabling software composability and protected access to shared data. In addition, transactional memory has the ability to execute atomic code sequences in parallel as long as no data conflicts occur. Transactional memory implementation proposals exist for both hardware and software, as well as hybrid solutions. This special issue on transactional memory introduces transactional memory as a concept, presents an overview of some of the most important approaches so far, and finally, includes five articles that advances the state-of-the-art in transactional memory research.  相似文献   

20.
讨论了在一个由高速局域网连接的高性能异构工作站平台上,如何有效地利用空闲工作站来求解计算密集型任务矩阵相乘的问题,为了获得较好的并行计算性能,文中给出了一个异构工作站群之间任务调度的模型和算法,算法中考虑了并行计算中协作任务间的通信时间、数据加栽时间、结果收集时间和各个异构工作站的任务计算时间,通过这个模型,可以在所有可利用的工作站集合中找出最适合的子集,获得最短的执行时间.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号