共查询到20条相似文献,搜索用时 0 毫秒
1.
崔焕庆 《计算机工程与设计》2007,28(17):4079-4081,4088
并行程序开发大多遵循"开发-执行-验证和分析"的流程,开发周期较长,效率低下,而正确性和高性能是使用并行程序的首要条件.为此,提出了一种贯彻算法设计、程序开发到结果分析全过程的、可以同时进行正确性验证和性能分析的开发流程,给出了较完善的计算机辅助开发工具设计的原则和方法,并开发了消息传递并行程序设计的辅助工具原型.实验证明,该流程和方法提高了并行程序开发效率,简化了程序员的工作. 相似文献
2.
尚月强 《计算机工程与设计》2007,28(13):3100-3102,3129
网络并行计算是并行计算与分布式计算技术非常重要的发展方向之一,结合具体的数值试验,探讨了Windows操作系统下基于PVM的网络并行数值计算中影响PVM并行程序性能的几个重要因素,包括负载平衡、通信开销、网络性能、任务粒度、处理机个数、精度要求及处理机内存容量问题等,并提出了提高PVM并行程序性能的相应策略,以高效快速地实现问题的求解. 相似文献
3.
The performance of a conventional parallel application is often degraded by load‐imbalance on heterogeneous clusters. Although it is simple to invoke multiple processes on fast processing elements to alleviate load‐imbalance, the optimal process allocation is not obvious. Kishimoto and Ichikawa presented performance models for high‐performance Linpack (HPL), with which the sub‐optimal configurations of heterogeneous clusters were actually estimated. Their results on HPL are encouraging, whereas their approach is not yet verified with other applications. This study presents some enhancements of Kishimoto's scheme, which are evaluated with four typical scientific applications: computational fluid dynamics (CFD), finite‐element method (FEM), HPL (linear algebraic system), and fast Fourier transform (FFT). According to our experiments, our new models (NP‐T models) are superior to Kishimoto's models, particularly when the non‐negative least squares method is used for parameter extraction. The average errors of the derived models were 0.2% for the CFD benchmark, 2% for the FEM benchmark, 1% for HPL, and 28% for the FFT benchmark. This study also emphasizes the importance of predictability in clusters, listing practical examples derived from our study. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献
4.
The GAMMA paradigm is recently proposed by Banatre and Metayer to describe the systematic construction of parallel programs without introducing artificial sequentiality.This paper presents two synchronous execution models for GAMMA and discusses how to implement them on MasPar MP-1,a massively data parallel computer.The results show that GAMMA paradign can be implemented very naturally on data parallel machines,and very high level language,such as GAMMA in which parallelism is left implicit,is suitable for specifying massively parallel applications. 相似文献
5.
航天领域的大规模科学与工程问题的数值模拟既依赖于高性能并行计算的支撑,同时也是高性能并行计算发展的动力。综述了航天领域高性能并行计算的研究进展,对高性能并行计算环境进行简单介绍,对相关研究领域包括气动力、气动热、化学非平衡、结构强度、热防护、蒙特卡罗方法和湍流研究等进行分类和详细阐述;总结了航天领域高性能并行计算存在科学计算高并行效率和工程计算低实用价值、并行应用的多样性和缺少科学的并行方法的矛盾,并指出了进一步研究方向。 相似文献
6.
Parallel execution of a programR (intuitively regarded as a partial order) is usually modeled by sequentially executing one of the total orders (interleavings)
into which it can be embedded. Our work deviates from this serialization principle by usingtrue concurrency to model parallel execution. True concurrency is represented via completions ofR tosemi total orders, called time diagrams. These orders are characterized via a set of conditions (denoted byCt), yielding orders or time diagrams which preserve some degree of the intended parallelism inR. Another way to express semi total orders is to use re-writing or derivation rules (denoted byCx) which for any programR generates a set of semi-total orders. This paper includes a classification of parallel execution into three classes according
to three different types ofCt conditions. For each class a suitableCx is found and a proof of equivalence between the set of all time diagrams satisfyingCt and the set of all terminalCx derivations ofR is devised. This equivalence between time diagram conditions and derivation rules is used to define a novel notion of correctness
for parallel programs. This notion is demonstrated by showing that a specific asynchronous program enforces synchronous execution,
which always halts, showing that true concurrency can be useful in the context of parallel program verification. 相似文献
7.
Slicing of concurrent programs is a compute‐intensive task. To speed up the slicing process, we have developed a parallel algorithm. For this purpose we used the concurrent control flow graph (CCFG) as the intermediate representation. We used a network of communicating processes to develop our parallel algorithm. We have implemented our parallel algorithm and the experimental results appear promising. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献
8.
Hardware monitoring through performance counters is available on almost all modern processors. Although these counters are originally designed for performance tuning, they have also been used for evaluating power consumption. We propose two approaches for modelling and understanding the behaviour of high performance computing (HPC) systems relying on hardware monitoring counters. We evaluate the effectiveness of our system modelling approach considering both optimizing the energy usage of HPC systems and predicting HPC applications’ energy consumption as target objectives. Although hardware monitoring counters are used for modelling the system, other methods–including partial phase recognition and cross platform energy prediction–are used for energy optimization and prediction. Experimental results for energy prediction demonstrate that we can accurately predict the peak energy consumption of an application on a target platform; whereas, results for energy optimization indicate that with no a priori knowledge of workloads sharing the platform we can save up to 24% of the overall HPC system’s energy consumption under benchmarks and real-life workloads. 相似文献
9.
10.
可扩展性是设计并行计算系统和并行算法所要考虑的一个重要性能指标。分析了等效率、等速度、平均延迟和等并行计算开销比几种并行系统可扩展性模型的特征,提出了一种新的更有效的可扩展性度量标准。通过实验结果分析,该模型能很好地评测并行计算系统的可扩展性。 相似文献
11.
Wenhao ZHOU Juan CHEN Chen CUI Qian WANG Dezun DONG Yuhua TANG 《Frontiers of Computer Science》2016,10(5):797-811
Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation platform is very important for the research on HPC software and hardware technologies. To effectively evaluate the performance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation platform, called HPC-NetSim. HPC-NetSim uses applicationdriven workloads and inherits the characteristics of the detailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router’s on/off states.We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses. 相似文献
12.
《International Journal of Parallel, Emergent and Distributed Systems》2012,27(4):293-301
In this paper, we present a new, easy to implement algorithm for detecting the termination of a parallel asynchronous computation on distributed-memory MIMD computers. We demonstrate that it operates concurrently with the main computation, adding minimal overhead, and we prove that it correctly detects termination when it occurs. Experimental results confirm that the termination detection routine imposes an overhead smaller than the experimental uncertainty. 相似文献
13.
Diwakar Krishnamurthy Mehrnoush Alemzadeh Mahmood Moussavi 《Concurrency and Computation》2011,23(15):1723-1748
High performance computing (HPC) systems allow researchers and businesses to harness large amounts of computing power needed for solving complex problems. In such systems a job scheduler prioritizes the execution of jobs belonging to users of the system in a manner that allows the system to satisfy performance objectives for various groups of users while simultaneously making efficient use of available resources. Typically, system administrators have the responsibility of manually configuring or tuning the job scheduler such that the performance objectives of user groups as well as system‐level performance objectives are met. Modern job schedulers used in production systems are quite complex. Through detailed trace‐driven simulations, we show that manually tuning the configuration of production schedulers in an environment characterized by multiple performance objectives is very challenging and may not be feasible. To alleviate this problem, this paper describes a toolset that can help a system administrator to automatically configure a scheduler such that the performance objectives for various classes of users in the system as well as other system‐level performance objectives can be satisfied. A unique aspect of this work that differentiates it from the existing work on scheduler tuning is that it has been implemented to work with a widely used production scheduler. Furthermore, in contrast to the existing work it considers the challenging real‐world problem of delivering different levels of performance to different classes of users. System administrators can exploit the toolset to react quickly to changes in performance objectives and workload conditions. Case studies using synthetic and real HPC workloads demonstrate the effectiveness of the technique. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献
14.
15.
随着计算能力的增强、应用课题规模和复杂度的增加,高性能计算机对并行文件系统性能要求越来越高.在海量小文件和大规模并发I/O操作的应用场景中,文件系统元数据的吞吐率成为限制其性能的关键因素.设计并实现了元数据代理(meta data delegation service,MDDS),通过降低元数据服务间的耦合度,保证元数据集群的高可用性;使用目录子树方式管理元数据代理空间,避免跨节点目录引入的分布式原子操作的复杂性和低效性.并针对高性能计算中I/O转发架构,提出基于元数据代理的两种作业调度策略——单作业独占单元数据代理调度和多作业共享多元数据代理调度——实现作业间和作业内的负载均衡.在116台存储服务器上对MDDS进行评估,实验结果表明,元数据代理提供了拟线性的元数据性能,在大规模的环境中较Lustre CMD方案有较好的扩展性;两种调度方式有效分散了作业元数据的负载,改善了高性能计算中的元数据瓶颈问题. 相似文献
16.
文中首先介绍了中国气象局武汉暴雨研究所高性能计算机应用现状和目前的模式业务系统,针对气象预报模式精细化对计算能力的更高需求,中国气象局武汉暴雨研究所采用曙光高性能计算机集群对原有集群系统进行升级,升级后的计算节点CPU可提供11.40 TFlops的双精浮点计算能力;其次,讨论了升级后的高性能计算机几个关键技术的现状并对未来进行展望;最后以WRF模式为例,对升级后的高性能计算机的性能进行了分析,得到了较好的加速比。结果表明:新升级的集群系统将大大节省区域高分辨数值预报模式运算时间,有助于提高科研成果的转化效率。 相似文献
17.
18.
The main issues when supporting fault tolerance based on checkpointing and rollback recovery for High‐Performance applications are related to the scalability of the introduced support, the possibility of analyzing the induced overhead and, in more general terms, the optimization of the trade‐off between failure‐free and recovery performances. In this paper we describe our contribution in fault tolerance for high‐level structured parallelism models. We take a different viewpoint w.r.t. existing contributions, by introducing a methodology to derive interesting properties to support fault tolerance. We show how to apply this methodology to a general data parallel model, deriving useful properties to introduce a class of checkpointing protocols. Thanks to this methodology, this class of protocols is not affected by the described issues. We exemplify two checkpointing protocols and the related rollback recovery techniques. For each protocol we also derive cost models statically describing the failure‐free performance, which can be used for performance tuning or to target some Quality of Service parameter. To assess the innovation of the results we analytically and experimentally compare the introduced protocols with two literature protocols. Results show that while the protocols introduced in this paper permit the definition of cost models and have a good scalability, the literature protocols do not always have these properties. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献
19.
20.
分析了失效检测算法的性能指标及影响其性能的因素,指出了传统评价方法的不足和局限性,提出了一种基于代价的失效检测性能评价方法。 相似文献