首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
曾丽芳  曾劲松等 《计算机工程》2002,28(10):102-104,188
为了研究基于软件DSM系统的OpenMP实现,该文以一类具有代表性的用户题为例,分别测试了其基于JIAJIA系统的两种实现力的加速比:一种是用JIAJIA提供的消息传递系统调用,实现了个类MPI版本(方式1):另一种是用多个处理机对共享数组的读写来替代消息传递系统调用(方式2),测试结果发现,对少量处理机系统,两种方式还具有可比性,但是,随着处理机数的增多,共享存储应用的性能急剧下降。通过对测试结果的分析及对用户题的进一步测试,发现方式2的时间主要花费在做一致性处理和缺页中断处理而导致的大量小消息通信上,测试表明,JIAJIA共享存储程序一般会比MPI程序导致更重的网络负载,要在JIAJIA共享存储基础之上建立一种实用的共享并行计算环境,尤其在支持OpenMP等共享编程语言方面,还有待进一步工作。  相似文献   

2.
并行计算技术是计算机技术发展的重要方向之一。当前并行程序模型主要有消息传递模型和共享存储模型两种。随着处理器多核技术的发展,在一枚多核处理器中集成两个或多个完整的计算引擎(内核),并充分利用多核计算机的特性,发挥多核计算机的性能成为一个很重要的研究方向。介绍一种新的MPI实现机制,这种机制集成了共享存储模型和消息通信模型的优点,在节点内使用共享存储模型,在节点间使用消息传递模型,并且通过自动生成线程级的任务来获得更好的性能。.  相似文献   

3.
基于SMP集群系统的并行编程模式研究与分析   总被引:4,自引:1,他引:4  
并行计算技术是计算机技术发展的重要方向之一,SMP与集群是当前主流的并行体系结构。当前并行程序设计方法主要采用基于消息传递模型的MPI和基于共享存储模型的OpenMP,两种编程模式各有特点和适用范围。对SMP集群以及MPI和OpenMP的特点进行了分析,介绍了在SMP集群系统中利用MPI和OpenMP混合编程的可行性方法。  相似文献   

4.
为了利用细观力学方法研究复合固体推进剂材料的力学性能,需要建立具有代表性的推进剂细观胞元模型,针对当前算法普遍存在的计算效率低下问题,依据分子动力学思想生成颗粒堆积模型的性能特性,通过分析负载均衡和消息通信,提出了并行模型的三个准则,设计了区域分解的并行策略,并利用共享存储并行和分布式存储并行两级并行手段实现了并行算法。最后在IBMBladeCenter集群平台上通过实例证明算法可以缓解负载均衡并缩减通信开销,上述试验数据验证了算法的高效性,达到了提高胞元生成效率的目的。  相似文献   

5.
存储模型仿真器的设计与实现   总被引:2,自引:1,他引:1  
存储一致性问题和高速缓存一致性问题是共享存储并行计算机中两个最关键的问题,通过仿真器对它们进行了量化研究,设计并实现了一个存储模型仿真器MMS.基于MMS仿真了不同并行机结构模型下多种存储一致性模型的行为;针对不同类型的计算问题比较了不同的存储一致性模型,并对实验结果进行了分析;实现了几个不同的高速缓存一致性协议,并比较了它们的性能.  相似文献   

6.
SAN异构存储共享系统实现技术研究   总被引:5,自引:0,他引:5  
以SAN异构存储共享系统的实现技术为核心,提出了一个基于SAN的用户级应用型存储共享系统模型,无需改变开放系统内核,可利用开放系统的系统调用来优化磁盘I/O性能,易于实现、维护和移植。详细描述了系统的设计思想和实现方法,并重点对动态库模块进行了研究。在与日本某大公司的合作项目中已实现了该系统,性能评测表明其具有接近于本地存储系统的性能。  相似文献   

7.
宋伟  宋玉 《微机发展》2007,17(2):164-167
并行计算技术是计算机技术发展的重要方向之一,SMP与集群是当前主流的并行体系结构。当前并行程序设计方法主要采用基于消息传递模型的MPI和基于共享存储模型的OpenMP,两种编程模式各有特点和适用范围。对SMP集群以及MPI和OpenMP的特点进行了分析,介绍了在SMP集群系统中利用MPI和OpenMP混合编程的可行性方法。  相似文献   

8.
并行计算技术是计算机技术发展的重要方向之一.当前并行程序模型主要有消息传递模型和共享存储模型两种.随着处理器多核技术的发展,在一枚多核处理器中集成两个或多个完整的计算引擎(内核),并充分利用多核计算机的特性,发挥多核计算机的性能成为一个很重要的研究方向.介绍一种新的MPI实现机制,这种机制集成了共享存储模型和消息通信模型的优点,在节点内使用共享存储模型,在节点间使用消息传递模型,并且通过自动生成线程级的任务来获得更好的性能.  相似文献   

9.
多核处理器的新特性使多核机群的存储层次更加复杂,同时也给MPI程序带来了新的优化空间.国内外学者提出了许多多核机群下MPI程序的优化方法和技术.测试了3个不同多核机群的通信性能,并分别在Intel与AMD多核机群下实验评估了几种具有普遍意义的优化技术:混合MPI/Op)MP、优化MPI运行时参数以及优化MPI进程摆放,同时对实验结果和优化性能进行了分析.  相似文献   

10.
伴随着可重构器件的快速发展与使用,共享存储可重构计算机软硬件通信逐渐成为国际计算机领域的新热点。因为可重构器件不仅具有硬件电路计算效率高的良好性能,同时具有可多次编程、方便修正的灵活性,硬件任务和软件任务等同的概念出现在系统设计中,共享存储可以灵活地完成一些稠密进行计算任务,打破了传统的软硬件协同设计领域,使其发生了巨大的变革。传统的硬件任务不支持程序上下文的切换、没有虚拟内存机制、无法唤醒系统服务等特点。共享存储通过虚拟地址映射机制,在硬件中实现了进行虚拟地址映射行为的模块,使硬件任务同软件任务一样具有虚拟地址,并利用这种机制实现了软硬件任务之间的通信。  相似文献   

11.
This paper presents ibvdev a scalable and efficient low-level Java message-passing communication device over InfiniBand. The continuous increase in the number of cores per processor underscores the need for efficient communication support for parallel solutions. Moreover, current system deployments are aggregating a significant number of cores through advanced network technologies, such as InfiniBand, increasing the complexity of communication protocols, especially when dealing with hybrid shared/distributed memory architectures such as clusters. Here, Java represents an attractive choice for the development of communication middleware for these systems, as it provides built-in networking and multithreading support. As the gap between Java and compiled languages performance has been narrowing for the last years, Java is an emerging option for High Performance Computing (HPC). The developed communication middleware ibvdev increases Java applications performance on clusters of multicore processors interconnected via InfiniBand through: (1) providing Java with direct access to InfiniBand using InfiniBand Verbs API, somewhat restricted so far to MPI libraries; (2) implementing an efficient and scalable communication protocol which obtains start-up latencies and bandwidths similar to MPI performance results; and (3) allowing its integration in any Java parallel and distributed application. In fact, it has been successfully integrated in the Java messaging library MPJ Express. The experimental evaluation of this middleware on an InfiniBand cluster of multicore processors has shown significant point-to-point performance benefits, up to 85% start-up latency reduction and twice the bandwidth compared to previous Java middleware on InfiniBand. Additionally, the impact of ibvdev on message-passing collective operations is significant, achieving up to one order of magnitude performance increases compared to previous Java solutions, especially when combined with multithreading. Finally, the efficiency of this middleware, which is even competitive with MPI in terms of performance, increments the scalability of communications intensive Java HPC applications.  相似文献   

12.
MPJ Express is a messaging system that allows application developers to parallelize their compute-intensive sequential Java codes on High Performance Computing clusters and multicore processors. In this paper, we extend MPJ Express software to provide two new communication devices. The first device—called hybrid—enables MPJ Express to exploit hybrid parallelism on cluster of multicore processors by sitting on top of existing shared memory and network communication devices. The second device—called native—uses JNI wrappers in interfacing MPJ Express to native MPI implementations like MPICH and Open MPI. We evaluate performance of these devices on a range of interconnects including 1G/10G Ethernet, 10G Myrinet and 40G InfiniBand. In addition, we analyze and evaluate the cost of MPJ Express buffering layer and compare it with the performance numbers of other Java MPI libraries. Our performance evaluation reveals that the native device allows MPJ Express to achieve comparable performance to native MPI libraries—for latency and bandwidth of point-to-point and collective communications—which is a significant gain in performance compared to existing communication devices. The hybrid communication device—without any modifications at application level—also helps parallel applications achieve better speedups and scalability by exploiting multicore architecture. Our performance evaluation quantifies the cost incurred by buffering and its impact on overall performance of software. We witnessed comparative performance as both new devices improve application performance and achieve upto 90 % of the theoretical bandwidth available without application rewriting effort—including NAS Parallel Benchmarks, point-to-point and collective communication.  相似文献   

13.
Hybrid parallel programming with the message passing interface (MPI) for internode communication in conjunction with a shared-memory programming model to manage intranode parallelism has become a dominant approach to scalable parallel programming. While this model provides a great deal of flexibility and performance potential, it saddles programmers with the complexity of utilizing two parallel programming systems in the same application. We introduce an MPI-integrated shared-memory programming model that is incorporated into MPI through a small extension to the one-sided communication interface. We discuss the integration of this interface with the MPI 3.0 one-sided semantics and describe solutions for providing portable and efficient data sharing, atomic operations, and memory consistency. We describe an implementation of the new interface in the MPICH2 and Open MPI implementations and demonstrate an average performance improvement of 40 % to the communication component of a five-point stencil solver.  相似文献   

14.
Memory hierarchy on multi-core clusters has twofold characteristics: vertical memory hierarchy and horizontal memory hierarchy. This paper proposes new parallel computation model to unitedly abstract memory hierarchy on multi-core clusters in vertical and horizontal levels. Experimental results show that new model can predict communication costs for message passing on multi-core clusters more accurately than previous models, only incorporated vertical memory hierarchy. The new model provides the theoretical underpinning for the optimal design of MPI collective operations. Aimed at horizontal memory hierarchy, our methodology for optimizing collective operations on multi-core clusters focuses on hierarchical virtual topology and cache-aware intra-node communication, incorporated into existing collective algorithms in MPICH2. As a case study, multi-core aware broadcast algorithm has been implemented and evaluated. The results of performance evaluation show that the above methodology for optimizing collective operations on multi-core clusters is efficient.  相似文献   

15.
王洁  衷璐洁  曾宇 《计算机科学》2011,38(10):281-284
多核处理器的新特性使多核机群的存储层次更加复杂,同时也给MPI程序带来了新的优化空间。国内外学 者提出了许多多核机群下MPI程序的优化方法和技术。测试了3个不同多核机群的通信性能,并分别在Intel与 AMD多核机群下实验评估了几种具有普遍意义的优化技术:混合MPI/OpcnMP、优化MPI运行时参数以及优化 MPI进程摆放,同时对实验结果和优化性能进行了分析。  相似文献   

16.
针对水声传播模型的计算量大,难以满足实时化、精细化水下声传播信息保障需求的难题,基于MPI+OpenMP混合并行编程方法,开展了WKBZ简正波模型混合并行计算方法研究,实现了水下声场2级混合并行计算。该方法通过节点间消息传递、节点内内存共享的方式,有效克服了MPI并行编程模型通信开销大和OpenMP并行编程环境可扩展性差的缺点,较好地解决了水下声传播快速计算的问题。测试结果表明,该方法能够较好地利用SMP集群节点间和节点内多级并行机制,充分发挥消息传递编程模型和共享内存编程模型各自的优势,大幅降低MPI进程间通信带来的时间开销,有效提升程序的可扩展性和并行效率。  相似文献   

17.
The message passing interface (MPI) has become a de facto standard for programming models of highperformance computing, but its rich and flexible interface semantics makes the program easy to generate communication deadlock, which seriously affects the usability of the system. However, the existing detection tools for MPI communication deadlock are not scalable enough to adapt to the continuous expansion of system scale. In this context, we propose a framework for MPI runtime communication deadlock detection, namely MPI-RCDD, which contains three kinds of main mechanisms. Firstly, MPI-RCDD has a message logging protocol that is associated with deadlock detection to ensure that the communication messages required for deadlock analysis are not lost. Secondly, it uses the asynchronous processing thread provided by the MPI to implement the transfer of dependencies between processes, so that multiple processes can participate in deadlock detection simultaneously, thus alleviating the performance bottleneck problem of centralized analysis. In addition, it uses an AND⊕OR model based algorithm named AODA to perform deadlock analysis work. The AODA algorithm combines the advantages of both timeout-based and dependency-based deadlock analysis approaches, and allows the processes in the timeout state to search for a deadlock circle or knot in the process of dependency transfer. Further, the AODA algorithm cannot lead to false positives and can represent the source of the deadlock accurately. The experimental results on typical MPI communication deadlock benchmarks such as Umpire Test Suit demonstrate the capability of MPIRCDD. Additionally, the experiments on the NPB benchmarks obtain the satisfying performance cost, which show that the MPI-RCDD has strong scalability.  相似文献   

18.
19.
This paper gives an overview of two related tools that we have developed to provide more accurate measurement and modelling of the performance of message-passing communication and application programs on distributed memory parallel computers. MPIBench uses a very precise, globally synchronised clock to measure the performance of MPI communication routines. It can generate probability distributions of communication times, not just the average values produced by other MPI benchmarks. This allows useful insights to be made into the MPI communication performance of parallel computers, and in particular how performance is affected by network contention. The Performance Evaluating Virtual Parallel Machine (PEVPM) provides a simple, fast and accurate technique for modelling and predicting the performance of message-passing parallel programs. It uses a virtual parallel machine to simulate the execution of the parallel program. The effects of network contention can be accurately modelled by sampling from the probability distributions generated by MPIBench. These tools are particularly useful on clusters with commodity Ethernet networks, where relatively high latencies, network congestion and TCP problems can significantly affect communication performance, which is difficult to model accurately using other tools. Experiments with example parallel programs demonstrate that PEVPM gives accurate performance predictions on commodity clusters. We also show that modelling communication performance using average times rather than sampling from probability distributions can give misleading results, particularly for programs running on a large number of processors.  相似文献   

20.
使用多核处理器已成为构建高性能计算机系统的主流方式。结合多核高性能计算机系统集共享内存结构和分布式内存结构于一体的体系结构特点,对AREM模式开展MPI/OpenMP混合并行计算研究与实现。性能测试结果表明,使用MPI/OpenMP混合并行计算可以将并行应用扩展至更大处理机规模,缩短计算时间,不对原程序结构做大的改动、以增量方式和较小的并行化代价,取得比较好的并行计算效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号