首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到4条相似文献,搜索用时 0 毫秒
1.
The performance of the Global Array shared-memory nonuniform memory-access programming model is explored in a wide-area-network (WAN) distributed supercomputer environment. The Global Array model is extended by introducing a concept of mirrored arrays that thanks to the caching and user-controlled consistency of the shared data structure scan reduce the application sensitivity to the network latency. Latencies and bandwidths for remote memory access are studied, and the performance of a large application from computational chemistry is evaluated using both fully distributed and also mirrored arrays. Excellent performance can be obtained with mirroring if even modest (0.5 MB/s) network bandwidth is available.  相似文献   

2.
WAPM:适合广域分布式计算的并行编程模型   总被引:1,自引:0,他引:1  
早期的MPI与OpenMP等编程模型由于扩展性限制或并行粒度的差异而不适合于大规模的广域动态Internet环境.提出了一个用于广域网络范围内的并行编程模型(WAPM),为应用的分布式计算的编程提供了一个新的可行解决方案.WAPM由通信库、通信协议和应用编程接口组成,并且具有通用编程、自适应并行、容错性等特点,通过选择合适的编程语言,就可形成一个广域范围内的并行程序设计环境.以分布式计算平台P2HP为工作平台,描述了WAPM分布式计算的实施过程.实验结果表明,WAPM是一个通用的、可行的、性能较好的编程模型.  相似文献   

3.
Programming for large‐scale, multicore‐based architectures requires adequate tools that offer ease of programming and do not hinder application performance. StarSs is a family of parallel programming models based on automatic function‐level parallelism that targets productivity. StarSs deploys a data‐flow model: it analyzes dependencies between tasks and manages their execution, exploiting their concurrency as much as possible. This paper introduces Cluster Superscalar (ClusterSs), a new StarSs member designed to execute on clusters of SMPs (Symmetric Multiprocessors). ClusterSs tasks are asynchronously created and assigned to the available resources with the support of the IBM APGAS runtime, which provides an efficient and portable communication layer based on one‐sided communication. We present the design of ClusterSs on top of APGAS, as well as the programming model and execution runtime for Java applications. Finally, we evaluate the productivity of ClusterSs, both in terms of programmability and performance and compare it to that of the IBM X10 language. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

4.
The fast multipole method (FMM) is a complex, multi‐stage algorithm over a distributed tree data structure, with multiple levels of parallelism and inherent data locality. X10 is a modern partitioned global address space language with support for asynchronous activities. The parallel tasks comprising FMM may be expressed in X10 by using a scalable pattern of activities. This paper demonstrates the use of X10 to implement FMM for simulation of electrostatic interactions between ions in a cyclotron resonance mass spectrometer. X10's task‐parallel model is used to express parallelism by using a pattern of activities mapping directly onto the tree. X10's work stealing runtime handles load balancing fine‐grained parallel activities, avoiding the need for explicit work sharing. The use of global references and active messages to create and synchronize parallel activities over a distributed tree structure is also demonstrated. In contrast to previous simulations of ion trajectories in cyclotron resonance mass spectrometers, our code enables both simulation of realistic particle numbers and guaranteed error bounds. Single‐node performance is comparable with the fastest published FMM implementations, and critical expansion operators are faster for high accuracy calculations. A comparison of parallel and sequential codes shows the overhead of activity management and work stealing in this application is low. Scalability is evaluated for 8k cores on a Blue Gene/Q system and 512 cores on a Nehalem/InfiniBand cluster. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号