共查询到20条相似文献,搜索用时 0 毫秒
1.
基于分布/共享内存层次结构的并行程序设计 总被引:1,自引:0,他引:1
分布内存结构和共享内存结构各具特点,又有很强的互补性,分布/共享内存层次结构将两种结构相结合,以充分发挥其优势。文中主要讨论基于分布/共享内存层次结构的并行程序设计问题,介绍了MPI和OpenMP混合并行程序设计模式。 相似文献
2.
数据竞争是共享存储程序中的一类难于调试的错误 .在支持域存储一致性模型的软件 DSM系统 JIAJIA上 ,通过采用汇编代码装配技术来获得程序所读写的共享变量集合的方法 ,实现了基于锁集合的动态数据竞争检测算法 .利用本文方法 ,在 TSP和 Barnes程序中找到了数据竞争情况 ,并根据找到的数据竞争 ,修正了 Barnes中的错误 .实际使用经验表明 ,本文方法易于用户使用 ,达到了实用水平 相似文献
3.
软件DSM(distributed shared memory)系统在机群上构造了共享存储编程环境,结合了共享存储的易编程性和机群的可扩展性,引起了广泛的研究.由于软件DSM系统是一个分布式系统,系统失败风险大,需要实现容错技术以促进其实用化.利用用户级检查点技术,在支持域存储一致模型的软件DSM系统JIAJIA的基础上,设计并实现了一个可恢复的高可移植的软件DSM系统JIACKPT(JIAjia with ChecKPoinTing).由于采用适合软件DSM系统的强全局一致状态以及多种优化措施,JIACKPT易于实现且获得很好的性能.在一个8节点的PC机群上的应用测试表明,即使每分钟做一次检查点,大部分应用的检查点开销也小于10%.此外,JIACKPT还具有高可移植性.这些都表明JIACKPT已经成为一个比较实用的系统. 相似文献
4.
N. P. Manoj K. V. Manjunath R. Govindarajan 《International journal of parallel programming》2004,32(2):77-122
Traditional software Distributed Shared Memory (DSM) systems rely on the virtual memory management mechanisms to detect accesses to shared memory locations and maintain their consistency. The resulting involvement of the OS (kernel) and the associated overhead which is significant, can be avoided by careful compile time analysis and code instrumentation. In this paper, we propose such a Compiler Assisted Software support approach (CAS-DSM). In the CAS-DSM implementation, the involvement of the OS kernel is avoided by instrumenting the application code at the source level. The overhead caused by the execution of the instrumented code is reduced through several aggressive compile time optimizations. Finally, we also address the issue of reducing certain overheads in polling-based implementation of receiving asynchronous messages. We used SUIF, a public domain compiler tool, to implement compile time analysis, instrumentation and optimizations. We modified CVM, a publicly available software DSM to support the instrumentation inserted by the compiler. Detailed performance evaluation of CAS-DSM is reported using a set of Splash/Splash2 parallel application benchmarks on a distributed memory IBM SP-2 machine. CAS-DSM achieved moderate to good performance improvements for most of the applications compared to the original CVM implementation. Reducing the overheads in polling-based implementation improves the performance of CAS-DSM significantly resulting in an overall improvement of 12–52% over the original CVM implementation. 相似文献
5.
软件DSM系统的并行调试环境已经成为制约其广泛应用的一个重要因素,重放方法使得用户能用循环调试技术来调试具有执行不确定性的软件DSM程序,本文定义了软件DSM程序执行的happen-before-1关系,并依据其提出一种在软件DMS系统JIAJIA上实现重放的方法,实际应用测试表明,该方法产生很小的空间和时间开销。 相似文献
6.
袁道华 《计算机研究与发展》1994,31(9):56-60
本文概述了分布式并行系统和分布式共享存储器的一般概念,讨论了使用共享对象和可靠广播的并行程序设计模型,最后给出了我们的改进模型。 相似文献
7.
Y. Charlie Hu Honghui Lu Alan L. Cox Willy Zwaenepoel 《Journal of Parallel and Distributed Computing》2000,60(12):160
In this paper, we present the first system that implements OpenMP on a network of shared-memory multiprocessors. This system enables the programmer to rely on a single, standard, shared-memory API for parallelization within a multiprocessor and between multiprocessors. It is implemented via a translator that converts OpenMP directives to appropriate calls to a modified version of the TreadMarks software distributed shared-memory (SDSM) system. In contrast to previous SDSM systems for SMPs, the modified TreadMarks system uses POSIX threads for parallelism within an SMP node. This approach greatly simplifies the changes required to the SDSM in order to exploit the intranode hardware shared memory. We present performance results for seven applications (Barnes-Hut, CLU, and Water from SPLASH-2, 3D-FFT from NAS, Red-Black SOR, TSP, and MGS) running on an SP2 with four four-processor SMP nodes. A comparison between the thread implementation and the original implementation of TreadMarks shows that using the hardware shared memory within an SMP node significantly reduces the amount of data and the number of messages transmitted between nodes and consequently achieves speedups that are up to 30% better than the original versions. We also compare SDSM against message passing. Overall, the speedups of multithreaded TreadMarks programs are within 7–30% of the MPI versions. 相似文献
8.
针对非规则应用的OpenMP制导扩展 总被引:1,自引:0,他引:1
许多非规则应用的棱心是稀疏矩阵运算.稀疏矩阵运算的特点是对一个数组元素的引用依赖于另两个数组的元素值,因此具有非规则访存特点.本文针对稀疏矩阵运算特点,提出一种新的OpenMP制导子句indirect,并在机群OpenMP系统OpenMP/JIAJIA上进行了实现.采用一个实的OpenMP应用Equake进行了测试,测试结果表明该制导扩展很有效,对于直接使用该制导子句的函数代码,其性能改进了18%,而整个应用的性能改进了15%. 相似文献
9.
Distributed shared memory (DSM) systems provide a simple programming paradigm for networks of workstations, which are gaining popularity due to their cost-effective high computing power. However, DSM systems usually exhibit poor performance due to the large communication delay between the nodes; and a lot of different memory consistency models have been proposed to mask the network delay. In this paper, we propose an asynchronous protocol for the release consistent memory model, which we call an Asynchronous Release Consistency (ARC) protocol. Unlike other protocols where the communication adheres to the synchronous request/receive paradigm, the ARC protocol is asynchronous, such that the necessary pages are broadcast before they are requested. Hence, the network delay can be reduced by proper prefetching of necessary pages. We have also compared the performance of the ARC protocol with the lazy release protocol by running standard benchmark programs; and the experimental results showed that the ARC protocol achieves a performance improvement of up to 29%. 相似文献
10.
适合机群OpenMP系统的制导扩展 总被引:1,自引:0,他引:1
OpenMP以其易用性和支持增量并行的特点成为共享存储体系结构的编程标准.机群OpenMP系统在机群上实现了OpenMP计算环境,它将OpenMP的易编程性和机群的可扩展性结合起来,是很有意义的.OpenMP的编程方式主要有循环级和SPMD两种,其中循环级方式易于编程而SPMD方式难于编程.然而在机群OpenMP系统中获得高性能OpenMP程序,必需采用SPMD方式.该文描述了适合机群OpenMP系统的一个简单的OpenMP制导扩展子集(包括数据分布制导、循环调度模式),并在机群OpenMP系统OpenMP/JIAJIA上进行了实现.应用测试表明,利用这些制导扩展进行编程,既保持循环级方式的易编程性又获得与SPMD方式相当的性能,是有效的编程方式. 相似文献
11.
DSM系统中内存一致性模型的研究 总被引:1,自引:0,他引:1
文章首先回顾了分布式共享存储器(DSM)系统中主要的内存一致性协议,重点分析了释放一致性(RC)模型。在此模型的基础上对其进行了改进,提出了基于动态减少无效副本集的RC模型。 相似文献
12.
分布式共享存储系统在分布式存储器的基础上构造逻辑上的共享存储模型。提出了在操作系统层实现分布式共享存储的系统框架,并以Linux操作系统为平台介绍了其实现。该系统提供简单的调用接口,并与Linux内存管理框架紧密结合。通过采用合适的DSM一致性协议提高了整体性能。 相似文献
13.
14.
This paper presents an efficient, writer-based logging scheme for recoverable distributed shared memory systems, in which logging of a data item is performed by its writer process, instead of every process that accesses the item logging it. Since the writer process maintains the log of data items, volatile storage can be used for logging. Only the readers' access information needs to be logged into the stable storage of the writer process to tolerate multiple failures. Moreover, to reduce the frequency of stable logging, only the data items accessed by multiple processes are logged with their access information when the items are invalidated, and also semantic-based optimization in logging is considered. Compared with the earlier schemes in which stable logging was performed whenever a new data item was accessed or written by a process, the size of the log and the logging frequency can be significantly reduced in the proposed scheme. 相似文献
15.
A common approach to fault-tolerant software DSM is to take checkpoints with message logging. Our remote logging has low overhead
because each node saves the coherence-related data into the memory of a remote node through a high-speed system area network.
For more lightweight fault-tolerant DSM, in this paper, we mainly focused on eliminating shared memory checkpointing during
failure-free execution. Each node independently takes the checkpoints of execution states and non-shared data only. When a
node fails, it regenerates its pages from the remote copies in live nodes. In order to efficiently reconstruct pages, we also
introduced a XOR-diffing technique. The diff logs, which have been created by XOR operations during failure-free execution,
can be applicable to any version of remote copies either backward or forward for recovery. Our scheme reduces the checkpointing
overhead and also alleviates the imbalance in execution times among nodes due to independent checkpointing.
This research is supported by KISTEP under the National Research Laboratory program. 相似文献
16.
When using a shared memory multiprocessor, the programmer faces the issue of selecting the portable programming model which will provide the best performance. Even if they restricts their choice to the standard programming environments (MPI and OpenMP), they have to select a programming approach among MPI and the variety of OpenMP programming styles. To help the programmer in their decision, we compare MPI with three OpenMP programming styles (loop level, loop level with large parallel sections, SPMD) using a subset of the NAS benchmark (CG, MG, FT, LU), two dataset sizes (A and B), and two shared memory multiprocessors (IBM SP3 NightHawk II, SGI Origin 3800). We have developed the first SPMD OpenMP version of the NAS benchmark and gathered other OpenMP versions from independent sources (PBN, SDSC and RWCP). Experimental results demonstrate that OpenMP provides competitive performance compared with MPI for a large set of experimental conditions. Not surprisingly, the two best OpenMP versions are those requiring the strongest programming effort. MPI still provides the best performance under some conditions. We present breakdowns of the execution times and measurements of hardware performance counters to explain the performance differences. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献
17.
A transparent distributed shared memory (DSM) system must achieve complete transparency in data distribution, workload distribution,
and reconfiguration respectively. The transparency of data distribution allows programmers to be able to access and allocate
shared data using the same user interface as is used in shared-memory systems. The transparency of workload distribution and
reconfiguration can optimize the parallelism at both the user-level and the kernel-level, and also improve the efficiency
of run-time reconfiguration. In this paper, a transparent DSM system referred to as Teamster is proposed and is implemented for clustered symmetric multiprocessors. With the transparency provided by Teamster, programmers can exploit all the computing power of the clustered SMP nodes in a transparent way as they do in single SMP
computer. Compared with the results of previous researches, Teamster can realize the transparency of cluster computing and obtain satisfactory system performance. 相似文献
18.
Journal of Computer Science and Technology - The multicore evolution has stimulated renewed interests in scaling up applications on shared-memory multiprocessors, significantly improving the... 相似文献
19.
20.