共查询到20条相似文献,搜索用时 0 毫秒
1.
Distributed shared memory (DSM) systems provide a simple programming paradigm for networks of workstations, which are gaining popularity due to their cost-effective high computing power. However, DSM systems usually exhibit poor performance due to the large communication delay between the nodes; and a lot of different memory consistency models have been proposed to mask the network delay. In this paper, we propose an asynchronous protocol for the release consistent memory model, which we call an Asynchronous Release Consistency (ARC) protocol. Unlike other protocols where the communication adheres to the synchronous request/receive paradigm, the ARC protocol is asynchronous, such that the necessary pages are broadcast before they are requested. Hence, the network delay can be reduced by proper prefetching of necessary pages. We have also compared the performance of the ARC protocol with the lazy release protocol by running standard benchmark programs; and the experimental results showed that the ARC protocol achieves a performance improvement of up to 29%. 相似文献
2.
分布式共享存储系统在分布式存储器的基础上构造逻辑上的共享存储模型。提出了在操作系统层实现分布式共享存储的系统框架,并以Linux操作系统为平台介绍了其实现。该系统提供简单的调用接口,并与Linux内存管理框架紧密结合。通过采用合适的DSM一致性协议提高了整体性能。 相似文献
3.
本文在基于排队论M/M/1动态负载均衡模型的基础上,提出了一种基于纳什均衡的动态负载均衡和静态负载均衡相结合的负载均衡方案.将改进后的方法与原负载均衡模型作对比,结果表明,在系统高通信开销时,新方案能有较好的性能表现,当系统负载量超过45%时,可以取得较好的期望响应时间. 相似文献
4.
软件DSM系统的并行调试环境已经成为制约其广泛应用的一个重要因素,重放方法使得用户能用循环调试技术来调试具有执行不确定性的软件DSM程序,本文定义了软件DSM程序执行的happen-before-1关系,并依据其提出一种在软件DMS系统JIAJIA上实现重放的方法,实际应用测试表明,该方法产生很小的空间和时间开销。 相似文献
5.
Due to a significant communication overhead of sending and receiving data, the loop partitioning approaches on distributed memory systems must guarantee not just the computation load balance but computation+communication load balance. The previous approaches in loop partitioning have achieved a communication-free, computation load balanced iteration space partitioning solution for a limited subset of DOALL loops. But a large category of DOALL loops inevitably result in communication and the trade-offs between computation and communication must be carefully analyzed for these loops in order to balance out the combined computation time and communication overheads. In this work, we describe a partitioning approach based on the above motivation for the general cases of DOALL loops. Our goal is to achieve a computation+communication load balanced partitioning through static data and iteration space distribution. Our approach first performs partitioning of iteration and data spaces of a loop nest by analyzing communication and parallelism; it then performs architecture-dependent analysis to adjust the granularity of partitions, load balance each partition with respect to total computation+communication, and then performs mapping of partitions onto the available number of processors. This multiphase partitioning method works as follows. First, the code partitioning phase analyzes the references in the body of the DOALL loop nest and determines a set of directions for reducing a larger degree of communication by trading a lesser degree of parallelism. The partitioning is carried out in the iteration space of the loop by cyclically following a set of direction vectors such that the data references are maximally localized and reused, eliminating a larger communication volume than parallelism. We then perform data space partitioning based on a new larger partition owns rule to minimize the communication overhead for a compute intensive partition by localizing its references relatively more than a smaller noncompute intensive partition. A partition interaction graph is then constructed which is used by the architecture-dependent analysis phase to merge the partitions to achieve granularity adjustment, computation+communication load balance, and mapping on the actual number of available processors. Relevant theory and algorithms are developed along with a performance evaluation on the Cray T3D. 相似文献
6.
Min Seung-Jai Basumallik Ayon Eigenmann Rudolf 《International journal of parallel programming》2003,31(3):225-249
This paper describes compiler techniques that can translate standard OpenMP applications into code for distributed computer systems. OpenMP has emerged as an important model and language extension for shared-memory parallel programming. However, despite OpenMP's success on these platforms, it is not currently being used on distributed system. The long-term goal of our project is to quantify the degree to which such a use is possible and develop supporting compiler techniques. Our present compiler techniques translate OpenMP programs into a form suitable for execution on a Software DSM system. We have implemented a compiler that performs this basic translation, and we have studied a number of hand optimizations that improve the baseline performance. Our approach complements related efforts that have proposed language extensions for efficient execution of OpenMP programs on distributed systems. Our results show that, while kernel benchmarks can show high efficiency of OpenMP programs on distributed systems, full applications need careful consideration of shared data access patterns. A naive translation (similar to OpenMP compilers for SMPs) leads to acceptable performance in very few applications only. However, additional optimizations, including access privatization, selective touch, and dynamic scheduling, resulting in 31% average improvement on our benchmarks. 相似文献
7.
This paper presents an efficient, writer-based logging scheme for recoverable distributed shared memory systems, in which logging of a data item is performed by its writer process, instead of every process that accesses the item logging it. Since the writer process maintains the log of data items, volatile storage can be used for logging. Only the readers' access information needs to be logged into the stable storage of the writer process to tolerate multiple failures. Moreover, to reduce the frequency of stable logging, only the data items accessed by multiple processes are logged with their access information when the items are invalidated, and also semantic-based optimization in logging is considered. Compared with the earlier schemes in which stable logging was performed whenever a new data item was accessed or written by a process, the size of the log and the logging frequency can be significantly reduced in the proposed scheme. 相似文献
8.
A transparent distributed shared memory (DSM) system must achieve complete transparency in data distribution, workload distribution,
and reconfiguration respectively. The transparency of data distribution allows programmers to be able to access and allocate
shared data using the same user interface as is used in shared-memory systems. The transparency of workload distribution and
reconfiguration can optimize the parallelism at both the user-level and the kernel-level, and also improve the efficiency
of run-time reconfiguration. In this paper, a transparent DSM system referred to as Teamster is proposed and is implemented for clustered symmetric multiprocessors. With the transparency provided by Teamster, programmers can exploit all the computing power of the clustered SMP nodes in a transparent way as they do in single SMP
computer. Compared with the results of previous researches, Teamster can realize the transparency of cluster computing and obtain satisfactory system performance. 相似文献
9.
数据竞争是共享存储程序中的一类难于调试的错误 .在支持域存储一致性模型的软件 DSM系统 JIAJIA上 ,通过采用汇编代码装配技术来获得程序所读写的共享变量集合的方法 ,实现了基于锁集合的动态数据竞争检测算法 .利用本文方法 ,在 TSP和 Barnes程序中找到了数据竞争情况 ,并根据找到的数据竞争 ,修正了 Barnes中的错误 .实际使用经验表明 ,本文方法易于用户使用 ,达到了实用水平 相似文献
10.
分布式和并行系统的负载平衡是影响系统性能的一个重要因素,本文提出了一个基于预测的动态负载平衡算法,本算法以本地负载信息为基础预测该结点达到空闲状态的时间,并且在该结点到达空闲状态之前发出任务请求,从而保证系统中各结点都处于忙碌状态,提高系统资源的利用率,提高系统性能。 相似文献
11.
12.
分布式ETL负载均衡策略研究 总被引:1,自引:0,他引:1
在分析分布式ETL中负载均衡重要性的基础上,针对传统ETL应用于分布式数据仓库中效率低的缺陷,提出一种根据ETL节点所抽取的数据类型不同对分布式ETL节点抽取的数据进行分割的策略,以及一种新的负载均衡模型—链网模型和Routers相结合的R-CN模型。在此基础上提出一种基于ETL数据分片和R-CN模型相结合的分布式ETL节点负载调度和均衡策略。此策略使ETL节点的数据处理能力有了很大的提高,有效地提高了分布式ETL的效率。 相似文献
13.
Distributed object computing systems are widely envisioned to be the desired distributed software development paradigm in the near future due to the higher modularity and the capability of handling machine and operating system heterogeneity. Indeed, enabled by the tremendous advancements in processor and networking technologies, complex operations such as object serialization and data marshalling become very efficient, and thus, distributed object systems are being built for many different applications. As the system scales up (e.g., with larger number of server and client objects, and more machines), a judicious load balancing system is required to efficiently distribute the workload (e.g., the queries, messages/objects passing) among different servers in the system. Several such load balancing schemes are proposed recently in the literature. However, while the rationales and mechanisms employed are dramatically different, the relative strengths and weaknesses of these approaches are unknown, making it difficult for a practitioner to choose an appropriate approach for the problem at hand. In this paper, we describe in detail three representative approaches, which are all practicable, and present a quantitative comparison using a real experimental distributed object computing platform. Among these three approaches, namely, JavaSpaces based, request redirection based, and fuzzy decision based, we find that the fuzzy decision-based algorithm outperforms the other two considerably under a wide range of different practical scenarios. 相似文献
14.
N. P. Manoj K. V. Manjunath R. Govindarajan 《International journal of parallel programming》2004,32(2):77-122
Traditional software Distributed Shared Memory (DSM) systems rely on the virtual memory management mechanisms to detect accesses to shared memory locations and maintain their consistency. The resulting involvement of the OS (kernel) and the associated overhead which is significant, can be avoided by careful compile time analysis and code instrumentation. In this paper, we propose such a Compiler Assisted Software support approach (CAS-DSM). In the CAS-DSM implementation, the involvement of the OS kernel is avoided by instrumenting the application code at the source level. The overhead caused by the execution of the instrumented code is reduced through several aggressive compile time optimizations. Finally, we also address the issue of reducing certain overheads in polling-based implementation of receiving asynchronous messages. We used SUIF, a public domain compiler tool, to implement compile time analysis, instrumentation and optimizations. We modified CVM, a publicly available software DSM to support the instrumentation inserted by the compiler. Detailed performance evaluation of CAS-DSM is reported using a set of Splash/Splash2 parallel application benchmarks on a distributed memory IBM SP-2 machine. CAS-DSM achieved moderate to good performance improvements for most of the applications compared to the original CVM implementation. Reducing the overheads in polling-based implementation improves the performance of CAS-DSM significantly resulting in an overall improvement of 12–52% over the original CVM implementation. 相似文献
15.
16.
为了加快在大规模神经网络训练下并行技术的训练速度问题,从BP算法的内部结构分析了BP神经网络算法的大规模行划分方法,提出了一种动态负载平衡方案。通过在PC集群环境下对并行算法的试验结果表明,这种并行划分提高了加速比,具有现实意义。 相似文献
17.
多级能量异构传感器网络的负载均衡成簇算法 总被引:2,自引:0,他引:2
在多级能量异构无线传感器网络中,节点的初始能量在一定的范围内随机分布,负载均衡和降低能耗是能量异构网络成簇算法的一个重要挑战.现有的分布式成簇算法主要是针对能量同构或二级异构网络设计的,无法实现节点能量多级异构时的负载均衡,因此提出了适用于多级能量异构传感网络的负载均衡成簇算法LBCA(load balance clustering algorithm).LBCA根据传感器网络的能量分布情况选择簇头节,最和实现负载均衡,可以有效地延长网络的稳定周期.簇头选择过程中,当探测区域能量分布均衡时,拥有较低平均通信能耗的节点将优先成为簇头节点,有利于降低探测区域内的总通信能耗;当探测区域能量分布不均衡时,具有较高剩余能量的节点将优先成为簇头节点,有利于实现探测区域内的负载均衡.将LBCA与主要的分布式成簇方案进行了比较,模拟实验结果显示,在多级能量异构传感器网络中,LBCA可以更好地实现负载均衡,极大地提高网络的稳定周期. 相似文献
18.
一种基于实测的高维动态负载平衡方法 总被引:3,自引:0,他引:3
针对大规模科学计算中的强非规则结构负载问题,作者开发出一种基于实测的动态负载平衡方法.首先,将由规则结构化网格组成的模拟区域剖分成多块;其次,把块的高维坐标转换成一维Hilbert空间填充曲线(HSFC)索引;然后,基于实测信息采用多层均权法剖分按一维HSFC索引排列的块;最后根据剖分信息重分配块以平衡负载.它把仅适用于一维的多层均权法扩展到二维和三维,并引入更多的实测信息和块数据结构.与ISP方法相比,该方法在64个CPU上提高负载平衡效率10%,在某MPP的500个CPU上模拟强非规则结构负载问题时,获得了88%的负载平衡效率和84%的并行效率. 相似文献
19.
分析了基于CORBA的多层分布式应用系统实现容错和平衡负载能力的技术原理,探讨了容错和负载平衡的具体实献方法,详细介绍了容错和平衡负载在Delphi分布式数据库开发中的应用。实际应用表明,结合该方法的分布式应用系统具有更加强大的容错和负载平衡能力,工作更加稳定。 相似文献
20.
分布存储系统的并行编译器需要解决各局部存储器之间数据分布问题和各处理机之间通信优化问题。论文并行编程模型、代码和数据分布、通信优化以及代码生成问题四个方面论述了基于分布存储系统的并行编译关键技术并提出了进一步研究所要解决的问题。 相似文献