期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An Asynchronous Protocol for Release Consistent Distributed Shared Memory Systems

Yeo Jaeheung Yeom Heon Y. Park Taesoon 《The Journal of supercomputing》2003,24(1):25-41

Distributed shared memory (DSM) systems provide a simple programming paradigm for networks of workstations, which are gaining popularity due to their cost-effective high computing power. However, DSM systems usually exhibit poor performance due to the large communication delay between the nodes; and a lot of different memory consistency models have been proposed to mask the network delay. In this paper, we propose an asynchronous protocol for the release consistent memory model, which we call an Asynchronous Release Consistency (ARC) protocol. Unlike other protocols where the communication adheres to the synchronous request/receive paradigm, the ARC protocol is asynchronous, such that the necessary pages are broadcast before they are requested. Hence, the network delay can be reduced by proper prefetching of necessary pages. We have also compared the performance of the ARC protocol with the lazy release protocol by running standard benchmark programs; and the experimental results showed that the ARC protocol achieves a performance improvement of up to 29%. 相似文献

2.

基于Linux的软件DSM实现

李鹏王雷《计算机工程》2006,32(4):58-60

分布式共享存储系统在分布式存储器的基础上构造逻辑上的共享存储模型。提出了在操作系统层实现分布式共享存储的系统框架，并以Linux操作系统为平台介绍了其实现。该系统提供简单的调用接口，并与Linux内存管理框架紧密结合。通过采用合适的DSM一致性协议提高了整体性能。相似文献

3.

基于纳什均衡的多用户分布式系统负载均衡的研究

王龙田野《软件》2012,33(12)

本文在基于排队论M/M/1动态负载均衡模型的基础上,提出了一种基于纳什均衡的动态负载均衡和静态负载均衡相结合的负载均衡方案.将改进后的方法与原负载均衡模型作对比,结果表明,在系统高通信开销时,新方案能有较好的性能表现,当系统负载量超过45％时,可以取得较好的期望响应时间. 相似文献

4.

软件DSM系统的重放

章隆兵张福新陈意云《小型微型计算机系统》2003,24(3):340-343

软件DSM系统的并行调试环境已经成为制约其广泛应用的一个重要因素，重放方法使得用户能用循环调试技术来调试具有执行不确定性的软件DSM程序，本文定义了软件DSM程序执行的happen-before-1关系，并依据其提出一种在软件DMS系统JIAJIA上实现重放的方法，实际应用测试表明，该方法产生很小的空间和时间开销。相似文献

5.

A Computation+Communication Load Balanced Loop Partitioning Method for Distributed Memory Systems

Santosh Pande Tareq Bali 《Journal of Parallel and Distributed Computing》1999,58(3):251

Due to a significant communication overhead of sending and receiving data, the loop partitioning approaches on distributed memory systems must guarantee not just the computation load balance but computation+communication load balance. The previous approaches in loop partitioning have achieved a communication-free, computation load balanced iteration space partitioning solution for a limited subset of DOALL loops. But a large category of DOALL loops inevitably result in communication and the trade-offs between computation and communication must be carefully analyzed for these loops in order to balance out the combined computation time and communication overheads. In this work, we describe a partitioning approach based on the above motivation for the general cases of DOALL loops. Our goal is to achieve a computation+communication load balanced partitioning through static data and iteration space distribution. Our approach first performs partitioning of iteration and data spaces of a loop nest by analyzing communication and parallelism; it then performs architecture-dependent analysis to adjust the granularity of partitions, load balance each partition with respect to total computation+communication, and then performs mapping of partitions onto the available number of processors. This multiphase partitioning method works as follows. First, the code partitioning phase analyzes the references in the body of the DOALL loop nest and determines a set of directions for reducing a larger degree of communication by trading a lesser degree of parallelism. The partitioning is carried out in the iteration space of the loop by cyclically following a set of direction vectors such that the data references are maximally localized and reused, eliminating a larger communication volume than parallelism. We then perform data space partitioning based on a new larger partition owns rule to minimize the communication overhead for a compute intensive partition by localizing its references relatively more than a smaller noncompute intensive partition. A partition interaction graph is then constructed which is used by the architecture-dependent analysis phase to merge the partitions to achieve granularity adjustment, computation+communication load balance, and mapping on the actual number of available processors. Relevant theory and algorithms are developed along with a performance evaluation on the Cray T3D. 相似文献

6.

Optimizing OpenMP Programs on Software Distributed Shared Memory Systems

Min Seung-Jai Basumallik Ayon Eigenmann Rudolf 《International journal of parallel programming》2003,31(3):225-249

This paper describes compiler techniques that can translate standard OpenMP applications into code for distributed computer systems. OpenMP has emerged as an important model and language extension for shared-memory parallel programming. However, despite OpenMP's success on these platforms, it is not currently being used on distributed system. The long-term goal of our project is to quantify the degree to which such a use is possible and develop supporting compiler techniques. Our present compiler techniques translate OpenMP programs into a form suitable for execution on a Software DSM system. We have implemented a compiler that performs this basic translation, and we have studied a number of hand optimizations that improve the baseline performance. Our approach complements related efforts that have proposed language extensions for efficient execution of OpenMP programs on distributed systems. Our results show that, while kernel benchmarks can show high efficiency of OpenMP programs on distributed systems, full applications need careful consideration of shared data access patterns. A naive translation (similar to OpenMP compilers for SMPs) leads to acceptable performance in very few applications only. However, additional optimizations, including access privatization, selective touch, and dynamic scheduling, resulting in 31% average improvement on our benchmarks. 相似文献

7.

A Low Overhead Logging Scheme for Fast Recovery in Distributed Shared Memory Systems

Park Taesoon Yeom Heon Y. 《The Journal of supercomputing》2000,15(3):295-320

This paper presents an efficient, writer-based logging scheme for recoverable distributed shared memory systems, in which logging of a data item is performed by its writer process, instead of every process that accesses the item logging it. Since the writer process maintains the log of data items, volatile storage can be used for logging. Only the readers' access information needs to be logged into the stable storage of the writer process to tolerate multiple failures. Moreover, to reduce the frequency of stable logging, only the data items accessed by multiple processes are logged with their access information when the items are invalidated, and also semantic-based optimization in logging is considered. Compared with the earlier schemes in which stable logging was performed whenever a new data item was accessed or written by a process, the size of the log and the logging frequency can be significantly reduced in the proposed scheme. 相似文献

8.

A Transparent Distributed Shared Memory for Clustered Symmetric Multiprocessors

Jyh-Biau Chang Ce-Kuen Shieh Tyng-Yeu Liang 《The Journal of supercomputing》2006,37(2):145-160

A transparent distributed shared memory (DSM) system must achieve complete transparency in data distribution, workload distribution, and reconfiguration respectively. The transparency of data distribution allows programmers to be able to access and allocate shared data using the same user interface as is used in shared-memory systems. The transparency of workload distribution and reconfiguration can optimize the parallelism at both the user-level and the kernel-level, and also improve the efficiency of run-time reconfiguration. In this paper, a transparent DSM system referred to as Teamster is proposed and is implemented for clustered symmetric multiprocessors. With the transparency provided by Teamster, programmers can exploit all the computing power of the clustered SMP nodes in a transparent way as they do in single SMP computer. Compared with the results of previous researches, Teamster can realize the transparency of cluster computing and obtain satisfactory system performance. 相似文献

9.

软件DSM系统中的动态数据竞争检测

章隆兵吴少刚张福新《小型微型计算机系统》2004,25(12):2070-2074

数据竞争是共享存储程序中的一类难于调试的错误 .在支持域存储一致性模型的软件 DSM系统 JIAJIA上 ,通过采用汇编代码装配技术来获得程序所读写的共享变量集合的方法 ,实现了基于锁集合的动态数据竞争检测算法 .利用本文方法 ,在 TSP和 Barnes程序中找到了数据竞争情况 ,并根据找到的数据竞争 ,修正了 Barnes中的错误 .实际使用经验表明 ,本文方法易于用户使用 ,达到了实用水平相似文献

10.

一种基于预测的负载平衡策略

李庆华尹社红《计算机与数字工程》2003,31(4):54-57,44

分布式和并行系统的负载平衡是影响系统性能的一个重要因素，本文提出了一个基于预测的动态负载平衡算法，本算法以本地负载信息为基础预测该结点达到空闲状态的时间，并且在该结点到达空闲状态之前发出任务请求，从而保证系统中各结点都处于忙碌状态，提高系统资源的利用率，提高系统性能。相似文献

11.

一种分布式网站负载均衡方法

刘昌华《计算机与数字工程》2003,31(5):54-57

本文介绍了Internet分布式网站负载均衡的基本概念及一种基于负载均衡设备的分布式网站负载均衡方法与组网实例。相似文献

12.

分布式ETL负载均衡策略研究 总被引：1，自引：0，他引：1

张亮夏秀峰《计算机与现代化》2011,(9):201-204

在分析分布式ETL中负载均衡重要性的基础上,针对传统ETL应用于分布式数据仓库中效率低的缺陷,提出一种根据ETL节点所抽取的数据类型不同对分布式ETL节点抽取的数据进行分割的策略,以及一种新的负载均衡模型—链网模型和Routers相结合的R-CN模型。在此基础上提出一种基于ETL数据分片和R-CN模型相结合的分布式ETL节点负载调度和均衡策略。此策略使ETL节点的数据处理能力有了很大的提高,有效地提高了分布式ETL的效率。相似文献

13.

On Load Balancing Approaches for Distributed Object Computing Systems 总被引：1，自引：0，他引：1

Cheung Lap-sun Kwok Yu-kwok 《The Journal of supercomputing》2004,27(2):149-175

Distributed object computing systems are widely envisioned to be the desired distributed software development paradigm in the near future due to the higher modularity and the capability of handling machine and operating system heterogeneity. Indeed, enabled by the tremendous advancements in processor and networking technologies, complex operations such as object serialization and data marshalling become very efficient, and thus, distributed object systems are being built for many different applications. As the system scales up (e.g., with larger number of server and client objects, and more machines), a judicious load balancing system is required to efficiently distribute the workload (e.g., the queries, messages/objects passing) among different servers in the system. Several such load balancing schemes are proposed recently in the literature. However, while the rationales and mechanisms employed are dramatically different, the relative strengths and weaknesses of these approaches are unknown, making it difficult for a practitioner to choose an appropriate approach for the problem at hand. In this paper, we describe in detail three representative approaches, which are all practicable, and present a quantitative comparison using a real experimental distributed object computing platform. Among these three approaches, namely, JavaSpaces based, request redirection based, and fuzzy decision based, we find that the fuzzy decision-based algorithm outperforms the other two considerably under a wide range of different practical scenarios. 相似文献

14.

CAS-DSM: A Compiler Assisted Software Distributed Shared Memory

N. P. Manoj K. V. Manjunath R. Govindarajan 《International journal of parallel programming》2004,32(2):77-122

Traditional software Distributed Shared Memory (DSM) systems rely on the virtual memory management mechanisms to detect accesses to shared memory locations and maintain their consistency. The resulting involvement of the OS (kernel) and the associated overhead which is significant, can be avoided by careful compile time analysis and code instrumentation. In this paper, we propose such a Compiler Assisted Software support approach (CAS-DSM). In the CAS-DSM implementation, the involvement of the OS kernel is avoided by instrumenting the application code at the source level. The overhead caused by the execution of the instrumented code is reduced through several aggressive compile time optimizations. Finally, we also address the issue of reducing certain overheads in polling-based implementation of receiving asynchronous messages. We used SUIF, a public domain compiler tool, to implement compile time analysis, instrumentation and optimizations. We modified CVM, a publicly available software DSM to support the instrumentation inserted by the compiler. Detailed performance evaluation of CAS-DSM is reported using a set of Splash/Splash2 parallel application benchmarks on a distributed memory IBM SP-2 machine. CAS-DSM achieved moderate to good performance improvements for most of the applications compared to the original CVM implementation. Reducing the overheads in polling-based implementation improves the performance of CAS-DSM significantly resulting in an overall improvement of 12–52% over the original CVM implementation. 相似文献

15.

流媒体随机模糊调度方法的设计与实现

林光国刘晓冬戴琼海《计算机工程》2005,31(14):178-180,224

结合流媒体服务特点,设计并实现了一种适合于流媒体集群动态负载调度的随机模糊凋度算法。主要解决了两方面的问题：机器负载的评估和调度策略的设计。相似文献

16.

一种并行BP神经网络的动态负载平衡方案

赵莉程荣《微机发展》2006,16(7):67-69

为了加快在大规模神经网络训练下并行技术的训练速度问题,从BP算法的内部结构分析了BP神经网络算法的大规模行划分方法,提出了一种动态负载平衡方案。通过在PC集群环境下对并行算法的试验结果表明,这种并行划分提高了加速比,具有现实意义。相似文献

17.

多级能量异构传感器网络的负载均衡成簇算法 总被引：2，自引：0，他引：2

王向辉张国印谢晓芹《计算机研究与发展》2008,45(3):392-399

在多级能量异构无线传感器网络中,节点的初始能量在一定的范围内随机分布,负载均衡和降低能耗是能量异构网络成簇算法的一个重要挑战.现有的分布式成簇算法主要是针对能量同构或二级异构网络设计的,无法实现节点能量多级异构时的负载均衡,因此提出了适用于多级能量异构传感网络的负载均衡成簇算法LBCA(load balance clustering algorithm).LBCA根据传感器网络的能量分布情况选择簇头节,最和实现负载均衡,可以有效地延长网络的稳定周期.簇头选择过程中,当探测区域能量分布均衡时,拥有较低平均通信能耗的节点将优先成为簇头节点,有利于降低探测区域内的总通信能耗;当探测区域能量分布不均衡时,具有较高剩余能量的节点将优先成为簇头节点,有利于实现探测区域内的负载均衡.将LBCA与主要的分布式成簇方案进行了比较,模拟实验结果显示,在多级能量异构传感器网络中,LBCA可以更好地实现负载均衡,极大地提高网络的稳定周期. 相似文献

18.

一种基于实测的高维动态负载平衡方法 总被引：3，自引：0，他引：3

曹小林莫则尧《计算机学报》2005,28(9):1440-1446

针对大规模科学计算中的强非规则结构负载问题,作者开发出一种基于实测的动态负载平衡方法．首先,将由规则结构化网格组成的模拟区域剖分成多块;其次,把块的高维坐标转换成一维Hilbert空间填充曲线（HSFC）索引;然后,基于实测信息采用多层均权法剖分按一维HSFC索引排列的块;最后根据剖分信息重分配块以平衡负载．它把仅适用于一维的多层均权法扩展到二维和三维,并引入更多的实测信息和块数据结构．与ISP方法相比,该方法在64个CPU上提高负载平衡效率10％,在某MPP的500个CPU上模拟强非规则结构负载问题时,获得了88％的负载平衡效率和84％的并行效率．相似文献

19.

基于CORBA的分布系统容错和负载的实现

杨林楠李永强《控制工程》2004,11(5):449-451

分析了基于CORBA的多层分布式应用系统实现容错和平衡负载能力的技术原理,探讨了容错和负载平衡的具体实献方法,详细介绍了容错和平衡负载在Delphi分布式数据库开发中的应用。实际应用表明,结合该方法的分布式应用系统具有更加强大的容错和负载平衡能力,工作更加稳定。相似文献

20.

基于分布存储系统的并行编译关键技术

贾明飞董渭清黄泳翔《计算机工程与应用》2003,39(22):103-106,152

分布存储系统的并行编译器需要解决各局部存储器之间数据分布问题和各处理机之间通信优化问题。论文并行编程模型、代码和数据分布、通信优化以及代码生成问题四个方面论述了基于分布存储系统的并行编译关键技术并提出了进一步研究所要解决的问题。相似文献