首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present a distributed memory parallel implementation of the unbalanced tree search (UTS) benchmark using MPI and investigate MPI’s ability to efficiently support irregular and nested parallelism through continuous dynamic load balancing. Two load balancing methods are explored: work sharing using a centralized work server and distributed work stealing using explicit polling to service steal requests. Experiments indicate that in addition to a parameter defining the granularity of load balancing, message-passing paradigms require additional techniques to manage the volume of communication and mitigate runtime overhead. Using additional parameters, we observed an improvement of up to 3–4X in parallel performance. We report results for three distributed memory parallel computer systems and use UTS to characterize the performance and scalability on these systems. Overall, we find that the simpler work sharing approach with a single work server achieves good performance on hundreds of processors and that our distributed work stealing implementation scales to thousands of processors and delivers more robust performance that is less sensitive to the particular workload and load balancing parameters.  相似文献   

2.
A modal logic for message passing processes   总被引:5,自引:0,他引:5  
A first-order modal logic is given for describing properties of processes which may send and receive values or messages along communication ports. We give two methods for proving that a process enjoys such a property. The first is the construction, for each processP and formulaF, of acharacteristic formula P satF such thatP enjoys the propertyF if and only if the formulaP satF is logically equivalent to tt. The second is a sound and complete proof system whose judgements take the formB P: F, meaning: under the assumptionB the processP enjoys the propertyF.The notion ofsymbolic operational semantics plays a crucial role in the design of both the characteristic formulae and the proof system.This work was been supported by the SERC grant GR/H16537 and the ESPRIT BRA CONCUR II  相似文献   

3.
In this paper we describe a new algorithm for maintaining a balanced search tree on a message-passing MIMD architecture; the algorithm is particularly well suited for implementation on a small number of processors. We introduce a (2B-2, 2B) search tree that uses a bidirectional ring of O(log n) processors to store n entries. Update operations use a bottom-up node-splitting scheme, which performs significantly better than top-down search tree algorithms. The bottom-up algorithm requires many fewer messages and results in less blocking due to synchronization than top-down algorithms. Additionally, for a given cost ratio of computation to communication the value of B may be varied to maximize performance. Implementations on a parallel-architecture simulator are described  相似文献   

4.
We present the design and implementation of a loosely coupled multiprocessor built from off-the-shelf parts. Message passing is used as the communication paradigm. Several novel techniques are used to reduce the demands on the kernel from the message passing subsystem. We achieve message passing times of the same order for messages within processors and interprocessor messages, allowing transparent interprocess communication. Because it is possible to achieve these performance results, we conclude that process allocation need not be a critical problem in efficient multiprocessor design, at least for small scale multiprocessors.  相似文献   

5.
Message passing notations (language, package, etc.) typically include some form of asynchronous or synchronous invocation. In a synchronous invocation, the invoker waits for the invocation's servicer to pass back results. Some message passing notations also include early reply or deferred reply (including forwarding), which alters how and when the servicer passes back its results; this additional flexibility is useful in realistic applications. It is well known how to transform a synchronous invocation into only asynchronous invocations. This paper extends such transformations to early reply and forward. This paper also describes the use of these transformations within the implementations of programming notations. Using the transformation simplifies the implementation without significantly affecting run‐time costs. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

6.
机载雷达组网航迹融合需要解决目标跟踪、数据关联与航迹管理3个子问题, 然而这3个子问题相互耦合,采用开环序贯估计算法会导致性能下降. 本文提出了一种基于消息传递的机载雷达组网航迹融合方法, 该方法在联合优化框架下解决目标跟踪、数据关联与航迹管理3个子问题. 首先, 建立机载雷达组网航迹融合的联合概率密度函数, 并将其转换为因子图. 其次, 将因子图分解为置信传播区域与平均场近似区域. 目标运动状态的统计模型服从共轭指数模型, 因此采用平均场近似以获得简单的消息传递更新公式. 数据关联包含一对一约束, 因此采用置信传播. 目标存在状态同样采用置信传播, 以获得更好的近似结果. 最后, 可以通过闭环迭代框架近似估计后验分布, 从而有效处理目标跟踪、数据关联与航迹管理之间的耦合问题. 仿真结果表明, 所提算法的性能优于多假设跟踪算法和联合概率密度关联算法.  相似文献   

7.
Communication overhead is the key obstacle to reaching hardware performance limits. The majority is associated with software overhead, a significant portion of which is attributed to message copying. To reduce this copying overhead, we have devised techniques that do not require to copy a received message in order for it to be bound to its final destination. Rather, a late-binding mechanism, which involves address translation and a dedicated cache, facilitates fast access to received messages by the consuming process/thread.We have introduced two policies namely Direct to Cache Transfer (DTCT) and lazy DTCT that determine whether a message after it is bound needs to be transferred into the data cache. We have studied the proposed methods in simulation and have shown their effectiveness in reducing access times to message payloads by the consuming process.  相似文献   

8.
Determining consistent global checkpoints is common to many distributed problems such as fault-tolerance, distributed debugging, properties detection, etc. Uncoordinated and coordinated checkpointing algorithms have been traditionally used for such determinations. This paper addresses a third technique, namely adaptive checkpointing, that has recently emerged. This technique assumes processes take local checkpoints independently and requires them to take additional local checkpoints in order that all local checkpoints be members of some consistent global checkpoint. We first study the characteristics of such adaptive algorithms. Then, a general adaptive checkpointing algorithm is designed from a condition, first stated by Netzer and Xu, that answers the following question: ‘does a given local checkpoint belong to a consistent global checkpoint’' (such a local checkpoint is not useless). The resulting algorithm has the nice property to reduce the number of additional local checkpoints taken to ensure the property ‘no local checkpoint is useless’. Futhermore, it provides each local checkpoint with a consistent global checkpoint to which it belongs. Compared to uncoordinated and coordinated checkpointing algorithms, this algorithm combines the advantages of both without inheriting their drawbacks.  相似文献   

9.
《Computers in Industry》1986,7(3):237-247
We are developing a communication oriented interactive environment for robot programming. A set of independent software modules for robot motoring control, low level sensoring processing, kinematics simulation and world model construction has been implemented as separate tools. They have been merged in a software environment whose implementation is based on the operating system Accent and can be accessed through an advanced graphics workstation. Each module interacts with the others through message exchange; each module may be organized into new submodules as new operational behaviours belonging to its functional area become necessary.  相似文献   

10.
Presents a systematic approach to the development of message passing programs. Our programming model is SPMD, with communications restricted to collective operations: scan, reduction, gather, etc. The design process in such an architecture-independent language is based on correctness-preserving transformation rules that are provable in a formal functional framework. We develop a set of design rules for composition and decomposition. For example, scan followed by reduction is replaced by a single reduction, and global reduction is decomposed into two faster operations. The impact of the design rules on the target performance is estimated analytically and tested in machine experiments. As a case study, we design two provably correct, efficient programs using the Message Passing Interface (MPI) for the famous maximum segment sum problem, starting from an intuitive, but inefficient, algorithm specification  相似文献   

11.
One of the most important abstractions for designing distributed programs is the broadcast facility. In this paper, we study the interconnection of distributed message passing systems. We have shown that totally ordered systems cannot be properly interconnected in any form. However, we have provided a simple protocol to properly interconnect FIFO ordered systems.  相似文献   

12.
13.
We propose extensions to the message passing interface (MPI) that generalize the MPI communicator concept to allow multiple communication endpoints per process, dynamic creation of endpoints, and the transfer of endpoints between processes. The generalized communicator construct can be used to express a wide range of interesting communication structures, including collective communication operations involving multiple threads per process, communications between dynamically created threads or processes, and object-oriented applications in which communications are directed to specific objects. Furthermore, this enriched functionality can be provided in a manner that preserves backward compatibility with MPI. We describe the proposed extensions, illustrate their use with examples, and describe a prototype implementation in the popular MPI implementation MPICH  相似文献   

14.
15.
This paper presents a directive-based programming environment for master–slave message passing applications that enables the efficient execution of the same code on both shared and distributed memory multiprocessors. The environment exports an extension of the OpenMP workqueuing model, supports multiple levels of task parallelism and more than one master and provides transparent load balancing with a combination of static and dynamic scheduling of tasks. In addition, it operates exclusively through the available hardware on shared-memory machines and exploits MPI for explicit communication on clusters. Experimental results on a Linux-cluster demonstrate the successful combination of ease of programming with the performance of message passing.  相似文献   

16.
Message Passing Interface (MPI) is the most popular standard for writing portable and scalable parallel applications for distributed memory architectures. Writing efficient parallel applications using MPI is a complex task, mainly due to the extra burden on programmers to explicitly handle all the complexities of message-passing (viz., inter-process communication, data distribution, load-balancing, and synchronization). The main goal of our research is to raise the level of abstraction of explicit parallelization using MPI such that the effort involved in developing parallel applications is significantly reduced in terms of the reduction in the amount of code written manually while avoiding intrusive changes to existing sequential programs. In this research, generative programming tools and techniques are combined with a domain-specific language, Hi-PaL (High-Level Parallelization Language), for automating the process of generating and inserting the required code for parallelization into the existing sequential applications. The results show that the performance of the generated applications is comparable to the manually written versions of the applications, while requiring no explicit changes to the existing sequential code.  相似文献   

17.
We describe a single-copy mechanism which enables an efficient message passing among UNIX processes on shared memory multiprocessors. A special version of PVMe, IBM's AIX implementation of the PVM message passing programming model, has been built based on this approach. Some preliminary results here reported show the clear advantage of the single-copy with respect to more conventional schemes.  相似文献   

18.
Recent publications prove that runtime systems oriented to the Bulk Synchronous Parallel Model usually achieve remarkable accuracy in their predictions. That accuracy can be seen in the capacity of the software for packing the messages generated during the superstep and their capability to find a rearrangement of the messages sent at the end of the superstep. Unfortunately, barrier synchronisation imposes some limits both in the range of available algorithms and in their performance. The asynchronous nature of many MPI/PVM programs makes their expression difficult or infeasible using a BSP oriented library. Through the generalisation of the concept of superstep we propose two extensions of the BSP model: the BSP Without Barriers (BSPWB) and the Message Passing Machine (MPM) models. These new models are oriented to MPI/PVM parallel programming. The parameters of the models and their quality are evaluated on four standard parallel platforms. The use of these BSP extensions is illustrated using the Fast Fourier Transform and the Parallel Sorting by Regular Sampling algorithms. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

19.
We introduce a runtime, nontrace-based algorithm to compute the critical path profile of the execution of message passing and shared-memory parallel programs. Our algorithm permits starting or stopping the critical path computation during program execution and reporting intermediate values. We also present an online algorithm to compute a variant of critical path, called critical path zeroing, that measures the reduction in application execution time that improving a selected procedure will have. Finally, we present a brief case study to quantify the runtime overhead of our algorithm and to show that online critical path profiling can be used to find program bottlenecks  相似文献   

20.
The performance of a parallel program executed on a message passing MIMD computer is determined mainly by the efficiency of the communication among the processors and the efficiency of the calculation carried out in each processor. In this paper we present the results of the experiments related to the efficiency of the communication of a T800 transputer based system. The results of these experiments are used to determine the basic hardware parameters for the communication capabilities of the system. Such parameters are the asymptotic rate of data transfer (r) and the message length required to obtain half the asymptotic rate (n1/2). These performance results will help us to evaluate new implementations or new architectures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号