共查询到20条相似文献,搜索用时 0 毫秒
1.
MPI(消息传递接口)作为一种著名的底层并行编程模型已被提出来作为网格编程的基础。描述了基于网格的消息传递接口的实现MPICH-G2,它基于MPICH和Gllobus工具包实现,在启动和管理中隐藏了异构性,具有良好的异构通讯性能。用一个例子说明如何在一个由Globus搭建的计算网格环境中通过MPICH-G2来创建和执行MPI计算。 相似文献
2.
数据重分布是实现消息传递环境下负载平衡的重要手段,提出了数据交错分布的模型问题及模型问题的并行计算模型,分析了模型问题在消息传递环境下的实现,讨论了性能和适用条件,给出了分析结果,讨论了通信与计算的时间重叠问题,将数据交错重分布负载平衡技术应用到非平衡刚性动力学方程组的并行计算中,获得了很好的负载平衡效果。 相似文献
3.
One of the most important abstractions for designing distributed programs is the broadcast facility. In this paper, we study the interconnection of distributed message passing systems. We have shown that totally ordered systems cannot be properly interconnected in any form. However, we have provided a simple protocol to properly interconnect FIFO ordered systems. 相似文献
4.
Communication overhead is the key obstacle to reaching hardware performance limits. The majority is associated with software overhead, a significant portion of which is attributed to message copying. To reduce this copying overhead, we have devised techniques that do not require to copy a received message in order for it to be bound to its final destination. Rather, a late-binding mechanism, which involves address translation and a dedicated cache, facilitates fast access to received messages by the consuming process/thread.We have introduced two policies namely Direct to Cache Transfer (DTCT) and lazy DTCT that determine whether a message after it is bound needs to be transferred into the data cache. We have studied the proposed methods in simulation and have shown their effectiveness in reducing access times to message payloads by the consuming process. 相似文献
5.
6.
Connecting tools using message passing in the Field environment 总被引:4,自引:0,他引:4
An overview is given of the Field environment, which was developed to show that highly integrated, interactive environments like those on PCs can be implemented on workstations and can be used for classical-language and large-scale programming. Field connects tools with selective broadcasting, which follows the Unix philosophy of letting independent tools cooperate through simple conventions, demonstrating that this simple approach is feasible and desirable. Field achieves its goals by providing a consistent graphical front end and a simple integration framework that lets existing and new Unix tools cooperate. The front end is based on a tool set called the Brown workstation environment. The framework combines selective broadcasting with an annotation editor that provides consistent access to the source code in multiple contexts and with a set of specialized interactive analysis tools. Field's integration framework and message facility are described 相似文献
7.
The Journal of Supercomputing - Hpcfolder is a user-friendly high-performance computing tool that can be used to analyze the performance of algorithms parallelized using MPI. It is possible to view... 相似文献
8.
Kolmogorov V 《IEEE transactions on pattern analysis and machine intelligence》2006,28(10):1568-1583
Algorithms for discrete energy minimization are of fundamental importance in computer vision. In this paper, we focus on the recent technique proposed by Wainwright et al. [33]—tree-reweighted max-product message passing (TRW). It was inspired by the problem of maximizing a lower bound on the energy. However, the algorithm is not guaranteed to increase this bound—it may actually go down. In addition, TRW does not always converge. We develop a modification of this algorithm which we call sequential tree-reweighted message passing. Its main property is that the bound is guaranteed not to decrease. We also give a weak tree agreement condition which characterizes local maxima of the bound with respect to TRW algorithms. We prove that our algorithm has a limit point that achieves weak tree agreement. Finally, we show that, our algorithm requires half as much memory as traditional message passing approaches. Experimental results demonstrate that on certain synthetic and real problems, our algorithm outperforms both the ordinary belief propagation and tree-reweighted algorithm in [33]. In addition, on stereo problems with Potts interactions, we obtain a lower energy than graph cuts. 相似文献
9.
《Simulation Modelling Practice and Theory》2008,16(9):1177-1189
We present a distributed memory parallel implementation of the unbalanced tree search (UTS) benchmark using MPI and investigate MPI’s ability to efficiently support irregular and nested parallelism through continuous dynamic load balancing. Two load balancing methods are explored: work sharing using a centralized work server and distributed work stealing using explicit polling to service steal requests. Experiments indicate that in addition to a parameter defining the granularity of load balancing, message-passing paradigms require additional techniques to manage the volume of communication and mitigate runtime overhead. Using additional parameters, we observed an improvement of up to 3–4X in parallel performance. We report results for three distributed memory parallel computer systems and use UTS to characterize the performance and scalability on these systems. Overall, we find that the simpler work sharing approach with a single work server achieves good performance on hundreds of processors and that our distributed work stealing implementation scales to thousands of processors and delivers more robust performance that is less sensitive to the particular workload and load balancing parameters. 相似文献
10.
A modal logic for message passing processes 总被引:5,自引:0,他引:5
A first-order modal logic is given for describing properties of processes which may send and receive values or messages along communication ports. We give two methods for proving that a process enjoys such a property. The first is the construction, for each processP and formulaF, of acharacteristic formula P satF such thatP enjoys the propertyF if and only if the formulaP satF is logically equivalent to tt. The second is a sound and complete proof system whose judgements take the formB P: F, meaning: under the assumptionB the processP enjoys the propertyF.The notion ofsymbolic operational semantics plays a crucial role in the design of both the characteristic formulae and the proof system.This work was been supported by the SERC grant GR/H16537 and the ESPRIT BRA CONCUR II 相似文献
11.
Presents a systematic approach to the development of message passing programs. Our programming model is SPMD, with communications restricted to collective operations: scan, reduction, gather, etc. The design process in such an architecture-independent language is based on correctness-preserving transformation rules that are provable in a formal functional framework. We develop a set of design rules for composition and decomposition. For example, scan followed by reduction is replaced by a single reduction, and global reduction is decomposed into two faster operations. The impact of the design rules on the target performance is estimated analytically and tested in machine experiments. As a case study, we design two provably correct, efficient programs using the Message Passing Interface (MPI) for the famous maximum segment sum problem, starting from an intuitive, but inefficient, algorithm specification 相似文献
12.
We study the renaming problem in a fully connected synchronous network with Byzantine failures. We show that when the original
namespace of the processors is unbounded, this problem cannot be solved in an a priori bounded number of rounds for , where n is the size of the network and t is the number of failures. On the other hand, for n > 3t, we present a Byzantine renaming algorithm that runs in O(lg
n) rounds. In addition, we present a fast, efficient strong renaming algorithm for n > t, which runs in rounds, where N
0 is the value of the highest identifier among all the correct processors. 相似文献
13.
Massimo Bernaschi 《Future Generation Computer Systems》1998,13(6):443-449
We describe a single-copy mechanism which enables an efficient message passing among UNIX processes on shared memory multiprocessors. A special version of PVMe, IBM's AIX implementation of the PVM message passing programming model, has been built based on this approach. Some preliminary results here reported show the clear advantage of the single-copy with respect to more conventional schemes. 相似文献
14.
We introduce a runtime, nontrace-based algorithm to compute the critical path profile of the execution of message passing and shared-memory parallel programs. Our algorithm permits starting or stopping the critical path computation during program execution and reporting intermediate values. We also present an online algorithm to compute a variant of critical path, called critical path zeroing, that measures the reduction in application execution time that improving a selected procedure will have. Finally, we present a brief case study to quantify the runtime overhead of our algorithm and to show that online critical path profiling can be used to find program bottlenecks 相似文献
15.
Colbrook A. Brewer E.A. Dellarocas C.N. Weihl W.E. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(2):97-108
In this paper we describe a new algorithm for maintaining a balanced search tree on a message-passing MIMD architecture; the algorithm is particularly well suited for implementation on a small number of processors. We introduce a (2B-2, 2B) search tree that uses a bidirectional ring of O(log n) processors to store n entries. Update operations use a bottom-up node-splitting scheme, which performs significantly better than top-down search tree algorithms. The bottom-up algorithm requires many fewer messages and results in less blocking due to synchronization than top-down algorithms. Additionally, for a given cost ratio of computation to communication the value of B may be varied to maximize performance. Implementations on a parallel-architecture simulator are described 相似文献
16.
The performance of a parallel program executed on a message passing MIMD computer is determined mainly by the efficiency of the communication among the processors and the efficiency of the calculation carried out in each processor. In this paper we present the results of the experiments related to the efficiency of the communication of a T800 transputer based system. The results of these experiments are used to determine the basic hardware parameters for the communication capabilities of the system. Such parameters are the asymptotic rate of data transfer (r∞) and the message length required to obtain half the asymptotic rate (n1/2). These performance results will help us to evaluate new implementations or new architectures. 相似文献
17.
18.
Message Passing Interface (MPI) is the most popular standard for writing portable and scalable parallel applications for distributed
memory architectures. Writing efficient parallel applications using MPI is a complex task, mainly due to the extra burden
on programmers to explicitly handle all the complexities of message-passing (viz., inter-process communication, data distribution,
load-balancing, and synchronization). The main goal of our research is to raise the level of abstraction of explicit parallelization
using MPI such that the effort involved in developing parallel applications is significantly reduced in terms of the reduction
in the amount of code written manually while avoiding intrusive changes to existing sequential programs. In this research,
generative programming tools and techniques are combined with a domain-specific language, Hi-PaL (High-Level Parallelization
Language), for automating the process of generating and inserting the required code for parallelization into the existing
sequential applications. The results show that the performance of the generated applications is comparable to the manually
written versions of the applications, while requiring no explicit changes to the existing sequential code. 相似文献
19.
20.
Kumar M.J. Patnaik L.M. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》1996,26(6):822-835
Various Artificial Neural Networks (ANNs) have been proposed in recent years to mimic the human brain in solving problems involving human-like intelligence. Efficient mapping of ANNs comprising of large number of neurons onto various distributed MIMD architectures is discussed in this paper. The massive interconnection among neurons demands a communication efficient architecture. Issues related to the suitability of MIMD architectures for simulating neural networks are discussed. Performance analysis of ring, torus, binary tree, hypercube, and extended hypercube for simulating artificial neural networks is presented. Our studies reveal that the performance of the extended hypercube is better than those of ring, torus, binary tree, and hypercube topologies. 相似文献