首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
It is a known problem that state spaces can grow very big, which makes operating with them (including reducing them) difficult because of memory shortage. In the attempt to extend the size of the state spaces that can be dealt with, we designed and implemented a bisimulation reduction algorithm for distributed memory settings using message passing communication. By using message passing, the same implementation can be used on both large SMP machines and clusters of workstations. The algorithm performs reduction of large labeled transition systems modulo strong bisimulation. We justify its correctness and termination. We provide an evaluation of the worst-case time and message complexity and some performance data from a prototype implementation. Both theory and practice show that the algorithm scales up with the number of workers.  相似文献   

2.
It is a known problem that state spaces can grow very large, which makes operating on them (including reducing them) difficult because of operational memory shortage. In an attempt to extend the size of the state spaces that can be dealt with, we designed and implemented a bisimulation reduction algorithm for distributed memory settings using message passing communication. By using message passing, the same implementation can be used on both clusters of workstations and large shared memory machines. The algorithm performs reduction of large labeled transition systems modulo strong bisimulation. We justify its correctness and termination and provide an evaluation of the worst-case time and message complexity and some performance data from a prototype implementation. Both theory and practice show that the algorithm scales up with the number of workstations.  相似文献   

3.
A linear scaling parallel clustering algorithm implementation and its application to very large datasets for cluster analysis is reported. WaveCluster is a novel clustering approach based on wavelet transforms. Despite this approach has an ability to detect clusters of arbitrary shapes in an efficient way, it requires considerable amount of time to collect results for large sizes of multi-dimensional datasets. We propose the parallel implementation of the WaveCluster algorithm based on the message passing model for a distributed-memory multiprocessor system. In the proposed method, communication among processors and memory requirements are kept at minimum to achieve high efficiency. We have conducted the experiments on a dense dataset and a sparse dataset to measure the algorithm behavior appropriately. Our results obtained from performed experiments demonstrate that developed parallel WaveCluster algorithm exposes high speedup and scales linearly with the increasing number of processors.  相似文献   

4.
Full waveform inversion (FWI) is an appealing seismic data-fitting procedure for the derivation of high-resolution quantitative models of the subsurface at various scales. Full modelling and inversion of visco-elastic waves from multiple seismic sources allow for the recovering of different physical parameters, although they remain computationally challenging tasks. An efficient massively parallel, frequency-domain FWI algorithm is implemented here on large-scale distributed-memory platforms for imaging two-dimensional visco-elastic media. The resolution of the elastodynamic equations, as the forward problem of the inversion, is performed in the frequency domain on unstructured triangular meshes, using a low-order finite element discontinuous Galerkin method. The linear system resulting from discretization of the forward problem is solved with a parallel direct solver. The inverse problem, which is presented as a non-linear local optimization problem, is solved in parallel with a quasi-Newton method, and this allows for reliable estimation of multiple classes of visco-elastic parameters. Two levels of parallelism are implemented in the algorithm, based on message passing interfaces and multi-threading, for optimal use of computational time and the core-memory resources available on modern distributed-memory multi-core computational platforms. The algorithm allows for imaging of realistic targets at various scales, ranging from near-surface geotechnic applications to crustal-scale exploration.  相似文献   

5.
Many of the world’s leading supercomputer architectures are a hybrid of shared memory and network-distributed memory. Such an architecture lends itself to a hybrid MPI-thread programming model. We first present an implementation of inter-thread message passing based on the MPI and pthread libraries. In addition, we present an efficient implementation of termination detection for communication rounds. We use the term phased message passing to denote the communication interface based on this termination detection. This interface is then used to implement parallel operations for adaptive unstructured meshes, and the performance of resulting applications is compared to pure MPI operation. We also present new workflows enabled by the ability to vary the number of threads during runtime.  相似文献   

6.
基于消息传递的LDPC码硬判决解码算法建模   总被引:1,自引:0,他引:1  
提出了一种以奇偶校验和作为消息传递的LDPC码硬判决的解码方案.该方案以奇偶校验方程是否满足约束为条件,从而决定接收分组中的错误位,并对错误位进行翻转.分析了迭代消息流传递机制和迭代解码过程,最后提出一种具体可实现的解码算法模型。  相似文献   

7.
There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of λ and will arrive at the receiving node at round r+λ-1. Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, (1) techniques for measurement of the value of λ on a given machine, (2) creating efficient broadcast algorithms that get the latency h and the number of nodes n as parameters and (3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Inter Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%  相似文献   

8.
Message passing programs commonly use message buffers to avoid unnecessary synchronizations and to improve performance by overlapping communication with computation. Unfortunately, using buffers can introduce portability problems and can lead to deadlock problems on systems without a sufficient number of message buffers.We explore a variety of problems related to buffer allocation for safe and efficient execution of message passing programs. We show that determining the minimum number of message buffers or verifying that each process has a sufficient number of message buffers are intractable problems. However, we give a polynomial time algorithm to determine the minimum number of message buffers needed to ensure that no send operation is unnecessarily delayed due to lack of message buffers. We extend these results to several different buffering schemes, which in some cases make the problems tractable.  相似文献   

9.
We present a simple and efficient mutual exclusion algorithm whose optimal message passing complexity isO(N), whereNis the number of processors in the network. The message complexity is measured by counting the number of communication hops in a network for a given topology. This algorithm reduces its message passing complexity by a token-chasing method, and enhances its effectiveness by dynamically adjusting state information stored in each processor. Moreover, this algorithm shortens the request delay by fully taking advantage of the network dynamic status information. The performance of the algorithm is also modeled for analytical evaluation. We have conducted a group of experiments on a network of workstations for comparisons between our algorithm and two other existing mutual exclusion algorithms. The experimental results show the effectiveness of our algorithm, especially when a large number of requests access the critical region in a distributed system. Finally, the token-chasing algorithm is further enhanced for fault tolerance under message loss and link crash conditions.  相似文献   

10.
In this paper, we describe various methods of deriving a parallel version of Stone's Strongly Implicit Procedure (SIP) for solving sparse linear equations arising from finite difference approximation to partial differential equations (PDEs). Sequential versions of this algorithm have been very successful in solving semi‐conductor, heat conduction and flow simulation problems and an efficient parallel version would enable much larger simulations to be run. An initial investigation of various parallelizing strategies was undertaken using a version of high performance Fortran (HPF) and the best methods were reprogrammed using the MPI message passing libraries for increased efficiency. Early attempts concentrated on developing a parallel version of the characteristic wavefront computation pattern of the existing sequential SIP code. However, a red‐black ordering of grid points, similar to that used in parallel versions of the Gauss–Seidel algorithm, is shown to be far more efficient. The results of both the wavefront and red‐black MPI based algorithms are reported for various size problems and number of processors on a sixteen node IBM SP2. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

11.
This article focuses on the effect of both process topology and load balancing on various programming models for SMP clusters and iterative algorithms. More specifically, we consider nested loop algorithms with constant flow dependencies, that can be parallelized on SMP clusters with the aid of the tiling transformation. We investigate three parallel programming models, namely a popular message passing monolithic parallel implementation, as well as two hybrid ones, that employ both message passing and multi-threading. We conclude that the selection of an appropriate mapping topology for the mesh of processes has a significant effect on the overall performance, and provide an algorithm for the specification of such an efficient topology according to the iteration space and data dependencies of the algorithm. We also propose static load balancing techniques for the computation distribution between threads, that diminish the disadvantage of the master thread assuming all inter-process communication due to limitations often imposed by the message passing library. Both improvements are implemented as compile-time optimizations and are further experimentally evaluated. An overall comparison of the above parallel programming styles on SMP clusters based on micro-kernel experimental evaluation is further provided, as well.  相似文献   

12.
Many multi-agent applications based on mobile agents require message propagation among group of agents. A fast and scalable group communication mechanism can considerably improve performance of these applications. Unfortunately, most of the existing approaches do not scale well and disseminate messages slowly when the number of agents grows.In this paper, we propose Sama, a new group communication mechanism, to speed up message delivery for a group of mobile agents on a heterogeneous internetwork. The main contribution of Sama is distribution and parallelization of message propagation in an efficient way to achieve scalability and high-speed of message delivery to group members. Sama uses message dispatcher objects (MDOs), which are stationary agents on each host, to propagate messages concurrently. The proposed mechanism is independent of agent locations and transparently delivers messages to the group using constant number of remote messages. It also transparently recovers from host failures. We also present a Hop-Ring protocol that considerably improves the performance of message dissemination in Sama. Our experimental results show that message propagation in Sama is significantly fast compared to the previously proposed methods.  相似文献   

13.
In irregular all-to-all communication, messages are exchanged between every pair of processors. The message sizes vary from processor to processor and are known only at run time. This is a fundamental communication primitive in parallelizing irregularly structured scientific computations. Our algorithm reduces the total number of message start-ups. It also reduces node contention by smoothing out the lengths of the messages communicated. As compared to the earlier approaches, our algorithm provides deterministic performance and also reduces the buffer space at the nodes during message passing. The performance of the algorithm is characterised using a simple communication model of high-performance computing (HPC) platforms. We show the implementation on T3D and SP2 using C and the message passing interface standard. These can be easily ported to other HPC platforms. The results show the effectiveness of the proposed technique as well as the interplay among the machine size, the variance in message length, and the network interface.  相似文献   

14.
We develop and experiment with a new parallel algorithm to approximate the maximum weight cut in a weighted undirected graph. Our implementation starts with the recent (serial) algorithm of Goemans and Williamson for this problem. We consider several different versions of this algorithm, varying the interior-point part of the algorithm in order to optimize the parallel efficiency of our method. Our work aims for an efficient, practical formulation of the algorithm with close-to-optimal parallelization. We analyze our parallel algorithm in the LogP model and predict linear speedup for a wide range of the parameters. We have implemented the algorithm using the message passing interface (MPI) and run it on several parallel machines. In particular, we present performance measurements on the IBM SP2, the Connection Machine CM5, and a cluster of workstations. We observe that the measured speedups are predicted well by our analysis in the LogP model. Finally, we test our implementation on several large graphs (up to 13,000 vertices), particularly on large instances of the Ising model.  相似文献   

15.
We describe a single-copy mechanism which enables an efficient message passing among UNIX processes on shared memory multiprocessors. A special version of PVMe, IBM's AIX implementation of the PVM message passing programming model, has been built based on this approach. Some preliminary results here reported show the clear advantage of the single-copy with respect to more conventional schemes.  相似文献   

16.
Inclined planes system optimization (IPO) is a new optimization algorithm inspired by the sliding motion dynamic along a frictionless inclined surface. In this paper, with the aim of create a powerful trade-off between the concepts of exploitation and exploration, and rectify the complexity of their structural parameters in the standard IPO, a modified version of IPO (called MIPO) is introduced as an efficient optimization algorithm for digital infinite-impulse-response (IIR) filters model identification. The IIR model identification is a complex and practical challenging problem due to multimodal error surface entanglement that many researches have been reported for it. In this work, MIPO utilizes an appropriate mechanism based on the executive steps of algorithm with the constant damp factors. To do this, unknown filter parameters are considered as a vector to be optimized. In implementation, at first, to demonstrate the effectiveness of the proposed method, 10 well-known benchmark functions have been considered for evaluating and testing. In addition, statistical analysis on the powerfulness, efficiency and applicability of the MIPO algorithm are presented. Obtained results in compared to some other popular methods, confirm the efficiency of the MIPO algorithm that makes the best optimal solutions and has a better performance and acceptable solutions.  相似文献   

17.
A strategy for dealing with communication conflicts that occur in omega networks is presented. The strategy operates by implementing the dual path property of omega networks, which allows the source and destination processors to reverse roles for some of the messages that are being transmitted. For certain message patterns, such a reversal produces a modified message pattern for which the network routes are disjoint. For a circuit switching mode in which the network links and switches are bidirectional, the disjoint set of routes for modified message pattern can be used to achieve conflict-free message transmission for the original message pattern. This strategy is investigated, and an efficient algorithm to determine whether it can be successfully applied to a given message pattern is presented  相似文献   

18.
In order to solve structural multi-damage identification problem, a two-stage damage detection method based on Bayesian theory and immune genetic algorithm (IGA) is presented. First, structural modal strain energy and frequency are considered as two kinds of information sources, and Bayesian theory is utilized to integrate the two information sources and preliminarily identify structural damage locations. After the damaged locations are determined, immune genetic algorithm is used to identify structural damage extent. Considering the search efficiency of the simple IGA is still not very good, some improved strategies are presented, such as culture vaccine, concentration control of the best antibody, and two termination conditions etc. Simulation results show that the two-stage method can precisely identify structural damage locations and extent, and the calculated results of the proposed improved IGA are obviously better than those of both the simple IGA and the genetic algorithm with elitist strategy.  相似文献   

19.
This paper reports on the experience of implementing Shiloach and Vishkin's parallel Maxflow algorithm [7] in Concurrent PROLOG. The major difficulties in this endeavor were understanding the algorithm, which is intricate, and adapting it to the computational model of Concurrent PROLOG. In spite of the difficulties, we were able to produce a Concurrent PROLOG program that implements the algorithm and achieves the expected complexity bounds. The lack of destructive assignment in the logic program's computation model prevents PROLOG from being an efficient implementation language for many sequential algorithms. Our main conclusion is that, in concurrent algorithms, message passing is a powerful substitute for destructive assignment. It is therefore possible to write efficient Concurrent PROLOG implementations of concurrent algorithms.  相似文献   

20.
王波  刘德亮 《计算机应用》2019,39(2):523-527
针对近场源波达方向(DOA)和距离的联合估计问题,提出一种近场迭代自适应算法(NF-IAA)。首先通过划分二维网格表示出近场区域内信源所有可能的位置,每个位置都看作存在一个潜在的信源入射到阵列上,表示出阵列输出的数据模型;然后通过循环迭代利用上一次谱估计的结果构建信号的协方差矩阵,将协方差矩阵的逆作为加权矩阵估计出每个位置对应的潜在信源能量;最后绘制出三维能量谱图,由于只有真实存在的信源能量不为0,因此谱峰对应的位置即为真实存在信源的位置。仿真实验表明在10个快拍条件下,NF-IAA的DOA分辨概率达到了90%,而二维多重信号分类(2D-MUSIC)算法只有40%;当快拍数降至2时,2D-MUSIC算法已经失效,而NF-IAA仍然能很好地分辨出3个入射信源并且准确地估计出位置参数。随着快拍数和信噪比(SNR)的增加,NF-IAA的估计性能一直优于2D-MUSIC。实验结果表明,NF-IAA具备少快拍条件下高精度、高分辨地估计近场源二维位置参数的能力。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号