首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We describe a single-copy mechanism which enables an efficient message passing among UNIX processes on shared memory multiprocessors. A special version of PVMe, IBM's AIX implementation of the PVM message passing programming model, has been built based on this approach. Some preliminary results here reported show the clear advantage of the single-copy with respect to more conventional schemes.  相似文献   

2.
We present three parallel sorting algorithms suitable for implementation on tightly coupled multiprocessors and compare their performance on the Denelcor HEP. Two of the algorithms implemented—parallel Shellsort and quickmerge—are new. Shellsort is amenable to parallelization; however, since Shellsort has higher complexity than quicksort, parallel Shellsort is inferior to parallel quicksort. A second new parallel algorithm, called quickmerge, is based upon both quicksort and mergesort. Our implementation of quickmerge achieves significantly higher speedup than occur implementation of parallel quicksort.  相似文献   

3.
This paper explores the macro data flow approach for solving numerical applications on distributed memory systems. We discuss the problems of this approach with a sophisticated ‘real life’ algorithm—the adaptive full multigrid method.

It is shown that the nonnumeric parts of the algorithm—the initialization, the termination and the mapping of processes to processors—are very important for the overall performance.

To avoid unnecessary global synchronization points we propose to use the distributed supervisors. We compare this solution with more centralized algorithms. The performance evaluation is done for nearest neighbour and bus connected multiprocessors using a simulation systems.  相似文献   


4.
A discussion is presented of two ways of mapping the cells in a two-dimensional area of a chip onto processors in an n-dimensional hypercube such that both small and large cell moves can be applied. Two types of move are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support such a parallel cost evaluation. A novel tree broadcasting strategy is presented for the hypercube that is used extensively in the algorithm for updating cell locations in the parallel environment. A dynamic parallel annealing schedule is proposed that estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control. The performance on an Intel iPSC-2/D4/MX hypercube is reported  相似文献   

5.
In this paper new parallel algorithms to solve the Lyapunov equations for the Cholesky factor using Hammarling's method on message passing multiprocessors are described. These algorithms are based on previous work carried out on the parallel solution of triangular linear systems by using row block data distribution and a wavefront of antidiagonals. The algorithms are theoretically analyzed and experimental results obtained on an SGI Power Challenge and a Cray T3D are presented.  相似文献   

6.
Two classes of algorithms for equation solving are presented and analyzed. These algorithms have been devised in recent years because of the computational facility of the multiprocessor. The first class consists of parallel search methods while the second class consists of asynchronous methods. The first class of methods are fail safe. That is they always provide an approximation to the root as well as the smallest possible interval (for the work done) guaranteed to contain the root. The second class frees the intrinsically interlocked nature of the more complicated forms of algorithms designed for multiprocessors by omitting the synchrony usually demanded in computation.  相似文献   

7.
8.
《Parallel Computing》2002,28(7-8):1079-1093
Vector quantization (VQ) is a widely used algorithm in speech and image data compression. One of the problems of the VQ methodology is that it requires large computation time especially for large codebook size. This paper addresses two issues. The first deals with the parallel construction of the VQ codebook which can drastically reduce the training time. A master/worker parallel implementation of a VQ algorithm is proposed. The algorithm is executed on the DM-MIMD Alex AVX-2 machine using a pipeline architecture. The second issue deals with the ability of accurately predicting the machine performance. Using communication and computation models, a comparison between expected and real performance is carried out. Results show that the two models can accurately predict the performance of the machine for image data compression. Analysis of metrics normally used in parallel realization is conducted.  相似文献   

9.
The matrix sign function is the basis of a parallel algorithm for solving the generalized algebraic Riccati equation. Three forms of the algorithm were implemented and tested on a distributed memory hypercube multiprocessor. Performance results indicate that the method is an excellent means of solving large-scale problems on a parallel computer.  相似文献   

10.
Improved algorithms for finding denominators of rational solutions of linear difference andq-difference equations with polynomial coefficients are proposed. The improved efficiency of these algorithms is achieved as a result of a more efficient implementation of the Abramov algorithm (due to the use of the Man and Wright algorithm for calculating the dispersion, which is extended for the case ofq-dispersion) and of the improvement of this algorithm by using an additional procedure for minimizing the degree of the denominator (similar to the Migushov algorithm). The case of difference equations is analyzed in detail, whereasq-difference equations are considered by analogy with the first case. The algorithms described were implemented in Maple V.  相似文献   

11.
We introduce a Steffensen-type method (STTM) for solving nonlinear equations in a Banach space setting. Then, we present a local convergence analysis for (STTM) using recurrence relations. Numerical examples validating our theoretical results are also provided in this study to show that (STTM) is faster than other methods [I.K. Argyros, J. Ezquerro, J.M. Gutiérrez, M. Hernández, and S. Hilout, On the semilocal convergence of efficient Chebyshev-Secant-type methods, J. Comput. Appl. Math. 235 (2011), pp. 3195–3206; J.A. Ezquerro and M.A. Hernández, An optimization of Chebyshev's method, J. Complexity 25 (2009), pp. 343–361] using similar convergence conditions.  相似文献   

12.
A new algorithm is proposed to solve the fuzzy relation equation PQ=R with max–min composition and max–product composition. The algorithm operates systematically and graphically on a matrix pattern to get all the solutions of P. An example is given to illustrate its effectiveness.  相似文献   

13.
The proposed algorithm represents an efficient parallel implementation of the Fedorenko multigrid method and is intended for solving three-dimensional elliptic equations. Scalability is provided by the use of the Chebyshev iterations for solution of the coarsest grid equations and for construction of the smoothing procedures. The calculation results are given: they confirm the efficiency of the algorithm and scalability of the parallel code.  相似文献   

14.
When the matrix A is in companion form, the essential step in solving the Lyapunov equation PA + ATP = −Q involves a linear n × n system for the first column of the solution matrix P. The complex dependence on the data matrices A and Q renders this system unsuitable for actual computation. In this paper we derive an equivalent system which exhibits simpler dependence on A and Q as well as improved complexity and robustness characteristics. A similar results is obtained also for the Stein equation PATPA = Q.  相似文献   

15.
《国际计算机数学杂志》2012,89(7):1538-1554
This paper suggests four different methods to solve nonlinear integro-differential equations, namely, He's variational iteration method, Adomian decomposition method, He's homotopy perturbation method and differential transform method. To assess the accuracy of each method, a test example with known exact solution is used. The study outlines significant features of these methods as well as sheds some light on advantages of one method over the other. The results show that these methods are very efficient, convenient and can be adapted to fit a larger class of problems. The comparison reveals that, although the numerical results of these methods are similar, He's homotopy perturbation method is the easiest, the most efficient and convenient. Moreover, we applied modified forms of He's variational iteration method and differential transform method to solve a mathematical model, which describes the accumulated effect of toxins on populations living in a closed system.  相似文献   

16.
N-Queens problem derives three variants: obtaining a specific solution, obtaining a set of solutions and obtaining all solutions. The purpose of the variant I is to find a constructive solution, which has been solved. Variant III is aiming to find all solutions and the largest number of queens currently being resolved is 26. Variant II whose purpose is to obtain a set of solutions for larger-scale problems relies on various intelligent algorithms. In this paper, we use a master-slave model genetic algorithm that combines the idea of the evolutionary algorithm and simulated annealing algorithm to solve Variant III, and use a parallel fitness function based on compute unified device architecture. Experimental results show that our scheme achieved a maximum 60-fold speedup over the single-CPU counterpart. On this basis, a two-level parallel genetic algorithm based on the island model and master-slave model is implemented on the GPU cluster by using message passing interface technology. Using two-node and three-node GPU cluster, speedup of 1.46 and 2.01 are obtained on average over single-node, respectively. Compared with the sequential genetic algorithm, the two-level parallel genetic algorithm makes full use of the parallel computing power of GPU cluster in solving N-Queen variant II and improves the performance by 99.19 times in the best case.  相似文献   

17.
提出了一种在分布式环境下求解非线性方程组的并行算法,该算法将Newton迭代法中的Jacobi矩阵进行适当的分裂,使得Newton迭代法具有很好的并行性。并在理论上进行了收敛性分析。在HP rx2600集群上进行的数值实验结果表明并行效率达70%以上。  相似文献   

18.
Summary This paper introduces a rather specific metalgorithm (or meta-program) for a class of algorithms for adaptive quadrature on parallel (MIMD) computers. This class includes all the current approaches to adaptive quadrature. The main result is that any member of this metalgorithm satisfies the conditions of a traditional numerical analysis convergence theorem from [2]. The algorithm structure in this metalgorithm is specified in some detail and 32 Attributes are assumed. These Attributes and structure serve to guide the design of particular algorithms. They also facilitate establishing algorithm correctness by providing a detailed set of algorithm properties (most of which are like assertions in program proving) that are sufficient for correctness.  相似文献   

19.
该文提出了分布式环境下求解周期块三对角线性方程组的一种并行算法,该算法通过对系数矩阵进行一次预处理后,充分利用系数矩阵结构的特殊性,使算法只在相邻处理机间通信两次。并从理论上给出了算法收敛的一个充分条件。最后,在HPrx2600集群上进行了数值试验,结果表明,实算与理论是一致的,并行性也很好。  相似文献   

20.
This paper is a survey on universal algorithms for solving the matrix Bellman equations over semirings and especially tropical and idempotent semirings. However, original algorithms are also presented. Some applications and software implementations are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号