首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
We present three parallel sorting algorithms suitable for implementation on tightly coupled multiprocessors and compare their performance on the Denelcor HEP. Two of the algorithms implemented—parallel Shellsort and quickmerge—are new. Shellsort is amenable to parallelization; however, since Shellsort has higher complexity than quicksort, parallel Shellsort is inferior to parallel quicksort. A second new parallel algorithm, called quickmerge, is based upon both quicksort and mergesort. Our implementation of quickmerge achieves significantly higher speedup than occur implementation of parallel quicksort.  相似文献   

2.
This paper explores the macro data flow approach for solving numerical applications on distributed memory systems. We discuss the problems of this approach with a sophisticated ‘real life’ algorithm—the adaptive full multigrid method.

It is shown that the nonnumeric parts of the algorithm—the initialization, the termination and the mapping of processes to processors—are very important for the overall performance.

To avoid unnecessary global synchronization points we propose to use the distributed supervisors. We compare this solution with more centralized algorithms. The performance evaluation is done for nearest neighbour and bus connected multiprocessors using a simulation systems.  相似文献   


3.
基于种群迭代搜索的智能优化算法在农业、交通、工业等很多领域都取得了广泛的应用.但是该类算法迭代寻优的特点使其求解效率通常较低,很难应用到大规模、高维或实时性要求较高的复杂优化问题中.随并行分布式技术的发展,国内外很多学者开始着手研究智能优化算法的并行化.本文首要介绍了并行智能优化算法的基本概念;其次从协同机制、并行模型以及硬件结构3个维度综述了几类常见的并行智能优化算法,详细分析阐述了它们优点及不足;最后对并行智能优化算法的未来研究进行了展望.  相似文献   

4.
The field of parallel metaheuristics is continuously evolving as a result of new technologies and needs that researchers have been encountering. In the last decade, new models of algorithms, new hardware for parallel execution/communication, and new challenges in solving complex problems have been making advances in a fast manner. We aim to discuss here on the state of the art, in a summarized manner, to provide a solution to deal with some of the growing topics. These topics include the utilization of classic parallel models in recent platforms (such as grid/cloud architectures and GPU/APU). However, porting existing algorithms to new hardware is not enough as a scientific goal, therefore researchers are looking for new parallel optimization and learning models that are targeted to these new architectures. Also, parallel metaheuristics, such as dynamic optimization and multiobjective problem resolution, have been applied to solve new problem domains in past years. In this article, we review these recent research areas in connection to parallel metaheuristics, as well as we identify future trends and possible open research lines for groups and PhD students.  相似文献   

5.
A new method of classification for numerical stability of parallel algorithms is proposed based on the theoretical foundation of forward error analysis. It partitions the algorithms according to their asymptotic stability—a measure introduced to relate the limiting behavior of the stability to the size of the problem. Using this method, the stability aspect of the pipelined solution technique for first-order and second-order linear recurrences—the core of a tridiagonal linear equation solver—is studied. In particular, it shows that the pipelined solution method of the first-order linear recurrences has the same degree of stability as the commonly used sequential evaluation algorithms. The stability problems of sequential and pipelined solution methods of the second-order linear recurrences are also studied.  相似文献   

6.
The complexity of performing matrix computations, such as solving a linear system, inverting a nonsingular matrix or computing its rank, has received a lot of attention by both the theory and the scientific computing communities. In this paper we address some “nonclassical” matrix problems that find extensive applications, notably in control theory. More precisely, we study the matrix equations AX + XAT = C and AXXB = C, the “inverse” of the eigenvalue problem (called pole assignment), and the problem of testing whether the matrix [B ABAn−1 B] has full row rank. For these problems we show two kinds of PRAM algorithms: on one side very fast, i.e. polylog time, algorithms and on the other side almost linear time and processor efficient algorithms. In the latter case, the algorithms rely on basic matrix computations that can be performed efficiently also on realistic machine models.  相似文献   

7.
Probability-one homotopy methods are a class of methods for solving nonlinear systems of equations that are globally convergent from an arbitrary starting point. The essence of all such algorithms is the construction of an appropriate homotopy map ρa(λ, x) and subsequent tracking of some smooth curve γ in the zero set of the homotopy map. Tracking a homotopy curve involves finding the unit tangent vector at different points along the zero curve, which amounts to calculating the kernel of the n × (n + 1) Jacobian matrix Dρa(λ, x). While computing the tangent vector is just one part of the curve tracking algorithm, it can require a significant percentage of the total tracking time. This note presents computational results showing the performance of several different parallel orthogonal factorization/triangular system solving algorithms for the tangent vector computation on a hypercube.  相似文献   

8.
Mackworth and Freuder have analyzed the time complexity of several constraint satisfaction algorithms.(1) Mohr and Henderson have given new algorithms, AC-4 and PC-3, for arc and path consistency, respectively, and have shown that the arc consistency algorithm is optimal in time complexity and of the same order space complexity as the earlier algorithms.(2) In this paper, we give parallel algorithms for solving node and arc consistency. We show that any parallel algorithm for enforcing are consistency in the worst case must have O(na) sequential steps, wheren is number of nodes, anda is the number of labels per node. We give several parallel algorithms to do arc consistency. It is also shown that they all have optimal time complexity. The results of running the parallel algorithms on a BBN Butterfly multiprocessor are also presented.This work was partially supported by NSF Grants MCS-8221750, DCR-8506393, and DMC-8502115.  相似文献   

9.
That the influence of the PRAM model is ubiquitous in parallel algorithm design is as clear as the fact that it is technologically infeasible for the forseeable future. The current generation of parallel hardware prominently features distributed memory and high‐performance interconnection networks—very much the antithesis of the shared memory required for the PRAM model. It has been shown that, in spite of communication costs, for some problems very fast parallel algorithms are available for distributed‐memory machines—from embarassingly parallel problems to sorting and numerical analysis. In contrast it is known that for other classes of problem PRAM‐style shared‐memory simulation on a distributed‐memory machine can, in theory, produce solutions of comparable performance to the best possible for such architectures. The Bulk Synchronous Parallel (BSP) model accurately represents most parallel machines—theoretical and actual—in an execution and cost model. We introduce a scalable portable PRAM realization appropriate for BSP computers and a methodology for usage. Our system is fast and built upon the familiar sequential C++ coupled with the new standard BSP library of parallel computation and communication primitives. It is portable to and predictable on a vast number of parallel computers including workstation clusters, a 256‐processor Cray T3D, an 8‐node IBM SP/2 and a 4‐node shared‐memory SGI Power Challenge machine. Our approach achieves simplicity of programming over direct‐mode BSP programming for reasonable overhead cost. We objectively compare optimized BSP and PRAM algorithms implemented with our C++ PRAM library and provide encouraging experimental results for our new style of programming. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

10.
In this paper, we propose some algorithms to solve the topological ordering problem, the breadth-first search problem and the connected component problem under the broadcast communication model. The basic idea of our algorithms is to divide a graph into several layers. Only after all vertices in one layer are processed, we begin to process the vertices in another layer. Thus, the number of broadcast conflicts is reduced. We also propose a randomized conflict resolution scheme to resolve conflicts. We show that the average time complexity of our algorithms is Θ(n), where n is the number of available processors and also the number of vertices in the graph.  相似文献   

11.
Threads provides a mechanism for simulating the execution of parallel algorithms on a simplified model of a shared-memory multiprocessor. The algorithms can be expressed in a high-level block-structured language, which supports multiple threads of execution within a common body of program code. Results show an ability to achieve good speedup for small problems using algorithms derived by simple modifications of sequential algorithms. As well, a sibling thread synchronisation feature provides the basis for the synchronous execution of threads. k-parallel algorithms tailored to the machine size and implemented as synchronously executing iterations, can provide near linear speedup as the problem size is increased. The techniques described in this paper seem to promise an effective synchronous execution mode for shared-memory MIMD architectures.  相似文献   

12.
We substantially improve the known algorithms for approximating all the complex zeros of an nth degree polynomial p(x). Our new algorithms save both Boolean and arithmetic sequential time, versus the previous best algorithms of Schönhage [1], Pan [2], and Neff and Reif [3]. In parallel (NC) implementation, we dramatically decrease the number of processors, versus the parallel algorithm of Neff [4], which was the only NC algorithm known for this problem so far. Specifically, under the simple normalization assumption that the variable x has been scaled so as to confine the zeros of p(x) to the unit disc x : |x| ≤ 1, our algorithms (which promise to be practically effective) approximate all the zeros of p(x) within the absolute error bound 2b, by using order of n arithmetic operations and order of (b + n)n2 Boolean (bitwise) operations (in both cases up to within polylogarithmic factors). The algorithms allow their optimal (work preserving) NC parallelization, so that they can be implemented by using polylogarithmic time and the orders of n arithmetic processors or (b + n)n2 Boolean processors. All the cited bounds on the computational complexity are within polylogarithmic factors from the optimum (in terms of n and b) under both arithmetic and Boolean models of computation (in the Boolean case, under the additional (realistic) assumption that n = O(b)).  相似文献   

13.
Multigrid methods are powerful techniques to accelerate the solution of computationally-intensive problems arising in a broad range of applications. Used in conjunction with iterative processes for solving partial differential equations, multigrid methods speed up iterative methods by moving the computation from the original mesh covering the problem domain through a series of coarser meshes. But this hierarchical structure leaves domain-parallel versions of the standard multigrid algorithms with a deficiency of parallelism on coarser grids. To compensate, several parallel multigrid strategies with more parallelism, but also more work, have been designed. We examine these parallel strategies and compare them to simpler standard algorithms to try to determine which techniques are more efficient and practical. We consider three parallel multigrid strategies: (1) domain-parallel versions of the standard V-cycle and F-cycle algorithms; (2) a multiple coarse grid algorithm, proposed by Fredrickson and McBryan, which generates several coarse grids for each fine grid; and (3) two Rosendale algorithm, which allow computation on all grids simultaneously. We study an elliptic model problem on simple domains, discretized with finite difference techniques on block-structured meshes in two or three dimensions with up to 106 or 109 points, respectively. We analyze performance using three models of parallel computation: the PRAM and two bridging models. The bridging models reflect the salient characteristics of two kinds of parallel computers: SIMD fine-grain computers, which contain a large number of small (bitserial) processors, and SPMD medium-grain computers, which have a more modest number of powerful (single chip) processors. Our analysis suggests that the standard algorithms are substantially more efficient than algorithms utilizing either parallel strategy. Both parallel strategies need too much extra work to compensate for their extra parallelism. They require a highly impractical number of processors to be competitive with simpler, standard algorithms. The analysis also suggests that the F-cycle, with the appropriate optimization techniques, is more efficient than the V-cycle under a broad range of problem, implementation, and machine characteristics, despite the fact that it exhibits even less parallelism than the V-cycle. Research at Princeton University partially supported by the National Science Foundation, Grant No. CCR-8920505, and the Office of Naval Research, Contract No. N0014-91-J-1463.  相似文献   

14.
A parallel two-list algorithm for the knapsack problem   总被引:10,自引:0,他引:10  
An n-element knapsack problem has 2n possible solutions to search over, so a task which can be accomplished in 2″ trials if an exhaustive search is used. Due to the exponential time in solving the knapsack problem, the problem is considered to be very hard. In the past decade, much effort has been done in order to find techniques which could lead to practical algorithms with reasonable running time. In 1994, Chang et al. proposed a brilliant parallel algorithm, which needs O(2n/8) processors to solve the knapsack problem in O(2n/2) time; that is, the cost of Chang et al.'s parallel algorithm is O(25n/8). In this paper, we propose a parallel algorithm to improve Chang et al.'s parallel algorithm by reducing the time complexity to be O(23n/8) under the same O(2n/8) processors available. Thus, the proposed parallel algorithm has a cost of O(2n/2). It is an improvement over previous literature. We believe that the proposed parallel algorithm is pragmatically feasible at the moment when multiprocessor systems become more and more popular.  相似文献   

15.
We consider a range of single machine and identical parallel machine pre-emptive scheduling models with controllable processing times. For each model we study a single criterion problem to minimize the compression cost of the processing times subject to the constraint that all due dates should be met. We demonstrate that each single criterion problem can be formulated in terms of minimizing a linear function over a polymatroid, and this justifies the greedy approach to its solution. A unified technique allows us to develop fast algorithms for solving both single criterion problems and bicriteria counterparts.  相似文献   

16.
基于机群架构的并行数据库实现技术研究   总被引:1,自引:1,他引:0  
在总结了现有并行数据库实现模型的基础上,基于"半重写变换"模型[1]实现了一个并行数据库系统的原型.通过对数据划分/重划分、并行选择、并行排序、并行连接等关键操作的实验分析,指出了.半重写变换"模型存在的缺陷,并提出了一种混合式的改进模型.从理论上说,在机群架构下实现并行数据库系统,这种混合模型较单一模型更有优势.  相似文献   

17.
Rapid advances in semiconductor technology have made it possible to build massively parallel processors. In addition, optical 3D storage and optical interconnections open a new opportunity due to inherent massive parallelism and non-interference of light beams. The approaches used in current parallel database research cannot take advantage of massive parallelism which can be provided by the emerging technologies, due to speedup and scaleup limitations.

In this paper, we present a computational paradigm for database machines which takes advantage of the opening opportunity for massive parallelism and discuss the validity and feasibility of the paradigm. The approach we take is based on associative computing and fine grained data parallelism which allow unlimited speedup and scaleup. Additionally, an asymptotically fast data-parallel join algorithm, which can efficiently deal with the joins in which multiple relations share a common join field, is presented. The algorithm is based on parallel sorting and parallel binary search, and performs a multiway join in Os + Σ log r) where s is the cost of sorting an intermediate relation and r is the size of an input relation. The cost s of sorting is kept minimum by the algorithm.  相似文献   


18.
In the last two decades several NC algorithms for solving basic linear algebraic problems have appeared in the literature. This interest was clearly motivated by the emergence of a parallel computing technology and by the wide applicability of matrix computations. The traditionally adopted computation model, however, ignores the arithmetic aspects of the applications, and no analysis is currently available demonstrating the concrete feasibility of many of the known fast methods. In this paper we give strong evidence to the contrary, on the sole basis of the issue of robustness, indicating that some theoretically brilliant solutions fail the severe test of the ``Engineering of Algorithms.' We perform a comparative analysis of several well-known numerical matrix inversion algorithms under both fixed- and variable-precision models of arithmetic. We show that, for most methods investigated, a typical input leads to poor numerical performance, and that in the exact-arithmetic setting no benefit derives from conditions usually deemed favorable in standard scientific computing. Under these circumstances, the only algorithm admitting sufficiently accurate NC implementations is Newton's iterative method, and the word size required to guarantee worst-case correctness appears to be the critical complexity measure. Our analysis also accounts for the observed instability of the considered superfast methods when implemented with the same floating-point arithmetic that is perfectly adequate for the fixed-precision approach. Received March 28, 1998; revised February 2, 1999, and April 21, 1999.  相似文献   

19.
We consider the calculation, on a local memory parallel computer, of all the zeros of an n th degree polynomial Pn(x) which has real coefficients. We describe a generic parallel algorith, which approximates all the zeros simultaneously and we give three specific examples of this algorithm which have orders of convergence two, three and four. We report extensive numerical tests of the algorithms; the fourth order algorithm is not robust, with many failures to convergence, whereas the other two algorithms are reliable and display very respectable parallel speedups for higher degree polynomials.  相似文献   

20.
Effective design of parallel matrix multiplication algorithms relies on the consideration of many interdependent issues based on the underlying parallel machine or network upon which such algorithms will be implemented, as well as, the type of methodology utilized by an algorithm. In this paper, we determine the parallel complexity of multiplying two (not necessarily square) matrices on parallel distributed-memory machines and/or networks. In other words, we provided an achievable parallel run-time that can not be beaten by any algorithm (known or unknown) for solving this problem. In addition, any algorithm that claims to be optimal must attain this run-time. In order to obtain results that are general and useful throughout a span of machines, we base our results on the well-known LogP model. Furthermore, three important criteria must be considered in order to determine the running time of a parallel algorithm; namely, (i) local computational tasks, (ii) the initial data layout, and (iii) the communication schedule. We provide optimality results by first proving general lower bounds on parallel run-time. These lower bounds lead to significant insights on (i)–(iii) above. In particular, we present what types of data layouts and communication schedules are needed in order to obtain optimal run-times. We prove that no one data layout can achieve optimal running times for all cases. Instead, optimal layouts depend on the dimensions of each matrix, and on the number of processors. Lastly, optimal algorithms are provided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号