首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We present a systolic algorithm to generate all the n! permutations of n given items. The computational model used is a linear systolic array consisting of n identical PEs. This algorithm requires n! time steps to solve this problem. Since any PE is identical and executes the same program, it is suitable for VLSI implementation. The correctness of the algorithm is proved. We also consider the ranking and unranking functions of permutations in this parallel algorithm  相似文献   

2.
For an arbitrary n × n matrix A and an n × 1 column vector b, we present a systolic algorithm to solve the dense linear equations Ax = b. An important consideration is that the pivot row can be changed during the execution of our systolic algorithm. The computational model consists of n linear systolic arrays. For 1 ≤ in, the ith linear array is responsible to eliminate the ith unknown variable xi of x. This algorithm requires 4n time steps to solve the linear system. The elapsed time unit within a time step is independent of the problem size n. Since the structure of a PE is simple and the same type PE executes the identical instructions, it is very suitable for VLSI implementation. The design process and correctness proof are considered in detail. Moreover, this algorithm can detect whether A is singular or not.  相似文献   

3.
In this paper, we derive time-minimal systolic arrays for Gaussian elimination and the Algebraic Path Problem (APP) that use a minimal number of processors. For a problem of size n, we obtain an execution time T(n) = 3n −1 using A(n) = n2/4+O(n) processors for Gaussian elimination, and T(n) = 5n −2 and A(n) = n3/+O(n) for the APP.  相似文献   

4.
A parallel sorting algorithm using cooperating heaps in a linear array of processors is presented. It can sort a sequence whose length is much larger than the number of processors. Because the output begins one step after all the items have been input, sorting n items requires 2n + 1 steps. Two independent modifications of the algorithm are possible; one tries to reduce the number of processors used, and the other can sort more items on the same array.  相似文献   

5.
With the advent of VLSI it has become possible to map parallel algorithms for compute-bound problems directly on silicon. Systolic architecture is very good candidate for VLSI implementation because of its regular and simple design, and regular communication pattern. In this paper, a systolic algorithm and corresponding systolic architecture, a linear systolic array, for the scanline-based hidden surface removal problem in three-dimensional computer graphics have been proposed. The algorithm is based on the concept of sample spans or intervals. The worst case time taken by the algorithm is O(n), n being the number of segments in a scanline. The time taken by the algorithm for a given scene depends on the scene itself, and on an average considerable improvement over the worst case behaviour is expected. A pipeline scheme for handling the I/O process has also been proposed which is suitable for VLSI implementation of the algorithm.  相似文献   

6.
The paper presents an optimal systolic array architecture for rapid solution of dense systems of linear equations. The array solves a system of size n×n in 4n + 1 time units including I/0 time. Data communications are strictly local and the processing elements (PEs) are simple. The complete three-phase solution algorithm is executed on a single array, employing about 3n2/2 PEs without any need for costly inter-phase I/0. Due to a novel data steering mechanism, the three algorithmic phases are maximally overlapped. Design optimality is established using systolic precedence diagrams. It is also shown that merging the functions of two adjacent PEs into a single PE is possible resulting in maximal PE utilization. An interesting result regarding cascading phase-optimal arrays is obtained.  相似文献   

7.
A systolic array is presented to improve numerical approximations to integrals using Richardson's extrapolation procedure in the form of Romberg integration. Two designs are presented, the first is an intuitive linear systolic array, the second, a systolic ring using approximately 1/3 of the cells of the first array. Both systolic arrays have a computation time of 3n cycles, which is a significant improvement on the O(n2) steps required to construct the extrapolation table sequentially.  相似文献   

8.
A parallel algorithm for generating all combinations of m items out of n given items in lexicographic order is presented. The computational model is a linear systolic array consisting of m identical processing elements. It takes (mn) time-steps to generate all the (mn) combinations. Since any processing element is identical and executes the same procedure, it is suitable for VLSI implementation. Based on mathematical induction, such algorithm is proved to be correct.  相似文献   

9.
Two parallel algorithms for finding minimum spanning forest (MSF) of a weighted undirected graph on hypercube computers, consisting of a fixed number of processors, are presented. One algorithm is suited for sparse graphs, the other for dense graphs. Our design strategy is based on successive elimination of non-MSF edges. The input graph is partitioned equally among different processors, which then repeatedly eliminate non-MSF edges and merge results to gradually construct the desired MSF of the entire graph. Low communication overhead is achieved by restricting the message-flow to between the neighboring processors in the hypercube topology. The correctness of our approach is due to a theorem which states that with total-ordered edges, if an edge of an arbitrary subgraph does not belong to its MSF, then it does not belong to the MSF of the entire graph. For a graph of n vertices and m edges, our first algorithm finds an MSF in O(m log m)/p) time using p processors for p ≤ (mlog m)/n(1+log(m/n)). The second algorithm, efficient for dense graphs, requires O(n2/p) time for pn/log n.  相似文献   

10.
《Parallel Computing》1990,15(1-3):133-145
This paper describes a parallel algorithm for the LU decomposition of band matrices using Gaussian elimination. The matrix dimension is n × n with 2r−1 diagonals. In the case when 1 r 2 p an optimal number of the processors, , is determined according to the equation . When 2 p r n a number of processors, p, statged by Veldhorst is adopted (see [7]). For band matrix with 2r-1 diagonals (1 r 2p) the task scheduling procedure with the aim to obtain maximal parallelism in system operation, i.e. good load balancing, is defined. The architecture of the system is of MIMD type. The connection between the processors is realised via a common bus. Communication and synchronization is performed by message passing technique.  相似文献   

11.
In this paper, we propose a new I/O overhead free Givens rotations based parallel algorithm for solving a system of linear equations. The algorithm uses a new technique called two-sided elimination and requires an N×(N+1) mesh-connected processor array to solve N linear equations in (5N-log N-4) time steps. The array is well suited for VLSI implementation as identical processors with simple and regular interconnection pattern are required. We also describe a fault-tolerant scheme based on an algorithm based fault tolerance (ABFT) approach. This scheme has small hardware and time overhead and can tolerate up to N processor failures  相似文献   

12.
Our approach combines the method of inexact steepest descent with the method of contractor directions to obtain an algorithm for solving systems of linear equations. In order to enhance the scope of applicability, we consider an iterative method with variable step-size iterations. We prove the convergence and given an error estimate for our method.

The algorithm is well-suited for parallel computation. In fact, for systems with m equations and n unknowns, each iteration may be computed in parallel time O(log m + log n), on an EREW PRAM with O(mn) processors.  相似文献   


13.
The paper describes the implementation of the Successive Overrelaxation (SOR) method on an asynchronous multiprocessor computer for solving large, linear systems. The parallel algorithm is derived by dividing the serial SOR method into noninterfering tasks which are then combined with an optimal schedule of a feasible number of processors. The important features of the algorithm are: (i) achieves a speedup Sp O(N/3) and an efficiency Ep 2/3 using P = [N/2] processors, where N is the number of the equations, (ii) contains a high level of inherent parallelism, whereas on the other hand, the convergence theory of the parallel SOR method is the same as its sequential counterpart and (iii) may be modified to use block methods in order to minimise the overhead due to communication and synchronisation of the processors.  相似文献   

14.
Parallel algorithms for solving the satisfaction problem of non-trivial functional and multivalued data dependencies (FDs and MVDs) in a relation of N tuples by M processors are developed in this paper. Algorithms performing, in a parallel manner, batch or interactive checking of these data dependencies are also discussed. The M processors are organized as a linear systolic array. The time complexities of the first two algorithms for solving the FD satisfaction problem under M N are both O(N), and that of Algorithm (3) or (4) for solving the FD or MVD satisfaction problem under N M is O(N2/M). The latter complexity reduced to O(N) if N = M and is at least not worse than O(N log N) if N = M (N/log N).  相似文献   

15.
A new tridiagonal Toeplitz linear system (TTLS) solver is proposed. The solver first decomposes an n-dimensional strictly diagonally dominant TTLS equation into a number of m-dimensional subsystems employing a modified Gaussian elimination method. An analytic solution of a continued fraction is obtained to derive the solver. The solver based on the modified Gaussian elimination method fully exploits parallelism. Computation and communication complexities of the proposed algorithm are all shown to be O(n/m).  相似文献   

16.
A longest common subsequence (LCS) of two strings is a common subsequence of the two strings of maximal length. The LCS problem is to find an LCS of two given strings and the length of the LCS (LLCS). In this paper, a fast linear systolic algorithm that improves on previous systolic algorithms for solving the LCS problem is presented. For two given strings of length m and n, where m n, the LLCS and an LCS can be found in m + 2n – 1 time steps. This algorithm achieves the tight lower bound of the time complexity under the situation where symbols are input sequentially to a linear array of n processors. The systolic algorithm can be modified to take only m + n steps on multicomputers by using the scatter operation.  相似文献   

17.
A systolic array for the solution of the assignment problem is presented. The algorithm requires O(n2) time on an orthogonally connected array of (n + 2) * (n + 2) cells consisting of simple adders and control logic. The design is area efficient and incorporates the new concept of a Systolic Control Ring (SCR) to generate the necessary systolic wavefronts in any orientation within the design, while special cells are positioned only on the periphery of the design. The design was simulated and tested by an OCCAM program.  相似文献   

18.
A parallel two-list algorithm for the knapsack problem   总被引:10,自引:0,他引:10  
An n-element knapsack problem has 2n possible solutions to search over, so a task which can be accomplished in 2″ trials if an exhaustive search is used. Due to the exponential time in solving the knapsack problem, the problem is considered to be very hard. In the past decade, much effort has been done in order to find techniques which could lead to practical algorithms with reasonable running time. In 1994, Chang et al. proposed a brilliant parallel algorithm, which needs O(2n/8) processors to solve the knapsack problem in O(2n/2) time; that is, the cost of Chang et al.'s parallel algorithm is O(25n/8). In this paper, we propose a parallel algorithm to improve Chang et al.'s parallel algorithm by reducing the time complexity to be O(23n/8) under the same O(2n/8) processors available. Thus, the proposed parallel algorithm has a cost of O(2n/2). It is an improvement over previous literature. We believe that the proposed parallel algorithm is pragmatically feasible at the moment when multiprocessor systems become more and more popular.  相似文献   

19.
We first show how to transform the solution of an n × n tridiagonal system into suffix computations of continued fractions. Then a parallel substitution scheme is introduced to compute the suffix values. The derived parallel algorithm allows the tridiagonal system to be solved in O(log n) time on an unshuffle network with Θ(n /log n) processors. It is cost-optimal in the sense that processor number times execution time is minimized. Our solver is conceptually simple and easy for implementation.  相似文献   

20.
We substantially improve the known algorithms for approximating all the complex zeros of an nth degree polynomial p(x). Our new algorithms save both Boolean and arithmetic sequential time, versus the previous best algorithms of Schönhage [1], Pan [2], and Neff and Reif [3]. In parallel (NC) implementation, we dramatically decrease the number of processors, versus the parallel algorithm of Neff [4], which was the only NC algorithm known for this problem so far. Specifically, under the simple normalization assumption that the variable x has been scaled so as to confine the zeros of p(x) to the unit disc x : |x| ≤ 1, our algorithms (which promise to be practically effective) approximate all the zeros of p(x) within the absolute error bound 2b, by using order of n arithmetic operations and order of (b + n)n2 Boolean (bitwise) operations (in both cases up to within polylogarithmic factors). The algorithms allow their optimal (work preserving) NC parallelization, so that they can be implemented by using polylogarithmic time and the orders of n arithmetic processors or (b + n)n2 Boolean processors. All the cited bounds on the computational complexity are within polylogarithmic factors from the optimum (in terms of n and b) under both arithmetic and Boolean models of computation (in the Boolean case, under the additional (realistic) assumption that n = O(b)).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号