期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel generation of permutations on systolic arrays

Chau-Jy Lin 《Parallel Computing》1990,15(1-3):267-276

We present a systolic algorithm to generate all the n! permutations of n given items. The computational model used is a linear systolic array consisting of n identical PEs. This algorithm requires n! time steps to solve this problem. Since any PE is identical and executes the same program, it is suitable for VLSI implementation. The correctness of the algorithm is proved. We also consider the ranking and unranking functions of permutations in this parallel algorithm 相似文献

2.

A systolic algorithm for solving dense linear systems

Chau-Jy Lin 《Computers & Mathematics with Applications》1996,32(12):77-91

For an arbitrary n × n matrix A and an n × 1 column vector b, we present a systolic algorithm to solve the dense linear equations Ax = b. An important consideration is that the pivot row can be changed during the execution of our systolic algorithm. The computational model consists of n linear systolic arrays. For 1 ≤ i ≤ n, the i^th linear array is responsible to eliminate the i^th unknown variable x_i of x. This algorithm requires 4n time steps to solve the linear system. The elapsed time unit within a time step is independent of the problem size n. Since the structure of a PE is simple and the same type PE executes the identical instructions, it is very suitable for VLSI implementation. The design process and correctness proof are considered in detail. Moreover, this algorithm can detect whether A is singular or not. 相似文献

3.

Spacetime-minimal systolic arrays for Gaussian elimination and the Algebraic path problem

Abdelhamid Benaini Yves Robert 《Parallel Computing》1990,15(1-3):211-225

In this paper, we derive time-minimal systolic arrays for Gaussian elimination and the Algebraic Path Problem (APP) that use a minimal number of processors. For a problem of size n, we obtain an execution time T(n) = 3n −1 using A(n) = n²/4+O(n) processors for Gaussian elimination, and T(n) = 5n −2 and A(n) = n³/+O(n) for the APP. 相似文献

4.

Parallel sorting with cooperating heaps in a linear array of processors

Yen-Chun Lin Ferng-Ching Lin 《Parallel Computing》1990,16(2-3):273-278

A parallel sorting algorithm using cooperating heaps in a linear array of processors is presented. It can sort a sequence whose length is much larger than the number of processors. Because the output begins one step after all the items have been input, sorting n items requires 2n + 1 steps. Two independent modifications of the algorithm are possible; one tries to reduce the number of processors used, and the other can sort more items on the same array. 相似文献

5.

A systolic algorithm for hidden surface removal

S. R. Das N. H. Vaidya L. M. Patnaik P. C. Mathias 《Parallel Computing》1990,15(1-3):277-289

With the advent of VLSI it has become possible to map parallel algorithms for compute-bound problems directly on silicon. Systolic architecture is very good candidate for VLSI implementation because of its regular and simple design, and regular communication pattern. In this paper, a systolic algorithm and corresponding systolic architecture, a linear systolic array, for the scanline-based hidden surface removal problem in three-dimensional computer graphics have been proposed. The algorithm is based on the concept of sample spans or intervals. The worst case time taken by the algorithm is O(n), n being the number of segments in a scanline. The time taken by the algorithm for a given scene depends on the scene itself, and on an average considerable improvement over the worst case behaviour is expected. A pipeline scheme for handling the I/O process has also been proposed which is suitable for VLSI implementation of the algorithm. 相似文献

6.

Solution of dense linear systems on an optimal systolic architecture

Ahmed El-Amawy 《Computers & Electrical Engineering》1987,13(3-4):177-193

The paper presents an optimal systolic array architecture for rapid solution of dense systems of linear equations. The array solves a system of size n×n in 4n + 1 time units including I/0 time. Data communications are strictly local and the processing elements (PEs) are simple. The complete three-phase solution algorithm is executed on a single array, employing about 3n²/2 PEs without any need for costly inter-phase I/0. Due to a novel data steering mechanism, the three algorithmic phases are maximally overlapped. Design optimality is established using systolic precedence diagrams. It is also shown that merging the functions of two adjacent PEs into a single PE is possible resulting in maximal PE utilization. An interesting result regarding cascading phase-optimal arrays is obtained. 相似文献

7.

Romberg integration using systolic arrays

D. J. Evans G. M. Megson 《Parallel Computing》1986,3(4):289-304

A systolic array is presented to improve numerical approximations to integrals using Richardson's extrapolation procedure in the form of Romberg integration. Two designs are presented, the first is an intuitive linear systolic array, the second, a systolic ring using approximately 1/3 of the cells of the first array. Both systolic arrays have a computation time of 3n cycles, which is a significant improvement on the O(n²) steps required to construct the extrapolation table sequentially. 相似文献

8.

A parallel algorithm for generating combinations

C.-J. Lin 《Computers & Mathematics with Applications》1989,17(12):1523-1533

A parallel algorithm for generating all combinations of m items out of n given items in lexicographic order is presented. The computational model is a linear systolic array consisting of m identical processing elements. It takes (_mⁿ) time-steps to generate all the (_mⁿ) combinations. Since any processing element is identical and executes the same procedure, it is suitable for VLSI implementation. Based on mathematical induction, such algorithm is proved to be correct. 相似文献

9.

Two minimum spanning forest algorithms on fixed-size hypercube computers

Sajal K. Das Narsingh Deo Sushil Prasad 《Parallel Computing》1990,15(1-3):179-187

Two parallel algorithms for finding minimum spanning forest (MSF) of a weighted undirected graph on hypercube computers, consisting of a fixed number of processors, are presented. One algorithm is suited for sparse graphs, the other for dense graphs. Our design strategy is based on successive elimination of non-MSF edges. The input graph is partitioned equally among different processors, which then repeatedly eliminate non-MSF edges and merge results to gradually construct the desired MSF of the entire graph. Low communication overhead is achieved by restricting the message-flow to between the neighboring processors in the hypercube topology. The correctness of our approach is due to a theorem which states that with total-ordered edges, if an edge of an arbitrary subgraph does not belong to its MSF, then it does not belong to the MSF of the entire graph. For a graph of n vertices and m edges, our first algorithm finds an MSF in O(m log m)/p) time using p processors for p ≤ (mlog m)/n(1+log(m/n)). The second algorithm, efficient for dense graphs, requires O(n²/p) time for p≤n/log n. 相似文献

10.

An optimal algorithm for Gaussian elimination of band matrices on an MIMD computer

《Parallel Computing》1990,15(1-3):133-145

This paper describes a parallel algorithm for the LU decomposition of band matrices using Gaussian elimination. The matrix dimension is n × n with 2r−1 diagonals. In the case when 1 r 2 p an optimal number of the processors, , is determined according to the equation . When 2 p r n a number of processors, p, statged by Veldhorst is adopted (see [7]). For band matrix with 2r-1 diagonals (1 r 2p) the task scheduling procedure with the aim to obtain maximal parallelism in system operation, i.e. good load balancing, is defined. The architecture of the system is of MIMD type. The connection between the processors is realised via a common bus. Communication and synchronization is performed by message passing technique. 相似文献

11.

A new algorithm based on Givens rotations for solving linearequations on fault-tolerant mesh-connected processors

Murthy K.N.B. Bhuvaneswari K. Ram Murthy C.S. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(8):825-832

In this paper, we propose a new I/O overhead free Givens rotations based parallel algorithm for solving a system of linear equations. The algorithm uses a new technique called two-sided elimination and requires an N×(N+1) mesh-connected processor array to solve N linear equations in (5N-log N-4) time steps. The array is well suited for VLSI implementation as identical processors with simple and regular interconnection pattern are required. We also describe a fault-tolerant scheme based on an algorithm based fault tolerance (ABFT) approach. This scheme has small hardware and time overhead and can tolerate up to N processor failures 相似文献

12.

A method of inexact steepest descent for systems of linear equations

T. Altman 《Computers & Mathematics with Applications》1990,19(12):65-69

Our approach combines the method of inexact steepest descent with the method of contractor directions to obtain an algorithm for solving systems of linear equations. In order to enhance the scope of applicability, we consider an iterative method with variable step-size iterations. We prove the convergence and given an error estimate for our method.

The algorithm is well-suited for parallel computation. In fact, for systems with m equations and n unknowns, each iteration may be computed in parallel time O(log m + log n), on an EREW PRAM with O(mn) processors. 相似文献

13.

Scheduling parallel iterative methods on multiprocessor systems

Nikolaos M. Missirlis 《Parallel Computing》1987,5(3):295-302

The paper describes the implementation of the Successive Overrelaxation (SOR) method on an asynchronous multiprocessor computer for solving large, linear systems. The parallel algorithm is derived by dividing the serial SOR method into noninterfering tasks which are then combined with an optimal schedule of a feasible number of processors. The important features of the algorithm are: (i) achieves a speedup S_p O(N/3) and an efficiency E_p 2/3 using P = [N/2] processors, where N is the number of the equations, (ii) contains a high level of inherent parallelism, whereas on the other hand, the convergence theory of the parallel SOR method is the same as its sequential counterpart and (iii) may be modified to use block methods in order to minimise the overhead due to communication and synchronisation of the processors. 相似文献

14.

Parallel algorithms for solving the satisfaction problem of functional and multivalued data dependencies

Chao-Chih Yang Weicong Shen 《Data & Knowledge Engineering》1989,3(4):323-338

Parallel algorithms for solving the satisfaction problem of non-trivial functional and multivalued data dependencies (FDs and MVDs) in a relation of N tuples by M processors are developed in this paper. Algorithms performing, in a parallel manner, batch or interactive checking of these data dependencies are also discussed. The M processors are organized as a linear systolic array. The time complexities of the first two algorithms for solving the FD satisfaction problem under M N are both O(N), and that of Algorithm (3) or (4) for solving the FD or MVD satisfaction problem under N M is O(N²/M). The latter complexity reduced to O(N) if N = M and is at least not worse than O(N log N) if N = M (N/log N). 相似文献

15.

A parallel algorithm solving a tridiagonal Toeplitz linear system

Hyoung Joong Kim Jang Gyu Lee 《Parallel Computing》1990,13(3):289-294

A new tridiagonal Toeplitz linear system (TTLS) solver is proposed. The solver first decomposes an n-dimensional strictly diagonally dominant TTLS equation into a number of m-dimensional subsystems employing a modified Gaussian elimination method. An analytic solution of a continued fraction is obtained to derive the solver. The solver based on the modified Gaussian elimination method fully exploits parallelism. Computation and communication complexities of the proposed algorithm are all shown to be O(n/m). 相似文献

16.

An Efficient Systolic Algorithm for the Longest Common Subsequence Problem

Lin Yen-Chun Chen Jyh-Chian 《The Journal of supercomputing》1998,12(4):373-385

A longest common subsequence (LCS) of two strings is a common subsequence of the two strings of maximal length. The LCS problem is to find an LCS of two given strings and the length of the LCS (LLCS). In this paper, a fast linear systolic algorithm that improves on previous systolic algorithms for solving the LCS problem is presented. For two given strings of length m and n, where m n, the LLCS and an LCS can be found in m + 2n – 1 time steps. This algorithm achieves the tight lower bound of the time complexity under the situation where symbols are input sequentially to a linear array of n processors. The systolic algorithm can be modified to take only m + n steps on multicomputers by using the scatter operation. 相似文献

17.

An orthogonal systolic design for the assignment problem

G. M. Megson D. J. Evans 《Parallel Computing》1990,16(2-3):253-267

A systolic array for the solution of the assignment problem is presented. The algorithm requires O(n²) time on an orthogonally connected array of (n + 2) * (n + 2) cells consisting of simple adders and control logic. The design is area efficient and incorporates the new concept of a Systolic Control Ring (SCR) to generate the necessary systolic wavefronts in any orientation within the design, while special cells are positioned only on the periphery of the design. The design was simulated and tested by an OCCAM program. 相似文献

18.

A parallel two-list algorithm for the knapsack problem 总被引：10，自引：0，他引：10

Der-Chyuan Lou Chin-Chen Chang 《Parallel Computing》1997,22(14):1985-1996

An n-element knapsack problem has 2ⁿ possible solutions to search over, so a task which can be accomplished in 2″ trials if an exhaustive search is used. Due to the exponential time in solving the knapsack problem, the problem is considered to be very hard. In the past decade, much effort has been done in order to find techniques which could lead to practical algorithms with reasonable running time. In 1994, Chang et al. proposed a brilliant parallel algorithm, which needs O(2^n/8) processors to solve the knapsack problem in O(2^n/2) time; that is, the cost of Chang et al.'s parallel algorithm is O(2^5n/8). In this paper, we propose a parallel algorithm to improve Chang et al.'s parallel algorithm by reducing the time complexity to be O(2^3n/8) under the same O(2^n/8) processors available. Thus, the proposed parallel algorithm has a cost of O(2^n/2). It is an improvement over previous literature. We believe that the proposed parallel algorithm is pragmatically feasible at the moment when multiprocessor systems become more and more popular. 相似文献

19.

A cost-optimal parallel tridiagonal system solver

Ferng-Ching Lin Kuo-Liang Chung 《Parallel Computing》1990,15(1-3):189-199

We first show how to transform the solution of an n × n tridiagonal system into suffix computations of continued fractions. Then a parallel substitution scheme is introduced to compute the suffix values. The derived parallel algorithm allows the tridiagonal system to be solved in O(log n) time on an unshuffle network with Θ(n /log n) processors. It is cost-optimal in the sense that processor number times execution time is minimized. Our solver is conceptually simple and easy for implementation. 相似文献

20.

Optimal and nearly optimal algorithms for approximating polynomial zeros

V.Y. Pan 《Computers & Mathematics with Applications》1996,31(12):97-138

We substantially improve the known algorithms for approximating all the complex zeros of an n^th degree polynomial p(x). Our new algorithms save both Boolean and arithmetic sequential time, versus the previous best algorithms of Schönhage [1], Pan [2], and Neff and Reif [3]. In parallel (NC) implementation, we dramatically decrease the number of processors, versus the parallel algorithm of Neff [4], which was the only NC algorithm known for this problem so far. Specifically, under the simple normalization assumption that the variable x has been scaled so as to confine the zeros of p(x) to the unit disc x : |x| ≤ 1, our algorithms (which promise to be practically effective) approximate all the zeros of p(x) within the absolute error bound 2^−b, by using order of n arithmetic operations and order of (b + n)n² Boolean (bitwise) operations (in both cases up to within polylogarithmic factors). The algorithms allow their optimal (work preserving) NC parallelization, so that they can be implemented by using polylogarithmic time and the orders of n arithmetic processors or (b + n)n² Boolean processors. All the cited bounds on the computational complexity are within polylogarithmic factors from the optimum (in terms of n and b) under both arithmetic and Boolean models of computation (in the Boolean case, under the additional (realistic) assumption that n = O(b)). 相似文献