首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
A systolic array for the solution of the assignment problem is presented. The algorithm requires O(n2) time on an orthogonally connected array of (n + 2) * (n + 2) cells consisting of simple adders and control logic. The design is area efficient and incorporates the new concept of a Systolic Control Ring (SCR) to generate the necessary systolic wavefronts in any orientation within the design, while special cells are positioned only on the periphery of the design. The design was simulated and tested by an OCCAM program.  相似文献   

2.
For an arbitrary n × n matrix A and an n × 1 column vector b, we present a systolic algorithm to solve the dense linear equations Ax = b. An important consideration is that the pivot row can be changed during the execution of our systolic algorithm. The computational model consists of n linear systolic arrays. For 1 ≤ in, the ith linear array is responsible to eliminate the ith unknown variable xi of x. This algorithm requires 4n time steps to solve the linear system. The elapsed time unit within a time step is independent of the problem size n. Since the structure of a PE is simple and the same type PE executes the identical instructions, it is very suitable for VLSI implementation. The design process and correctness proof are considered in detail. Moreover, this algorithm can detect whether A is singular or not.  相似文献   

3.
A novel algorithm for fast computation of Zernike moments   总被引:7,自引:0,他引:7  
J.  H. Z.  C.  L. M. 《Pattern recognition》2002,35(12):2905-2911
Zernike moments (ZMs) have been successfully used in pattern recognition and image analysis due to their good properties of orthogonality and rotation invariance. However, their computation by a direct method is too expensive, which limits the application of ZMs. In this paper, we present a novel algorithm for fast computation of Zernike moments. By using the recursive property of Zernike polynomials, the inter-relationship of the Zernike moments can be established. As a result, the Zernike moment of order n with repetition m, Znm, can be expressed as a combination of Zn−2,m and Zn−4,m. Based on this relationship, the Zernike moment Znm, for n>m, can be deduced from Zmm. To reduce the computational complexity, we adopt an algorithm known as systolic array for computing these latter moments. Using such a strategy, the multiplication number required in the moment calculation of Zmm can be decreased significantly. Comparison with known methods shows that our algorithm is as accurate as the existing methods, but is more efficient.  相似文献   

4.
A systolic array is presented to improve numerical approximations to integrals using Richardson's extrapolation procedure in the form of Romberg integration. Two designs are presented, the first is an intuitive linear systolic array, the second, a systolic ring using approximately 1/3 of the cells of the first array. Both systolic arrays have a computation time of 3n cycles, which is a significant improvement on the O(n2) steps required to construct the extrapolation table sequentially.  相似文献   

5.
In this paper, we derive time-minimal systolic arrays for Gaussian elimination and the Algebraic Path Problem (APP) that use a minimal number of processors. For a problem of size n, we obtain an execution time T(n) = 3n −1 using A(n) = n2/4+O(n) processors for Gaussian elimination, and T(n) = 5n −2 and A(n) = n3/+O(n) for the APP.  相似文献   

6.
A linear rotation based algorithm is proposed for solving linear system equations, Ax = b. This algorithm modified the conventional Gaussian elimination method and can avoid the problems of numerical singularity and ill condition. In this study, the implementation of a trapezoidal systolic array of n2/2 + n −2 processors as well as a linear array of n processors are accomplished for this algorithm. The trapezoidal systolic array performs the triangularization of a matrix A by using the modified linear rotation algorithm; while the linear array performs the backward substitution for evaluating the solution of x. The computing time for solving a linear equation system will be O(5n) time units. Also an implicit representation of the elimination factor by means of the sign parameter sequence instead of an numerical value is introduced for simplifying the hardware complexity. It is clear that this systolic architecture is simple, uniform, and regular, and therefore well suitable for the implementation of a VLSI chip.  相似文献   

7.
A parallel algorithm for generating all combinations of m items out of n given items in lexicographic order is presented. The computational model is a linear systolic array consisting of m identical processing elements. It takes (mn) time-steps to generate all the (mn) combinations. Since any processing element is identical and executes the same procedure, it is suitable for VLSI implementation. Based on mathematical induction, such algorithm is proved to be correct.  相似文献   

8.
The paper presents an optimal systolic array architecture for rapid solution of dense systems of linear equations. The array solves a system of size n×n in 4n + 1 time units including I/0 time. Data communications are strictly local and the processing elements (PEs) are simple. The complete three-phase solution algorithm is executed on a single array, employing about 3n2/2 PEs without any need for costly inter-phase I/0. Due to a novel data steering mechanism, the three algorithmic phases are maximally overlapped. Design optimality is established using systolic precedence diagrams. It is also shown that merging the functions of two adjacent PEs into a single PE is possible resulting in maximal PE utilization. An interesting result regarding cascading phase-optimal arrays is obtained.  相似文献   

9.
With the advent of VLSI it has become possible to map parallel algorithms for compute-bound problems directly on silicon. Systolic architecture is very good candidate for VLSI implementation because of its regular and simple design, and regular communication pattern. In this paper, a systolic algorithm and corresponding systolic architecture, a linear systolic array, for the scanline-based hidden surface removal problem in three-dimensional computer graphics have been proposed. The algorithm is based on the concept of sample spans or intervals. The worst case time taken by the algorithm is O(n), n being the number of segments in a scanline. The time taken by the algorithm for a given scene depends on the scene itself, and on an average considerable improvement over the worst case behaviour is expected. A pipeline scheme for handling the I/O process has also been proposed which is suitable for VLSI implementation of the algorithm.  相似文献   

10.
We present an optimum bit-parallel/word-sequential systolic convolver. Our design is the best one among the previous many convolvers in the sense that its optimality in time and space performances is simultaneously attained without augmenting any global control, broadcasting, initial-data-preloading, and/or multi-sequential or parallel I/O ports which were allowed in most of the previous designs. As an application of our convolver we give a systolic polynomial divider which can compute the polynomial division in exactly n + O(1) optimum steps on [min(nm, m)/2]+O(1) systolic cells for the division of any degree n polynomial by any degree m polynomial (n m).  相似文献   

11.
We show that the notoriously difficult problem of finding and reporting the smallest number of vertex-disjoint paths that cover the vertices of a graph can be solved time- and work-optimally for cographs. Our result implies that for this class of graphs the task of finding a Hamiltonian path can be solved time- and work-optimally in parallel.

It was open for more than 10 years to find a time- and work-optimal parallel solution for this important problem. Our contribution is to offer an optimal solution to this important problem. We begin by showing that any algorithm that solves an instance of size n of the problem must take Ω(log n) time on the CREW, even if an infinite number of processors are available. We then go on to show that this time lower bound is tight by devising an EREW algorithm that, given an n-vertex cograph G represented by its cotree, finds and reports all the paths in a minimum path cover in O(log n) time using n/log n processors.  相似文献   


12.
In this paper we present an O(log n) time parallel algorithm for arithmetic expression evaluation, on an n × n processor array with reconfigurable bus system, where n is the sum of the number of operators and constants in the expression. The basic technique involved here is leaves-cutting (rake operation), as in the case of PRAM model algorithms available in the literature for this problem. The input to our algorithm is assumed to be the binary tree associated with a given expression (also known as expression tree with n number of nodes). Our algorithm is faster compared to the previous best time for expression evaluation on mesh connected computers which is O(√n).  相似文献   

13.
In this paper we consider the unbounded single machine parallel batch scheduling problem with family jobs and release dates to minimize makespan. We show that this problem is strongly NP-hard, and give an O(n(n/m+1)m) time dynamic programming algorithm and an O(mkk+1P2k−1) time dynamic programming algorithm, where n is the number of jobs, m is the number of families, k is the number of distinct release dates and P is the sum of the processing times of all families. We further give a heuristic with a performance ratio 2. We also give a polynomial-time approximation scheme for the problem.  相似文献   

14.
In this paper, we consider the edge searching and node searching problems on trees. Given a tree, we show a transformation from an optimal node-search strategy to an optimal edge-search strategy. Using our transformation, we simplify a previous linear-time algorithm for determining the edge-search number of a tree, and improve the running time of a previous algorithm for constructing an optimal edge-search strategy of an n-vertex tree from O(nlogn) to O(n). We also improve the running time of a previous algorithm for constructing an optimal min-cut linear layout of an n-vertex tree with the maximum degree 3 from O(nlogn) to O(n).  相似文献   

15.
We first show how to transform the solution of an n × n tridiagonal system into suffix computations of continued fractions. Then a parallel substitution scheme is introduced to compute the suffix values. The derived parallel algorithm allows the tridiagonal system to be solved in O(log n) time on an unshuffle network with Θ(n /log n) processors. It is cost-optimal in the sense that processor number times execution time is minimized. Our solver is conceptually simple and easy for implementation.  相似文献   

16.
Mian  Haibin   《Computer Communications》2007,30(18):3787-3795
Most data structures for packet forwarding are optimized for IPv4, and less capable of handling IPv6 routing tables. Shape-Shifting Trie (SST) is specifically designed to be scalable to IPv6. To reduce the worst-case lookup time that is proportional to the height of SST, it is desirable to construct a minimum-height SST. Breadth-First Pruning (BFP) algorithm takes O(n2) time to construct a minimum-height SST, where n is the number of nodes in the binary trie corresponding to the routing table. In this paper, we propose a Post-Order Minimum-Height Pruning (POMHP) algorithm that takes only O(n) time to construct a minimum-height SST. We further propose nodes merging algorithm to cut down SST size without affecting SST height.  相似文献   

17.
We present an optimal distributed algorithm to record a global state of a distributed system with causally ordered message delivery. The message complexity of our algorithm is O(n) bits where n is the number of processes in the system.  相似文献   

18.
We propose an adaptive finite element algorithm for shells which, in addition to the usual h-p adaption also shows adaptivity with respect to the order n of the dimension reduction. The idea of the algorithm is to adaptively capture and resolve the various length scales that may occur in shells. The algorithm presented in the paper is limited to axisymmetric problems, which reduces the h-p part of the problem to one dimension only. The performance of the algorithm is tested in some example cases where the shell is cylindrical. For comparison, we test the algorithm also when n is limited so that n k, where K = 1, 2 or 3. Choosing k = 2 essentially corresponds to the classical shell models.  相似文献   

19.
In this paper, we study the problem of finding routing algorithms on the multirate rearrangeable Clos networks which use as few number of middle-stage switches as possible. We propose a new routing algorithm called the “grouping algorithm”. This is a simple algorithm which uses fewer middle-stage switches than all known strategies, given that the number of input-stage switches and output-stage switches are relatively small compared to the size of input and output switches. In particular, the grouping algorithm implies that m = 2n+(n−1)/2k is a sufficient number of middle-stage switches for the symmetric three-stage Clos network C(n,m,r) to be multirate rearrangeable, where k is any positive integer and rn/(2k−1).  相似文献   

20.
This paper presents an efficient algorithm for enumerating all minimal a-b separators separating given non-adjacent vertices a and b in an undirected connected simple graph G = (V, E), Our algorithm requires O(n3Rab) time, which improves the known result of O(n4Rab) time for solving this problem, where ¦V¦= n and Rab is the number of minimal a-b separators. The algorithm can be generalized for enumerating all minimal A-B separators that separate non-adjacent vertex sets A, B < V, and it requires O(n2(nnAnb)RAB) time in this case, where na = ¦A¦, nB = ¦B¦ and rAB is the number of all minimal AB separators. Using the algorithm above as a routine, an efficient algorithm for enumerating all minimal separators of G separating G into at least two connected components is constructed. The algorithm runs in time O(n3R+Σ + n4RΣ), which improves the known result of O(n6RΣ) time, where Rσ is the number of all minimal separators of G and RΣR+Σ = ∑1i, vj) ERvivj n − 1)/2 − m)RΣ. Efficient parallelization of these algorithms is also discussed. It is shown that the first algorithm requires at most O((n/log n)Rab) time and the second one runs in time O((n/log n)R+Σ+n log nRΣ) on a CREW PRAM with O(n3) processors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号