首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 23 毫秒
1.
Using a directed acyclic graph (DAG) model of algorithms, the paper focuses on time-minimal multiprocessor schedules that use as few processors as possible. Such a processor-time-minimal scheduling of an algorithm's DAG first is illustrated using a triangular shaped 2-D directed mesh (representing, for example, an algorithm for solving a triangular system of linear equations). Then, algorithms represented by an n×n×n directed mesh are investigated. This cubical directed mesh is fundamental; it represents the standard algorithm for computing matrix product as well as many other algorithms. Completion of the cubical mesh required 3n-2 steps. It is shown that the number of processing elements needed to achieve this time bound is at least [3n2/4]. A systolic array for the cubical directed mesh is then presented. It completes the mesh using the minimum number of steps and exactly [3n 2/4] processing elements it is processor-time-minimal. The systolic array's topology is that of a hexagonally shaped, cylindrically connected, 2-D directed mesh  相似文献   

2.
Using a directed acyclic graph (DAG) model of algorithms, the authors focus on processor-time-minimal multiprocessor schedules: time-minimal multiprocessor schedules that use as few processors as possible. The Kung, Lo, and Lewis (KLL) algorithm for computing the transitive closure of a relation over a set of n elements requires at least 5n-4 parallel steps. As originally reported. their systolic array comprises n2 processing elements. It is shown that any time-minimal multiprocessor schedule of the KLL algorithm's dag needs at least n2/3 processing elements. Then a processor-time-minimal systolic array realizing the KLL dag is constructed. Its processing elements are organized as a cylindrically connected 2-D mesh, when n=0 mod 3. When n≠0 mod 3, the 2-D mesh is connected as a torus  相似文献   

3.
An algorithm for convolving a k×k window of weighting coefficients with an n×n image matrix on a pyramid computer of O(n2) processors in time O(logn+k2), excluding the time to load the image matrix, is presented. If k=Ω (√log n), which is typical in practice, the algorithm has a processor-time product O(n 2 k2) which is optimal with respect to the usual sequential algorithm. A feature of the algorithm is that the mechanism for controlling the transmission and distribution of data in each processor is finite state, independent of the values of n and k. Thus, for convolving two {0, 1}-valued matrices using Boolean operations rather than the typical sum and product operations, the processors of the pyramid computer are finite-state  相似文献   

4.
A novel discrete relaxation architecture   总被引:1,自引:0,他引:1  
The discrete relaxation algorithm (DRA) is a computational technique that enforces arc consistency (AC) in a constraint satisfaction problem (CSP). The original sequential AC-1 algorithm suffers from O(n3m3) time complexity, and even the optimal sequential AC-4 algorithm is O (n2m2) for an n-object and m-label DRA problem. Sample problem runs show that these algorithms are all too slow to meet the need for any useful, real-time CSP applications. A parallel DRA5 algorithm that reaches a lower bound of O(nm) (where the number of processors is polynomial in the problem size) is given. A fine-grained, massively parallel hardware computer architecture has been designed for the DRA5 algorithm. For practical problems, many orders of magnitude of efficiency improvement can be reached on such a hardware architecture  相似文献   

5.
Formulas are derived for the exponential of an arbitrary 2×2 matrix in terms of either its eigenvalues or entries. These results are then applied to the second-order mechanical vibration equation with weak or strong damping. Some formulas for the exponential of n×n matrices are given for matrices that satisfy an arbitrary quadratic polynomial. Besides the above-mentioned 2×2 matrices, these results encompass involutory, rank 1, and idempotent matrices. Consideration is then given to n×n matrices that satisfy a special cubic polynomial. These results are applied to the case of a 3×3 skew symmetric matrix whose exponential represents the constant rotation of a rigid body about a fixed axis  相似文献   

6.
An algorithm intended for software implementation on a programmable systolic/wavefront computer is presented for the computation of a complex-valued frequency-response matrix G. Typically, real-valued state-space model matrices are given and the calculation of G must be performed for a very large number of values of the scalar frequency parameter. The algorithm is an orthogonal version of an algorithm described previously by A.J. Laub (ibid., vol.26, no.4, p.407-8, 1981). The system matrix A is reduced initially to an upper Hessenberg form which is preserved as the frequency varies subsequently. A systolic QR factorization of a certain complex-valued matrix is then implemented for effecting the necessary linear system solution (inversion). The critical computational component is the back solve. This computational component's process dependency graph is embedded optimally in space and time through the use of a nonlinear spacetime transformation. The computational period of the algorithm is O(n) where n is the order of the matrix A  相似文献   

7.
An O(n2) time serial algorithm is developed for obtaining the medial axis transform (MAT) of an n×n image. An O(log n) time CREW PRAM algorithm and an O(log2 n) time SIMD hypercube parallel algorithm for the MAT are also developed. Both of these use O(n2) processors. Two problems associated with the MAT, the area and perimeter reporting problem, are studied. An O(log n) time hypercube algorithm is developed for both of them, where n is the number of squares in the MAT, and the algorithms use O(n2) processors  相似文献   

8.
Linear matrix equations in the ring of polynomials in n indeterminates (n-D) are studied. General- and minimum-degree solutions are discussed. Simple and constructive, necessary and sufficient solvability conditions are derived. An algorithm to solve the equations with general n-D polynomial matrices is presented. It is based on elementary reductions in a greater ring of polynomials in one indeterminate, having as coefficients polynomial fractions in the other n-1 indeterminates, which makes the use of Euclidean division possible  相似文献   

9.
A distributed knot detection algorithm for general graphs is presented. The knot detection algorithm uses at most O(n log n+m) messages and O(m+n log n) bits of memory to detect all knots' nodes in the network (where n is the number of nodes and m is the number of links). This is compared to O(n2) messages needed in the best algorithm previously published. The knot detection algorithm makes use of efficient cycle detection and clustering techniques. Various applications for the knot detection algorithms are presented. In particular, its importance to deadlock detection in store and forward communication networks and in transaction systems is demonstrated  相似文献   

10.
An example is provided which illustrates that the new bounds for the trace of the product of an arbitrary n×n real matrix A and an n×n nonnegative definite real symmetric matrix B derived in the above-titled paper (ibid., vol.37, no.2, pp.239-240, Feb. 1992) are not valid  相似文献   

11.
An adaptive parallel algorithm for inducing a priority queue structure on an n-element array is presented. The algorithm is extended to provide optimal parallel construction algorithms for three other heap-like structures useful in implementing double-ended priority queues, namely min-max heaps, deeps, and min-max-pair heaps. It is shown that an n-element array can be made into a heap, a deap, a min-max heap, or a min-max-pair heap in O(log n+(n /p)) time using no more than n/log n processors, in the exclusive-read-exclusive-write parallel random-access machine model  相似文献   

12.
Rotator graphs, a set of directed permutation graphs, are proposed as an alternative to star and pancake graphs. Rotator graphs are defined in a way similar to the recently proposed Faber-Moore graphs. They have smaller diameter, n-1 in a graph with n factorial vertices, than either the star or pancake graphs or the k-ary n-cubes. A simple optimal routing algorithm is presented for rotator graphs. The n-rotator graphs are defined as a subset of all rotator graphs. The distribution of distances of vertices in the n-rotator graphs is presented, and the average distance between vertices is found. The n-rotator graphs are shown to be optimally fault tolerant and maximally one-step fault diagnosable. The n-rotator graphs are shown to be Hamiltonian, and an algorithm for finding a Hamiltonian circuit in the graphs is given  相似文献   

13.
Memory and processing architecture for 3D voxel-based imagery   总被引:1,自引:0,他引:1  
A versatile voxel-based architecture for 3-D volume visualization, called the Cube architecture, is introduced. A small-scale prototype of the architecture has been realized in hardware and has been operating in true real-time, faster than the alternative voxel systems. The Cube architecture is centered around a 3-D cubic frame buffer, of voxels, and it entertains three processors that access the frame buffer to input sampled and synthetic data, to manipulate the 3-D images, and to project and render them. To cope with the huge quantity of voxels and still perform in real-time, two special features were incorporated within the architecture: a unique skewed memory organization, which permits the retrieval and storage of voxels in parallel, and a multiple-write bus, which speeds up the viewing process. These features allow Cube, for example, to project an image of n3 voxels in O(n 2 log n) time rather than the conventional O( n3) time  相似文献   

14.
Most existing methods of mapping algorithms into processor arrays are restricted to the case where n-dimensional algorithms, or algorithms with n nested loops, are mapped into (n-1)-dimensional arrays. However, in practice, it is interesting to map n-dimensional algorithms into (k-1)-dimensional arrays where k<n. A computational conflict occurs if two or more computations of an algorithm are mapped into the same execution time. Based on the Hermite normal form of the mapping matrix, necessary and sufficient conditions are derived to identify mapping without computational conflicts. These conditions are used to find time mappings of n-dimensional algorithms into (k-1)-dimensional arrays, k<n , without computational conflicts. For some applications, the mapping is time-optimal  相似文献   

15.
Considers the problem of determining whether each point in a polytope n×n matrices is stable. The approach is to check stability of certain faces of the polytope. For n⩾3, the authors show that stability of each point in every (2n-4)-dimensional face guarantees stability of the entire polytope. Furthermore, they prove that, for any kn2, there exists a k-dimensional polytope containing a strictly unstable point and such that all its subpolytopes of dimension min {k-1,2n-5} are stable  相似文献   

16.
In an n-dimensional hypercube Qn, with the fault set |F|<2n-2, assuming S and D are not isolated, it is shown that there exists a path of length equal to at most their Hamming distance plus 4. An algorithm with complexity O (|F|logn) is given to find such a path. A bound for the diameter of the faulty hypercube Qn-F, when |F|<2n-2, as n+2 is obtained. This improves the previously known bound of n+6 obtained by A.-H. Esfahanian (1989). Worst case scenarios are constructed to show that these bounds for shortest paths and diameter are tight. It is also shown that when |F|<2n-2, the diameter bound is reduced to n+1 if every node has at least 2 nonfaulty neighbors and reduced to n if every node has at least 3 nonfaulty neighbors  相似文献   

17.
A parallel sorting algorithm for sorting n elements evenly distributed over 2d p nodes of a d-dimensional hypercube is presented. The average running time of the algorithm is O((n log n)/p+p log 2n). The algorithm maintains a perfect load balance in the nodes by determining the (kn/p)th elements (k1,. . ., (p-1)) of the final sorted list in advance. These p-1 keys are used to partition the sorted sublists in each node to redistribute data to the nodes to be merged in parallel. The nodes finish the sort with an equal number of elements (n/ p) regardless of the data distribution. A parallel selection algorithm for determining the balanced partition keys in O(p log2n) time is presented. The speed of the sorting algorithm is further enhanced by the distance-d communication capability of the iPSC/2 hypercube computer and a novel conflict-free routing algorithm. Experimental results on a 16-node hypercube computer show that the sorting algorithm is competitive with the previous algorithms and faster for skewed data distributions  相似文献   

18.
Two arrays of numbers sorted in nondecreasing order are given: an array A of size n and an array B of size m, where n<m. It is required to determine, for every element of A, the smallest element of B (if one exists) that is larger than or equal to it. It is shown how to solve this problem on the EREW PRAM (exclusive-read exclusive-write parallel random-access machine) in O(logm logn/log log m) time using n processors. The solution is then extended to the case in which fewer than n processors are available. This yields an EREW PRAM algorithm for the problem whose cost is O(n log m, which is O(m)) for nm/log m. It is shown how the solution obtained leads to an improved parallel merging algorithm  相似文献   

19.
Let ξ be a random variable over a finite set with an arbitrary probability distribution. Improvements to a fast method of generating sample values for ξ in constant time are suggested. The proposed modification reduces the time required for initialization to O( n). For a simple genetic algorithm, this improvement changes an O(g n 1n n) algorithm into an O(g n) algorithm (where g is the number of generations, and n is the population size)  相似文献   

20.
Structural controllability of time-invariant and time-varying systems when the input control sequences have a restricted length k is compared. The dimensions of controllable space coincide in the following three special cases: the input sequences have length k=2; the input sequences have k=n, where n is the size of the system (i.e., the ultimate controllability is the same in both cases); and for every length of input sequences provided that the system has a single input only. It is proved that there may appear a gap for every input length k such that 2< kn/2. The case when n/2<k<n is left open  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号