期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A processor-time-minimal systolic array for cubical mesh algorithms

Cappello P. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(1):4-13

Using a directed acyclic graph (DAG) model of algorithms, the paper focuses on time-minimal multiprocessor schedules that use as few processors as possible. Such a processor-time-minimal scheduling of an algorithm's DAG first is illustrated using a triangular shaped 2-D directed mesh (representing, for example, an algorithm for solving a triangular system of linear equations). Then, algorithms represented by an n×n×n directed mesh are investigated. This cubical directed mesh is fundamental; it represents the standard algorithm for computing matrix product as well as many other algorithms. Completion of the cubical mesh required 3n-2 steps. It is shown that the number of processing elements needed to achieve this time bound is at least [3n^2/4]. A systolic array for the cubical directed mesh is then presented. It completes the mesh using the minimum number of steps and exactly [3n ^2/4] processing elements it is processor-time-minimal. The systolic array's topology is that of a hexagonally shaped, cylindrically connected, 2-D directed mesh 相似文献

2.

A processor-time-minimal systolic array for transitive closure

Scheiman C.J. Cappello P.R. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(3):257-269

Using a directed acyclic graph (DAG) model of algorithms, the authors focus on processor-time-minimal multiprocessor schedules: time-minimal multiprocessor schedules that use as few processors as possible. The Kung, Lo, and Lewis (KLL) algorithm for computing the transitive closure of a relation over a set of n elements requires at least 5n-4 parallel steps. As originally reported. their systolic array comprises n² processing elements. It is shown that any time-minimal multiprocessor schedule of the KLL algorithm's dag needs at least n²/3 processing elements. Then a processor-time-minimal systolic array realizing the KLL dag is constructed. Its processing elements are organized as a cylindrically connected 2-D mesh, when n=0 mod 3. When n≠0 mod 3, the 2-D mesh is connected as a torus 相似文献

3.

Two-dimensional convolution on a pyramid computer

Chang J.H. Ibarra O.H. Pong T.-C. Sohn S.M. 《IEEE transactions on pattern analysis and machine intelligence》1988,10(4):590-593

An algorithm for convolving a k×k window of weighting coefficients with an n×n image matrix on a pyramid computer of O(n²) processors in time O(logn+k²), excluding the time to load the image matrix, is presented. If k=Ω (√log n), which is typical in practice, the algorithm has a processor-time product O(n ² k²) which is optimal with respect to the usual sequential algorithm. A feature of the algorithm is that the mechanism for controlling the transmission and distribution of data in each processor is finite state, independent of the values of n and k. Thus, for convolving two {0, 1}-valued matrices using Boolean operations rather than the typical sum and product operations, the processors of the pyramid computer are finite-state 相似文献

4.

A novel discrete relaxation architecture 总被引：1，自引：0，他引：1

Gu J. Wang W. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(8):857-865

The discrete relaxation algorithm (DRA) is a computational technique that enforces arc consistency (AC) in a constraint satisfaction problem (CSP). The original sequential AC-1 algorithm suffers from O(n³m³) time complexity, and even the optimal sequential AC-4 algorithm is O (n²m²) for an n-object and m-label DRA problem. Sample problem runs show that these algorithms are all too slow to meet the need for any useful, real-time CSP applications. A parallel DRA5 algorithm that reaches a lower bound of O(nm) (where the number of processors is polynomial in the problem size) is given. A fine-grained, massively parallel hardware computer architecture has been designed for the DRA5 algorithm. For practical problems, many orders of magnitude of efficiency improvement can be reached on such a hardware architecture 相似文献

5.

Some explicit formulas for the matrix exponential

Bernstein D.S. So W. 《Automatic Control, IEEE Transactions on》1993,38(8):1228-1232

Formulas are derived for the exponential of an arbitrary 2×2 matrix in terms of either its eigenvalues or entries. These results are then applied to the second-order mechanical vibration equation with weak or strong damping. Some formulas for the exponential of n×n matrices are given for matrices that satisfy an arbitrary quadratic polynomial. Besides the above-mentioned 2×2 matrices, these results encompass involutory, rank 1, and idempotent matrices. Consideration is then given to n×n matrices that satisfy a special cubic polynomial. These results are applied to the case of a 3×3 skew symmetric matrix whose exponential represents the constant rotation of a rigid body about a fixed axis 相似文献

6.

Systolic computation of multivariable frequency response

Capello P.R. Laub A.J. 《Automatic Control, IEEE Transactions on》1988,33(6):550-558

An algorithm intended for software implementation on a programmable systolic/wavefront computer is presented for the computation of a complex-valued frequency-response matrix G. Typically, real-valued state-space model matrices are given and the calculation of G must be performed for a very large number of values of the scalar frequency parameter. The algorithm is an orthogonal version of an algorithm described previously by A.J. Laub (ibid., vol.26, no.4, p.407-8, 1981). The system matrix A is reduced initially to an upper Hessenberg form which is preserved as the frequency varies subsequently. A systolic QR factorization of a certain complex-valued matrix is then implemented for effecting the necessary linear system solution (inversion). The critical computational component is the back solve. This computational component's process dependency graph is embedded optimally in space and time through the use of a nonlinear spacetime transformation. The computational period of the algorithm is O(n) where n is the order of the matrix A 相似文献

7.

Serial and parallel algorithms for the medial axis transform

Jenq J.-F. Sahni S. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(12):1218-1224

An O(n²) time serial algorithm is developed for obtaining the medial axis transform (MAT) of an n×n image. An O(log n) time CREW PRAM algorithm and an O(log² n) time SIMD hypercube parallel algorithm for the MAT are also developed. Both of these use O(n²) processors. Two problems associated with the MAT, the area and perimeter reporting problem, are studied. An O(log n) time hypercube algorithm is developed for both of them, where n is the number of squares in the MAT, and the algorithms use O(n²) processors 相似文献

8.

n-D polynomial matrix equations

Sebek M. 《Automatic Control, IEEE Transactions on》1988,33(5):499-502

Linear matrix equations in the ring of polynomials in n indeterminates (n-D) are studied. General- and minimum-degree solutions are discussed. Simple and constructive, necessary and sufficient solvability conditions are derived. An algorithm to solve the equations with general n-D polynomial matrices is presented. It is based on elementary reductions in a greater ring of polynomials in one indeterminate, having as coefficients polynomial fractions in the other n-1 indeterminates, which makes the use of Euclidean division possible 相似文献

9.

An efficient distributed knot detection algorithm

Cidon I. 《IEEE transactions on pattern analysis and machine intelligence》1989,15(5):644-649

A distributed knot detection algorithm for general graphs is presented. The knot detection algorithm uses at most O(n log n+m) messages and O(m+n log n) bits of memory to detect all knots' nodes in the network (where n is the number of nodes and m is the number of links). This is compared to O(n²) messages needed in the best algorithm previously published. The knot detection algorithm makes use of efficient cycle detection and clustering techniques. Various applications for the knot detection algorithms are presented. In particular, its importance to deadlock detection in store and forward communication networks and in transaction systems is demonstrated 相似文献

10.

A comment on `An inequality for the trace of matrix product' byJ.K. Baksalary and S. Puntanen

Chengshan Xiao 《Automatic Control, IEEE Transactions on》1993,38(3):510-511

An example is provided which illustrates that the new bounds for the trace of the product of an arbitrary n×n real matrix A and an n×n nonnegative definite real symmetric matrix B derived in the above-titled paper (ibid., vol.37, no.2, pp.239-240, Feb. 1992) are not valid 相似文献

11.

Optimal parallel initialization algorithms for a class of priorityqueues

Olariu S. Wen Z. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(4):423-429

An adaptive parallel algorithm for inducing a priority queue structure on an n-element array is presented. The algorithm is extended to provide optimal parallel construction algorithms for three other heap-like structures useful in implementing double-ended priority queues, namely min-max heaps, deeps, and min-max-pair heaps. It is shown that an n-element array can be made into a heap, a deap, a min-max heap, or a min-max-pair heap in O(log n+(n /p)) time using no more than n/log n processors, in the exclusive-read-exclusive-write parallel random-access machine model 相似文献

12.

Rotator graphs: an efficient topology for point-to-pointmultiprocessor networks

Corbett P.F. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(5):622-626

Rotator graphs, a set of directed permutation graphs, are proposed as an alternative to star and pancake graphs. Rotator graphs are defined in a way similar to the recently proposed Faber-Moore graphs. They have smaller diameter, n-1 in a graph with n factorial vertices, than either the star or pancake graphs or the k-ary n-cubes. A simple optimal routing algorithm is presented for rotator graphs. The n-rotator graphs are defined as a subset of all rotator graphs. The distribution of distances of vertices in the n-rotator graphs is presented, and the average distance between vertices is found. The n-rotator graphs are shown to be optimally fault tolerant and maximally one-step fault diagnosable. The n-rotator graphs are shown to be Hamiltonian, and an algorithm for finding a Hamiltonian circuit in the graphs is given 相似文献

13.

Memory and processing architecture for 3D voxel-based imagery 总被引：1，自引：0，他引：1

Kaufman A. Bakalash R. 《Computer Graphics and Applications, IEEE》1988,8(6):10-23

A versatile voxel-based architecture for 3-D volume visualization, called the Cube architecture, is introduced. A small-scale prototype of the architecture has been realized in hardware and has been operating in true real-time, faster than the alternative voxel systems. The Cube architecture is centered around a 3-D cubic frame buffer, of voxels, and it entertains three processors that access the frame buffer to input sampled and synthetic data, to manipulate the 3-D images, and to project and render them. To cope with the huge quantity of voxels and still perform in real-time, two special features were incorporated within the architecture: a unique skewed memory organization, which permits the retrieval and storage of voxels in parallel, and a multiple-write bus, which speeds up the viewing process. These features allow Cube, for example, to project an image of n³ voxels in O(n ² log n) time rather than the conventional O( n³) time 相似文献

14.

On time mapping of uniform dependence algorithms into lowerdimensional processor arrays

Shang W. Fortes J.A.B. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(3):350-363

Most existing methods of mapping algorithms into processor arrays are restricted to the case where n-dimensional algorithms, or algorithms with n nested loops, are mapped into (n-1)-dimensional arrays. However, in practice, it is interesting to map n-dimensional algorithms into (k-1)-dimensional arrays where k<n. A computational conflict occurs if two or more computations of an algorithm are mapped into the same execution time. Based on the Hermite normal form of the mapping matrix, necessary and sufficient conditions are derived to identify mapping without computational conflicts. These conditions are used to find time mappings of n-dimensional algorithms into (k-1)-dimensional arrays, k<n , without computational conflicts. For some applications, the mapping is time-optimal 相似文献

15.

The minimal dimension of stable faces required to guaranteestability of a matrix polytope

Cobb J.D. Demarco C.L. 《Automatic Control, IEEE Transactions on》1989,34(9):990-992

Considers the problem of determining whether each point in a polytope n×n matrices is stable. The approach is to check stability of certain faces of the polytope. For n⩾3, the authors show that stability of each point in every (2n-4)-dimensional face guarantees stability of the entire polytope. Furthermore, they prove that, for any k⩽n², there exists a k-dimensional polytope containing a strictly unstable point and such that all its subpolytopes of dimension min {k-1,2n-5} are stable 相似文献

16.

Algorithms and bounds for shortest paths and diameter in faultyhypercubes

Tien S.-B. Raghavendra C.S. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(6):713-718

In an n-dimensional hypercube Qn, with the fault set |F|<2_n-2, assuming S and D are not isolated, it is shown that there exists a path of length equal to at most their Hamming distance plus 4. An algorithm with complexity O (|F|logn) is given to find such a path. A bound for the diameter of the faulty hypercube Qn-F, when |F|<2_n-2, as n+2 is obtained. This improves the previously known bound of n+6 obtained by A.-H. Esfahanian (1989). Worst case scenarios are constructed to show that these bounds for shortest paths and diameter are tight. It is also shown that when |F|<2n-2, the diameter bound is reduced to n+1 if every node has at least 2 nonfaulty neighbors and reduced to n if every node has at least 3 nonfaulty neighbors 相似文献

17.

Balanced parallel sort on hypercube multiprocessors

Abali B. Ozguner F. Bataineh A. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(5):572-581

A parallel sorting algorithm for sorting n elements evenly distributed over 2^d p nodes of a d-dimensional hypercube is presented. The average running time of the algorithm is O((n log n)/p+p log 2n). The algorithm maintains a perfect load balance in the nodes by determining the (kn/p)th elements (k1,. . ., (p-1)) of the final sorted list in advance. These p-1 keys are used to partition the sorted sublists in each node to redistribute data to the nodes to be merged in parallel. The nodes finish the sort with an equal number of elements (n/ p) regardless of the data distribution. A parallel selection algorithm for determining the balanced partition keys in O(p log2n) time is presented. The speed of the sorting algorithm is further enhanced by the distance-d communication capability of the iPSC/2 hypercube computer and a novel conflict-free routing algorithm. Experimental results on a 16-node hypercube computer show that the sorting algorithm is competitive with the previous algorithms and faster for skewed data distributions 相似文献

18.

Parallel binary search

Akl S.G. Meijer H. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(2):247-250

Two arrays of numbers sorted in nondecreasing order are given: an array A of size n and an array B of size m, where n<m. It is required to determine, for every element of A, the smallest element of B (if one exists) that is larger than or equal to it. It is shown how to solve this problem on the EREW PRAM (exclusive-read exclusive-write parallel random-access machine) in O(logm logn/log log m) time using n processors. The solution is then extended to the case in which fewer than n processors are available. This yields an EREW PRAM algorithm for the problem whose cost is O(n log m, which is O(m)) for n⩽m/log m. It is shown how the solution obtained leads to an improved parallel merging algorithm 相似文献

19.

A linear algorithm for generating random numbers with a givendistribution

Vose M.D. 《IEEE transactions on pattern analysis and machine intelligence》1991,17(9):972-975

Let ξ be a random variable over a finite set with an arbitrary probability distribution. Improvements to a fast method of generating sample values for ξ in constant time are suggested. The proposed modification reduces the time required for initialization to O( n). For a simple genetic algorithm, this improvement changes an O(g n 1n n) algorithm into an O(g n) algorithm (where g is the number of generations, and n is the population size) 相似文献

20.

On the gap between the structural controllability of time-varyingand time-invariant systems

Poljak S. 《Automatic Control, IEEE Transactions on》1992,37(12):1961-1965

Structural controllability of time-invariant and time-varying systems when the input control sequences have a restricted length k is compared. The dimensions of controllable space coincide in the following three special cases: the input sequences have length k=2; the input sequences have k=n, where n is the size of the system (i.e., the ultimate controllability is the same in both cases); and for every length of input sequences provided that the system has a single input only. It is proved that there may appear a gap for every input length k such that 2< k⩽n/2. The case when n/2<k<n is left open 相似文献