期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A processor-time-minimal systolic array for cubical mesh algorithms

Cappello P. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(1):4-13

Using a directed acyclic graph (DAG) model of algorithms, the paper focuses on time-minimal multiprocessor schedules that use as few processors as possible. Such a processor-time-minimal scheduling of an algorithm's DAG first is illustrated using a triangular shaped 2-D directed mesh (representing, for example, an algorithm for solving a triangular system of linear equations). Then, algorithms represented by an n×n×n directed mesh are investigated. This cubical directed mesh is fundamental; it represents the standard algorithm for computing matrix product as well as many other algorithms. Completion of the cubical mesh required 3n-2 steps. It is shown that the number of processing elements needed to achieve this time bound is at least [3n^2/4]. A systolic array for the cubical directed mesh is then presented. It completes the mesh using the minimum number of steps and exactly [3n ^2/4] processing elements it is processor-time-minimal. The systolic array's topology is that of a hexagonally shaped, cylindrically connected, 2-D directed mesh 相似文献

2.

A period-processor-time-minimal schedule for cubical meshalgorithms

Scheiman C. Cappello P. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(3):274-280

Using a directed acyclic graph (dag) model of algorithms, we investigate precedence-constrained multiprocessor schedules for the n×n×n directed mesh. This cubical mesh is fundamental, representing the standard algorithm for square matrix product, as well as many other algorithms. Its completion requires at least 3^n-2 multiprocessor steps. Time-minimal multiprocessor schedules that use as few processors as possible are called processor-time-minimal. For the cubical mesh, such a schedule requires at least [3n²/4] processors. Among such schedules, one with the minimum period (i.e., maximum throughput) is referred to as a period-processor-time-minimal schedule. The period of any processor-time-minimal schedule for the cubical mesh is at least 3^n/2 steps. This lower bound is shown to be exact by constructing, for n a multiple of 6, a period-processor-time-minimal multiprocessor schedule that can be realized on a systolic array whose topology is a toroidally connected n/2×n/2×3 mesh 相似文献

3.

Constant time algorithms for the transitive closure and somerelated graph problems on processor arrays with reconfigurable bussystems

Wang B.-F. Chen G.-H. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(4):500-507

The transitive closure problem in O(1) time is solved by a new method that is far different from the conventional solution method. On processor arrays with reconfigurable bus systems, two O (1) time algorithms are proposed for computing the transitive closure of an undirected graph. One is designed on a three-dimensional n×n×n processor array with a reconfigurable bus system, and the other is designed on a two-dimensional n²×n² processor array with a reconfigurable bus system, where n is the number of vertices in the graph. Using the O(1) time transitive closure algorithms, many other graph problems are solved in O(1) time. These problems include recognizing bipartite graphs and finding connected components, articulation points, biconnected components, bridges, and minimum spanning trees in undirected graphs 相似文献

4.

Serial and parallel algorithms for the medial axis transform

Jenq J.-F. Sahni S. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(12):1218-1224

An O(n²) time serial algorithm is developed for obtaining the medial axis transform (MAT) of an n×n image. An O(log n) time CREW PRAM algorithm and an O(log² n) time SIMD hypercube parallel algorithm for the MAT are also developed. Both of these use O(n²) processors. Two problems associated with the MAT, the area and perimeter reporting problem, are studied. An O(log n) time hypercube algorithm is developed for both of them, where n is the number of squares in the MAT, and the algorithms use O(n²) processors 相似文献

5.

Performance evaluation of circuit switched multistageinterconnection networks using a hold strategy

Hsiao S.-H. Chen C.Y.R. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(5):632-640

The performance evaluation of processor-memory communications for multiprocessor systems using circuit switched interconnection networks with a hold strategy is performed. Message size and processor processing time are considered and shown to have a significant effect on the overall system performance. A closed queuing network model is proposed such that only (n+2) states are required by the proposed model, in contrast to (n²+3n+4)/2 states needed in previous studies, where n is the number of stages of the multistage interconnection network. Since a closed-form solution is obtained, the behavior of a complete cycle of memory access through multistage interconnection networks can be accurately analyzed and various performance bounds can be obtained 相似文献

6.

Minimal parameter solution of the orthogonal matrix differentialequation

Bar-Itzhack I.Y. Markley F.L. 《Automatic Control, IEEE Transactions on》1990,35(3):314-317

The straightforward solution of the first-order differential equation satisfied by all nth-order orthogonal matrices requires n² integrations to obtain the matrix elements. There are, however, only n(n-1)/2 independent parameters which determine an orthogonal matrix. The questions of choosing them, finding their differential equation, and expressing the orthogonal matrix in terms of these parameters are considered in the present work. Several possibilities which are based on attitude determination in three dimensions are examined. It is concluded that not all 3-D methods have useful extensions to other dimensions, and that the 3-D Gibbs vector (or Cayley parameters) provide the most useful extension. An algorithm is developed using the resulting parameters, which are termed extended Rodrigues parameters, and numerical results are presented of the application of the algorithm to a fourth-order matrix 相似文献

7.

Two-dimensional convolution on a pyramid computer

Chang J.H. Ibarra O.H. Pong T.-C. Sohn S.M. 《IEEE transactions on pattern analysis and machine intelligence》1988,10(4):590-593

An algorithm for convolving a k×k window of weighting coefficients with an n×n image matrix on a pyramid computer of O(n²) processors in time O(logn+k²), excluding the time to load the image matrix, is presented. If k=Ω (√log n), which is typical in practice, the algorithm has a processor-time product O(n ² k²) which is optimal with respect to the usual sequential algorithm. A feature of the algorithm is that the mechanism for controlling the transmission and distribution of data in each processor is finite state, independent of the values of n and k. Thus, for convolving two {0, 1}-valued matrices using Boolean operations rather than the typical sum and product operations, the processors of the pyramid computer are finite-state 相似文献

8.

Comments on `Parallel algorithms for hierarchical clustering andcluster validity'

Murtagh F. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(10):1056-1057

In the above-titled paper (ibid., vol.12, no.11, p.1088-92, Nov. 1990), parallel implementations of hierarchical clustering algorithms that achieve O(n²) computational time complexity and thereby improve on the baseline of sequential implementations are described. The latter are stated to be O( n³), with the exception of the single-link method. The commenter points out that state-of-the-art hierarchical clustering algorithms have O(n²) time complexity and should be referred to in preference to the O(n³) algorithms, which were described in many texts in the 1970s. Some further references in the parallelizing of hierarchic clustering algorithms are provided 相似文献

9.

Memory and processing architecture for 3D voxel-based imagery 总被引：1，自引：0，他引：1

Kaufman A. Bakalash R. 《Computer Graphics and Applications, IEEE》1988,8(6):10-23

A versatile voxel-based architecture for 3-D volume visualization, called the Cube architecture, is introduced. A small-scale prototype of the architecture has been realized in hardware and has been operating in true real-time, faster than the alternative voxel systems. The Cube architecture is centered around a 3-D cubic frame buffer, of voxels, and it entertains three processors that access the frame buffer to input sampled and synthetic data, to manipulate the 3-D images, and to project and render them. To cope with the huge quantity of voxels and still perform in real-time, two special features were incorporated within the architecture: a unique skewed memory organization, which permits the retrieval and storage of voxels in parallel, and a multiple-write bus, which speeds up the viewing process. These features allow Cube, for example, to project an image of n³ voxels in O(n ² log n) time rather than the conventional O( n³) time 相似文献

10.

An efficient heuristic for permutation packet routing on mesheswith low buffer requirements

Makedon F. Symvonis A. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(3):270-276

Even though exact algorithms exist for permutation routine of n² messages on a n×n mesh of processors which require constant size queues, the constants are very large and the algorithms very complicated to implement. A novel, simple heuristic for the above problem is presented. It uses constant and very small size queues (size=2). For all the simulations run on randomly generated data, the number of routing steps that is required by the algorithm is almost equal to the maximum distance a packet has to travel. A pathological case is demonstrated where the routing takes more than the optimal, and it is proved that the upper bound on the number of required steps is O(n²). Furthermore, it is shown that the heuristic routes in optimal time inversion, transposition, and rotations, three special routing problems that appear very often in the design of parallel algorithms 相似文献

11.

An efficient distributed knot detection algorithm

Cidon I. 《IEEE transactions on pattern analysis and machine intelligence》1989,15(5):644-649

A distributed knot detection algorithm for general graphs is presented. The knot detection algorithm uses at most O(n log n+m) messages and O(m+n log n) bits of memory to detect all knots' nodes in the network (where n is the number of nodes and m is the number of links). This is compared to O(n²) messages needed in the best algorithm previously published. The knot detection algorithm makes use of efficient cycle detection and clustering techniques. Various applications for the knot detection algorithms are presented. In particular, its importance to deadlock detection in store and forward communication networks and in transaction systems is demonstrated 相似文献

12.

A novel discrete relaxation architecture 总被引：1，自引：0，他引：1

Gu J. Wang W. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(8):857-865

The discrete relaxation algorithm (DRA) is a computational technique that enforces arc consistency (AC) in a constraint satisfaction problem (CSP). The original sequential AC-1 algorithm suffers from O(n³m³) time complexity, and even the optimal sequential AC-4 algorithm is O (n²m²) for an n-object and m-label DRA problem. Sample problem runs show that these algorithms are all too slow to meet the need for any useful, real-time CSP applications. A parallel DRA5 algorithm that reaches a lower bound of O(nm) (where the number of processors is polynomial in the problem size) is given. A fine-grained, massively parallel hardware computer architecture has been designed for the DRA5 algorithm. For practical problems, many orders of magnitude of efficiency improvement can be reached on such a hardware architecture 相似文献

13.

Computing the width of a set 总被引：1，自引：0，他引：1

Houle M.E. Toussaint G.T. 《IEEE transactions on pattern analysis and machine intelligence》1988,10(5):761-765

For a set of points P in three-dimensional space, the width of P, W (P), is defined as the minimum distance between parallel planes of support of P. It is shown that W(P) can be computed in O(n log n +I) time and O(n) space, where I is the number of antipodal pairs of edges of the convex hull of P, and n is the number of vertices; in the worst case, I=O( n²). For a convex polyhedra the time complexity becomes O(n+I). If P is a set of points in the plane, the complexity can be reduced to O(nlog n). For simple polygons, linear time suffices 相似文献

14.

An algorithm computing the general entry of the nthKronecker power of a matrix

Choi C.H. 《Automatic Control, IEEE Transactions on》1993,38(5):828-830

The number of distinct entries among the m²ⁿ entries of the nth Kronecker power of an m×m matrix is derived. An algorithm to find the value of each entry of the Kronecker power is presented 相似文献

15.

Designing efficient parallel algorithms on mech-connected computerswith multiple broadcasting

Chen Y.-C. Chen W.-T. Chen G.-H. Sheu J.-P. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(2):241-246

Semigroup and prefix computations on two-dimensional mesh-connected computers with multiple broadcasting (2-MCCMBs) are studied. Previously, only square 2-MCCMBs with N processing elements were considered for semigroup computations of N data items, and O(N^1/6) time was required. It is found that square machines are not the best form for semigroup computations, and an O(N^1/8)-time algorithm is derived on an N^5/8×N^3/8 rectangular 2-MCCMB. This time complexity can be further reduced to O(N^1/9) if fewer processing elements are used. Parallel algorithms for prefix computations with the same time complexities are derived 相似文献

16.

A smoothly parameterized family of stabilizable, observable linearsystems containing realizations of all transfer functions of McMillandegree not exceeding n

Pait F. Morse A.S. 《Automatic Control, IEEE Transactions on》1991,36(12):1475-1477

It is shown that there is a continuously parameterized family F of n-dimensional single-input single-output (SISO) stabilizable detectable linear system Σ(p) which contains at least one realization of each reduced, strictly proper transfer function of McMillan degree not exceeding n. The parameterization map p→Σ(p) is a polynomial function in 2n indeterminates from an open convex polyhedron in R²ⁿ to the linear space of all SISO n-dimensional linear systems 相似文献

17.

An efficient digital search algorithm by using a double-arraystructure

Aoe J.-I. 《IEEE transactions on pattern analysis and machine intelligence》1989,15(9):1066-1077

An efficient digital search algorithm that is based on an internal array structure called a double array, which combines the fast access of a matrix form with the compactness of a list form, is presented. Each arc of a digital search tree, called a DS-tree, can be computed from the double array in 0(1) time; that is to say, the worst-case time complexity for retrieving a key becomes 0(k) for the length k of that key. The double array is modified to make the size compact while maintaining fast access, and algorithms for retrieval, insertion, and deletion are presented. If the size of the double array is n+cm, where n is the number of nodes of the DS-tree, m is the number of input symbols, and c is a constant particular to each double array, then it is theoretically proved that the worst-case times of deletion and insertion are proportional to cm and cm², respectively, and are independent of n. Experimental results of building the double array incrementally for various sets of keys show that c has an extremely small value, ranging from 0.17 to 1.13 相似文献

18.

Latin squares for parallel array access

Kim K. Prasanna V.K. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(4):361-370

A parallel memory system for efficient parallel array access using perfect latin squares as skewing functions is discussed. Simple construction methods for building perfect latin squares are presented. The resulting skewing scheme provides conflict free access to several important subsets of an array. The address generation can be performed in constant time with simple circuitry. The skewing scheme can provide constant time access to rows, columns, diagonals, and N^1/2×N^1/2 subarrays of an N× N array with maximum memory utilization. Self-routing Benes networks can be used to realize the permutations needed between the processing elements and the memory modules. Two skewing schemes that provide conflict free access to three-dimensional arrays are also discussed. Combined with self-routing Benes networks, these schemes provide efficient access to frequently used subsets of three-dimensional arrays 相似文献

19.

Fast image labeling using local operators on mesh-connectedcomputers

Alnuweiri H.M. Kumar V.K.P. 《IEEE transactions on pattern analysis and machine intelligence》1991,13(2):202-207

A new parallel algorithm is proposed for fat image labeling using local operators on image pixels. The algorithm can be implemented on an n×n mesh-connected computer such that, for any integer k in the range [1, log (2n)], the algorithm requires Θ(kn¹k/) bits of local memory per processor and takes Θ(kn) time. Bit-serial processors and communication links can be used without affecting the asymptotic time complexity of the algorithm. The time complexity of the algorithm has very small leading constant factors, which makes it superior to previous mesh computer labeling algorithms for most practical image sizes (e.g. up to 4096×4096 images). Furthermore, the algorithm is based on using stacks that can be realized using very fast shift registers within each processing element 相似文献

20.

A theory for multiresolution signal decomposition: the waveletrepresentation 总被引：1，自引：0，他引：1

Mallat S.G. 《IEEE transactions on pattern analysis and machine intelligence》1989,11(7):674-693

Multiresolution representations are effective for analyzing the information content of images. The properties of the operator which approximates a signal at a given resolution were studied. It is shown that the difference of information between the approximation of a signal at the resolutions 2^j+1 and 2^j (where j is an integer) can be extracted by decomposing this signal on a wavelet orthonormal basis of L²(Rⁿ), the vector space of measurable, square-integrable n-dimensional functions. In L²(R), a wavelet orthonormal basis is a family of functions which is built by dilating and translating a unique function ψ(x). This decomposition defines an orthogonal multiresolution representation called a wavelet representation. It is computed with a pyramidal algorithm based on convolutions with quadrature mirror filters. Wavelet representation lies between the spatial and Fourier domains. For images, the wavelet representation differentiates several spatial orientations. The application of this representation to data compression in image coding, texture discrimination and fractal analysis is discussed 相似文献