期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Balanced parallel sort on hypercube multiprocessors

Abali B. Ozguner F. Bataineh A. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(5):572-581

A parallel sorting algorithm for sorting n elements evenly distributed over 2^d p nodes of a d-dimensional hypercube is presented. The average running time of the algorithm is O((n log n)/p+p log 2n). The algorithm maintains a perfect load balance in the nodes by determining the (kn/p)th elements (k1,. . ., (p-1)) of the final sorted list in advance. These p-1 keys are used to partition the sorted sublists in each node to redistribute data to the nodes to be merged in parallel. The nodes finish the sort with an equal number of elements (n/ p) regardless of the data distribution. A parallel selection algorithm for determining the balanced partition keys in O(p log2n) time is presented. The speed of the sorting algorithm is further enhanced by the distance-d communication capability of the iPSC/2 hypercube computer and a novel conflict-free routing algorithm. Experimental results on a 16-node hypercube computer show that the sorting algorithm is competitive with the previous algorithms and faster for skewed data distributions 相似文献

2.

Optimal algorithms on the pipelined hypercube and related networks

JaJa J. Ryu K.W. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(5):582-591

Parallel algorithms for several important combinatorial problems such as the all nearest smaller values problem, triangulating a monotone polygon, and line packing are presented. These algorithms achieve linear speedups on the pipelined hypercube, and provably optimal speedups on the shuffle-exchange and the cube-connected-cycles for any number p of processors satisfying 1⩽p⩽n/((log³n)(loglog n)²), where n is the input size. The lower bound results are established under no restriction on how the input is mapped into the local memories of the different processors 相似文献

3.

Serial and parallel algorithms for the medial axis transform

Jenq J.-F. Sahni S. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(12):1218-1224

An O(n²) time serial algorithm is developed for obtaining the medial axis transform (MAT) of an n×n image. An O(log n) time CREW PRAM algorithm and an O(log² n) time SIMD hypercube parallel algorithm for the MAT are also developed. Both of these use O(n²) processors. Two problems associated with the MAT, the area and perimeter reporting problem, are studied. An O(log n) time hypercube algorithm is developed for both of them, where n is the number of squares in the MAT, and the algorithms use O(n²) processors 相似文献

4.

A smoothly parameterized family of stabilizable, observable linearsystems containing realizations of all transfer functions of McMillandegree not exceeding n

Pait F. Morse A.S. 《Automatic Control, IEEE Transactions on》1991,36(12):1475-1477

It is shown that there is a continuously parameterized family F of n-dimensional single-input single-output (SISO) stabilizable detectable linear system Σ(p) which contains at least one realization of each reduced, strictly proper transfer function of McMillan degree not exceeding n. The parameterization map p→Σ(p) is a polynomial function in 2n indeterminates from an open convex polyhedron in R²ⁿ to the linear space of all SISO n-dimensional linear systems 相似文献

5.

Generalized measures of fault tolerance in n-cube networks

Oh A.D. Choi H.-A. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(6):702-703

It is shown that for a given p (1<p⩽n ), the n-cube network can tolerate up to p2^(n-p)-1 processor failures and remains connected provided that at most p neighbors of any nonfaulty processor are allowed to fail. This generalizes the result for p=n-1, obtained by A.-M Esfahanian (1989). It is also shown that the n-cube network with n⩾5 remains connected provided that at most two neighbors of any processor are allowed to fail 相似文献

6.

Two-dimensional convolution on a pyramid computer

Chang J.H. Ibarra O.H. Pong T.-C. Sohn S.M. 《IEEE transactions on pattern analysis and machine intelligence》1988,10(4):590-593

An algorithm for convolving a k×k window of weighting coefficients with an n×n image matrix on a pyramid computer of O(n²) processors in time O(logn+k²), excluding the time to load the image matrix, is presented. If k=Ω (√log n), which is typical in practice, the algorithm has a processor-time product O(n ² k²) which is optimal with respect to the usual sequential algorithm. A feature of the algorithm is that the mechanism for controlling the transmission and distribution of data in each processor is finite state, independent of the values of n and k. Thus, for convolving two {0, 1}-valued matrices using Boolean operations rather than the typical sum and product operations, the processors of the pyramid computer are finite-state 相似文献

7.

Algorithms and bounds for shortest paths and diameter in faultyhypercubes

Tien S.-B. Raghavendra C.S. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(6):713-718

In an n-dimensional hypercube Qn, with the fault set |F|<2_n-2, assuming S and D are not isolated, it is shown that there exists a path of length equal to at most their Hamming distance plus 4. An algorithm with complexity O (|F|logn) is given to find such a path. A bound for the diameter of the faulty hypercube Qn-F, when |F|<2_n-2, as n+2 is obtained. This improves the previously known bound of n+6 obtained by A.-H. Esfahanian (1989). Worst case scenarios are constructed to show that these bounds for shortest paths and diameter are tight. It is also shown that when |F|<2n-2, the diameter bound is reduced to n+1 if every node has at least 2 nonfaulty neighbors and reduced to n if every node has at least 3 nonfaulty neighbors 相似文献

8.

Odd even shifts in SIMD hypercubes

Ranka S. Sahni S. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(1):77-82

A linear-time algorithm is developed to perform all odd (even) length circular shifts of data in an SIMD (single-instruction-stream, multiple-data-stream) hypercube. As an application, the algorithm is used to obtain an O(M²+log N) time and O(1) memory per processor algorithm to compute the two-dimensional convolution of an N×N image and an M×M template on an N² processor SIMD hypercube. This improves the previous best complexity of O(M² log M+log N) 相似文献

9.

The stability of a family of polynomials can be deduced from afinite number 0(k³) of frequency checks

Djaferis T.E. Hollot C.V. 《Automatic Control, IEEE Transactions on》1989,34(9):982-986

Let φ(s,a)=φ₀(s,a)+ a₁φ₁(s)+a₂φ₂(s)+ . . .+a_kφ _k(s)=φ₀(s)-q(s, a) be a family of real polynomials in s, with coefficients that depend linearly on parameters a_i which are confined in a k-dimensional hypercube Ω_a. Let φ₀(s) be stable of degree n and the φ_i(s) polynomials (i⩾1) of degree less than n. A Nyquist argument shows that the family φ(s) is stable if and only if the complex number φ₀(jω) lies outside the set of complex points -q(jω,Ω_a) for every real ω. In a previous paper (Automat. Contr. Conf., Atlanta, GA, 1988) the authors have shown that -q(jω,Ω_a), the so-called `-q locus', is a 2k convex parpolygon. The regularity of this figure simplifies the stability test. In the present paper they again exploit this shape and show that to test for stability only a finite number of frequency checks need to be done; this number is polynomial in k, 0(k³), and these critical frequencies correspond to the real nonnegative roots of some polynomials 相似文献

10.

Computing the width of a set 总被引：1，自引：0，他引：1

Houle M.E. Toussaint G.T. 《IEEE transactions on pattern analysis and machine intelligence》1988,10(5):761-765

For a set of points P in three-dimensional space, the width of P, W (P), is defined as the minimum distance between parallel planes of support of P. It is shown that W(P) can be computed in O(n log n +I) time and O(n) space, where I is the number of antipodal pairs of edges of the convex hull of P, and n is the number of vertices; in the worst case, I=O( n²). For a convex polyhedra the time complexity becomes O(n+I). If P is a set of points in the plane, the complexity can be reduced to O(nlog n). For simple polygons, linear time suffices 相似文献

11.

Optimal parallel initialization algorithms for a class of priorityqueues

Olariu S. Wen Z. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(4):423-429

An adaptive parallel algorithm for inducing a priority queue structure on an n-element array is presented. The algorithm is extended to provide optimal parallel construction algorithms for three other heap-like structures useful in implementing double-ended priority queues, namely min-max heaps, deeps, and min-max-pair heaps. It is shown that an n-element array can be made into a heap, a deap, a min-max heap, or a min-max-pair heap in O(log n+(n /p)) time using no more than n/log n processors, in the exclusive-read-exclusive-write parallel random-access machine model 相似文献

12.

Job scheduling in a partitionable mesh using a two-dimensionalbuddy system partitioning scheme

Li K. Cheng K.-H. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(4):413-422

The job scheduling problem in a partitionable mesh-connected system in which jobs require square meshes and the system is a square mesh whose size is a power of two is discussed. A heuristic algorithm of time complexity O(n(log n+log p)), in which n is the number of jobs to be scheduled and p is the size of the system is presented. The algorithm adopts the largest-job-first scheduling policy and uses a two-dimensional buddy system as the system partitioning scheme. It is shown that, in the worst case, the algorithm produces a schedule four times longer than an optimal schedule, and, on the average, schedules generated by the algorithm are twice as long as optimal schedules 相似文献

13.

An efficient distributed knot detection algorithm

Cidon I. 《IEEE transactions on pattern analysis and machine intelligence》1989,15(5):644-649

A distributed knot detection algorithm for general graphs is presented. The knot detection algorithm uses at most O(n log n+m) messages and O(m+n log n) bits of memory to detect all knots' nodes in the network (where n is the number of nodes and m is the number of links). This is compared to O(n²) messages needed in the best algorithm previously published. The knot detection algorithm makes use of efficient cycle detection and clustering techniques. Various applications for the knot detection algorithms are presented. In particular, its importance to deadlock detection in store and forward communication networks and in transaction systems is demonstrated 相似文献

14.

Numerical algorithms for eigenvalue assignment by constant anddynamic output feedback

Misra P. Patel R.V. 《Automatic Control, IEEE Transactions on》1989,34(6):579-588

Algorithms are proposed for eigenvalue assignment (EVA) by constant as well as dynamic output feedback. The main algorithm is developed for single-input, multioutput systems and the results are then extended to multiinput, multioutput systems. In computing the feedback, use is made of the fact that the closed-loop eigenvalues can almost always be assigned arbitrarily close to the desired locations in the complex plane, provided the system satisfies the condition m+ p>n, where m, p, and n are , respectively, the number of inputs, outputs and states of the system. The EVA problem has been treated as a converse of the algebraic eigenvalue problem. The proposed algorithms are based on the implicitly shifted QR algorithm for solving the algebraic eigenvalue problem. The performance of the algorithms is illustrated by several numerical examples 相似文献

15.

A unified task-based dependability model for hypercube computers

Das C.R. Kim J. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(3):312-324

A unified analytical model for computing the task-based dependability (TDB) of hypercube architectures is presented. A hypercube is deemed operational as long as a task can be executed on the system. The technique can compute both reliability and availability for two types of task requirements-I-connected model and subcube model. The I-connected TBD assumes that a connected group of at least I working nodes is required for task execution. The subcube TBD needs at least an m-cube in an n-cube, m⩽ n, for task execution. The dependability is computed by multiplying the probability that x nodes (x⩾I or x⩾2^m) are working in an n-cube at time t by the conditional probability that the hypercube can satisfy any one of the two task requirements from x working nodes. Recursive models are proposed for the two types of task requirements to find the connection probability. The subcube requirement is extended to find multiple subcubes for analyzing multitask dependability. The analytical results are validated through extensive simulation 相似文献

16.

A novel discrete relaxation architecture 总被引：1，自引：0，他引：1

Gu J. Wang W. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(8):857-865

The discrete relaxation algorithm (DRA) is a computational technique that enforces arc consistency (AC) in a constraint satisfaction problem (CSP). The original sequential AC-1 algorithm suffers from O(n³m³) time complexity, and even the optimal sequential AC-4 algorithm is O (n²m²) for an n-object and m-label DRA problem. Sample problem runs show that these algorithms are all too slow to meet the need for any useful, real-time CSP applications. A parallel DRA5 algorithm that reaches a lower bound of O(nm) (where the number of processors is polynomial in the problem size) is given. A fine-grained, massively parallel hardware computer architecture has been designed for the DRA5 algorithm. For practical problems, many orders of magnitude of efficiency improvement can be reached on such a hardware architecture 相似文献

17.

A VLSI constant geometry architecture for the fast Hartley andFourier transforms

Zapata E.L. Arguello F. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(1):58-70

An application-specific architecture for the parallel calculation of the decimation in time and radix 2 fast Hartley (FHT) and Fourier (FFT) transforms is presented. A real sequence with N=2ⁿ data items is considered as input. The system calculates the FHT and the FFT in n and n+1 stages. respectively. The modular and regular parallel architecture is based on a constant geometry algorithm using butterflies of four data items and the perfect unshuffle permutation. With this permutation, the mapping of the algorithm in VLSI technology is simplified and the communications among processors are minimized. Organization of the processor memory based on first-in, first-out (FIFO) queues facilitates a systolic data flow and permits the implementation in a direct way of the complex data movements and address sequences of the transforms. This is accomplished by means of simple multiplexing operations, using hardwired control. The total calculation time is (Nlog₂N)/4Q cycles for the FHT and N(1+log₂N)/4Q cycles for the FFT, where Q is the number of processors ( Q= 2^q, Q⩽N/4) 相似文献

18.

Constant time algorithms for the transitive closure and somerelated graph problems on processor arrays with reconfigurable bussystems

Wang B.-F. Chen G.-H. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(4):500-507

The transitive closure problem in O(1) time is solved by a new method that is far different from the conventional solution method. On processor arrays with reconfigurable bus systems, two O (1) time algorithms are proposed for computing the transitive closure of an undirected graph. One is designed on a three-dimensional n×n×n processor array with a reconfigurable bus system, and the other is designed on a two-dimensional n²×n² processor array with a reconfigurable bus system, where n is the number of vertices in the graph. Using the O(1) time transitive closure algorithms, many other graph problems are solved in O(1) time. These problems include recognizing bipartite graphs and finding connected components, articulation points, biconnected components, bridges, and minimum spanning trees in undirected graphs 相似文献

19.

The capacity of associative memories with malfunctioning neurons

Shirazi M.N. Maekawa S. 《Neural Networks, IEEE Transactions on》1993,4(4):628-635

Hopfield associative memories with αn malfunctioning neurons are considered. Using some facts from exchangeable events theory, the asymptotic storage capacity of such a network is derived as a function of the parameter α under stability and attractivity requirements. It is shown that the asymptotic storage capacity is (1-α)2n/(4 log n) under stability and (1-α)²(1-2ρ)²n/(4 log n) under attractivity requirements, respectively. Comparing these capacities with their maximum values corresponding to the case when there is no malfunctioning neurons, α=0, shows the robustness of the retrieval mechanism of Hopfield associative memories with respect to the existence of malfunctioning neurons. This result also supports the claim that neural networks are fault tolerant 相似文献

20.

An efficient algorithm for calculating the likelihood andlikelihood gradient of ARMA models

Burshtein D. 《Automatic Control, IEEE Transactions on》1993,38(2):336-340

Exact analytical expressions are obtained for the likelihood and likelihood gradient stationary autoregressive moving average (ARMA) models. Denote the sample size by N, the autoregressive order by p, and the moving average order by q. The calculation of the likelihood requires (p+2q+1)N +o(N) multiply-add operations, and the calculation of the likelihood gradient requires (2p+6q+2)N+o(N) multiply-add operations. These expressions may be used to obtain an iterative, Newton-Raphson-type converging algorithm, with superlinear convergence rate, that computes the maximum-likelihood estimator in (2 p+6q+2)N+o(N) multiply-add operations per iteration 相似文献