首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Processors arrays with reconfigurable bus systems (abbreviated to PARBS) have been received a lot of attention in the last decade, and many undirected graph algorithms with constant time complexity have been proposed on PARBS. However, for a directed graph, it will be proved that connecting PARBS in the way proposed for undirected graphs generates paths which do not exist in the directed graph. This result may lead to incorrect solution for directed graph problems. Therefore, in this paper, a model named D-PARBS (Directional PARBS) is proposed for eliminating the non-existent paths. This model can be used to correctly identifying redundant arcs on directed graphs in constant time. Furthermore, by modifying the D-PARBS architecture, constant time algorithms with O(n 3) processors are developed to solve topological sort, transitive closure, cyclic graph checking, and strongly connected component problems on directed graphs. Chi-Jung Kuo: He is currently an operation officer of the Information Processing Department in the Taipei Branch of Bangkok Bank. His research interests include parallel processing and graph theory. He received his B.S. and M.S. degree in Information Management from National Taiwan University of Science and Technology in 1993 and 1995, respectively. He was a teacher in the Chin Min Junior College from 1995 to 1996. Chiun-Chieh Hsu, Ph.D.: He is currently a full professor of Information Management Department at National Taiwan University of Science and Technology. His current research interests include fault-tolerant computing, parallel and distributed processing, networks, and graph theory. He received his B.E., M.E., and Ph.D. degrees all in Electrical Engineering from National Taiwan University in 1983, 1987, and 1990, respectively. He worked as a software and firmware design engineer of Research and Development in Acer Computer Company from 1983 to 1985. Wei-Chen Fang: He received the BS degree in MIS from Tamkang University and the MS degree in Information Management from National Taiwan University of Science and Technology. Currently, he is a Ph.d student and works as an assistant researcher with the Advanced Technology research group for Chunghwa Telecommunication Laboratory. His research interests include multi-processor architecture, fault-tolerant computing, and parallel and distributed processing.  相似文献   

2.
3.
The paper contains results of computational experiences with the following algorithms for finding the transitive closure of a digraph: (i) Warshall's algorithm [17], (ii) Purdom's algorithm [13], (iii) the modification of Yen's algorithm [14], and (iv) the new algorithms for finding the transitive closure [3, 4]. The tested digraphs were generated at random. The enclosed references contain all papers known to the authors concerning transitive closure algorithms.  相似文献   

4.
In this paper we analyze the algorithms expressed as a system of recurrence equations. The algorithms are called 2?1 output algorithms if two output values of one function (variable identification) are specified by the system of recurrence equations for each index point in the algorithm. The algorithm is in free form if the indexes of these two values are not dependent. Two standard classes are determined by this criteria: the nearest neighbour and the all pair form. For example the sorting algorithm can be expressed in the all pair form i.e., the linear insertion algorithm or in the nearest neighbour form i.e., the bubble sort algorithm. However these algorithms are different in their nature. A procedure to eliminate the computational broadcast for the all pair 2?1output algorithm has been proposed by the authors in [1]. The result obtained by implementing this procedure was a localized form of the algorithm and a system of uniform recurrence equations by eliminating the computational and data broadcast. So he data dependence method can be efficiently used for parallel implementations. The proposed procedure cannot be implemented directly on the nearest neighbour form algorithms. Here we show how the algorithm can be restructured into a form where the computational and data broadcast can be eliminated. These transformations result in localized algorithms. A few examples show how these algorithms can be implemented on processor arrays. For example, the Gentleman Kung triangular array [2] can be used for solving the QR decomposition algorithm for both forms of the algorithm. The implementations differ in the order of the data flow and the processor operation. We show that the implementation of the nearest neighbour algorithm is even better than the standard one.  相似文献   

5.
We develop constant-time algorithms to compute the Hough transform on a processor array with a reconfigurable bus system (abbreviated to PARBS). The PARBS is a comptuation model which consists of a processor array and a reconfigurable bus system. It is a very powerful computation model in that many problems can be solved efficiently. In this paper, we introduce the concept of iterative-PARBS which is similar to the FOR-loop construct in sequential programming languages. The iterative-PARBS is a building block in which the processing data can be routed through it several times. We can think it as a “hardware subroutine”. Based on this scheme, we are able to explore constant-time Hough transform algorithms on PARBS. The following new results are derived in this study:
  1. The sum ofn bits can be computed in O(1) times on a PARBS with O(n 1+?) processors for any fixed ?>0.
  2. The weights of each simple path of ann*n image can be computed in O(1) time on a 3-D PARBS with O(n 2+?) processors for any fixed ?>0.
  3. Thep angle Hough transform of ann*n image can be computed in O(1) time on a PARBS with O(p*n 2+?) processors for any fixed ?>0 withp copies of the image pretiled.
  4. Thep angle Hough transform of ann*n image can be computed in O(1) time on a PARBS with O(p*n 3) processors.
  相似文献   

6.
Practical parallel algorithms, based on classical sequential Union-Find algorithms for computing transitive closures of binary relations, are described and implemented for both shared memory and distributed memory parallel computers. By practical algorithms, we mean algorithms that are efficient for parallel systems with bounded numbers of processors as opposed to algorithms where the number of processors grows with the problem size. Transitive closures are useful for decomposing many applications problems into independent subproblems. The implementations were on an ENCORE Multimax shared memory machine and an NCUBE hypercube. Our implementations indicate that transitive closure computations are intrinsically difficult for distributed memory parallel machines because of the need for global information. By contrast, our results for shared memory machines exhibited excellent speedups.Supported in part by NSF Grant DCR-8619103, ONR contract N000-86-G-0202 and DOE Grant DE-FG02-85ER25001.Supported in part by RADC contract F30602-85-C-0303.Supported in part by RADC contract F30602-85-C-0303.  相似文献   

7.
Computer vision algorithms are natural candidates for high performance computing systems. Algorithms in computer vision are characterized by complex and repetitive operations on large amounts of data involving a variety of data interactions (e.g., point operations, neighborhood operations, global operations). In this paper, we describe the use of the custom computing approach to meet the computation and communication needs of computer vision algorithms. By customizing hardware architecture at the instruction level for every application, the optimal grain size needed for the problem at hand and the instruction granularity can be matched. A custom computing approach can also reuse the same hardware by reconfiguring at the software level for different levels of the computer vision application. We demonstrate the advantages of our approach using Splash 2-a Xilinx 4010-based custom computer  相似文献   

8.
We show that there is a randomizedoblivious algorithm for routing any (partial) permutation on ann ×n grid in 2n +O(logn) parallel communication steps. The queues will not grow larger than Θ(logn/log logn) with high probability. We then modify this to obtain a (nonoblivious) algorithm with the same running time such that the size of the queues is bounded by a constant with high probability. For permutations withlocality, where each packet has to travel a distance at mostL, a generalization of the algorithm routes in time proportional toL with high probability. Finally, we identify a class of meshlike networks that have optimal or near-optimal diameter. These meshes have the potential of being adapted to run existing sorting and routing algorithms with corresponding reduction in their running times.  相似文献   

9.
Effective fault tolerance techniques are essential for improving the reliability of multiprocessor systems. At the same time, fault tolerance must be achieved at high speed to meet the real-time constraints of embedded systems. While parallelism has often been exploited to increase performance, to the best of our knowledge, there has been no previously reported work on parallel reconfiguration of mesh-connected processor arrays with faults. This paper presents two parallel algorithms to accelerate reconfiguration of the processor arrays. The first algorithm reconfigures a host array in parallel in a multithreading manner. The threads in the parallel algorithm execute independently within a safe rerouting distance. The second algorithm is based on a divide-and-conquer approach to first generate the leftmost segments in parallel and then merge the segments in parallel. When compared to the conventional algorithm, simulation results from a large number of instances confirm that the proposed algorithms significantly accelerate the reconfiguration without loss of harvest.  相似文献   

10.
11.
We show that there is a randomizedoblivious algorithm for routing any (partial) permutation on ann ×n grid in 2n +O(logn) parallel communication steps. The queues will not grow larger than (logn/log logn) with high probability. We then modify this to obtain a (nonoblivious) algorithm with the same running time such that the size of the queues is bounded by a constant with high probability. For permutations withlocality, where each packet has to travel a distance at mostL, a generalization of the algorithm routes in time proportional toL with high probability. Finally, we identify a class of meshlike networks that have optimal or near-optimal diameter. These meshes have the potential of being adapted to run existing sorting and routing algorithms with corresponding reduction in their running times.Preliminary reports of portions of the results contained in this paper have appeared in theProceedings of the 1988 Aegean Workshop on Computing [5], and in theProceedings of the 1987 Conference on Foundations of Software Technology and Theoretical Computer Science [18]. The work of the first author was supported in part by NSF Grant NSF-DCR-85-03251 and ONR contract N00014-80-C-0647. The work of the second author was supported in part by NSF Grant NSF-DCR-86-00379.  相似文献   

12.
Most existing methods of mapping algorithms into processor arrays are restricted to the case where n-dimensional algorithms, or algorithms with n nested loops, are mapped into (n-1)-dimensional arrays. However, in practice, it is interesting to map n-dimensional algorithms into (k-1)-dimensional arrays where k<n. A computational conflict occurs if two or more computations of an algorithm are mapped into the same execution time. Based on the Hermite normal form of the mapping matrix, necessary and sufficient conditions are derived to identify mapping without computational conflicts. These conditions are used to find time mappings of n-dimensional algorithms into (k-1)-dimensional arrays, k<n , without computational conflicts. For some applications, the mapping is time-optimal  相似文献   

13.
Homogeneous processor arrays are emerging in tera-scale computation and effective fault tolerance techniques are essential to improving the reliability of such complex integrated circuits. We study the degradable processor arrays to achieve fault tolerance by employing reconfiguration. Three bypass schemes and three rerouting schemes are proposed to reconfigure three-dimensional processor arrays with defective processors to achieve target arrays without faults. A heuristic algorithm is proposed to construct a target array on the selected rows and columns. It is also proved that the proposed greedy plane rerouting algorithm (GPR) produces maximum target array. In addition, the problem of constructing the communication efficient array is considered in this paper. An algorithm is proposed to refine the communication among processors within the target array constructed by GPR. Experimental study shows that the proposed algorithm GPR produces target arrays with higher harvest and lower degradation on the host arrays with fault density no more than 5%. In addition, the communication performance is significantly optimized by reducing the number of long interconnects, and the average improvement is about 34% for all cases considered in this paper.  相似文献   

14.
We continue the study of priority or “greedy-like” algorithms as initiated in Borodin et al. (2003) [10] and as extended to graph theoretic problems in Davis and Impagliazzo (2009) [12]. Graph theoretic problems pose some modeling problems that did not exist in the original applications of Borodin et al. and Angelopoulos and Borodin (2002) [3]. Following the work of Davis and Impagliazzo, we further clarify these concepts. In the graph theoretic setting, there are several natural input formulations for a given problem and we show that priority algorithm bounds in general depend on the input formulation. We study a variety of graph problems in the context of arbitrary and restricted priority models corresponding to known “greedy algorithms”.  相似文献   

15.
Efficient parallel algorithms for graph problems   总被引:1,自引:0,他引:1  
We present an efficient technique for parallel manipulation of data structures that avoids memory access conflicts. That is, this technique works on the Exclusive Read/Exclusive Write (EREW) model of computation, which is the weakest shared memory, MIMD machine model. It is used in a new parallel radix sort algorithm that is optimal for keys whose values are over a small range. Using the radix sort and known results for parallel prefix on linked lists, we develop parallel algorithms that efficiently solve various computations on trees and unicycular graphs. Finally, we develop parallel algorithms for connected components, spanning trees, minimum spanning trees, and other graph problems. All of the graph algorithms achieve linear speedup for all but the sparsest graphs.Part of this work was done while the first author was at the University of Illinois, Urbana-Champaign, the second author was at Carnegie-Mellon University, and the third author was at the Hebrew University and the Courant Institute of Mathematical Sciences, New York University. A preliminary version of this work was presented at the 1986 International Conference on Parallel Processing.  相似文献   

16.
We present an efficient technique for parallel manipulation of data structures that avoids memory access conflicts. That is, this technique works on the Exclusive Read/Exclusive Write (EREW) model of computation, which is the weakest shared memory, MIMD machine model. It is used in a new parallel radix sort algorithm that is optimal for keys whose values are over a small range. Using the radix sort and known results for parallel prefix on linked lists, we develop parallel algorithms that efficiently solve various computations on trees and “unicycular graphs.” Finally, we develop parallel algorithms for connected components, spanning trees, minimum spanning trees, and other graph problems. All of the graph algorithms achieve linear speedup for all but the sparsest graphs.  相似文献   

17.
We present processor-time optimal parallel algorithms for several problems onn ×n digitized image arrays, on a mesh-connected array havingp processors and a memory of sizeO(n 2) words. The number of processorsp can vary over the range [1,n 3/2] while providing optimal speedup for these problems. The class of image problems considered here includes labeling the connected components of an image; computing the convex hull, the diameter, and a smallest enclosing box of each component; and computing all closest neighbors. Such problems arise in medium-level vision and require global operations on image pixels. To achieve optimal performance, several efficient data-movement and reduction techniques are developed for the proposed organization.This research was supported in part by the National Science Foundation under Grant IRI-8710836 and in part by DARPA under Contract F33615-87-C-1436 monitored by the Wright Patterson Airforce Base.  相似文献   

18.
We present processor-time optimal parallel algorithms for several problems onn ×n digitized image arrays, on a mesh-connected array havingp processors and a memory of sizeO(n 2) words. The number of processorsp can vary over the range [1,n 3/2] while providing optimal speedup for these problems. The class of image problems considered here includes labeling the connected components of an image; computing the convex hull, the diameter, and a smallest enclosing box of each component; and computing all closest neighbors. Such problems arise in medium-level vision and require global operations on image pixels. To achieve optimal performance, several efficient data-movement and reduction techniques are developed for the proposed organization.  相似文献   

19.
In this paper, we present algorithms for computing the Euclidean distance transform (EDT) of a binary image on the array with reconfigurable optical buses (AROB). First, we develop a parallel algorithm termed as Algorithm Expander which can be implemented in O(1) time on an AROB with N x Ndelta processors, where delta = 1/k, k is a constant and a positive integer. Algorithm Expander is designed to compute a higher dimensional EDT based on the computed lower dimensional EDT. It functions as a general EDT expander for us to expand EDT from a lower dimension to a higher dimension. We then develop parallel algorithms for the two-dimensional (2-D)_EDT of a binary image array of size N x N in O(1) time on an AROB with N x N x Ndelta processors and for the three-dimensional (3-D)_EDT of a binary image of size N x N x N in O(1) time on an AROB with N x N x N x Ndelta processors. To the best of our knowledge, all results derived above are the best O(1) time algorithms known. We then extend it to compute the nD_EDT of a binary image of size Nn in O(n) time on an AROB with Nn+delta processors. We also apply our parallel EDT algorithms to build Voronoi diagram and Voronoi polyhetra (polygons), to find all maximal empty spheres and the largest empty sphere, and to compute the medial axis transform. All of these applications can be solved in the same time complexity on an AROB with the same number of processors as needed for solving the EDT problems in the same dimensions.  相似文献   

20.
The reconfigurable mesh consists of an array of processors interconnected by a reconfigurable bus system. The bus system can be used to dynamically obtain various interconnection patterns among the processors. Recently, this model has attracted a lot of attention. The authors show O(1) time solutions to the following computational geometry problems on the reconfigurable mesh: all-pairs nearest neighbors, convex hull, triangulation, two-dimensional maxima, two-set dominance counting, and smallest enclosing box. All these solutions accept N planar points as input and employ an N×N reconfigurable mesh. The basic scheme employed in the implementations is to recursively find an O(1) time solution. The number of recursion levels and the size of the subproblems at each level of recursion are optimized such that the problem decomposition and the solution to the problem can be obtained in constant time. As a result, they have developed some efficient merge techniques to combine the solutions for subproblems on the reconfigurable mesh. These techniques exploit reconfigurability in nontrivial ways leading to constant time solutions using optimal size of the mesh  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号