期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A processor-time-minimal systolic array for transitive closure

Scheiman C.J. Cappello P.R. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(3):257-269

Using a directed acyclic graph (DAG) model of algorithms, the authors focus on processor-time-minimal multiprocessor schedules: time-minimal multiprocessor schedules that use as few processors as possible. The Kung, Lo, and Lewis (KLL) algorithm for computing the transitive closure of a relation over a set of n elements requires at least 5n-4 parallel steps. As originally reported. their systolic array comprises n² processing elements. It is shown that any time-minimal multiprocessor schedule of the KLL algorithm's dag needs at least n²/3 processing elements. Then a processor-time-minimal systolic array realizing the KLL dag is constructed. Its processing elements are organized as a cylindrically connected 2-D mesh, when n=0 mod 3. When n≠0 mod 3, the 2-D mesh is connected as a torus 相似文献

2.

A unified systolic array for adaptive beamforming

Adam W. Bojanczyk Franklin T. Luk 《Journal of Parallel and Distributed Computing》1990,8(4)

We present a new algorithm and systolic array for adaptive beamforming. Our approach improves on McWhirter's pioneering work in two respects. First, our algorithm uses only orthogonal transformations and thus should have better numerical properties. Second, the algorithm can be implemented on one single p × p triangular array of programmable processors that offers a throughput of one residual element per cycle. 相似文献

3.

A content-addressable systolic array for sparse matrix computation

Omar Wing 《Journal of Parallel and Distributed Computing》1985,2(2):170-181

A systolic array is proposed which is specifically designed to solve a system of sparse linear equations. The array consists of a number of processing elements connected in a ring. Each processing element has its own content-addressable memory where the nonzero elements of the sparse matrix are stored. Matrix elements to which elementary operations are applied are extracted from the memory by content addressing. The system of equations is solved in a systolic fashion and the solution is obtained in NZ + 5n ? 2 steps, where NZ is the number of nonzero elements along and below the diagonal and n is the number of equations. 相似文献

4.

A bit-level systolic array for digital contour smoothing

Nikolai Petkov Fridrich Sloboda 《Parallel Computing》1989,12(3):301-313

Linear operators for digital contour smoothing are described. These operators are defined by circulant Toeplitz matrices and allow to smooth digital contours in the least-squares sense. They minimize the undersampling, digitizing and quantizing error and allow to calculate invariants, such as curvature, which are not possible to calculate without smoothing. A bit-level systolic array which is capable of realizing the proposed operator is described. This array is easy to implement in VLSI, because the array cells involved are very simple. Furthermore, the array is completely pipelined on the bit-level, so that it operates with a high clock frequency achieving very high throughputs. 相似文献

5.

A data-driven VLSI array for arbitrary algorithms

Koren I. Mendelson B. Peled I. Silberman G.M. 《Computer》1988,21(10):30-43

The design of specialized processing array architectures, capable of executing any given arbitrary algorithm, is proposed. An approach is adopted in which the algorithm is first represented in the form of a dataflow graph and then mapped onto the specialized processor array. The processors in this array execute the operations included in the corresponding nodes (or subsets of nodes) of the dataflow graph, while regular interconnections of these elements serve as edges of the graph. To speed up the execution, the proposed array allows the generation of computation fronts and their cancellation at a later time, depending on the arriving data operands; thus it is called a data-driven array. The structure of the basic cell and its programming are examined. Some design details are presented for two selected blocks, the instruction memory and the flag array. A scheme for mapping a dataflow graph (program) onto a hexagonally connected array is described and analyzed. Two distinct performance measures-mapping efficiency and array utilization-and some performance results are discussed 相似文献

6.

A systolic array for the longest common subsequence problem

Yves Robert Maurice Tchuente 《Information Processing Letters》1985,21(4):191-198

We introduce a linear systolic array for the Longest Common Subsequence (LCS, for short) problem. We first present an array of m identical cells which computes the length of an LCS of two strings of length m and n, respectively, in linear time (i.e., in time proportional to m + n). Then we show that, by extending any cell with the systolic stack introduced by Guibas and Liang (1982), a new array can be designed to recover an LCS in linear time. 相似文献

7.

A lattice model for cellular (systolic) algorithms

《Parallel Computing》1986,3(3):251-258

In this paper a lattice model (here ‘lattice’ means a net of points) is introduced for homogeneous cellular algorithms. On the basis of this model a transformation methodology is developed, which makes it possible to produce many different versions of a given cellular algorithm. These versions may have quite different structural properties, but they perform the same computation as the original algorithm. In this way a great variety of cellular algorithms can be offered to choose the best version in practice and, on the other hand, cellular algorithms can be classified according to their inherent structures. 相似文献

8.

Efficient algorithms for array redistribution

Thakur R. Choudhary A. Ramanujam J. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(6):587-594

Dynamic redistribution of arrays is required very often in programs on distributed presents efficient algorithms for redistribution between different cyclic(k) distributions, as defined in High Performance Fortran. We first propose special optimized algorithms for a cyclic(x) to cyclic(y) redistribution when x is a multiple of y, or y is a multiple of x. We then propose two algorithms, called the GCD method and the LCM method, for the general cyclic(x) to cyclic(y) redistribution when there is no particular relation between x and y. We have implemented these algorithms on the Intel Touchstone Delta, and find that they perform well for different array sizes and number of processors 相似文献

9.

A systolic processor array for the quadrant interlocking elimination method

《国际计算机数学杂志》2012,89(1):29-44

In this paper a systolic array is presented for the solution of linear equations occurring in scientific and engineering calculations. The new systolic array is based on the parallel Quadrant Interlocking Elimination method of Evans and Hadjidimos [1]. 相似文献

10.

Two-level pipelined systolic array for multidimensional convolution

HT Kung LM Ruane DWL Yen 《Image and vision computing》1983,1(1):30-36

This paper describes a systolic array for the computation of n-dimensional (nD) convolutions for any positive integer n. Systolic systems usually achieve high performance by allowing computations to be pipelined over a large array of processing elements. To achieve even higher performance, the systolic array described in this paper uses a second level of pipelining by allowing the processing elements themselves to be pipelined to an arbitrary degree. 相似文献

11.

Computational geometry algorithms for the systolic screen

F. Dehne A. -L. Hassenklover J. -R. Sack N. Santoro 《Algorithmica》1991,6(1-6):734-761

Adigitized plane Π of sizeM is a rectangular √M × √M array of integer lattice points called pixels. A √M × √M mesh-of-processors in which each processorP _ij represents pixel (i,j) is a natural architecture to store and manipulate images in Π; such a parallel architecture is called asystolic screen. In this paper we consider a variety of computational-geometry problems on images in a digitized plane, and present optimal algorithms for solving these problems on a systolic screen. In particular, we presentO(√M)-time algorithms for determining all contours of an image; constructing all rectilinear convex hulls of an image (peeling); solving the parallel and perspective visibility problem forn disjoint digitized images; and constructing the Voronoi diagram ofn planar objects represented by disjoint images, for a large class of object types (e.g., points, line segments, circles, ellipses, and polygons of constant size) and distance functions (e.g., allL _p metrics). These algorithms implyO(√M)-time solutions to a number of other geometric problems: e.g., rectangular visibility, separability, detection of pseudo-star-shapedness, and optical clustering. One of the proposed techniques also leads to a new parallel algorithm for determining all longest common subsequences of two words. 相似文献

12.

Computational geometry algorithms for the systolic screen

F. Dehne A. -L. Hassenklover J. -R. Sack N. Santoro 《Algorithmica》1991,6(1):734-761

Adigitized plane of sizeM is a rectangular M × M array of integer lattice points called pixels. A M × M mesh-of-processors in which each processorP _ij represents pixel (i,j) is a natural architecture to store and manipulate images in ; such a parallel architecture is called asystolic screen. In this paper we consider a variety of computational-geometry problems on images in a digitized plane, and present optimal algorithms for solving these problems on a systolic screen. In particular, we presentO(M)-time algorithms for determining all contours of an image; constructing all rectilinear convex hulls of an image (peeling); solving the parallel and perspective visibility problem forn disjoint digitized images; and constructing the Voronoi diagram ofn planar objects represented by disjoint images, for a large class of object types (e.g., points, line segments, circles, ellipses, and polygons of constant size) and distance functions (e.g., allL _p metrics). These algorithms implyO(M)-time solutions to a number of other geometric problems: e.g., rectangular visibility, separability, detection of pseudo-star-shapedness, and optical clustering. One of the proposed techniques also leads to a new parallel algorithm for determining all longest common subsequences of two words.Research supported by the Naural Sciences and Engineering Research Council of Canada. With the Editor-in-Chief's permission, this paper was sent to the referees in a form which kept them unaware of the fact that the Guest Editor is one of the co-authors. 相似文献

13.

Fast median filtering algorithms for mesh computers

Steven L. 《Pattern recognition》1995,28(12):1965-1972

Two fast algorithms for median filtering of images using parallel computers having 2-D mesh interconnections are given. Both algorithms assume that an n × n image is loaded onto the mesh with one processing element per pixel. One algorithm performs median filtering over d × d neighborhoods in O(d²) time and works with pixel values in an arbitrarily large range. This algorithm, while theoretically suboptimal, achieves a lower constant than a previously published asymptotically—optimal algorithm and is simpler to program. The second algorithm assumes that the range of pixel values is limited and relatively small, and it accomplishes median filtering in O(d) time. 相似文献

14.

A generic systolic array building block for neural networks withon-chip learning

Lehmann C. Viredaz M. Blayo F. 《Neural Networks, IEEE Transactions on》1993,4(3):400-407

Neural networks require VLSI implementations for on-board systems. Size and real-time considerations show that on-chip learning is necessary for a large range of applications. A flexible digital design is preferred here to more compact analog or optical realizations. As opposed to many current implementations, the two-dimensional systolic array system presented is an attempt to define a novel computer architecture inspired by neurobiology. It is composed of generic building blocks for basic operations rather than predefined neural models. A full custom VLSI design of a first prototype has demonstrated the efficacy of this design. A complete board dedicated to Hopfield's model has been designed using these building blocks. Beyond the very specific application presented, the underlying principles can be used for designing efficient hardware for most neural network models. 相似文献

15.

A period-processor-time-minimal schedule for cubical meshalgorithms

Scheiman C. Cappello P. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(3):274-280

Using a directed acyclic graph (dag) model of algorithms, we investigate precedence-constrained multiprocessor schedules for the n×n×n directed mesh. This cubical mesh is fundamental, representing the standard algorithm for square matrix product, as well as many other algorithms. Its completion requires at least 3^n-2 multiprocessor steps. Time-minimal multiprocessor schedules that use as few processors as possible are called processor-time-minimal. For the cubical mesh, such a schedule requires at least [3n²/4] processors. Among such schedules, one with the minimum period (i.e., maximum throughput) is referred to as a period-processor-time-minimal schedule. The period of any processor-time-minimal schedule for the cubical mesh is at least 3^n/2 steps. This lower bound is shown to be exact by constructing, for n a multiple of 6, a period-processor-time-minimal multiprocessor schedule that can be realized on a systolic array whose topology is a toroidally connected n/2×n/2×3 mesh 相似文献

16.

A survey on routing algorithms for wireless Ad-Hoc and mesh networks

Eiman Alotaibi Biswanath Mukherjee 《Computer Networks》2012,56(2):940-965

Wireless networking technology is evolving as an inexpensive alternative for building federated and community networks (relative to the traditional wired networking approach). Besides its cost-effectiveness, a wireless network brings operational efficiencies, namely mobility and untethered convenience to the end user. A wireless network can operate in both the “Ad-Hoc” mode, where users are self-managed, and the “Infrastructure” mode, where an authority manages the network with some Infrastructure such as fixed wireless routers, base stations, access points, etc. An Ad-Hoc network generally supports multi-hopping, where a data packet may travel over multiple hops to reach its destination. Among the Infrastructure-based networks, a Wireless Mesh Network (with a set of wireless routers located at strategic points to provide overall network connectivity) also provides the flexibility of multi-hopping. Therefore, how to route packets efficiently in wireless networks is a very important problem.A variety of wireless routing solutions have been proposed in the literature. This paper presents a survey of the routing algorithms proposed for wireless networks. Unlike routing in a wired network, wireless routing introduces new paradigms and challenges such as interference from other transmissions, varying channel characteristics, etc. In a wireless network, routing algorithms are classified into various categories such as Geographical, Geo-casting, Hierarchical, Multi-path, Power-aware, and Hybrid routing algorithms. Due to the large number of surveys that study different routing-algorithm categories, we select a limited but representative number of these surveys to be reviewed in our work. This survey offers a comprehensive review of these categories of routing algorithms.In the early stages of development of wireless networks, basic routing algorithms, such as Dynamic Source Routing (DSR) and Ad-Hoc On-demand Distance Vector (AODV) routing, were designed to control traffic on the network. However, it was found that applying these basic routing algorithms directly on wireless networks could lead to some issues such as large area of flooding, Greedy Forwarding empty set of neighbors, flat addressing, widely-distributed information, large power consumption, interference, and load-balancing problems. Therefore, a number of routing algorithms have been proposed as extensions to these basic routing algorithms to enhance their performance in wireless networks. Hence, we study the features of routing algorithms, which are compatible with the wireless environment and which can overcome these problems. 相似文献

17.

Mechanical certification of systolic algorithms

S. Purushothaman P. A. Subrahmanyam 《Journal of Automated Reasoning》1989,5(1):67-91

We present in this paper an approach for mechanically certifying the correctness of systolic algorithms and detail the correctness proof of a systolic algorithm for a dynamic programming (optimal parenthesization) problem. 相似文献

18.

Robust algorithms for packet routing in a mesh

P. Raghavan 《Theory of Computing Systems》1995,28(1):1-11

This paper considers the problem of permutation packet routing on a n×n mesh-connected array of processors. Each node in the array is assumed to be independently faulty with a probability bounded above by a valuep. This paper gives a routing algorithm which, ifp 0.29, will with very high probability route every packet that can be routed inO(n logn) steps with queue lengths that areO(log² n). Extensions to higher-dimensional meshes are given. 相似文献

19.

Efficient algorithms for parallel sorting on mesh multicomputers

V. Singh V. Kumar G. Agha C. Tomlinson 《International journal of parallel programming》1991,20(2):95-131

We present two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyze their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers, while QSP1 is fairly close to optimal. Langet al. ⁽¹⁾ and Schnorret al. ⁽²⁾ have developed parallel sorting algorithms for the mesh architecture that have either optimal (Schnorr) or close to optimal (Lang) run-time complexity for the one-element-perprocessor case. Both QSP1 and QSP2 have better scalability than the scaled-down variants of these algorithms (for the case in which there are more elements than processors). We also analyze a different variant of Lang's sort which is as scalable as QSP2. We briefly discuss another metric called resource consumption. According to this metric, both QSP1 and QSP2 are superior to variants of Lang's sort. 相似文献

20.

A programmable array processor architecture for flexible approximate string matching algorithms

Panagiotis D. Michailidis Konstantinos G. Margaritis 《Journal of Parallel and Distributed Computing》2007

Approximate string matching problem is a common and often repeated task in information retrieval and bioinformatics. This paper proposes a generic design of a programmable array processor architecture for a wide variety of approximate string matching algorithms to gain high performance at low cost. Further, we describe the architecture of the array and the architecture of the cell in detail in order to efficiently implement for both the preprocessing and searching phases of most string matching algorithms. Further, the architecture performs approximate string matching for complex patterns that contain don’t care, complement and classes symbols. We also simulate and evaluate the proposed architecture on a field programmable gate array (FPGA) device using the JHDL tool for synthesis and the Xilinx Foundation tools for mapping, placement, and routing. Finally, our programmable implementation achieves about 8–340 times faster execution than a desktop computer with a Pentium 4 3.5 GHz for all algorithms when the length of the pattern is 1024. 相似文献