期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Synthesizing systolic arrays with control signals from recurrence equations

Sanjay V. Rajopadhye 《Distributed Computing》1989,3(2):88-105

We present a technique for synthesizing systolic arrays which have non-uniform data flow governed by control signals. The starting point for the synthesis is anAffine Recurrence Equation—a generalization of the simple recurrences encountered in mathematics. A large class of programs, including most (single and multiple) nested-loop programs can be described by such recurrences. In this paper we extend our earlier work (Rajopadhye and Fujimoto 1986) in two principal directions. Firstly, we characterize a class of transformations calleddata pipelining and show that they yield recurrences that havelinear conditional expressions governing the computation. Secondly, we discuss the synthesis of systolic arrays that have non-uniform data flow governed by control signals. We show how to derive the control signals in such arrays by applying similar pipelining transformations to theselinear conditional expressions. The approach is illustrated by deriving the Guibas-Kung-Thompson architecture for computing the cost of optimal string parenthesization.Supported by a University of Utah Graduate Research Fellowship, and NSF grant No. MIP-8802454 相似文献

2.

Systolic opportunities for multidimensional data streams

Chai S.M. Wills D.S. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(4):388-398

Portable image processing applications require an efficient, scalable platform with localized computing regions. This paper presents a new class of area I/O systolic architecture to exploit the physical data locality of planar data streams by processing data where it falls. A synthesis technique using dependence graphs, data partitioning, and computation mapping is developed to handle planar data streams and to systematically design arrays with area I/O. Simulation results show that the use of area I/O provides a 16 times speedup over systems with perimeter I/O. Performance comparisons for a set of signal processing algorithms show that systolic arrays that consider planar data streams in the design process are up to three times faster than traditional arrays 相似文献

3.

基于线性脉动阵列的卷积神经网络计算优化与性能分析

下载免费PDF全文

刘勤让刘崇阳周俊王孝龙《网络与信息安全学报》2018,4(12):16-24

针对大部分FPGA端上的卷积神经网络（CNN,convolutional neural network）加速器设计未能有效利用稀疏性的问题,从带宽和能量消耗方面考虑,提出了基于线性脉动阵列的2种改进的CNN计算优化方案。首先,卷积转化为矩阵相乘形式以利用稀疏性;其次,为解决传统的并行矩阵乘法器存在较大I/O需求的问题,采用线性脉动阵列改进设计;最后,对比分析了传统的并行矩阵乘法器和2种改进的线性脉动阵列用于CNN加速的利弊。理论证明及分析表明,与并行矩阵乘法器相比,2种改进的线性脉动阵列都充分利用了稀疏性,具有能量消耗少、I/O带宽占用少的优势。相似文献

4.

On high-speed computing with a programmable linear array

Peizong Lee Zvi M. Kedem 《The Journal of supercomputing》1990,4(3):223-249

It has been observed by many researchers that systolic arrays are very suitable for certain high-speed computations. Using a formal methodology, we present a design for a single simple programmable linear systolic array capable of solving large numbers of problems drawn from a variety of applications. The methodology is applicable to problems solvable by sequential algorithms that can be specified as nested for-loops of arbitrary depth. The algorithms of this form that can be computed on the array presented in this paper include 25 algorithms dealing with signal and image processing, algebraic computations, matrix arithmetic, pattern matching, database operations, sorting, and transitive closure. Assuming bounded I/O, for 18 of those algorithms the time and storage complexities are optimal, and therefore no improvement can be expected by using dedicated special-purpose linear systolic arrays designed for individual algorithms. We also describe another design which, using a sufficient large local memory and allowing data to be preloaded and unloaded, has an optimal processor/time product.An earlier version of this paper was presented at Supercomputing '88.This work was partially supported by ONR under the contract N00014-85-K-0046 and by NSF under Grant Number CCR-8906949. 相似文献

5.

A systolic algorithm for solving dense linear systems

Chau-Jy Lin 《Computers & Mathematics with Applications》1996,32(12):77-91

For an arbitrary n × n matrix A and an n × 1 column vector b, we present a systolic algorithm to solve the dense linear equations Ax = b. An important consideration is that the pivot row can be changed during the execution of our systolic algorithm. The computational model consists of n linear systolic arrays. For 1 ≤ i ≤ n, the i^th linear array is responsible to eliminate the i^th unknown variable x_i of x. This algorithm requires 4n time steps to solve the linear system. The elapsed time unit within a time step is independent of the problem size n. Since the structure of a PE is simple and the same type PE executes the identical instructions, it is very suitable for VLSI implementation. The design process and correctness proof are considered in detail. Moreover, this algorithm can detect whether A is singular or not. 相似文献

6.

Sliding-mode control of discrete-time linear plants with input saturation: application to a twin-rotor system

Maria Letizia Corradini Andrea Cristofaro 《International journal of control》2013,86(8):1523-1535

This paper investigates the stabilisation problem for discrete-time linear controllable systems subject to actuator saturation. In the case of completely known plants, a control design technique based on a time-varying sliding surface is proposed ensuring stabilisation under the assumption of availability of all state measurements. Further, in the presence of matched disturbances with a known constant bound, a discrete-time sliding-mode controller is proposed ensuring plant practical stabilisation, and a conservative estimate of the attraction domain is given. Theoretical results have been validated by experimental data using a twin-rotor system. 相似文献

7.

Systolic architectures for polynomial and polynomial matrix manipulations

THOMAS D. FABER STANISLAW H. ŻAK 《International journal of systems science》2013,44(9):1739-1765

Manipulations of polynomials and polynomial matrices, basic to the polynomial equation approach to control system design, are dissected in a manner suitable for repeatable, rhythmic solution. These solutions are then implemented on systolic arrays. An introduction to a systematic design and analysis procedure for systolic arrays, along with an example, concludes the paper. 相似文献

8.

An interactive fuzzy CAD tool

Manzoul M.A. Moore J. 《Micro, IEEE》1996,16(2):68-74

相似文献

9.

Mapping nested loop algorithms into multidimensional systolicarrays

Lee P.-Z. Kedem Z.M. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(1):64-76

Consideration is given to transforming depth p-nested for loop algorithms into q-dimensional systolic VLSI arrays where 1⩽q⩽p-1. Previously, there existed complete characterizations of correct transformation only for the cases where q=p-1 or q=1. This gap is filled by giving formal necessary and sufficient conditions for correct transformation of a p-nested loop algorithm into a q-dimensional systolic array for any q, 1⩽q⩽p-1. Practical methods are presented. The techniques developed are applied to the automatic design of special purpose and programmable systolic arrays. The results also contribute toward automatic compilation onto more general purpose programmable arrays. Synthesis of linear and planar systolic array implementations for a three-dimensional cube-graph algorithm and a reindexed Warshall-Floyd path-finding algorithm are used to illustrate the method 相似文献

10.

具有状态时滞线性系统对时滞参数的自适应控制 总被引：12，自引：0，他引：12

姜偕富费树岷冯纯伯《控制理论与应用》2002,19(5):704-708

对于存在状态时滞的线性时滞系统, 研究了无记忆与带记忆的复合状态反馈控制器的设计问题, 当滞后常数不能精确已知时, 首次提出了对时滞参数的自适应控制方案, 通过解相应的线性矩阵不等式 (LMI)即能求得满足设计要求的对时滞参数自适应控制器. 最后给出了仿真实例以说明结论的有效性. 相似文献

11.

External-Memory Multimaps

Elaine Angelino Michael T. Goodrich Michael Mitzenmacher Justin Thaler 《Algorithmica》2013,67(1):23-48

Many data structures support dictionaries, also known as maps or associative arrays, which store and manage a set of key-value pairs. A multimap is generalization that allows multiple values to be associated with the same key. For example, the inverted file data structure that is used prevalently in the infrastructure supporting search engines is a type of multimap, where words are used as keys and document pointers are used as values. We study the multimap abstract data type and how it can be implemented efficiently online in external memory frameworks, with constant expected I/O performance. The key technique used to achieve our results is a combination of cuckoo hashing using buckets that hold multiple items with a multiqueue implementation to cope with varying numbers of values per key. Our external-memory results are for the standard two-level memory model. 相似文献

12.

Some results concerning linear iterative (systolic) arrays

Oscar H. Ibarra Michael A. Palis Sam M. Kim 《Journal of Parallel and Distributed Computing》1985,2(2):182-218

Characterizations of various types of linear iterative (systolic) arrays in terms of single processor sequential machines are given. Using these characterizations, new or improved results concerning the properties, power, and limitations of the different linear array models are proved. For example, a speed-up theorem is proved that is stronger than what has previously appeared in the literature and, moreover, works for arrays with one-way communication lines. Also investigated are the effects of augmenting the array with a supplemental control mechanism called global control. It is shown that in many cases arrays with global control can be simulated in real-time by arrays without global control. The result remains true even if one of the processors is augmented by a stack. Cases are also exhibited where the addition of global control makes the array strictly more powerful, even if the control lines are restricted to only a few processors of the array. 相似文献

13.

Wafer-scale integration and two-level pipelined implementations of systolic arrays

H.T. Kung Monica S. Lam 《Journal of Parallel and Distributed Computing》1984,1(1):32-63

Two important issues in systolic array designs are addressed: How is fault tolerance provided in systolic arrays to enhance the yield of wafer-scale integration implementations? And, how are efficient systolic arrays with two levels of pipelining designed? (The first level refers to the pipelined organization of the array at the cellular level, and the second refers to the pipelined functional units inside the cells.) The fault-tolerant scheme proposed replaces defective cells with clocked delays. This has the distinct characteristic that data can flow through the array with faulty cells at the original clock speed. It is shown that both the defective cells under this fault-tolerant scheme and the second-level pipeline stages can simply be modeled as additional delays in the data paths of “generic” systolic designs. The mathematical notion of a cut is introduced to solve the problem of how to allow for these extra delays while preserving the correctness of the original systolic array designs. The results obtained by applying these techniques are encouraging. When applied to systolic arrays without feedback cycles, the arrays can tolerate large numbers of failures (with the addition of very little hardware) while maintaining the original throughput. Furthermore, all of the pipeline stages in the cells can be kept fully utilized through the addition of a small number of delay registers. However, adding delays to systolic arrays with cycles typically induces a significant decrease in throughput. In response to this, a new class of systolic algorithms has been derived in which the data cycle around a ring of processing cells. The systolic ring architecture has the property that its performance degrades gracefully as cells fail. Use of the cut theory and ring architectures for arrays with feedback gives effective fault-tolerant and two-level pipelining schemes for most systolic arrays. As a side effect of developing the ring architecture approach, several new systolic algorithms have been derived. These algorithms generally require only one-third to one-half of the number of cells used in previous designs to achieve the same throughput. The new systolic algorithms include ones for LU-decomposition, QR-decomposition, and the solution of triangular linear systems. 相似文献

14.

Mapping of Neural Network Models onto Systolic Arrays

《Journal of Parallel and Distributed Computing》2000,60(6):677-689

This paper presents a mapping scheme for the proposed implementation of neural network models on systolic arrays. The mapping technique is illustrated on the multilayer perceptron with back-propagation learning. Dependency graphs have been given that represent the operations in the execution phases of the neural network model and later suitable algorithms are presented to realize the operations in a linear bidirectional systolic array. The speedup metric has been used to evaluate the performance of the proposed implementation. 相似文献

15.

Lyapunov-based distributed control of systems on lattices 总被引：1，自引：0，他引：1

Jovanovic M.R. Bamieh B. 《Automatic Control, IEEE Transactions on》2005,50(4):422-433

We investigate the properties of systems on lattices with spatially distributed sensors and actuators. These systems arise in a variety of applications such as the control of vehicular platoons, formation of unmanned aerial vehicles, arrays of microcantilevers, and constellations of satellites. We use a Lyapunov-based framework as a tool for stabilization/regulation/asymptotic tracking of both linear and nonlinear systems. We first present results for nominal design and then describe the design of adaptive controllers in the presence of parametric uncertainties. These uncertainties are assumed to be temporally constant, but are allowed to be spatially varying. We show that, in most cases, the distributed design yields controllers with architecture similar to that of the original plant. 相似文献

16.

A class of SIMD machines simulated by systolic arrays

Hiroshi Umeo 《Journal of Parallel and Distributed Computing》1985,2(4):391-403

In this paper we introduce a new subclass of single instruction steam/multiple data stream (SIMD) machines, referred to as a simple SIMD, then consider an implementation of a class of simple SIMD parallel algorithms onto systolic arrays, which have been considered as one candidate for VLSI-based cellular computers. The class of simple SIMD algorithms is so large that it includes many conventional SIMD algorithms, such as sorting, image processing, and graph algorithms. We develop several time-efficient algorithms for the simulations of simple SIMD machines, which have global data communications, by systolic arrays with only local data communications. The systolic simulation theorems enable us to use many conventional SIMD algorithms on the systolic arrays with little loss of time efficiency. 相似文献

17.

Space-Time Equations for Non-Unimodular Mappings

《国际计算机数学杂志》2012,89(5):555-572

The class of systems of uniform recurrence equations (UREs) is closed under uni-modular transformations. As a result, every systolic array described by a unimodular mapping can be specified by a system of space-time UREs, in which the time and space coordinates are made explicit. As non-unimodular mappings are frequently used in systolic designs, this paper presents a method that derives space-time equations for systolic arrays described by non-unimodular mappings. The space-time equations for non-unimodular mappings are known elsewhere as sparse UREs (SUREs) because the domains of their variables are sparse and their data dependences are uniform. Our method is compositional in that space-time SUREs can be further transformed by unimodular and non-unimodular mappings, allowing a straightforward implementation in systems like ALPHA. Specifying a systolic design by space-time equations has two advantages. First, the space-time equations exhibit all useful properties about the design, allowing the design to be formally verified. Second, depending on the application area and performance requirement, the space-time equations can be realised as custom VLSI systems, FPGAs, or programs to be run on a parallel computer. 相似文献

18.

算法到脉动阵列处理器的映射 总被引：1，自引：0，他引：1

霍红卫庄心谷《计算机研究与发展》1997,34(11):812-816

文中讨论了将含有循环的算法映射到脉动阵列的一般方法。这一映射方法是建立在下标集和数据相关向量的数学变换基础上，并给出了带有常数据相关的算法中存在有效变换的充要条件。最后给出了许所有点对之间最短路径问题的映射过程。相似文献

19.

Romberg integration using systolic arrays

D. J. Evans G. M. Megson 《Parallel Computing》1986,3(4):289-304

A systolic array is presented to improve numerical approximations to integrals using Richardson's extrapolation procedure in the form of Romberg integration. Two designs are presented, the first is an intuitive linear systolic array, the second, a systolic ring using approximately 1/3 of the cells of the first array. Both systolic arrays have a computation time of 3n cycles, which is a significant improvement on the O(n²) steps required to construct the extrapolation table sequentially. 相似文献

20.

Learning in linear systolic neural network engines: analysis andimplementation 总被引：1，自引：0，他引：1

Jones S.R. Sammut K.M. Hunter J. 《Neural Networks, IEEE Transactions on》1994,5(4):584-593

Linear systolic processor arrays are a widely proposed digital architecture for neural networks. This paper reports the analysis of a range of training algorithms implemented on a linear systolic ring, with a view to (a) identifying low-level instruction requirements, (b) assessing different hardware structures for PE implementation and (c) evaluating the impact of different array controller designs. Quantitative data is derived and used to determine cost-effective PE and controller hardware constructs. 相似文献