共查询到10条相似文献,搜索用时 140 毫秒
1.
We present a parallel solution to the unbounded knapsack problem on a linear systolic array. It achieves optimal speedup for this well-known, NP-hard problem on a model of computation that is weaker than the PRAM. Our array is correct by construction, as it is formally derived by transforming a recurrence equation specifying the algorithm. This recurrence has dynamic dependencies, a property that puts it beyond the scope of previous methods for automatic systolic synthesis. Our derivation thus serves as a case study. We generalize the technique and propose a systematic method for deriving systolic arrays by nonlinear transformations of recurrences. We give sufficient conditions that the transformations must satisfy, thus extending systolic synthesis methods. We address a number of pragmatic considerations: implementing the array on only a fixed number of PEs, simplifying the control to just two counters and a few latches, and loading the coefficients so that successive problems can be pipelined without any loss of throughput. Using a register level model of VLSI, we formulate a nonlinear optimization problem to minimize the expected running time of the array. The analytical solution of this problem allows us to choose the memory size of each PE in an optimal manner 相似文献
2.
Alessandro Armando Massimo Benerecetti Jacopo Mantovani 《Electronic Notes in Theoretical Computer Science》2006,144(3):79
In previous work we proposed Linear Programs as a fine grained model for imperative programs, and showed how the model checking procedure used in SLAM can be generalised to a model checking procedure for Linear Programs. In this paper we show that our model checking procedure for linear programs can be extended in such a way to support the analysis of linear programs featuring a symbol for undefined values and conditional expressions. This extension is particularly important as it paves the way to the construction of model checking procedures for wider classes of imperative programs such as, e.g., linear programs with arrays. We provide a detailed account of a symbolic model checking procedure for this extended class of linear programs, discuss its implementation in the eureka tool, and present experimental results that confirm its effectiveness in the analysis of linear programs with arrays. 相似文献
3.
In this paper, we consider the implementation of a product c=A
b, where A is N
1×N
3 band matrix with bandwidth ω and b is a vector of size N
3×1, on bidirectional and unidirectional linear systolic arrays (BLSA and ULSA, respectively). We distinguish the cases when
the matrix bandwidth ω is 1≤ω≤N
3 and N
3≤ω≤N
1+N
3−1. A modification of the systolic array synthesis procedure based on data dependencies and space-time transformations of
data dependency graph is proposed. The modification enables obtaining both BLSA and ULSA with an optimal number of processing
elements (PEs) regardless of the matrix bandwidth. The execution time of the synthesized arrays has been minimized. We derive
explicit formulas for the synthesis of these arrays. The performances of the designed arrays are discussed and compared to
the performances of the arrays obtained by the standard design procedure. 相似文献
4.
We develop a simple mapping technique to design linear systolic arrays. The basic idea of our technique is to map the computations of a certain class of two-dimensional systolic arrays onto one-dimensional arrays. Using this technique, systolic algorithms are derived for problems such as matrix multiplication and transitive closure on linearly connected arrays of PEs with constant I/O bandwidth. Compared to known designs in the literature, our technique leads to modular systolic arrays with constant hardware in each PE, few control lines, lexicographic data input/output, and improved delay time. The unidirectional flow of control and data in our design assures implementation of the linear array in the known fault models of Wafer Scale Integration. 相似文献
5.
《国际计算机数学杂志》2012,89(5):555-572
The class of systems of uniform recurrence equations (UREs) is closed under uni-modular transformations. As a result, every systolic array described by a unimodular mapping can be specified by a system of space-time UREs, in which the time and space coordinates are made explicit. As non-unimodular mappings are frequently used in systolic designs, this paper presents a method that derives space-time equations for systolic arrays described by non-unimodular mappings. The space-time equations for non-unimodular mappings are known elsewhere as sparse UREs (SUREs) because the domains of their variables are sparse and their data dependences are uniform. Our method is compositional in that space-time SUREs can be further transformed by unimodular and non-unimodular mappings, allowing a straightforward implementation in systems like ALPHA. Specifying a systolic design by space-time equations has two advantages. First, the space-time equations exhibit all useful properties about the design, allowing the design to be formally verified. Second, depending on the application area and performance requirement, the space-time equations can be realised as custom VLSI systems, FPGAs, or programs to be run on a parallel computer. 相似文献
6.
Rui C. Gonçalves Don Batory João L. Sobral Taylor L. Riché 《Software and Systems Modeling》2017,16(4):929-947
Dataflow programs are widely used. Each program is a directed graph where nodes are computations and edges indicate the flow of data. In prior work, we reverse-engineered legacy dataflow programs by deriving their optimized implementations from a simple specification graph using graph transformations called refinements and optimizations. In MDE speak, our derivations were PIM-to-PSM mappings. In this paper, we show how extensions complement refinements, optimizations, and PIM-to-PSM derivations to make the process of reverse engineering complex legacy dataflow programs tractable. We explain how optional functionality in transformations can be encoded, thereby enabling us to encode product lines of transformations as well as product lines of dataflow programs. We describe the implementation of extensions in the \(\mathtt{ReFlO}\) tool and present two non-trivial case studies as evidence of our work’s generality. 相似文献
7.
《国际计算机数学杂志》2012,89(3-4):231-248
The systolic concept in the parallel architecture design proposed by the H. T. Kung [1,2] obtains high throughput and speedups. The linear array for the matrix vector multiplication executes the algorithm in 2n ? 1 time steps using 2n ? 1 processors. Although the speedup obtained is very high, the efficiency is very poor (typical values of 25% efficiency for problem size greater than 10). H. T. Kung proposed an idea for a linear systolic array using two data streams flowing in opposite directions. However, the processors in the array perform operations in every second time moment. Attempts to improve this design have been made by many researchers. Nonlinear and folding transformations techniques [3,4,5] only decrease the number of processors used to half the size, but do not affect the time. We propose the use of a fast linear systolic computation procedure to obtain a solution that uses 3n/2 processors and executes the algorithm in 3n/2 time steps for the same cells, the same communication and the same regular data flow as the H. T. Kung linear array. Only the algorithm is restructured and more efficiently organized. Now the processors are utilized in every time step and no idle steps are required. 相似文献
8.
本文将FP代数,重写理论与脉动阵列(Systolic Arrays)的设计结合起来,研究了脉动阵列的形式化设计和自动综合的问题。文章中提出的FP/B并发计算型,不但可表示某一类FP/B递归方程的展开式解,而且可以用来等价地对算法进行重新描述,从而开发了计算的并行性和流水线性,获得一个规整高效的计算结构。文章形式地用FP/B定义了脉动式,并根据FP/B代数,建立了具有终止性和保持正确性的脉动阵列重写系统,它能将用户FP/B程序自动转换为等价的脉动式,再根据FP/B并发计算型及一些函数的几何语义可较为直接地获得一个脉动阵列的硬件描述。文末给出一个例子加以说明。 相似文献
9.
We introduce a class of transformations that modify the representation of dynamic data structures used in programs with the objective of compressing their sizes. Based upon a profiling study of data value characteristics, we have developed the common‐prefix and narrow‐data transformations that respectively compress a 32 bit address pointer and a 32 bit integer field into 15 bit entities. A pair of fields that have been compressed by the above compression transformations are packed together into a single 32 bit word. The above transformations are designed to apply to data structures that are partially compressible, that is, they compress portions of data structures to which transformations apply and provide a mechanism to handle the data that is not compressible. The accesses to compressed data are efficiently implemented by designing data compression extensions (DCX) to the processor's instruction set. We have observed average reductions in heap allocated storage of 25% and average reductions in execution time and power consumption of 30%. If DCX support is not provided the reductions in execution times fall from 30% to 18%. Copyright © 2006 John Wiley & Sons, Ltd. 相似文献