首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Parallel prefix circuits are parallel algorithms performing the prefix operation for the combinational circuit model. The size of a prefix circuit is the number of operation nodes in the circuit, and the depth is the maximum level of operation nodes. A circuit with n inputs is depth-size optimal if its depth plus size equals 2n−2. Smaller depth implies faster computation, while smaller size implies less power consumption, smaller VLSI area, and less cost. A circuit should have a small fan-out and small depth for it to be of practical use. In this paper, we present a new approach to easing the design of parallel prefix circuits, and construct a depth-size optimal parallel prefix circuit, named WE4, with fan-out 4. In many cases of n, WE4 has the smallest depth among all known depth-size optimal prefix circuits with bounded fan-out.  相似文献   

2.
Given n values x 1, x 2,...,x n and an associative binary operation , the prefix problem is to compute x 1x 2x i, 1in. Prefix circuits are combinational circuits for solving the prefix problem. For any n-input prefix circuit D with depth d and size s, if d+s=2n–2, then D is depth-size optimal. In general, a prefix circuit with a small depth is faster than one with a large depth. For prefix circuits with the same depth, a prefix circuit with a smaller fan-out occupies less area and is faster in VLSI implementation. This paper is on constructing parallel prefix circuits that are depth-size optimal with small depth and small fan-out. We construct a depth-size optimal prefix circuit H4 with fan-out 4. It has the smallest depth among all known depth-size optimal prefix circuits with a constant fan-out; furthermore, when n136, its depth is less than, or equal to, those of all known depth-size optimal prefix circuits with unlimited fan-out. A size lower bound of prefix circuits is also derived. Some properties related to depth-size optimality and size optimality are introduced; they are used to prove that H4 is depth-size optimal.  相似文献   

3.
A family of parallel algorithms solving the prefix problem on the combinational circuit model is presented. These prefix circuits are waist-size optimal with waist 1 (WSO-1). They are not only building blocks for constructing fast depth-size optimal prefix circuits, but also themselves fast problem-size-independent prefix circuits. When the problem size is greater than the circuit width, the presented prefix circuits may very much faster than any other prefix circuits of the same width, especially when the problem size is greater than or equal to twice the circuit width. The new prefix circuits are compared analytically with other representative prefix circuits to show how fast they are. They have the minimum depth and are the fastest among all WSO-1 prefix circuits of the same width and fan-out. Thus, they are better building blocks than other WSO-1 circuits for constructing fast depth-size optimal prefix circuits with the same fan-out.  相似文献   

4.
A New Class of Depth-Size Optimal Parallel Prefix Circuits   总被引:1,自引:1,他引:0  
Given n values x1, x2, ... ,xn and an associative binary operation o, the prefix problem is to compute x1ox2o··· oxi, 1in. Many combinational circuits for solving the prefix problem, called prefix circuits, have been designed. It has been proved that the size s(D(n)) and the depth d(D(n)) of an n-input prefix circuit D(n) satisfy the inequality d(D(n))+s(D(n))2n–2; thus, a prefix circuit is depth-size optimal if d(D(n))+s(D(n))=2n–2. In this paper, we construct a new depth-size optimal prefix circuit SL(n). In addition, we can build depth-size optimal prefix circuits whose depth can be any integer between d(SL(n)) and n–1. SL(n) has the same maximum fan-out lgn+1 as Snir's SN(n), but the depth of SL(n) is smaller; thus, SL(n) is faster. Compared with another optimal prefix circuit LYD(n), d(LYD(n))+2d(SL(n))d(LYD(n)). However, LYD(n) may have a fan-out of at most 2 lgn–2, and the fan-out of LYD(n) is greater than that of SL(n) for almost all n12. Because an operation node with greater fan-out occupies more chip area and is slower in VLSI implementation, in most cases, SL(n) needs less area and may be faster than LYD(n). Moreover, it is much easier to design SL(n) than LYD(n).  相似文献   

5.
In this paper, we present lower and upper bounds on the size of limited width, bounded and unbounded fan-out parallel prefix circuits. The lower bounds on the sizes of such circuits are a function of the depth, width, and number of inputs. The size requirement of an N input bounded fan-out parallel prefix circuit having limited width W and extra depth k (the difference between allowed and minimum possible depth) is shown to be (N log2 W/2 k + N) for k log2 W. This implies that insisting on minimum depth causes the circuit size to be nonlinear, while as little as log2log2 W of extra depth can possibly reduce the size to linear. Also, we show that there is a clear difference between the two cases of bounded and unbounded fan-out by proving the size of a limited width, unbounded fan-out parallel prefix circuit lies between a lower bound of ((2 + 21–k /3)N) and an upper bound of O((2 + 21–k )N).Uniform, systolic constructions of limited width parallel prefix circuits are provided here and shown to be asymptotically optimal. By associating the width of the circuit with the number of processors and the fan-out capabilities of the circuit with the interconnection structure of a multiprocessor, time- and processor-efficient algorithms may be developed.  相似文献   

6.
We introduce a family of graphs C(n,i,s,a) that generalizes the binary search tree. The graphs represent logic circuits with fan-in i, restricted fan-out s, and arising by n progressive additions of random gates to a starting circuit of a isolated nodes. We show via martingales that a suitably normalized version of the number of terminal nodes in binary circuits converges in distribution to a normal random variate.Received: 23 July 2003, Published online: 29 October 2004  相似文献   

7.
Prefix computation is one of the fundamental problems that can be used in many applications such as fast adders. Most proposed parallel prefix circuits assume that the circuit is of the same width as the input size.  相似文献   

8.
The circuit value update problem is the problem of updating values in a representation of a combinational circuit when some of the inputs are changed. We assume for simplicity that each combinational element has bounded fan-in and fan-out and can be evaluated in constant time. This problem is easily solved on an ordinary serial computer in O(W+D) time, where W is the number of elements in the altered subcircuit and D is the subcircuit's embedded depth (its depth measured in the original circuit). In this paper we show how to solve the circuit value update problem efficiently on a P-processor parallel computer. We give a straightforward synchronous, parallel algorithm that runs in expected time. Our main contribution, however, is an optimistic, asynchronous, parallel algorithm that runs in expected time, where W and D are the size and embedded depth, respectively, of the ``volatile' subcircuit, the subcircuit of elements that have inputs which either change or glitch as a result of the update. To our knowledge, our analysis provides the first analytical bounds on the running time of an optimistic, asynchronous, parallel algorithm. Received November 1, 1995, and in final form November 25, 1996.  相似文献   

9.
Finding the area and perimeter of the union/intersection of a set of iso-rectangles is a very important part of circuit extraction in VLSI design. We combine two techniques, the uniform grid and the vertex neighborhoods, to develop a new parallel algorithm for the area and perimeter problems which has an average linear time performance but is not worst-case optimal. The uniform grid technique has been used to generate the candidate vertices of the union or intersection of the rectangles. An efficient point-in-rectangles inclusion test filters the candidate set to obtain the relevant vertices of the union or intersection. Finally, the vertex neighborhood technique is used to compute the mass properties from these vertices. This algorithm has an average time complexity of O(((n + k)/p) + log p) where n is the number of input rectangle edges with k intersections on p processors assuming a PRAM model of computation. The analysis of the algorithm on a SIMD architecture is also presented. This algorithm requires very simple data structures which makes the implementation easy. We have implemented the algorithm on a Sun 4/280 workstation and a Connection Machine. The sequential implementation performs better than the optimal algorithm for large datasets. The parallel implementation on a Connection Machine CM-2 with 32K processors also shows good results.  相似文献   

10.
11.
吕晓兰 《测控技术》2014,33(2):127-129
针对目前存在的缩1码模2~n+1加法器的优缺点,设计出一个有效的基于进位选择的缩1码模2~n+1加法器。在模加法器的进位计算中,采用进位选择计算代替传统的进位计算,进位计算前缀运算量明显减少。分析和实验结果表明,对于比较大的n值,进位选择缩1码模2~n+1加法器在保持较高运算速度的前提下,有效地提高了集成度。  相似文献   

12.
《Computer aided design》1986,18(9):489-496
As the integration size of VLSI chips increases, programmable logic arrays (PLAs) are indispensable for reducing chip design time. In many cases, however, PLAs are too larger or too slow. Here is presented a new approach to overcome such disadvantages. This produces a circuit of a smaller area and faster response time than PLAs, still maintaining the short design time. The approach works as follows: electronic circuits are automatically designed by an algorithm DIMN directly from given logic functions, and then a symbolic layout for them is automatically made by a program SIMON.  相似文献   

13.
Simulating Boolean Circuits on a DNA Computer   总被引:6,自引:0,他引:6  
M. Ogihara  A. Ray 《Algorithmica》1999,25(2-3):239-250
We demonstrate that DNA computers can simulate Boolean circuits with a small overhead. Boolean circuits embody the notion of massively parallel signal processing and are frequently encountered in many parallel algorithms. Many important problems such as sorting, integer arithmetic, and matrix multiplication are known to be computable by small size Boolean circuits much faster than by ordinary sequential digital computers. This paper shows that DNA chemistry allows one to simulate large semi-unbounded fan-in Boolean circuits with a logarithmic slowdown in computation time. Also, for the class NC 1 , the slowdown can be reduced to a constant. In this algorithm we have encoded the inputs, the Boolean AND gates, and the OR gates to DNA oligonucleotide sequences. We operate on the gates and the inputs by standard molecular techniques of sequence-specific annealing, ligation, separation by size, amplification, sequence-specific cleavage, and detection by size. Additional steps of amplification are not necessary for NC 1 circuits. The feasibility of the DNA algorithm has been successfully tested on a small circuit by actual biochemical experiments. Received May 29, 1997; revised February 15, 1998.  相似文献   

14.
In this paper we study small depth circuits that contain threshold gates (with or without weights) and parity gates. All circuits we consider are of polynomial size. We prove several results which complete the work on characterizing possible inclusions between many classes defined by small depth circuits. These results are the following:
1.  A single threshold gate with weights cannot in general be replaced by a polynomial fan-in unweighted threshold gate of parity gates.
2.  On the other hand it can be replaced by a depth 2 unweighted threshold circuit of polynomial size. An extension of this construction is used to prove that whatever can be computed by a depthd polynomial size threshold circuit with weights can be computed by a depthd+1 polynomial size unweighted threshold circuit, whered is an arbitrary fixed integer.
3.  A polynomial fan-in threshold gate (with weights) of parity gates cannot in general be replaced by a depth 2 unweighted threshold circuit of polynomial size.
  相似文献   

15.
A parallel algorithm to generate the dominance graph on a collection of nonoverlapping iso-oriented rectangles is presented. This graph arises from the constraint graph commonly used in compaction algorithms for VLSI circuits. The dominance graph expresses the notion of aboveness on a collection of nonoverlapping rectangles: it is the directed graph which contains an edge from a rectangleb to rectanglec iffc is immediately aboveb. The algorithm is based on the divide and conquer paradigm; in the EREW PRAM model, it has time complexityO(log2 n), usingn/logn processors. Its processor-time product isO(nlogn), which is optimal.  相似文献   

16.
The design of a floating point matrix- vector multiplication processor array for VLSI, which has an optimal area-time complexity product, is presented. This processor array is capable of performing the function (where n = 1,…, N) and can be applied in many digital signal processing applications, by simply changing the matrix coefficients stored in that array. Each N-bit mantissa, M-bit exponent (N, M) processor element of the array comprises a mantissa multiplier/adder circuit and hardware to handle the floating point control. The multiplier/adder circuit is implemented by a new optimal algorithm, which is regular, recursive and fast. Secondly, the algorithm offers a highly local and regular interconnection network, which is a fundamental requirement in VLSI circuit design methodology.  相似文献   

17.
The problem of partitioning appears in several areas ranging from VLSI, parallel programming to molecular biology. The interest in finding an optimal partition, especially in VLSI, has been a hot issue in recent years. In VLSI circuit partitioning, the problem of obtaining a minimum cut is of prime importance. With current trends, partitioning with multiple objectives which includes power, delay and area, in addition to minimum cut is in vogue. In this paper, we engineer three iterative heuristics for the optimization of VLSI netlist bi-partitioning. These heuristics are based on Genetic Algorithms (GAs), Tabu Search (TS) and Simulated Evolution (SimE). Fuzzy rules are incorporated in order to handle the multi-objective cost function. For SimE, fuzzy goodness functions are designed for delay and power, and proved efficient. A series of experiments are performed to evaluate the efficiency of the algorithms. ISCAS-85/89 benchmark circuits are used and experimental results are reported and analyzed to compare the performance of GA, TS and SimE.Further, we compared the results of the iterative heuristics with a modified FM algorithm, named PowerFM, which targets power optimization. PowerFM performs better in terms of power dissipation for smaller circuits. For larger sized circuits, SimE outperforms PowerFM in terms of all the three objectives, delay, number of nets cut, and power dissipation.  相似文献   

18.
We consider the class of unbounded fan-in depth three Boolean circuits, for which the bottom fan-in is limited by k and the top gate is an OR. It is known that the smallest such circuit computing the parity function has gates (for k = O(n 1/2)) for some , and this was the best lower bound known for explicit (P-time computable) functions. In this paper, for k = 2, we exhibit functions in uniform NC 1 that require size depth 3 circuits. The main tool is a theorem that shows that any circuit on n variables that accepts a inputs and has size s must be constant on a projection (subset defined by equations of the form x i = 0, x i = 1, x i = x j or x i = ) of dimension at least log(a/s)log n. Received: April 1, 1997.  相似文献   

19.
We study the power of constant-depth circuits containing negation gates, unbounded fan-in AND and OR gates, and a small number of MAJORITY gates. It is easy to show that a depth 2 circuit of sizeO(n) (wheren is the number of inputs) containingO(n) MAJORITY gates can determine whether the sum of the input bits is divisible byk, for any fixedk>1, whereas it is known that this requires exponentialsize circuits if we have no MAJORITY gates. Our main result is that a constant-depth circuit of size containingn o(1) MAJORITY gates cannot determine if the sum of the input bits is divisible byk; moreover, such a circuit must give the wrong answer on a constant fraction of the inputs. This result was previously known only fork=2. We prove this by obtaining an approximate representation of the behavior of constant-depth circuits by multivariate complex polynomials.  相似文献   

20.
Abstract

This paper describes nPSA, a parallel algorithm for distributed asynchronous simulation of digital circuits with nominal delays in a massively parallel SIMD environment. Glitch detection and suppression are included, together with a discussion of other factors, such as recon-vergent fan-out and feedback lines. A new set of metrics is also proposed for evaluation purposes. nPSA combines demand-driven and deadlock avoidance protocols in order to deliver high performance compared to typical synchronous parallel simulators. Although its performance greatly depends on the quality of circuit embedding on the host machine, nPSA is independent of the computer architecture and communication protocol used.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号