首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Adders are one of the basic fundamental critical arithmetic circuits in a system and their performances affect the overall performance of the system. Traditional n-bit adders provide precise results, whereas the lower bound of their critical path delay of n bit adder is (log n). To achieve a minimum critical path delay lower than (log n), many inaccurate adders have been proposed. These inaccurate adders decrease the overall critical path delay and improve the speed of computation by sacrificing the accuracy or predicting the computation results. In this work, a fast reconfigurable approximate ripple carry adder has been proposed using GDI (Gate Diffusion Logic) passing cell. Here, GDI cell acts as a reconfigurable cell to be either connected with the previous carry value or approximated value in an adder chain. This adder has greater advantage and it can be configured as an accurate or inaccurate adder by selecting working mode in GDI cell. The implementation results show that, in the approximate working mode, the proposed 64-bit adder provides up to 23%, 34% and 95% reductions in area, power and delay, respectively compared to those of the existing adder.  相似文献   

2.
In this paper we exhibit several new classes of hash functions with certain desirable properties, and introduce two novel applications for hashing which make use of these functions. One class contains a small number of functions, yet is almost universal2. If the functions hash n-bit long names into m-bit indices, then specifying a member of the class requires only O((m + log2log2(n)) · log2(n)) bits as compared to O(n) bits for earlier techniques. For long names, this is about a factor of m larger than the lower bound of m + log2n ? log2m bits. An application of this class is a provably secure authentication technique for sending messages over insecure lines. A second class of functions satisfies a much stronger property than universal2. We present the application of testing sets for equality.The authentication technique allows the receiver to be certain that a message is genuine. An “enemy”—even one with infinite computer resources—cannot forge or modify a message without detection. The set equality technique allows operations including “add member to set,” “delete member from set” and “test two sets for equality” to be performed in expected constant time and with less than a specified probability of error.  相似文献   

3.
Inner product computation is an important operation, invoked repeatedly in matrix multiplications. A high-speed inner product processor can be very useful (among many possible applications) in real-time signal processing. This paper presents the design of a fast inner product processor, with appreciably reduced latency and cost. The inner product processor is implemented with a tree of carry-propagate or carry-save adders; this structure is obtained with the incorporation of three innovations in the conventional multiply/add tree: (1) The leaf-multipliers are expanded into adder subtrees, thus achieving an O(log Nb) latency, where N denotes the number of elements in a vector and b the number of bits in each element. (2) The partial products, to be summed in producing an inner product, are reordered according to their “minimum alignments.” This reordering brings approximately a 20% saving in hardware—including adders and data paths. The reduction in adder widths also yields savings in carry propagation time for carry-propagate adders. (3) For trees implemented with carry-save adders, the partial product reordering also serves to truncate the carry propagation chain in the final propagation stage by 2 log b − 1 positions, thus significantly reducing the latency further. A form of the Baugh and Wooley algorithm is adopted to implement two's complement notation with changes only in peripheral hardware.  相似文献   

4.
Approximate graph coloring takes as input a graph and returns a legal coloring which is not necessarily optimal. We improve the performance guarantee, or worst-case ratio between the number of colors used and the minimum number of colors possible, toO(n(log logn)3/(logn)3), anO(logn/log logn) factor better than the previous best-known result.  相似文献   

5.
Parallel integer sorting using small operations   总被引:1,自引:0,他引:1  
We consider the problem of sortingn integers in the range [0,n c -1], wherec is a constant. It has been shown by Rajasekaran and Sen [14] that this problem can be solved optimally inO(logn) steps on an EREW PRAM withO(n) n -bit operations, for any constant >O. Though the number of operations is optimal, each operation is very large. In this paper, we show thatn integers in the range [0,n c -1] can be sorted inO(logn) time withO(nlogn)O(1)-bit operations andO(n) O(logn)-bit operations. The model used is a non-standard variant of an EREW PRAMtthat permits processors to have word-sizes ofO(1)-bits and (logn)-bits. Clearly, the speed of the proposed algorithm is optimal. Considering that the input to the problem consists ofO (n logn) bits, the proposed algorithm performs an optimal amount of work, measured at the bit level.This work was partially supported by The Northeast Parallel Architectures Center (NPAC) at Syracuse University, Syracuse, NY 13244 and The Rome Air Development Center, under contract F30602-88-D-0027.  相似文献   

6.
We present a randomized parallel algorithm that computes the greatest common divisor of two integers of n bits in length with probability 1−o(1) that takes O(nloglogn/logn) time using O(n6+?) processors for any ?>0 on the EREW PRAM parallel model of computation. The algorithm either gives a correct answer or reports failure.We believe this to be the first randomized sublinear time algorithm on the EREW PRAM for this problem.  相似文献   

7.
Summary Using modular arithmetic we obtain the following improved bounds on the time and space complexities for n × n Boolean matrix multiplication: O(n log 2 7 lognlogloglognloglogloglogn) bit operations and O(n 2loglog n) bits of storage on a logarithmic cost RAM having no multiply or divide instruction; O(n log 2 7(logn)2–1/2log 2 7(loglog n)1/2log 2 7–1) bit operations and O(n 2log n) bits of storage on a RAM which can use indirect addressing for table lookups. The first algorithm can be realized as a Boolean circuit with O(n log 2 7lognlogloglognloglogloglogn) gates. Whenever n×n arithmetic matrix multiplication can be performed in less than O(n log 2 7) arithmetic operations, our results have corresponding improvements.This work was supported in part by the Office of Naval Research under contract N00014-67-0204-0063, by the National Research Council of Canada under grant A4307, and by the National Science Foundation under grants MCS76-17321 and GJ-43332  相似文献   

8.
Suffix tree, suffix array, and directed acyclic word graph (DAWG) are data-structures for indexing a text. Although they enable efficient pattern matching, their data-structures require O(nlogn) bits, which make them impractical to index long text like human genome. Recently, the development of compressed data-structures allow us to simulate suffix tree and suffix array using O(n) bits. However, there is still no O(n)-bit data-structure for DAWG with full functionality. This work introduces an $n(H_{k}(\overline{S})+ 2 H_{0}^{*}(\mathcal {T}_{\overline{S}}))+o(n)$ -bit compressed data-structure for simulating DAWG (where $H_{k}(\overline{S})$ and $H_{0}^{*}(\mathcal{T}_{\overline{S}})$ are the empirical entropies of the reversed sequence and the reversed suffix tree topology, respectively.) Besides, we also propose an application of DAWG to improve the time complexity for the local alignment problem. In this application, the previously proposed solutions using BWT (a version of compressed suffix array) run in O(n 2 m) worst case time and O(n 0.628 m) average case time where n and m are the lengths of the database and the query, respectively. Using compressed DAWG proposed in this paper, the problem can be solved in O(nm) worst case time and the same average case time.  相似文献   

9.
A sweepline algorithm for Voronoi diagrams   总被引:4,自引:0,他引:4  
Steven Fortune 《Algorithmica》1987,2(1-4):153-174
We introduce a geometric transformation that allows Voronoi diagrams to be computed using a sweepline technique. The transformation is used to obtain simple algorithms for computing the Voronoi diagram of point sites, of line segment sites, and of weighted point sites. All algorithms haveO(n logn) worst-case running time and useO(n) space.  相似文献   

10.
11.
Signed digit (SD) number systems support digit-parallel carry-free addition, where the sum digits absorb the possible signed carries in {−1, 0, 1}. Radix-2h maximally redundant SD (MRSD) number systems are particularly attractive. The reason is that, with the minimal (h + 1) bits per SD, maximum range is achieved. There are speculative MRSD adders that trade increased area and power for higher speed, via simultaneous computation of three sum digits, while anticipating one of the possible signed carries. However, the nonspeculative approach that uses carry-save two’s complement encoding for intermediate sum digits, has proved to be more efficient.In this paper, we examine three previous nonspeculative MRSD adders and offer an improved design with significant savings in latency, area consumption and power dissipation. The enhanced performance is mainly due to the elimination of sign extension of the signed carries. The latter leads to a sum matrix of positively and negatively weighted bits that normally complicate the use of standard adder cells. However, with inverted encoding of negatively weighted bits, we manage to efficiently use such cells. The claimed performance measures are supported by 0.13 μm CMOS technology synthesis.  相似文献   

12.
We prove new lower bounds for nearest neighbor search in the Hamming cube. Our lower bounds are for randomized, two-sided error, algorithms in Yao's cell probe model. Our bounds are in the form of a tradeoff among the number of cells, the size of a cell, and the search time. For example, suppose we are searching among n points in the d dimensional cube, we use poly(n,d) cells, each containing poly(d, log n) bits. We get a lower bound of Ω(d/log n) on the search time, a significant improvement over the recent bound of Ω(log d) of Borodin et al. This should be contrasted with the upper bound of O(log log d) for approximate search (and O(1) for a decision version of the problem; our lower bounds hold in that case). By previous results, the bounds for the cube imply similar bounds for nearest neighbor search in high dimensional Euclidean space, and for other geometric problems.  相似文献   

13.
Sorting is a classic problem and one to which many others reduce easily. In the streaming model, however, we are allowed only one pass over the input and sublinear memory, so in general we cannot sort. In this paper we show that, to determine the sorted order of a multiset s of size n containing σ?2 distinct elements using one pass and o(nlogσ) bits of memory, it is generally necessary and sufficient that its entropy H=o(logσ). Specifically, if s={s1,…,sn} and si1,…,sin is the stable sort of s, then we can compute i1,…,in in one pass using O((H+1)n) time and O(Hn) bits of memory, with a simple combination of classic techniques. On the other hand, in the worst case it takes that much memory to compute any sorted ordering of s in one pass.  相似文献   

14.
We present semi-streaming algorithms for basic graph problems that have optimal per-edge processing times and therefore surpass all previous semi-streaming algorithms for these tasks. The semi-streaming model, which is appropriate when dealing with massive graphs, forbids random access to the input and restricts the memory to bits.Particularly, the formerly best per-edge processing times for finding the connected components and a bipartition are O(α(n)), for determining k-vertex and k-edge connectivity O(k2n) and O(n⋅logn) respectively for any constant k and for computing a minimum spanning forest O(logn). All these time bounds we reduce to O(1).Every presented algorithm determines a solution asymptotically as fast as the best corresponding algorithm up to date in the classical RAM model, which therefore cannot convert the advantage of unlimited memory and random access into superior computing times for these problems.  相似文献   

15.
We consider the following problem: Given an unsorted array of n elements, and a sequence of intervals in the array, compute the median in each of the subarrays defined by the intervals. We describe a simple algorithm which needs O(nlogk+klogn) time to answer k such median queries. This improves previous algorithms by a logarithmic factor and matches a comparison lower bound for k=O(n). The space complexity of our simple algorithm is O(nlogn) in the pointer machine model, and O(n) in the RAM model. In the latter model, a more involved O(n) space data structure can be constructed in O(nlogn) time where the time per query is reduced to O(logn/loglogn). We also give efficient dynamic variants of both data structures, achieving O(log2n) query time using O(nlogn) space in the comparison model and O((logn/loglogn)2) query time using O(nlogn/loglogn) space in the RAM model, and show that in the cell-probe model, any data structure which supports updates in O(logO(1)n) time must have Ω(logn/loglogn) query time.Our approach naturally generalizes to higher-dimensional range median problems, where element positions and query ranges are multidimensional—it reduces a range median query to a logarithmic number of range counting queries.  相似文献   

16.
The connectivity is an important criteria to measure the fault-tolerant performance of a graph. However, the connectivity based on the condition of the set of arbitrary faulty nodes is generally lower. In this paper, in order to heighten this measure, we introduce the restricted connectivity into bijective connection networks. First, we prove that the probability that all the neighbors of an arbitrary node becomes faulty in any n-dimensional bijective connection network Xn is very low when n becomes sufficient large. Then, we give a constructive proof that under the condition that each node of an n-dimensional bijective connection network Xn has at least one fault-free neighbor, its restricted connectivity is 2n − 2, about half of the connectivity of Xn. Finally, by our constructive proof, we give an O(n) algorithm to get a reliable path of length at most n + 3⌈log2F∣⌉ + 1 between any two fault-free nodes in an n-dimensional bijective connection network. In particular, since the family of BC networks contains hypercubes, crossed cubes, Möbius cubes, etc., our algorithm is appropriate for these cubes.  相似文献   

17.
Summary In this paper a new data structure is described for performing member and neighbor queries in O(logn) time that allows for O(1) worst-case update time once the position of the inserted or deleted element is known. In this way previous solutions that achieved only O(1) amortized time or O(log* n) worst-case time are improved. The method is based on a combinatorial result on the height of piles that are split after some fixed number of insertions. This combinatorial result is interesting in its own right and might have other applications as well.  相似文献   

18.
Approximate graph coloring takes as input a graph and returns a legal coloring which is not necessarily optimal. We improve the performance guarantee, or worst-case ratio between the number of colors used and the minimum number of colors possible, toO(n(log logn)3/(logn)3), anO(logn/log logn) factor better than the previous best-known result.The work of the first author was supported by Air Force Grant AFOSR-86-0078 and NSF PYI Grant 8657527-CCR. The work of the second author was supported by a National Science Foundation Graduate Fellowship.  相似文献   

19.
We design a 3-bit adder or a radix-8 full adder (FA) in quantum-dot cellular automata (QCA), where the 3-bit carry propagation path can be accommodated in one clock-zone. To achieve this, we introduce group majority signals similar to group propagate and generate signals in parallel prefix computations, use them to reformulate the carry expressions of a previous radix-4 FA, and as such we could extend it to higher radix FAs. Applying the aforementioned new interpretation of carry expressions (via group majority signals) on 3-bit adders, results in that only a single clock cycle is required for 12-bit (vs. the previous 8-bit) carry propagation, across four radix-8 FAs. Based on the proposed radix-8 QCA-FA, we realized 8-, 16-, 32-, 64, and 128-bit QCA adders via QCADesigner. Comparison of these adders with the previous radix-4 experiment, showed 9–41% speed up, and 57–76% area saving, for 16–128-bit adders, respectively. On the other hand, compared to the best previous radix-2 design, for the same bit widths, we experienced 57–172% speed up, but at the cost of 138–4% area increase, except for the 64 and 128-bit cases, where we also experienced 19% and 41% area saving, respectively.  相似文献   

20.
We present a parallel algorithm for solving thenext element search problemon a set of line segments, using a BSP-like model referred to as thecoarse grained multicomputer(CGM). The algorithm requiresO(1) communication rounds (h-relations withh=O(n/p)),O((n/p) log n) local computation, andO((n/p) log p) memory per processor, assumingn/pp. Our result implies solutions to the point location, trapezoidal decomposition, and polygon triangulation problems. A simplified version for axis-parallel segments requires onlyO(n/p) memory per processor, and we discuss an implementation of this version. As in a previous paper by Develliers and Fabri (Int. J. Comput. Geom. Appl.6(1996), 487–506), our algorithm is based on a distributed implementation of segment trees which are of sizeO(n log n). This paper improves onop. cit.in several ways: (1) It studies the more general next element search problem which also solves, e.g., planar point location. (2) The algorithms require onlyO((n/p) log n) local computation instead ofO(log p*(n/p) log n). (3) The algorithms require onlyO((n/p) log p) local memory instead ofO((n/p) log n).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号