期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Decimal Square Root: Algorithm and Hardware Implementation

Adel Hosseiny Ghassem Jaberipur 《Circuits, Systems, and Signal Processing》2016,35(12):4195-4219

We propose a new digit recurrence decimal square root (DSR) design and provide its ASIC implementation. The interim square root digits are in \([ {-5,5} ]\). The proposed architecture generally follows that of a previous radix-10 divider. However, it provides novel solutions with regard to few DSR-specific challenges. For example, complex error analysis shows that only four (out of sixteen) digits of partial square root is sufficient to estimate partial remainders that are required for the more complicated square root digit selection. This design performs about 10 % faster and consumes 28 % less area than the previously reported ASIC digit recurrence decimal square rooter. 相似文献

2.

Fast radix-2 division with quotient-digit prediction

Milo&#; D. Ercegovac Tomas Lang 《The Journal of VLSI Signal Processing》1989,1(3):169-180

An implementation of a radix-2 division unit is presented that uses prediction of the quotient digit. This prediction allows the concurrent computation of the quotient digit and the partial remainder. To achieve a simple quotient-digit selection, resulting in a step time roughly half of that of SRT division (without prediction), a simple estimate of the partial remainder is used, which requires that the divisor be scaled close to unity. This prescaling is simple to implement and increases the execution time by two cycles. We estimate a speed-up of 1.5 with respect to SRT division with redundant remainders. 相似文献

3.

Radix-8 division with over-redundant digit set

Paolo Montuschi Luigi Ciminiera 《The Journal of VLSI Signal Processing》1994,7(3):259-270

We present a radix-8 divider that uses an over-redundant digit set for the quotient in order to obtain simple digit selection rules. We show that the proposed enlarged set of values for the quotient digit does not lead to increases both in the complexity and the delay of the adder required to update the remainder, with respect to similar solutions, since the values allowed for the quotient digit have been selected carefully. The digit selection process is subdivided into two concurrent steps, each one making reference to a secondary digit set and the resulting implementation can be cheaper and faster than other units which do not use over-redundant digit sets. A performance analysis estimates a speed improvement from 25% to 35% with respect to a radix-8 architecture by Fandrianto, and from 21% to 30% with respect to a radix-4 architecture with prescaling, presented by Ercegovac and Lang. As required from the IEEE 754 floating point standard, the proposed algorithm features the correct remainder of the division. 相似文献

4.

Decimal Division Algorithms: The Issue of Partial Remainders

Amir Kaivani Seok-Bum Ko 《Journal of Signal Processing Systems》2013,73(2):181-188

The efficiency of decimal digit-recurrence division algorithms is totally affected by the number representations of the quotient, the divisor and partial remainders participated in quotient digit selection (QDS). This paper establishes general rules and conditions for QDS with operands represented in the generalized signed-digit format. As a result of this generalization, a universal convergence condition is introduced which obviates the unnecessary conservatism of previous algorithms and hence paves the way for more correct and efficient decimal division hardware designs. It is also concluded that keeping the partial remainders in minimally redundant symmetric signed-digit representation (with digit-set [?5,6])and applying into QDS the divisor represented in minimally asymmetric non-redundant signed-digit format (with digit-set [?4,5]) lead to the smallest minimum precision required, of the divisor and the partial remainder, for QDS and thus faster and simpler division algorithm. Moreover, it is shown that even in case of using non-redundant partial remainders (for the sake of lower area cost); minimally asymmetric signed-digit representation brings about more efficiency. The suggested representations are applied to the fastest previous decimal digit-recurrence divider and 10 % speed-up is achieved while keeping the area cost approximately unaltered. 相似文献

5.

Fast Decimal Floating-Point Division

Nikmehr H. Phillips B. Lim C.-C. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(9):951-961

A new implementation for decimal floating-point (DFP) division is introduced. The algorithm is based on high-radix SRT division The SRT division algorithm is named after D. Sweeney, J. E. Robertson, and T. D. Tocher. with the recurrence in a new decimal signed-digit format. Quotient digits are selected using comparison multiples, where the magnitude of the quotient digit is calculated by comparing the truncated partial remainder with limited precision multiples of the divisor. The sign is determined concurrently by investigating the polarity of the truncated partial remainder. A timing evaluation using a logic synthesis shows a significant decrease in the division execution time in contrast with one of the fastest DFP dividers reported in the open literature 相似文献

6.

Radix-2 SRT division algorithm with simple quotient digit selection

Burgess N. 《Electronics letters》1991,27(21):1910-1911

A new and fast algorithm for SRT division that combines a modified version of the Svoboda algorithm with the radix-2 signed-digit number system is presented. The quotient bit selection is a simple function of the two most significant digits of the current partial remainder, and the two operands are each prescaled by a single subtraction.<> 相似文献

7.

High-speed complex-number multiplications based on redundant binary representation of partial products

Kyung-Wook Shin Heung-Woo Jeon 《International Journal of Electronics》2013,100(6):683-702

The complex-number multiplier is one of the key arithmetic components for the baseband signal processing of modern digital communication systems such as channel equalization, timing recovery, modulation and demodulation. This paper presents two algorithms suitable for a high-speed complex-number multiplier, which are based on redundant binary (RB) representation of partial products. The basic idea behind our approach is to convert a pair of binary partial products into a RB form so that the post-addition/subtraction which is inevitable in the conventional methods based on binary multiplication, is eliminated. With the proposed algorithms, the complex-number multiplication is reduced to two RB multiplications, one for the real part and the other for the imaginary part. The RB multiplication is defined by an addition of RB partial products, and is performed in parallel without carry propagation from the least-significant digit to the most-significant digit. This work results not only in simplified arithmetic operations, but also in highly parallel and simple architecture when compared with conventional methods using binary multiplications. To demonstrate the algorithms, two test chips have been implemented using a 0.8µm CMOS technology. 相似文献

8.

Minimizing the complexity of SRT tables

Oberman S.F. Flynn M.J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1998,6(1):141-149

This paper presents an analysis of the complexity of quotient digit selection tables in SRT division implementations. SRT dividers are widely used in VLSI systems to compute floating-point quotients. These dividers use a fixed number of partial remainder and divisor bits to consult a table to select the next quotient-digit in each iteration. This analysis derives the allowable divisor and partial remainder truncations for radix 2 through radix 32, and it quantifies the relationship between table parameters and the complexity of the tables. Several techniques are presented for further minimizing table complexity. By mapping the tables to a library of standard-cells, delay and area values were measured and are presented for table configurations through radix 32. Several conclusions are drawn based on this data which impacts optimized SRT divider designs 相似文献

9.

Fast VLSI algorithms for division and square root

S. E. McQuillan J. V. McCanny 《The Journal of VLSI Signal Processing》1994,8(2):151-168

Real time digital signal processing demands high performance implementations of division and square root. This can only be achieved by the design of fast and efficient arithmetic algorithms which address practical VLSI architectural design issues. In this paper, new algorithms for division and square root are described. The new schemes are based on pre-scaling the operands and modifying the classical SRT method such that the result digits and the remainders are computed concurrently and the computations in adjacent rows are overlapped. Consequently, their performance exceeds that of the SRT methods. The hardware cost for higher radices is considerably more than that of the SRT methods but for many applications, this is not prohibitive. A system of equations is presented which enables both an analysis of the method for any radix and the parameters of implementations to be easily determined. This is illustrated for the case of radix 2 and radix 4. In addition, a highly regular array architecture combining the division and square root method is described. 相似文献

10.

The continued fraction as an information source (Corresp.)

《IEEE transactions on information theory / Professional Technical Group on Information Theory》1984,30(4):671-674

Nearly every digit (positive-integer partial quotient) of the simple continued fraction representing a random number carries3.4325bits of information about the number. This is reduced by 0.0081 bit if the preceding digit is already known and by a further0.0007bit if the second preceding digit is also known, Any finite sequence of digits is equally likely to occur forward and backward. Hence, the digit (source output symbol) following any digit likewise gives0.0081bit of information about the latter, and all succeeding (or preceding) digits together give0.0088bit despite the unbounded range of the statistical dependence among digits. 相似文献

11.

An efficient maximum-redundancy radix-8 SRT division andsquare-root method

Hobson R.F. Fraser M.W. 《Solid-State Circuits, IEEE Journal of》1995,30(1):29-38

A new approach to integrating hardware multiplication, division, and square-root is presented. We use a fully integrated control path which simultaneously reduces part of the redundant partial-remainder and performs a truncated multiplication of the next quotient or square-root digit by the divisor or square-root value. A separate (parallel) full precision iterative multiplier is used for partial remainder production. Strategic details of a radix-8 implementation are discussed. It is shown that a maximally redundant digit set is a viable choice for high performance in this case 相似文献

12.

High-speed VLSI arithmetic processor architectures using hybrid number representation

H. R. Srinivas Keshab K. Parhi 《The Journal of VLSI Signal Processing》1992,4(2-3):177-198

This paper addresses design of high speed architectures for fixed-point, two's-complement, bit-parallel division, square-root, and multiplication operations. These architectures make use of hybrid number representations (i.e. the input and output numbers are represented using two's complement representation, and the internal numbers are represented using radix-2 redundant representation). We propose newshifted remainder conditioning, andsign multiplexing techniques in combination with novel circuit architecture approaches to obtain efficient divider and square-root architectures. Our divider exploits full dynamic range of operands and eliminates the need for on-line or off-line conversion of the result to binary (this is because our nonrestoring division and square-root operators output binary quotient). Furthermore, since the binary input set is a subset of the redundant digit set, no binary-to-redundant number conversion is necessary at the input of the divider and square-root operators. We also present a fast, new conversion scheme for converting radix-2 redundant numbers to two's complement binary numbers, and use this to design a bit-parallel multiplier. This multiplier architecture requires fewer pipelining latches than conventional two's complement multipliers, and reduces the latency of the multiplication operation from (2W–1) to aboutW (whereW is the word-length), when pipelined at the bit-level.This research was supported by the Office of Naval Research under contract number N00014-J-91-1008. 相似文献

13.

Decimal SRT Square Root: Algorithm and Architecture

Amir Kaivani Seok-Bum Ko 《Circuits, Systems, and Signal Processing》2013,32(5):2137-2150

Given the popularity of decimal arithmetic, hardware implementation of decimal operations has been a hot topic of research in recent decades. Besides the four basic operations, the square root can be implemented as an instruction directly in the hardware, which improves the performance of the decimal floating-point unit in the processors. Hardware implementation of decimal square rooters is usually done using either functional or digit-recurrence algorithms. Functional algorithms, entailing multiplication per iteration, seem inadequate to use for decimal square roots, given the high cost of decimal multipliers. On the other hand, digit-recurrence square root algorithms, particularly SRT (this method is named after its creators, Sweeney, Robertson, and Tocher) algorithms, are simple and well suited for decimal arithmetic. This paper, with the intention of reducing the latency of the decimal square root operation while maintaining a reasonable cost, proposes an SRT algorithm and the corresponding hardware architecture to compute the decimal square root. The proposed fixed-point square root design requires n+3 cycles to compute an n-digit root; the synthesis results show an area cost of about 31K NAND2 and a cycle time of 40 FO4. These results reveal the 14 % speed advantage of the proposed decimal square root architecture over the fastest previous work (which uses a functional algorithm) with about a quarter of the area. 相似文献

14.

Floating-point division and square root using a Taylor-series expansion algorithm

Taek-Jun Kwon Jeffrey Draper 《Microelectronics Journal》2009,40(11):1601-1605

Hardware support for floating-point (FP) arithmetic is a mandatory feature of modern microprocessor design. Although division and square root are relatively infrequent operations in traditional general-purpose applications, they are indispensable and becoming increasingly important in many modern applications. Therefore, overall performance can be greatly affected by the algorithms and the implementations used for designing FP-Div and FP-Sqrt units. In this paper, a single-precision fused floating-point multiply/divide/square root unit based on Taylor-series expansion algorithm is proposed. We extended an existing multiply/divide fused unit to incorporate the square root function with little area and latency overhead since Taylor's theorem enables us to compute approximations for many well-known functions with very similar forms. The implementation results of the proposed fused unit based on standard cell methodology in IBM 90 nm technology exhibits that the incorporation of square root function to an existing multiply/divide unit requires only a modest 18% area increase and the same low latency for divide and square root operation can be achieved (12 cycles). The proposed arithmetic unit exhibits a reasonably good area-performance balance. 相似文献

15.

Complex Square Root with Operand Prescaling

Milo&#; D. Ercegovac Jean-Michel Muller 《The Journal of VLSI Signal Processing》2007,49(1):19-30

We propose a radix-r digit-recurrence algorithm for complex square-root. The operand is prescaled to allow the selection of square-root digits by rounding of the residual. This leads to a simple hardware implementation of digit selection. Moreover, the use of digit recurrence approach allows correct rounding of the result if needed. The algorithm, compatible with the complex division presented in Ercegovac and Muller (“Complex Division with Prescaling of the Operands,” in Proc. Application-Specific Systems, Architectures, and Processors (ASAP’03), The Hague, The Netherlands, June 24–26, 2003), and its design are described. We also give rough estimates of its latency and cost with respect to implementation based on standard floating-point instructions as used in software routines for complex square root.

Jean-Michel MullerEmail:

相似文献

16.

Further Reducing the Redundancy of a Notation Over a Minimally Redundant Digit Set

Marc Daumas David W. Matula 《The Journal of VLSI Signal Processing》2003,33(1-2):7-18

Redundant notations are used implicitly or explicitly in many digital designs. They have been studied in details and a general framework is known to reduce the redundancy of a notation down to the minimally redundant digit set. We present here an operator to further reduce the redundancy of such a representation. It does not reduce the number of allowed digits since removing one digit to a minimally redundant digit set is a conversion to a non redundant digit set and this is an expensive operation. Our operator introduces some correlation between the digits to reduce the number of possible redundant notations for any represented number. This reduction is visible in small useful operators like the elimination of leading zeros. We also present a key application with a CMOS Booth recoded multiplier. Our multiplier is able to accept both a redundant or a non redundant input with very little modifications and almost no penalty in time or space compared to state-of-the-art non redundant multipliers. 相似文献

17.

Radix-4 Vectoring CORDIC Algorithm and Architectures

J. Villalba E.L. Zapata E. Antelo J.D. Bruguera 《The Journal of VLSI Signal Processing》1998,19(2):127-147

In this work we extend the radix-4 CORDIC algorithm to the vectoring mode (the radix-4 CORDIC algorithm was proposed recently by the authors for the rotation mode). The extension to the vectoring mode is not straightforward, since the digit selection function is more complex in the vectoring case than in the rotation case; as in the rotation mode, the scale factor is not constant. Although the radix-4 CORDIC algorithm in vectoring mode has a similar recurrence as the radix-4 division algorithm, there are specific issues concerning the vectoring algorithm that demand dedicated study. We present the digit selection for nonredundant and redundant arithmetic (following two different approaches: arithmetic comparisons and table look-up), the computation and compensation of the scale factor, and the implementation of the algorithm (with both types of digit selection) in a word-serial architecture. When compared with conventional radix-2 (redundant and non-redundant) architectures, the radix-4 algorithms present a significant speed up for angle calculation. For the computation of the magnitude the speed up is very slight, due to the nonconstant scale factor in the radix-4 algorithm. 相似文献

18.

An Improved Genetic Algorithm with Quasi-Gradient Crossover

Xiao-Ling Zhang Li Du Guang-Wei Zhang Qiang Miao Zhong-Lai Wang 《中国电子科技》2008,6(1):47-51

The convergence of genetic algorithm is mainly determined by its core operation crossover operation. When the objective function is a multiple hump function, traditional genetic algorithms are easily trapped into local optimum, which is called premature conver- gence. In this paper, we propose a new genetic algorithm with improved arithmetic crossover operation based on gradient method. This crossover operation can generate offspring along quasi-gradient direction which is the Steepest descent direction of the value of objective function. The selection operator is also simplified, every individual in the population is given an opportunity to get evolution to avoid complicated selection algorithm. The adaptive mutation operator and the elitist strategy are also applied in this algorithm. The case 4 indicates this algorithm can faster converge to the global optimum and is more stable than the conventional genetic algorithms. 相似文献

19.

Area Efficient Sequential Decimal Fixed-point Multiplier

Liu Han Amir Kaivani Seok-Bum Ko 《Journal of Signal Processing Systems》2014,75(1):39-46

In this paper, a new architecture is proposed to reduce the area cost and power consumption of the decimal fixed-point multiplier. In the proposed sequential architecture, the partial product generation and selection cycles are reduced to one. Moreover, the elaborately selected easy multiples reduce the hardware requirement of the partial products selector. Subsequently, two partial products are accumulated with the iteration result in a redundant decimal format by a multi-operand redundant adder. The lower-significant half digits of the final product are iteratively converted in every cycle. On the other hand, the higher-significant half digits are converted by a carry-propagation adder in two extra cycles. After all, the area of the whole architecture is reduced significantly by not only the simpler partial product generation and accumulation architecture, but also the less registers. The synthesized result shows that the proposed sequential multiplier has a lower area cost and reasonable computation latency. 相似文献

20.

Novel Pipelined Architecture for Efficient Evaluation of the Square Root Using a Modified Non-Restoring Algorithm

Imtiaz Sajid M. M. Ahmed Sotirios G. Ziavras 《Journal of Signal Processing Systems》2012,67(2):157-166

The square root is a basic arithmetic operation in image and signal processing. We present a novel pipelined architecture to implement N-bit fixed-point square root operation on an FPGA using a non-restoring pipelined algorithm that does not require floating-point hardware. Pipelining hazards in its hardware realization are avoided by modifying the classic non-restoring algorithm, thus resulting in a 13% improved latency. Furthermore, the proposed architecture is flexible allowing modification as per individual application needs. It is demonstrated that the proposed architecture is approximately four times faster than its popular counterparts and at the same time it consumes 50% less energy for envelope detection at 268 MHz sampling rate. 相似文献