期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

High-speed VLSI arithmetic processor architectures using hybrid number representation

H. R. Srinivas Keshab K. Parhi 《The Journal of VLSI Signal Processing》1992,4(2-3):177-198

This paper addresses design of high speed architectures for fixed-point, two's-complement, bit-parallel division, square-root, and multiplication operations. These architectures make use of hybrid number representations (i.e. the input and output numbers are represented using two's complement representation, and the internal numbers are represented using radix-2 redundant representation). We propose newshifted remainder conditioning, andsign multiplexing techniques in combination with novel circuit architecture approaches to obtain efficient divider and square-root architectures. Our divider exploits full dynamic range of operands and eliminates the need for on-line or off-line conversion of the result to binary (this is because our nonrestoring division and square-root operators output binary quotient). Furthermore, since the binary input set is a subset of the redundant digit set, no binary-to-redundant number conversion is necessary at the input of the divider and square-root operators. We also present a fast, new conversion scheme for converting radix-2 redundant numbers to two's complement binary numbers, and use this to design a bit-parallel multiplier. This multiplier architecture requires fewer pipelining latches than conventional two's complement multipliers, and reduces the latency of the multiplication operation from (2W–1) to aboutW (whereW is the word-length), when pipelined at the bit-level.This research was supported by the Office of Naval Research under contract number N00014-J-91-1008. 相似文献

2.

On recoding in arithmetic algorithms

Milo? D. Ercegovac Tomás Lang 《The Journal of VLSI Signal Processing》1996,14(3):283-294

Recoding is the process of transforming between digit sets. It is used to reduce the cost and delay of the implementation of arithmetic algorithms, such as digit-recurrence and parallel algorithms for multiplication, division/square-root, and in compound operations. We present a simple and systematic basis for developing these recodings. 相似文献

3.

Further Reducing the Redundancy of a Notation Over a Minimally Redundant Digit Set

Marc Daumas David W. Matula 《The Journal of VLSI Signal Processing》2003,33(1-2):7-18

Redundant notations are used implicitly or explicitly in many digital designs. They have been studied in details and a general framework is known to reduce the redundancy of a notation down to the minimally redundant digit set. We present here an operator to further reduce the redundancy of such a representation. It does not reduce the number of allowed digits since removing one digit to a minimally redundant digit set is a conversion to a non redundant digit set and this is an expensive operation. Our operator introduces some correlation between the digits to reduce the number of possible redundant notations for any represented number. This reduction is visible in small useful operators like the elimination of leading zeros. We also present a key application with a CMOS Booth recoded multiplier. Our multiplier is able to accept both a redundant or a non redundant input with very little modifications and almost no penalty in time or space compared to state-of-the-art non redundant multipliers. 相似文献

4.

High-speed complex-number multiplications based on redundant binary representation of partial products

Kyung-Wook Shin Heung-Woo Jeon 《International Journal of Electronics》2013,100(6):683-702

The complex-number multiplier is one of the key arithmetic components for the baseband signal processing of modern digital communication systems such as channel equalization, timing recovery, modulation and demodulation. This paper presents two algorithms suitable for a high-speed complex-number multiplier, which are based on redundant binary (RB) representation of partial products. The basic idea behind our approach is to convert a pair of binary partial products into a RB form so that the post-addition/subtraction which is inevitable in the conventional methods based on binary multiplication, is eliminated. With the proposed algorithms, the complex-number multiplication is reduced to two RB multiplications, one for the real part and the other for the imaginary part. The RB multiplication is defined by an addition of RB partial products, and is performed in parallel without carry propagation from the least-significant digit to the most-significant digit. This work results not only in simplified arithmetic operations, but also in highly parallel and simple architecture when compared with conventional methods using binary multiplications. To demonstrate the algorithms, two test chips have been implemented using a 0.8µm CMOS technology. 相似文献

5.

高性能并行全冗余十进制乘法器的设计

下载免费PDF全文

张柳崔晓平董文雯《电子学报》2018,46(6):1519-1523

商业计算、金融分析等领域对高精度计算的需求对硬件十进制运算提出了越来越高的要求.已有的全冗余十进制乘法器由于全冗余加法器的结构复杂,已经给其性能的提升造成了瓶颈.本文优化设计了基于超载十进制数集（Overloaded Decimal Digit Set,ODDS）的全冗余ODDS加法器以降低其复杂度,并设计了一种新的基于该加法器的十进制压缩树模块.本文在部分积产生模块采用有符号的基-10编码和冗余的二-十进制（Binary Coded Decimal,BCD）编码快速产生十进制部分积.在最终积产生模块采用优化的编码转换电路快速产生BCD-8421乘积.实验结果显示所设计的并行全冗余十进制乘法器速度较快、面积较小. 相似文献

6.

一种用于公钥密码系统的新型可变Radix快速乘法硬件算法

盖伟新《电子学报》1995,23(11):77-80

本文提出了一种新型的可变ｒａｄｉｘ快速乘法硬件算法，算法中，采用了二进制数的冗余数表示方法，使二个大数（大到５１２ｂｉｔ位或更大）的相加在Ｏ（１）时间内完成而无需等待进位；其次，提出了可变ｒａｄｉｘ快速乘法思想，使算法比ｒａｄｉｘ－４的乘法算法速度提高３３％，比ｒａｄｉｘ－８的乘法算法速度提高１１％而硬件实现更为简单，算法还能克服在较坏和最坏条件下，ｒａｄｉｘ－８乘法算法速度严重下降的缺陷，是一种可以作为核心运算有效地使用在许多公钥密码体制（如ＲＳＡ）硬件ＶＬＳＩ实现中的新型快速算法。相似文献

7.

Weighted two-valued digit-set encodings: unifying efficient hardware representation schemes for redundant number systems

Jaberipur G. Parhami B. Ghodsi M. 《IEEE transactions on circuits and systems. I, Regular papers》2005,52(7):1348-1357

We introduce the notion of two-valued digit (twit) as a binary variable that can assume one of two different integer values. Posibits, or simply bits, in {0,1} and negabits in {-1,0}, commonly used in two's-complement representations and (n,p) encoding of binary signed digits, are special cases of twits. A weighted bit-set (WBS) encoding, which generalizes the two's-complement encoding by allowing one or more posibits and/or negabits in each radix-2 position, has been shown to unify many efficient implementations of redundant number systems. A collection of equally weighted twits, including ones with noncontiguous values (e.g., {-1,1} or {0,2}), can lead to wider representation range without the added storage and interconnection costs associated with multivalued digit sets. We present weighted twit-set (WTS) encodings as a generalization of WBS encodings, examine key properties of this new class of encodings, and show that any redundant number system (e.g., generalized signed-digit and hybrid-redundant systems), including those that are based on noncontiguous and/or zero-excluded digit sets, is faithfully representable by WTS encoding. We highlight this broad coverage by a tree chart having WTS representations at its root and various useful redundant representations at its many internal nodes and leaves. We further examine how highly optimized conventional components such as standard full/half-adders and compressors may be used for arithmetic on WTS-encoded operands, thus allowing highly efficient and VLSI-friendly circuit implementations. For example, focusing on the WBS-like subclass of WTS encodings, we describe a twit-based implementation of a particular stored-transfer representation which offers area and speed advantages over other similar designs based on WBS and hybrid-redundant representations. 相似文献

8.

A fast VLSI adder architecture

Srinivas H.R. Parhi K.K. 《Solid-State Circuits, IEEE Journal of》1992,27(5):761-767

An architecture for performing fixed-point, high-speed, two's-complement, bit-parallel addition by using the carry-free property of redundant arithmetic and a fast parallel redundant-to-binary conversion scheme is presented. The internal numbers are represented in radix-2 redundant digit form, and the inputs and the output of the adder are represented in two's-complement binary form. The adder operands are added first in a radix-2 redundant adder to produce the result in radix-2 digit (-1, 0, 1) form. This result is converted to two's-complement binary form using the parallel conversion scheme. The high-speed conversion for long words is achieved through the use of a novel sign-select operation. The proposed adder, referred to as the sign-select conversion adder, is faster than all previous high-speed two's-complement binary adders for large word lengths. The implementation is highly regular with repeated modules and is very well suited for VLSI implementation 相似文献

9.

Memristor based N-bits redundant binary adder

《Microelectronics Journal》2015,46(3):207-213

This paper introduces a memristor based N-bits redundant binary adder architecture for canonic signed digit code CSDC as a step towards memristor based multilevel ALU. New possible solutions for multi-level logic designs can be established by utilizing the memristor dynamics as a basis in the circuit realization. The proposed memristor-based redundant binary adder circuit tries to achieve the theoretical advantages of the redundant binary system, and to eliminate the carry (borrow) propagation using signed digit representation. The advantage of carry elimination in the addition process is that it makes the speed independent of the operands length which speeds up all arithmetic operations. One memristor is sufficient for both the addition process and for storing the final result as a memory cell. The adder operation has been validated via different cases for 1-bit and 3-bits addition using HP memristor model and PSPICE simulation results. 相似文献

10.

Design, Implementation and Analysis of a New Redundant CORDIC Processor with Constant Scaling Factor and Regular Structure 总被引：1，自引：0，他引：1

Shen-Fu Hsiao Jen-Yin Chen 《The Journal of VLSI Signal Processing》1998,20(3):267-278

A new high-speed redundant CORDIC processor is designed and implemented based on the double rotation method, which turns out to be the two-dimensional (2D) Householder CORDIC, a special case of the generalized Householder CORDIC in the 2D Euclidean vector space. The new processor has the advantages of regular structure and high throughput rate. The pipelined structure with radix-2 signed-digit (SD) redundant arithmetic is adopted to reduce the carry-propagation delay of the adders while the digit-serial structure alleviates the burden of the hardware cost and I/O requirement. Compared to previously proposed designs, the new CORDIC processor preserves the constant scaling factor, an important merit of the original CORDIC, and thus does not require any complicated division or square-root operations for variable scaling factor calculation. Furthermore, the processor is well suited to VLSI implementation since it does not call for any irregularly inserted correcting iterations. Both angle calculation mode for computing trigonometric function and vector rotation mode for plane rotations are supported. Practical VLSI chip implementation of the fixed-point redundant CORDIC processor using 0.6 m standard cell library is given including detailed numerical error analysis. 相似文献

11.

Fully redundant decimal addition and subtraction using stored-unibit encoding

Amir Kaivani Author Vitae 《Integration, the VLSI Journal》2010,43(1):34-41

Decimal computer arithmetic is experiencing a revived popularity, and there is quest for high-performance decimal hardware units. Successful experiences on binary computer arithmetic may find grounds in decimal arithmetic. For example, the traditional fully redundant (i.e., the result and both of the operands are represented in a redundant format) and semi-redundant (i.e., the result and only one of the operands are redundant) binary addition schemes have influenced the design and implementation of similar decimal arithmetic units. However, special comparison and correction steps are required when decimal arithmetic algorithms are implemented on binary hardware. To circumvent these difficulties, alternative encodings of decimal digits and a variety of decimal arithmetic algorithms have been examined by many researchers over decades. In this paper we offer a new redundant decimal digit set [−8, 9] and a fully redundant addition/subtraction scheme. The proposed digit set, faithfully encoded as a mix of posibits, negabits, and unibits, is shown to obviate the need for any compare-to-9 operations and leads to minimal penalty subtraction using the addition circuitry. Moreover, conversion from the standard BCD encoding to the proposed stored-unibit encoding is possible with the latency of one logic level. However, the reverse conversion, like any other redundant to nonredundant conversion, involves carry propagation. 相似文献

12.

一种10位50 MSPS CMOS流水线A/D转换器 总被引：1，自引：1，他引：0

邬成刘文平权海洋罗来华《微电子学》2004,34(6):682-684,688

介绍了一种CMOS流水线结构高速高精度A／D转换器，该器件具有50MHz工作频率和10位分辨率。设计采用双采样技术，提高了有效采样率；由于运用了冗余数字校正技术，可以采用低功耗的动态比较器。对转换器的单元结构进行了优化，并对主要电路进行了分析。相似文献

13.

Radix-4 Vectoring CORDIC Algorithm and Architectures

J. Villalba E.L. Zapata E. Antelo J.D. Bruguera 《The Journal of VLSI Signal Processing》1998,19(2):127-147

In this work we extend the radix-4 CORDIC algorithm to the vectoring mode (the radix-4 CORDIC algorithm was proposed recently by the authors for the rotation mode). The extension to the vectoring mode is not straightforward, since the digit selection function is more complex in the vectoring case than in the rotation case; as in the rotation mode, the scale factor is not constant. Although the radix-4 CORDIC algorithm in vectoring mode has a similar recurrence as the radix-4 division algorithm, there are specific issues concerning the vectoring algorithm that demand dedicated study. We present the digit selection for nonredundant and redundant arithmetic (following two different approaches: arithmetic comparisons and table look-up), the computation and compensation of the scale factor, and the implementation of the algorithm (with both types of digit selection) in a word-serial architecture. When compared with conventional radix-2 (redundant and non-redundant) architectures, the radix-4 algorithms present a significant speed up for angle calculation. For the computation of the magnitude the speed up is very slight, due to the nonconstant scale factor in the radix-4 algorithm. 相似文献

14.

Complex Square Root with Operand Prescaling

Milo&#; D. Ercegovac Jean-Michel Muller 《The Journal of VLSI Signal Processing》2007,49(1):19-30

We propose a radix-r digit-recurrence algorithm for complex square-root. The operand is prescaled to allow the selection of square-root digits by rounding of the residual. This leads to a simple hardware implementation of digit selection. Moreover, the use of digit recurrence approach allows correct rounding of the result if needed. The algorithm, compatible with the complex division presented in Ercegovac and Muller (“Complex Division with Prescaling of the Operands,” in Proc. Application-Specific Systems, Architectures, and Processors (ASAP’03), The Hague, The Netherlands, June 24–26, 2003), and its design are described. We also give rough estimates of its latency and cost with respect to implementation based on standard floating-point instructions as used in software routines for complex square root.

Jean-Michel MullerEmail:

相似文献

15.

Conversion of redundant binary into two's complementrepresentations

Herrfeld A. Hentschke S. 《Electronics letters》1995,31(14):1132-1133

A static CMOS circuit that converts a redundant binary representation into a two's complement representation is presented. The structure and time delay of the resulting logic are identical to a standard carry look-ahead logic for adders. The resulting layout is very regular, has no diffusion gaps and can be expanded to any desired look-ahead length. The circuit can be used for both multiplication and division 相似文献

16.

A power-driven multiplication instruction-set design method for ASIPs

Wu-An Kuo TingTing Hwang Wu A.C.-H. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(1):81-85

This paper presents a novel power-driven multiplication instruction-set design method for application-specific instruction-set processors (ASIPs). Based on a dual-and-configurable-multiplier structure, our proposed method devises a multiplication instruction set for low-power ASIPs. Our method exploits the execution sequences of multiplication instructions and effective bit widths of variables to reduce power consumed by redundant multiplication bits while minimizing the multiplication execution time. Experimental results on a set of DSP programs demonstrate that our proposed method achieves significant power reduction (up to 18.53%) and execution time improvement (up to 10.43%) with 18% area overhead. 相似文献

17.

Constant-time addition with hybrid-redundant numbers: Theory and implementations

Ghassem Jaberipur Author Vitae Behrooz Parhami^{Author Vitae} 《Integration, the VLSI Journal》2008,41(1):49-64

Hybrid-redundant number representation has provided a flexible framework for digit-parallel addition in a manner that facilitates area-time tradeoffs for VLSI implementations via arbitrary spacing of redundant digit positions within an otherwise nonredundant representation. We revisit the hybrid redundancy scheme, pointing out limitations such as representational asymmetry, lack of representational closure in certain adder implementations, and difficulties in subtraction and carry acceleration. Given the intuitiveness of the hybrid redundancy concept and its potential for describing practically useful redundant number systems, we are motivated to extend it within the framework of weighted bit-set encodings to circumvent the aforementioned problems. The extension is based mainly on allowing negatively weighted bits (negabits), as well as standard posibits, to appear in nonredundant positions. Our extended hybrid redundancy scheme provides for arbitrary spacing of redundant positions in symmetric digit sets, without any degradation in arithmetic efficiency, while at the same time allowing low-latency subtraction by means of the same circuitry that is used for addition. Finally, we describe how inverted encoding of negabits leads to the exclusive use of unmodified standard full/half-adder, counter, and compressor cells, with no extra inverters, and to the direct applicability of conventional carry acceleration techniques in constant-time addition. 相似文献

18.

Novel Radix Finite Field Multiplier for GF(2m)

Mekhallalati M.C. Ashur A.S. Ibrahim M.K. 《Journal of Signal Processing Systems》1997,15(3):233-245

In this paper, a new High-Radix Finite Field multiplication algorithm for GF(2^m) is proposed for the first time. The proposed multiplication algorithm can operate in a Digit-serial fashion, and hence can give a trade-off between the speed, the area , the input/output pin limitation, and the low power consumption by simply varying the digit size. A detailed example of a new Radix-16 GF(2^m) Digit-Serial multiplication architecture adopting the proposed algorithm illustrates a speed improvement of 75% when compared to conventional Radix-2 bit-serial realization. This is made more significant when it is noted that the speed improvement of 75% was achieved at the expense of only 2.3 times increase in the hardware requirements of the proposed architecture. 相似文献

19.

Novel Radix Finite Field Multiplier for GF(2^m)

M.C. Mekhallalati A.S. Ashur M.K. Ibrahim 《The Journal of VLSI Signal Processing》1997,15(3):233-245

In this paper, a new High-Radix Finite Field multiplication algorithm for GF(2^m) is proposed for the first time. The proposed multiplication algorithm can operate in a Digit-serial fashion, and hence can give a trade-off between the speed, the area , the input/output pin limitation, and the low power consumption by simply varying the digit size. A detailed example of a new Radix-16 GF(2^m) Digit-Serial multiplication architecture adopting the proposed algorithm illustrates a speed improvement of 75% when compared to conventional Radix-2 bit-serial realization. This is made more significant when it is noted that the speed improvement of 75% was achieved at the expense of only 2.3 times increase in the hardware requirements of the proposed architecture. 相似文献

20.

A Radix-4 New Svobota-Tung Divider with Constant Timing Complexity for Prescaling

Jen-Shiun Chiang Min-Shiou Tsai 《The Journal of VLSI Signal Processing》2003,33(1-2):117-124

A new floating-point division architecture that complies with the IEEE 754-1985 standard is proposed in this paper. This architecture is based on the New Svoboda-Tung (NST) division algorithm and the radix-4 MROR (maximally redundant maximally recoded) signed digit number system. In NST division, the divisor and dividend must be prescaled. We summarize a general systematic method to accomplish the prescaling, and we also propose a hardware scheme such that the timing complexity is constant regardless of the bit length of the divisor. For the divider implementation, a new MROR signed digit adder with carry free characteristic is proposed for addition and subtraction, and this adder can improve the cycle time significantly. A 32-b/32-b radix-4 divider is thus designed in Verilog HDL; the simulation results show that this architecture is implementable using currently available libraries. The hardware complexity and performance of this divider is competitive with conventional SRT dividers. 相似文献