期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Low‐Power and Low‐Hardware Bit‐Parallel Polynomial Basis Systolic Multiplier over GF(2m) for Irreducible Polynomials

Sudha Ellison Mathe Lakshmi Boppana 《ETRI Journal》2017,39(4):570-581

Multiplication in finite fields is used in many applications, especially in cryptography. It is a basic and the most computationally intensive operation from among all such operations. Several systolic multipliers are proposed in the literature that offer low hardware complexity or high speed. In this paper, a bit‐parallel polynomial basis systolic multiplier for generic irreducible polynomials is proposed based on a modified interleaved multiplication method. The hardware complexity and delay of the proposed multiplier are estimated, and a comparison with the corresponding multipliers available in the literature is presented. Of the corresponding multipliers, the proposed multiplier achieves a reduction in the hardware complexity of up to 20% when compared to the best multiplier for m = 163. The synthesis results of application‐specific integrated circuit and field‐programmable gate array implementations of the proposed multiplier are also presented. From the synthesis results, it is inferred that the proposed multiplier achieves low power consumption and low area complexitywhen compared to the best of the corresponding multipliers. 相似文献

2.

Systolic and Non-Systolic Scalable Modular Designs of Finite Field Multipliers for Reed–Solomon Codec

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(6):747-757

In this paper, we present efficient algorithms for modular reduction to derive novel systolic and non-systolic architectures for polynomial basis finite field multipliers over $GF(2^{m})$ to be used in Reed–Solomon (RS) codec. Using the proposed algorithm for unit degree reduction and optimization of implementation of the logic functions in the processing elements (PEs), we have derived an efficient bit-parallel systolic design for finite field multiplier which involves nearly two-thirds of the area-complexity of the existing design having the same time-complexity. The proposed modular reduction algorithms are also used to derive efficient non-systolic serial/parallel designs of field multipliers over $GF(2^{8})$ with different digit-sizes, where the critical path and the hardware-complexity are further reduced by optimizing the implementation of modular reduction operations and finite field accumulations. The proposed bit-serial design involves nearly 55% of the minimum of area, and half the minimum of area-time complexity of the existing bit-serial designs. Similarly, the proposed digit-serial/parallel designs involve significantly less area, and less area-time complexities compared with the existing designs of the same digit-size. By parallel modular reduction through multiple degrees followed by appropriate logic-level sub-expression sharing; a hardware-efficient regular and modular form of a balanced-tree bit-parallel non-systolic multiplier is also derived. The proposed bit-parallel non-systolic pipelined design involves less than 65% of the area and nearly two-thirds of the area-time complexity of the existing bit-parallel design for a RS codec, while the non-pipelined form offers nearly 25% saving of area with less time-complexity. 相似文献

3.

Low-Power Finite Impulse Response (FIR) Filter Design Using Two-Dimensional Logarithmic Number System (2DLNS) Representations

Mahzad Azarmehr Majid Ahmadi 《Circuits, Systems, and Signal Processing》2012,31(6):2075-2091

In most real-time DSP applications, high performance is a prime target. Here, performance may be interpreted as a combination of higher speed, lower power consumption, sufficient precision, and VLSI area efficiency. It has been experienced that efficient digital multiplication is a prerequisite for high-speed DSP applications. The MDLNS, which has similar properties to the classical LNS, is an alternative approach to conventional number systems for performing multiplication, through using parallel small adders. In addition, by applying recursive multiplication scheme, larger word length multiplication can be performed by use of several small multipliers. The concept of recursive multiplication can be applied to 2DLNS structures, resulting in more efficient digital multipliers. In this work, the recursive 2DLNS-based multipliers have been applied to FIR filter design. These applications demonstrate the superiority of our architectures in terms of VLSI area and power consumption. 相似文献

4.

An Efficient Look-up Table-based Approach for Multiplication over GF(2 m ) Generated by Trinomials

Bimal K. Meher Pramod K. Meher 《Circuits, Systems, and Signal Processing》2013,32(6):2623-2638

In this paper, we present an efficient look-up table (LUT)-based approach to design multipliers for GF(2^m) generated by irreducible trinomials. A straightforward LUT-based multiplication requires a table of size (m×2^m) bits for the Galois field of degree m. The LUT size, therefore, becomes quite large for the fields of large degrees recommended by the National Institute of Standards and Technology (NIST). Keeping that in view, we have proposed a digit-serial LUT-based design, where operand bits are grouped into digits of fixed width, and multiplication is performed in serial/parallel manner. We restrict the digit size to 4 to store only 16 words in the LUT to have lower area-delay complexity. We have also proposed a digit-parallel LUT-based design for high-speed applications, using the same LUT as the digit-serial design, at the cost of some additional multiplexors and combinational logic for parallel modular reductions and additions. We have presented a simple circuit for the initialization of LUT content, which can be used to update the LUT in three cycles whenever required. The proposed digit-serial design involves less area-complexity and less time-complexity than those of the existing LUT-based designs. The proposed digit-parallel design offers nearly 28 % improvement in area-delay product over the best of the existing LUT-based designs. NIST has recommended five binary finite fields for elliptic curve cryptography, out of which two are generated by the trinomials Q(x)=x ²³³+x ⁷⁴+1 and Q(x)=x ⁴⁰⁹+x ⁸⁷+1. In this paper, we have designed a reconfigurable multiplier that can be used for both these fields. The proposed reconfigurable multiplier is shown to have a negligible reconfiguration overhead and would be useful for cryptographic applications. 相似文献

5.

A digit-serial multiplier for finite field GF(2/sup m/)

Chang Hoon Kim Chun Pyo Hong Soonhak Kwon 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(4):476-483

In this paper, an efficient digit-serial systolic array is proposed for multiplication in finite field GF(2/sup m/) using the standard basis representation. From the least significant bit first multiplication algorithm, we obtain a new dependence graph and design an efficient digit-serial systolic multiplier. If input data come in continuously, the proposed array can produce multiplication results at a rate of one every /spl lceil/m/L/spl rceil/ clock cycles, where L is the selected digit size. Analysis shows that the computational delay time of the proposed architecture is significantly less than the previously proposed digit-serial systolic multiplier. Furthermore, since the new architecture has the features of regularity, modularity, and unidirectional data flow, it is well suited to VLSI implementation. 相似文献

6.

An efficient and high-speed VLSI implementation of optimal normal basis multiplication over GF(2m)

《Integration, the VLSI Journal》2016

Finite field multiplication is one of the most important operations in the finite field arithmetic and the main and determining building block in terms of overall speed and area in public key cryptosystems. In this work, an efficient and high-speed VLSI implementation of the bit-serial, digit-serial and bit-parallel optimal normal basis multipliers with parallel-input serial-output (PISO) and parallel-input parallel-output (PIPO) structures are presented. Two general multipliers, namely, Massey–Omura (MO) and Reyhani Masoleh–Hassan (RMH) are considered as case study for implementation. These multipliers are constructed by using AND, XOR–AND and XOR tree components. In the MO multiplier, to have strong input signals and have a better implementation, the row of AND gates are implemented by using inverter and NOR components. Also the XOR–AND component in the RMH structure is implemented using a new low-cost structure. The XOR tree in both multipliers consists of a high number of logic stages and many inputs; therefore, to optimally decrease the delay and increase the drive ability of the circuit for different loads, the logical effort method is employed as an efficient method for sizing the transistors. The multipliers are first designed for different load capacitances using different structures and different number of stages. Then using the logical effort method and a new proposed 4-input XOR gate structure, the circuits are modified for acquiring minimum delay. Using 0.18 μm CMOS technology, the bit-serial, digit-serial and bit-parallel structures with type-1 and type-2 optimal normal basis are implemented over the finite fields GF(2²²⁶) and GF(2²³³) respectively. The results show that the proposed structures have better delay and area characteristics compared to previous designs. 相似文献

7.

A Low‐Complexity 128‐Point Mixed‐Radix FFT Processor for MB‐OFDM UWB Systems

Sang‐In Cho Kyu‐Min Kang 《ETRI Journal》2010,32(1):1-10

In this paper, we present a fast Fourier transform (FFT) processor with four parallel data paths for multiband orthogonal frequency‐division multiplexing ultra‐wideband systems. The proposed 128‐point FFT processor employs both a modified radix‐2⁴ algorithm and a radix‐2³ algorithm to significantly reduce the numbers of complex constant multipliers and complex booth multipliers. It also employs substructure‐sharing multiplication units instead of constant multipliers to efficiently conduct multiplication operations with only addition and shift operations. The proposed FFT processor is implemented and tested using 0.18 µm CMOS technology with a supply voltage of 1.8 V. The hardware‐ efficient 128‐point FFT processor with four data streams can support a data processing rate of up to 1 Gsample/s while consuming 112 mW. The implementation results show that the proposed 128‐point mixed‐radix FFT architecture significantly reduces the hardware cost and power consumption in comparison to existing 128‐point FFT architectures. 相似文献

8.

Scalable Gaussian Normal Basis Multipliers over GF(2<Superscript><Emphasis Type="Italic">m</Emphasis></Superscript>) Using Hankel Matrix-Vector Representation

Chiou-Yng Lee Che Wun Chiou 《Journal of Signal Processing Systems》2012,69(2):197-211

This work presents a novel scalable multiplication algorithm for a type-t Gaussian normal basis (GNB) of GF(2^m). Utilizing the basic characteristics of MSD-first and LSD-first schemes with d-bit digit size, the GNB multiplication can be decomposed into n(n + 1) Hankel matrix-vector multiplications. where n = (mt + 1)/d. The proposed scalable architectures for computing GNB multiplication comprise of one d × d Hankel multiplier, four registers and one final reduction polynomial circuit. Using the relationship of the basis conversion from the GNB to the normal basis, we also present the modified scalable multiplier which requires only nk Hankel multiplications, where k = mt/2d if m is even or k = (mt − t + 2)/2d if m is odd. The developed scalable multipliers have the feature of scalability. It is shown that, as the selected digit size d ≥ 8, the proposed scalable architectures have significantly lower time-area complexity than existing digit-serial multipliers. Moreover, the proposed architectures have the features of regularity, modularity, and local interconnection ability. Accordingly, they are well suited for VLSI implementation. 相似文献

9.

Efficient Reconfigurable Implementation of Canonical and Normal Basis Multipliers Over Galois Fields GF(2^m) Generated by AOPs

J.L. Ima?a J.M. Sánchez 《The Journal of VLSI Signal Processing》2006,42(3):285-296

Galois fields GF(2^m) are used in modern communication systems such as computer networks, satellite links, or compact disks, and they play an important role in a wide number of technical applications. They use arithmetic operations in the Galois field, where the multiplication is the most important and one of the most complex operations. Efficient multiplier architectures are therefore specially important. In this paper, a new method for multiplication in the canonical and normal basis over GF(2^m) generated by an AOP (all-one-polynomial), which we have named the transpositional method, is presented. This new approach is based on the grouping and sharing of subexpressions. The theoretical space and time complexities of the bit-parallel canonical and normal basis multipliers constructed using our approach are equal to the smallest ones found in the literature for similar methods, but the practical implementation over reconfigurable hardware using our method reduces the area requirements of the multipliers. José Luis Ima?a is Assistant Professor of Computer Architecture in the Department of Computer Architecture, Complutense University of Madrid (Spain). He received the Ph.D. degree in Physics from the Complutense University in 2003. His current research interests are computer architectures, VLSI technologies, logic design and verification, finite field arithmetic and cryptography. Juan M. Sánchez-Pérez is Professor of Computer Architecture in the Department of Computer Science, University of Extremadura, Spain. He received a PhD degree in Physics from the University Complutense of Madrid in 1976. His research interests are modern computer architectures, VLSI technologies and logic design. 相似文献

10.

High‐Performance Low‐Power FFT Cores

Wei Han Ahmet T. Erdogan Tughrul Arslan Mohd. Hasan 《ETRI Journal》2008,30(3):451-460

Recently, the power consumption of integrated circuits has been attracting increasing attention. Many techniques have been studied to improve the power efficiency of digital signal processing units such as fast Fourier transform (FFT) processors, which are popularly employed in both traditional research fields, such as satellite communications, and thriving consumer electronics, such as wireless communications. This paper presents solutions based on parallel architectures for high throughput and power efficient FFT cores. Different combinations of hybrid low‐power techniques are exploited to reduce power consumption, such as multiplierless units which replace the complex multipliers in FFTs, low‐power commutators based on an advanced interconnection, and parallel‐pipelined architectures. A number of FFT cores are implemented and evaluated for their power/area performance. The results show that up to 38% and 55% power savings can be achieved by the proposed pipelined FFTs and parallel‐pipelined FFTs respectively, compared to the conventional pipelined FFT processor architectures. 相似文献

11.

Two systolic architectures for modular multiplication

Wei-Chang Tsai Shung C.B. Sheng-Jyh Wang 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(1):103-107

The authors present two systolic architectures to speed up the computation of modular multiplication in RSA cryptosystems. In the double-layer architecture, the main operation of Montgomery's algorithm is partitioned into two parallel operations after using the precomputation of the quotient bit. In the non-interlaced architecture, we eliminate the one-clock-cycle gap between iterations by pairing off the double-layer architecture. We compare our architectures with some previously proposed Montgomery-based systolic architectures, on the basis of both modular multiplication and modular exponentiation. The comparisons indicate that our architectures offer the highest speed, lower hardware complexity, and lower power consumption 相似文献

12.

A comparative study of energy/power consumption in parallel decimal multipliers

《Microelectronics Journal》2014,45(6):775-780

Decimal multiplication is a frequent operation with inherent complexity in implementation. Commercial and financial applications require working with decimal numbers while it has been shown that if we convert decimal number to binary ones, this will negatively influence the preciseness required for these applications. Existing research works on parallel decimal multipliers have mainly focused on latency and area as two major factors to be improved. However, energy/power consumption is another prominent issue in today׳s digital systems. While the energy consumption of parallel decimal multipliers has not been addressed in previous works, in this paper we present a comparative study of parallel decimal multipliers, considering energy/power consumption, leakage and dynamic power consumption, beside latency and area. This study can provide some guidelines for EDA tools and hardware designers to choose proper multiplier based on given applications and design constraints. All designs in were implemented using VHDL and synthesized in Design-Compiler toolbox with TSMC 45 nm technology file. 相似文献

13.

Bit-level pipelined digit-serial multiplier

A. AGGOUN A. ASHUR M. K. IBRAHIM 《International Journal of Electronics》2013,100(6):1209-1219

A new cell architecture for high performance digit-serial computation is presented. The design of this cell is based on the feed forward of the carry digit, which allows a high level of pipelining to increase the throughput rate with minimum latency. This will give designers greater flexibility in finding the best trade-off between hardware cost and throughput rate. A twin-pipe architecture to double the throughput rate of digit-serial/parallel multipliers is also presented. The effects of the number of pipelining levels and the twin architecture on the throughput rate and hardware cost are presented. A two's complement digit-serial/parallel multiplier which can operate on both negative and positive numbers is also presented. 相似文献

14.

Reconfigurable Rounding Based Approximate Multiplier for Energy Efficient Multimedia Applications

Garg Bharat Patel Sujit 《Wireless Personal Communications》2021,118(2):919-931

The approximate design has emerged as a revolutionary design paradigm to obtain energy efficient digital signal processing cores while exhibiting acceptable accuracy. In different signal processing architectures, multiplier is the prime arithmetic unit and significantly influences the performance of these cores. Therefore, four novel energy efficient rounding based approximate (RBA) multiplier architectures are proposed in this paper. These multipliers first approximate input operands to the nearest power of two values and then achieve multiplication using few adders and shifters. The proposed RBA multipliers significantly reduce implementation complexity and provide higher energy efficiency. Further, a novel reconfigurable rounding based approximate (RRBA) multiplier is proposed to achieve desired performance-quality tradeoff. Further, the performance of proposed RBA and RRBA multipliers is evaluated and analysed over the existing approximate multiplier architectures. The proposed 8-bit RBA0 requires 59.8% (54.7%) reduced area (delay) compared to the existing approximate multiplier. Finally, the efficacy of the proposed multipliers is demonstrated in the application by implementing Gaussian filters embedded with existing and proposed approximate multipliers. The Gaussian filter designed using RBA0 provides 32.5% reduced energy consumption over the filter with existing multiplier.

相似文献

15.

An Efficient Distributed Arithmetic-Based Realization of the Decision Feedback Equalizer

M. Surya Prakash Rafi Ahamed Shaik Sagar Koorapati 《Circuits, Systems, and Signal Processing》2016,35(2):603-618

A distributed arithmetic (DA)-based decision feedback equalizer architecture for IEEE 802.11b PHY scenarios is presented. As the transmission data rate increases, the hardware complexity of the decision feedback equalizer increases due to requirement for large number of taps in feed forward and feedback filters. DA, an efficient technique that uses memories for the computation of inner product of two vectors, has been used since DA-based realization of filters can lead to great computational savings. For higher-order filters, the memory-size requirement in DA would be high, and so ROM decomposition has been employed. The speed is further increased by employing digit-serial input operation. Two architectures have been presented, namely the direct-memory architecture and reduced-memory architecture where the later is derived using the former. A third architecture has also been presented where the offset-binary coding scheme is employed along with the ROM decomposition and digit-serial variants of DA. Synthesis results on Altera Cyclone III EP3C55F484C6 FPGA show that the proposed DA-based implementations are free of hardware multipliers and use less number of hardware resources compared to the multiply-and-accumulate-based implementation. 相似文献

16.

An approach for fixed coefficient RNS-based FIR filter

Kotha Srinivasa Reddy Subhendu Kumar Sahoo 《International Journal of Electronics》2013,100(8):1358-1376

相似文献

17.

Low-complexity bit-parallel systolic multipliers over GF(2)

Chiou-Yng Lee^{Author Vitae} 《Integration, the VLSI Journal》2008,41(1):106-112

This paper presents new time-dependent and time-independent multiplication algorithms over finite fields GF(2^m) by employing an interleaved conventional multiplication and a folded technique. The proposed algorithm allows efficient realization of the bit-parallel systolic multipliers. The results show that the proposed time-independent multiplier saves about 54% space complexity as compared to other related multipliers for polynomial and dual bases of GF(2^m). The proposed architectures include the features of regularity, modularity and local interconnection. Accordingly, it is well suited for VLSI implementation. 相似文献

18.

Power Reduction Technique in Coefficient Multiplications Through Multiplier Characterization

Sangjin Hong Shu-Shin Chin Suhwan Kim Wei Hwang 《The Journal of VLSI Signal Processing》2004,38(2):101-113

This paper presents a multiplier power reduction technique for low-power DSP applications through utilization of coefficient optimization. The optimization is implementation dependent in that the multipliers are assumed to be designed in either ASIC or full-custom architectures for general purpose multiplication. The paper first describes a model characterizing the power consumption of the multiplier. Then the coefficient optimized made based on this model. This methodology is applicable to multiplications requiring a large set of coefficients and random data sets. We can accurately estimate the actual power dissipation of the multipliers using the characterization technique. The coefficient optimization based on the power model can save as much as 34.02%. 相似文献

19.

一种高效的可伸缩分组并行有限域乘法器及VLSI实现

顾震宇曾晓洋陈超龚绿怡章倩苓《微电子学与计算机》2003,20(4):50-53,56

文章提出了基于全1多项式基的可伸缩分组并行有限域乘法器结构，并按照最低位先入和最高位先入的方式分别进行了算法描述，分别称为AOPBLSDM（AOP-Based LSD-first Digital-Serial Multiplier）和AOPBMSDM（AOP-Based MSD-first Digital-Serial Multiplier)。该乘法器的结构规整，适于VLSI实现；同时由于该乘法器具有面积和速度可伸缩度大的特点，因而可以在不同的应用场合下找到最佳的实现方案。理论分析及ASIC综合实现结果均表明，本文所提出的结构在面积和速度上具有一定的优势。相似文献

20.

High speed merged array multiplication

Farhad Fuad Islam Keikichi Tamaru 《The Journal of VLSI Signal Processing》1995,10(1):41-52

Multiplication-accumulation operations described by represent the fundamental computation involved in many digital signal processing algorithms. For high speed signal processing, one obvious approach to realize the above computation in VLSI is to employm discrete multipliers working in parallel. However, a more area efficient approach is offered by the merged multiplication technique [5]. But the principal drawback of the conventional merged technique is its longer latency than the former discrete approach. This work proposes a hardware algorithm for merged array multiplication which eliminates this drawback and achieves significant improvement in latency when compared with the conventional scheme for merged multiplication. The proposed algorithm utilizes multiple wave front computation as opposed to the traditional approach where computation in an array multiplier is carried out by a single wave front. The improvement in latency by the proposed approach is greater than 40% (form>2) when compared with a conventional approach to merged multiplication. The consequent cost in the form of additional requirement of VLSI area is found to be rather small. In this paper, we provide a thorough analytic discussion on the proposed algorithm and support it by experimental results. 相似文献