期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Decimal SRT Square Root: Algorithm and Architecture

Amir Kaivani Seok-Bum Ko 《Circuits, Systems, and Signal Processing》2013,32(5):2137-2150

Given the popularity of decimal arithmetic, hardware implementation of decimal operations has been a hot topic of research in recent decades. Besides the four basic operations, the square root can be implemented as an instruction directly in the hardware, which improves the performance of the decimal floating-point unit in the processors. Hardware implementation of decimal square rooters is usually done using either functional or digit-recurrence algorithms. Functional algorithms, entailing multiplication per iteration, seem inadequate to use for decimal square roots, given the high cost of decimal multipliers. On the other hand, digit-recurrence square root algorithms, particularly SRT (this method is named after its creators, Sweeney, Robertson, and Tocher) algorithms, are simple and well suited for decimal arithmetic. This paper, with the intention of reducing the latency of the decimal square root operation while maintaining a reasonable cost, proposes an SRT algorithm and the corresponding hardware architecture to compute the decimal square root. The proposed fixed-point square root design requires n+3 cycles to compute an n-digit root; the synthesis results show an area cost of about 31K NAND2 and a cycle time of 40 FO4. These results reveal the 14 % speed advantage of the proposed decimal square root architecture over the fastest previous work (which uses a functional algorithm) with about a quarter of the area. 相似文献

2.

Design and Implementation of a Polynomial Basis Multiplier Architecture Over GF(2m)

Huong Ho 《Journal of Signal Processing Systems》2014,75(3):203-208

In this paper, the design and circuit implementation of a polynomial basis multiplier architecture over Galois Fields GF(2^m) is presented. The proposed architecture supports field multiplication of two m-term polynomials where m is a positive integer. Circuit implementations based on this parameterized architecture where m is configurable is suitable for applications in error control coding and cryptography. The proposed architecture offers low latency, polynomial basis multiplication where the irreducible polynomial P(x)?=?x ^m?+?p _kt.?x ^kt?+?…?+?p ₁.?x?+?1 with m ≥ kt + 4 is dynamically reconfigurable. Results of the complexity analysis show that the proposed architecture requires less logic resources compared to existing sequential polynomial basis multipliers. In terms of timing performance, the proposed architecture has a latency of m/4, which is the lowest among the multipliers found in literature for GF(2^m). 相似文献

3.

Implementation of the Exponential Function in a Floating-Point Unit

Álvaro Vázquez Elisardo Antelo 《The Journal of VLSI Signal Processing》2003,33(1-2):125-145

In this work we present an implementation of the exponential function in double precision, in a unit that supports IEEE floating-point arithmetic. As existing proposals, the implementation is based on the use of a floating-point multiplier and additional hardware. We decompose the computation into three subexponentials. The first and third subexponentials are computed in a conventional way (table look-up and polynomial approximation). The second subexponential is computed based on a transformation of the slow radix-2 digit-recurrence algorithm into a fast computation by using the multiplier and additional hardware. We present a design process that permits the selection of the most convenient trade-off between hardware complexity and latency. We discuss the algorithm, the implementation, and perform a rough comparison with three proposed designs. Our estimations indicate that the implementation proposed in this work presents better trade-off between hardware complexity and latency than the compared designs. 相似文献

4.

FPGA implementation of high-performance,resource-efficient Radix-16 CORDIC rotator based FFT algorithm

《Integration, the VLSI Journal》2020

The fast Fourier transform (FFT) is an algorithm widely used to compute the discrete Fourier transform (DFT) in real-time digital signal processing. High-performance with fewer resources is highly desirable for any real-time application. Our proposed work presents the implementation of the radix-2 decimation-in-frequency (R2DIF) FFT algorithm based on the modified feed-forward double-path delay commutator (DDC) architecture on FPGA device. Need for a complex multiplier to carry out the multiplication of complex twiddle factors and large memory to store the twiddle factors are the main concerns for FFT implementation. Propose work aims to address these issues. In this work, a high-performance radix-16 COordinate Rotational DIgital Computer (CORDIC) algorithm based rotator is proposed to carry out the complex twiddle factor multiplication. Further, CORDIC needs only rotational angles to carry out complex multiplication, which reduces the need for large memory to store the twiddle factors. To compute the total rotation for n-bit precision, our proposed radix-16 CORDIC algorithm takes n/4 iteration as compared to n iteration of the radix-2 CORDIC algorithm. Our proposed architecture of the radix-2 decimation-in-frequency (R2DIF) algorithm is implemented on a Virtex−7 series FPGA. Further, the detailed comparison is presented between our proposed FFT implementation and other recently proposed FFT implementations. Experimental results suggest that proposed implementation has less latency and hardware utilization as compared to recently proposed implementations. 相似文献

5.

Yield analysis for self-repairable MEMS devices

Xingguo Xiong Yu-Liang Wu Wen-Ben Jone 《Analog Integrated Circuits and Signal Processing》2008,56(1-2):71-81

In this paper, yield analysis for a self-repairable MEMS (SRMEMS) accelerometer design is proposed. The accelerometer consists of (n + m) identical modules: n of them serve as the main device, while the remaining m modules act as the redundancy. The yield model for MEMS redundancy repair is developed by statistical analysis. Based upon the yield model, the yield increase after redundancy repair for different m and n numbers is analyzed. ANSYS Monte Carlo simulation is used to estimate the yield of BISR/non-BISR MEMS devices with random point-stiction defects. The simulation results are in good agreement with the theoretical prediction based on our yield model. The simulation results also show that the SRMEMS leads to effective yield increase compared to non-BISRS design, especially for a moderate initial yield. 相似文献

6.

A counter-based pseudo-exhaustive pattern generator for BIST applications

I. Voyiatzis 《Microelectronics Journal》2004,35(11):927-935

Built-in self test (BIST) has been accepted as an efficient alternative to external testing, since it provides for both test generation and response verification operations, on chip. Pseudo-exhaustive BIST generators provide 100% fault coverage for detectable combinational faults with much fewer test vectors than exhaustive testing. An (n,k) adjacent bit pseudo-exhaustive test set (PETS) is a set of n-bit vectors in which all 2^k binary combinations appear to all adjacent k-bit groups of inputs.In this paper a novel, counter-based pseudo-exhaustive BIST generator is presented, termed pseudo-exhaustive counter (PEC). An n-stage PEC can generate (n,k) adjacent bit PETS for any value of k, k<n. This kind of testing is termed Generic pseudo-exhaustive testing. A Generic pseudo-exhaustive generator can be used to pseudo-exhaustively test more than one module. The PEC scheme is then extended to recursively generate all (n,k) adjacent bit pseudo-exhaustive tests sets for k<=n. This kind of testing is termed progressive pseudo-exhaustive testing in the literature; α progressive pseudo-exhaustive generator can pseudo-exhaustively test more than one modules in parallel.Comparisons of PEC with techniques proposed in the literature that can be used for Generic and Progressive pseudo-exhaustive testing reveal that PEC is more effective in terms of both hardware overhead and time required to complete the test. 相似文献

7.

Low-power compact composite field AES S-Box/Inv S-Box design in 65 nm CMOS using Novel XOR Gate

Nabihah Ahmad S.M. Rezaul Hasan 《Integration, the VLSI Journal》2013

The Substitution box (S-Box) forms the core building block of any hardware implementation of the Advanced Encryption Standard (AES) algorithm as it is a non-linear structure requiring multiplicative inversion. This paper presents a full custom CMOS design of S-Box/Inversion S-Box (Inv S-Box) with low power GF (2⁸) Galois Field inversions based on polynomial basis, using composite field arithmetic. The S-Box/Inv S-Box utilizes a novel low power 2-input XOR gate with only six devices to achieve a compact module implemented in 65 nm IBM CMOS technology. The area of the core circuit is only about 288 μm² as a result of this transistor level optimization. The hardware cost of the S-Box/Inv S-Box is about 158 logic gates equivalent to 948 transistors with a critical path propagation delay of 7.322 ns enabling a throughput of 130 Mega-SubBytes per second. This design indicates a power dissipation of only around 0.09 μW using a 0.8 V supply voltage, and, is suitable for applications such as RFID tags and smart cards which require low power consumption with a small silicon die. The proposed implementation compares favorably with other existing S-Box designs. 相似文献

8.

An area-efficient and low-power 64-point pipeline Fast Fourier Transform for OFDM applications

《Integration, the VLSI Journal》2017

In an orthogonal frequency division multiplexing (OFDM) based wireless systems, Fast Fourier Transform (FFT) is a critical block as it occupies large area and consumes more power. In this paper, we present an area-efficient and low power 16-bit word-width 64-point radix-2² and radix-2³ pipelined FFT architectures for an OFDM-based IEEE 802.11a wireless LAN baseband. The designs are derived from radix-2^k algorithm and adopt a Single-Path Delay Feedback (SDF) architecture for hardware implementation. To eliminate the complex multipliers and read-only memory (ROM) which is used for internal storage of twiddle factor coefficients, the proposed 64-point FFT employs a Canonical Signed Digit (CSD) complex constant multiplier using adders, multiplexers and shifters. The complex constant multiplier (CCM) is modified using common sub-expression sharing block that reduces the area of the design. The proposed radix-2² and radix-2³ pipelined FFT architectures are modeled and implemented using TSMC 180 nm CMOS technology with a supply voltage of 1.8 V. The implementation results show that the proposed architectures significantly reduces the hardware cost and power consumption in comparison to existing 64-point FFT architectures. 相似文献

9.

Improved Scalar Multiplication on Elliptic Curves Defined over F2mn

Dong Hoon Lee Seongtaek Chee Sang Cheol Hwang Jae‐Cheol Ryou 《ETRI Journal》2004,26(3):241-251

We propose two improved scalar multiplication methods on elliptic curves over F_qn where q = 2^m using Frobenius expansion. The scalar multiplication of elliptic curves defined over subfield F_q can be sped up by Frobenius expansion. Previous methods are restricted to the case of a small m. However, when m is small, it is hard to find curves having good cryptographic properties. Our methods are suitable for curves defined over medium‐sized fields, that is, 10 ≤ m ≤ 20. These methods are variants of the conventional multiple‐base binary (MBB) method combined with the window method. One of our methods is for a polynomial basis representation with software implementation, and the other is for a normal basis representation with hardware implementation. Our software experiment shows that it is about 10% faster than the MBB method, which also uses Frobenius expansion, and about 20% faster than the Montgomery method, which is the fastest general method in polynomial basis implementation. 相似文献

10.

Analysis and design of amplifiers and comparators in CMOS 0.35 μm technology

Fernando Paixão Cortes Eric Fabris 《Microelectronics Reliability》2004,44(4):657-664

Design techniques and CAD tools for digital systems are advancing rapidly at decreasing cost, while CMOS analog circuit design is related mostly with the individual experience and background of the designer. Therefore, the design of an analog circuit depends on several factors such as a reliable design methodology, good modeling and technology characterization. Most of this work focuses on the analysis of several analog circuits, including their functionality, using different design methodologies. Initially the determination of two key design parameters (slope factor n and early voltage VA) and the g_m/I_D characteristics were derived from simulations. Then, the analysis and design of three diferent analog circuits are presented. A comparison is made between two design methodology applied to an analog amplifier design. The first one is a conventional approach where transistors are in saturation. The second one is based on the g_m/I_D characteristic, that allows a unified synthesis methodology in all regions of operation of the transistor. The analog modules for comparison and continuous filtering, that find vast applications today, are then analyzed and designed with the parameters and methodology proposed. 相似文献

11.

Fusion function placement for Active Networks paradigm in wireless sensor networks

Zongqing Lu Su-Lim Tan Jit Biswas 《Wireless Networks》2013,19(7):1525-1536

Active Networks paradigm integrated with distributed data fusion has the potential to significantly reduce energy dissipation in wireless sensor networks, where energy conservation is the most challenging issue. This work aims to minimize energy cost when distributed data fusion is deployed for the Active Networks computing paradigm. First we propose an optimal solution for mapping task graph of distributed data fusion application into network. Optimal solution uses an exhaustive search algorithm for finding the placements with minimized power consumption. However, optimal solution has high computational complexity—O(mn ^k), where n denotes the number of network nodes, m is the number of fusion functions, and k is the maximum number of children a fusion function has in task graph and its children are also fusion functions. Then, an approximate solution with low complexity (O(mlog n + log² m)) is proposed called P2lace, which includes two phases, task graph partition and task graph placement. Finally, an extensive evaluation compares approximate solution with optimal solution. The results show that approximate solution is scalable with different task graph characteristics and network size and only causes slightly more transmission cost than optimal solution. And the algorithm without optimizing is shown to be applicable to the network, where the sink node does not have global information of entire network. 相似文献

12.

Design and Numerical Evaluation of Cascade-Type Thermoelectric Modules

Takeyuki Fujisaka Hongtao Sui Ryosuke O. Suzuki 《Journal of Electronic Materials》2013,42(7):1688-1696

Thermoelectric (TE) generation performance can be enhanced by stacking several TE modules (so-called cascade-type modules). This work presents a design method to optimize the cascade structure for maximum power output. A one-dimensional model was first analyzed to optimize the TE element dimensions by considering the heat balance including conductive heat transfer, Peltier heat, and Joule heat, assuming constant temperatures at all TE junctions. The number of p–n pairs was successively optimized to obtain maximum power. The power output increased by 1.24 times, from 12.7 W in a conventional model to 15.7 W in the optimized model. Secondly, a two-dimensional numerical calculation based on the finite-volume method was used to evaluate the temperature and electric potential distributions. Voltage–current characteristics were calculated, the maximum power output was evaluated, and the efficiencies of two possible models were compared to select the optimal design. The one-dimensional analytical approach is effective for a rough design, and multidimensional numerical calculation is effective for evaluating the dimensions and performance of cascade-type TE modules in detail. 相似文献

13.

All digital skew tolerant synchronous interfacing methods for high-performance point-to-point communications in deep sub-micron SoCs

Syed Rafay HasanAuthor Vitae Normand BélangerAuthor VitaeYvon SavariaAuthor Vitae M. Omair AhmadAuthor Vitae 《Integration, the VLSI Journal》2011,44(1):22-38

High-performance clocking of intellectual property (IP) modules, within a skew budget, is becoming difficult in deep sub-micron technologies. In this work, we propose a novel and all-digital synchronous design method for point-to-point communications, using two stages of interfacing registers and locally delayed clock with phase adjustments. This design is free from synchronizers and clock-data mismatch problems. Moreover, communicating modules run at frequencies which are virtually independent of the clock skew. We also provide a comprehensive case-wise mathematical analysis to facilitate design automation for synthesizing such designs as standard cells. An overall improvement in skew tolerance of up to n times (where n is the number of registers used), when compared to conventional designs, is achieved when the skew orientation is known and n/2 times if the skew orientation is unknown. Improvement in skew tolerance is validated using gate level simulations with the 0.18 μm TSMC CMOS technology. A prototype implementation of the proposed design using a Virtex-II Pro FPGA from Xilinx validates the claim that such designs allow a fast module to communicate with a slow module without constraining their frequencies. 相似文献

14.

An efficient reconfigurable multiplier architecture for Galois field GF(2)

P Kitsos G TheodoridisO Koufopavlou 《Microelectronics Journal》2003,34(10):975-980

This paper describes an efficient architecture of a reconfigurable bit-serial polynomial basis multiplier for Galois field GF(2^m), where 1<m≤M. The value m, of the irreducible polynomial degree, can be changed and so, can be configured and programmed. The value of M determines the maximum size that the multiplier can support. The advantages of the proposed architecture are (i) the high order of flexibility, which allows an easy configuration for different field sizes, and (ii) the low hardware complexity, which results in small area. By using the gated clock technique, significant reduction of the total multiplier power consumption is achieved. 相似文献

15.

Design of RNS Reverse Converters with Constant Shifting to Residue Datapath Channels

Piotr Patronik Stanisław J. Piestrak 《Journal of Signal Processing Systems》2018,90(3):323-339

This paper presents a new general approach to simplify residue-to-binary (reverse) converters for a Residue Number System (RNS) composed of an arbitrary set of moduli. It is suggested to formulate the basic equation of the reverse converter in a form consisting of two separate parts: one depending on input variables of the converter whereas the other is a single constant. Then, the constant, instead of being added inside the reverse converter, can be shifted out to the residue datapath channels, in most cases at no hardware cost or extra delay. Thus, the hardware cost of the converter is reduced, because its multi-operand adder has one operand less to handle. To illustrate various design issues of this new design approach and to prove its efficiency, a new design method of the residue-to-binary (reverse) converters for the 3-moduli set {2ⁿ?1,2ⁿ,2ⁿ+1} is considered. Two versions of the new converters for the 3-moduli set {2ⁿ?1,2ⁿ,2ⁿ+1} as well as several of their known counterparts were synthesized for all dynamic ranges from 8 to 38 bits (i.e., for 3 ≤ n ≤ 13). The results obtained suggest that, compared to the best of the state-of-the-art converters, at least one of two versions of our converters is superior with respect to area and power consumption, for all dynamic ranges considered, in some cases accompanied by slight delay reduction. The area is reduced from about 5 % to about 20 % and the largest savings are observed for the power consumption—from over 10 % up to 27 %. 相似文献

16.

Reliability calculation of redundant systems with non-identical units

E. Balagurusamy K.B. Misra 《Microelectronics Reliability》1976,15(2):135-138

The present paper develops mathematical models for evaluating the exact reliability and mean time to failure of k-out-of-m: G systems with different unit failure probabilities. The ith unit is assumed to be characterized by a general hazard rate h_i(t) = λ_it^b.The models are based on the concepts of tie sets; they are fairly simple and can be used for any values of m and k. The algorithm developed for these models is suitable for adoption for a computer code. 相似文献

17.

An efficient tree architecture for modulo 2 n +1 multiplication

Zhongde Wang G. A. Jullien W. C. Miller 《Journal of Signal Processing Systems》1996,14(3):241-248

Modulo 2ⁿ+1 multiplication plays an important role in the Fermat number transform and residue number systems; the diminished-1 representation of numbers has been found most suitable for representing the elements of the rings. Existing algorithms for modulo (2ⁿ+1) multiplication either use recursive modulo (2ⁿ+1) addition, or a regular binary multiplication integrated with the modulo reduction operation. Although most often adopted for largen, this latter approach requires conversions between the diminished-1 and binary representations. In this paper we propose a parallel fine-grained architecture, based on a Wallace tree, for modulo (2ⁿ+1) multiplication which does not require any conversions; the use of a Wallace tree considerably improves the speed of the multiplier. This new architecture exhibits an extremely modular structure with associated VLSI implementation advantages. The critical path delay and the hardware requirements of the new multiplier are similar to that of a correspondingn×n bit binary multiplier. 相似文献

18.

Swarm intelligence driven design space exploration of optimal k-cycle transient fault secured datapath during high level synthesis based on user power–delay budget

《Microelectronics Reliability》2015,55(6):990-1004

Fault security indicates ability to provide error detection or fetch correct output. Fault security assures possibility of using either hardware redundancy or time redundancy to optimize the overheads associated with fault security. However, generation (design space exploration (DSE)) of an optimal fault secured datapath structure based on user power–delay budget during high level synthesis (HLS) in the context k-cycle transient fault is considered an intractable problem. This is due to the fact that for every type of candidate design solution produced during exploration, a feasible k-cycle fault secured datapath may not exist which satisfies the conflicting user constraints/budget. Secondly, insertion of inapt cut (resulting in an additional checkpoint) to optimize delay overhead associated with fault security in most cases may not result in optimal solutions in the context of user constraints/budgets. The solutions to the above problems have not been addressed in the literature so far. The paper therefore presents the following novelties: (a) an algorithm for fault secured DSE process (b) handling k-cycle transient faults during DSE (c) schemes for selecting appropriate edges for inserting cuts that selects available locations in the scheduled Control Data Flow Graph (CDFG) which minimizes delay overhead associated with fault security (d) swarm intelligence (particle swarm optimization) driven DSE process that adaptively/intelligently computes the candidate design solutions for generating an optimal fault secured datapath.Results of the proposed approach when tested on standard benchmarks yielded optimal results in most cases as evident from the data obtained for generational distance (GD), spacing (S), spreading (Δ) and weighted metric (W_m). Further, results of comparison with a recent approaches indicated significant reduction of final cost (better quality) for the proposed approach. 相似文献

19.

Testing the Local Interconnect Resources of SRAM-Based FPGA's

M. Renovell J.M. Portal J. Figueras Y. Zorian 《Journal of Electronic Testing》2000,16(5):513-520

This paper addresses the problem of testing the configurable modules used in the local interconnect of SRAM-based FPGAs. First, it is demonstrated that a n address bit Configurable Interface Multiplexer requires N = 2ⁿ test configurations considering a stuck-at as well as a functional fault model. Second, a logic cell with a set of k input Configurable Interface Modules with n address bits is analyzed and it is proven that the set of CIMs can be tested in parallel making the number of required test configurations equal to N = 2ⁿ. Third, it is shown that the complete circuit i.e. a m × m array of sets of k Configurable Interface Multiplexers with n address bits can be tested with only N = 2ⁿ test configurations using the XOR tree and shift register structures. 相似文献

20.

Evaluation of Temperature-Dependent Effective Material Properties and Performance of a Thermoelectric Module

Heng-Chieh Chien En-Ting Chu Huey-Lin Hsieh Jing-Yi Huang Sheng-Tsai Wu Ming-Ji Dai Chun-Kai Liu Da-Jeng Yao 《Journal of Electronic Materials》2013,42(7):2362-2370

We devised a novel method to evaluate the temperature-dependent effective properties of a thermoelectric module (TEM): Seebeck coefficient (S _m), internal electrical resistance (R _m), and thermal conductance (K _m). After calculation, the effective properties of the module are converted to the average material properties of a p–n thermoelectric pillar pair inside the module: Seebeck coefficient (S _TE), electrical resistivity (ρ _TE), and thermal conductivity (k _TE). For a commercial thermoelectric module (Altec 1091) chosen to verify the novel method, the measured S _TE has a maximum value at bath temperature of 110°C; ρ _TE shows a positive linear trend dependent on the bath temperature, and k _TE increases slightly with increasing bath temperature. The results show the method to have satisfactory measurement performance in terms of practicability and reliability; the data for tests near 23°C agree with published values. 相似文献