期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

SFF—The Single-Stream FPGA-Optimized Feedforward FFT Hardware Architecture

Carl Ingemarsson Oscar Gustafsson 《Journal of Signal Processing Systems》2018,90(11):1583-1592

In this paper, a fast Fourier transform (FFT) hardware architecture optimized for field-programmable gate-arrays (FPGAs) is proposed. We refer to this as the single-stream FPGA-optimized feedforward (SFF) architecture. By using a stage that trades adders for shift registers as compared with the single-path delay feedback (SDF) architecture the efficient implementation of short shift registers in Xilinx FPGAs can be exploited. Moreover, this stage can be combined with ordinary or optimized SDF stages such that adders are only traded for shift registers when beneficial. The resulting structures are well-suited for FPGA implementation, especially when efficient implementation of short shift registers is available. This holds for at least contemporary Xilinx FPGAs. The results show that the proposed architectures improve on the current state of the art. 相似文献

2.

Design of FIR digital filters using tapped cascaded FIR Subfilters

Shogo Nakamura Sanjit K. Mitra 《Circuits, Systems, and Signal Processing》1982,1(1):43-56

This paper considers the design of an FIR digital filter by interconnecting a number of identical FIR subfilters with the aid of a few additional multipliers and adders. The overall structure is in the form of a tapped cascaded FIR subfilters. A composite method to determine the tapping coefficients along with the coefficients of the subfilter to approximate overall frequency response characteristic is proposed. Several numerical examples illustrating the proposed method are included. 相似文献

3.

Use Of The Remez Algorithm For Designing FRM Based FIRr Filters

Tapio?Saram?ki Email author Yong Ching?Lim Email author 《Circuits, Systems, and Signal Processing》2003,22(2):77-97

A very efficient technique for drastically reducing the number of multipliers and adders in narrow transition-band linear-phase finite impulse response (FIR) filters is to use the one-stage or multistage frequency-response masking (FRM) approach as originally introduced by Lim. In the original synthesis techniques developed by Lim and Lian, the subfilters in the overall approach were designed using time-consuming linear programming. In order to perform the overall synthesis faster, this paper shows how these subfilters can be designed with the aid of the the Remez multiple exchange algorithm, the most powerful technique for designing arbitrary-magnitude linear-phase FIR filters in the minimax sense. In addition to speeding up the overall procedure, the use of the Remez algorithm enables one to generate a very fast MATLAB program for the overall synthesis so that after being given the filter specifications as well as the number of stages, the program automatically provides the solution with the minimum number of multipliers and adders required in the overall implementation. This is possible because the MATLAB Remez routine is directly available and thus can be used for this purpose after appropriate modifications. 相似文献

4.

基于FPGA的FIR滤波器优化设计

吕威《电视技术》2014,38(5):71-73,112

介绍了一种在FPGA上实现的占用硬件资源少但是速度快的有限脉冲响应滤波器结构,新提出的结构不包含乘法器模块,而是采用加法器和移位寄存器替换乘法器模块。采用的方法为对乘法器系数近似为二次幂三项之和,在FPGA上实现的一个7阶有限脉冲响应滤波器表明该方法比传统含乘法器模块的滤波器占用面积减少75%。相似文献

5.

基于累加器的内建自测方法研究与分析

龚绿怡顾震宇曾晓洋章倩苓《微电子学与计算机》2003,20(11):77-80

在适于采用内建自测方法进行可测性设计的电路中，累加器往往是一种被普遍采用的基本单元，如通用处理器和数字信号处理电路中的算术及逻辑运算电路。文章以Booth乘法器为例，介绍了利用累加器电路进行内建自测输出响应分析的几种常见形式，同时给出了相应的故障覆盖率、硬件开销和时延等方面的比较结果。相似文献

6.

An Efficient Algorithm for the Optimization of FIR Filters Synthesized Using the Multistage Frequency-Response Masking Approach

Juha Yli-Kaakinen Tapio Saram?ki 《Circuits, Systems, and Signal Processing》2011,30(1):157-183

A very efficient technique to drastically reduce the number of multipliers and adders in narrow transition-band linear-phase finite-impulse response digital filters is to use the one-stage or multistage frequency-response masking (FRM) approach, which has been originally introduced by Lim and further improved by Lim and Lian. In these original synthesis techniques, the subfilters in the overall implementation are separately designed. As shown earlier by the authors of this contribution together with Johansson, the arithmetic complexity in one-stage FRM filter designs can be considerably reduced by using the following two-step technique for simultaneously optimizing all the subfilters. First, a suboptimal solution is found by using a simple design scheme. Second, this solution is used as a start-up solution for further optimization, which is carried out with the aid of an efficient nonlinear optimization algorithm. This paper exploits this approach to synthesizing multistage FRM filters. An example taken from the literature illustrates that both the number of multipliers and the number of adders for the resulting optimized multistage FRM filters are approximately 70 percent compared with those of the filters synthesized using the original multistage FRM filter design schemes. Additional examples are included in order to show the benefits provided by the proposed synthesis scheme over other recently published design techniques, in terms of an improved performance of the resulting solution, a higher accuracy of the solution, and a faster speed required to arrive at the best solution. 相似文献

7.

An Area Optimization Method for Digital Filter Design

Sang‐Hun Yoon Jong‐Wha Chong Chi‐Ho Lin 《ETRI Journal》2004,26(6):545-554

In this paper, we propose an efficient design method for area optimization in a digital filter. The conventional methods to reduce the number of adders in a filter have the problem of a long critical path delay caused by the deep logic depth of the filter due to adder sharing. Furthermore, there is such a disadvantage that they use the transposed direct form (TDF) filter which needs more registers than those of the direct form (DF) filter. In this paper, we present a hybrid structure of a TDF and DF based on the flattened coefficients method so that it can reduce the number of flip‐flops and full‐adders without additional critical path delay. We also propose a resource sharing method and sharing‐pattern searching algorithm to reduce the number of adders without deepening the logic depth. Simulation results show that the proposed structure can save the number of adders and registers by 22 and 26%, respectively, compared to the best one used in the past. 相似文献

8.

Multiplication-Free Polynomial-Based FIR Filters with an Adjustable Fractional Delay

Juha Yli-Kaakinen Tapio Saramaki 《Circuits, Systems, and Signal Processing》2006,25(2):265-294

An efficient coefficient quantization scheme is described for minimizing the cost of implementing fixed parallel linear-phase finite impulse response (FIR) filters in the modified Farrow structure introduced by Vesma and Saramaki for generating FIR filters with an adjustable fractional delay. The implementation costs under consideration are the minimum number of adders and subtracters when implementing these parallel subfilters as a very large-scale integration (VLSI) circuit. Two implementation costs are under consideration to meet the given criteria. In the first case, all the coefficient values are implemented independently of each other as a few signed-powers-of-two terms, whereas in the second case, the common subexpressions within all the coefficient values included in the overall implementation are properly shared in order to reduce the overall implementation cost even further. The optimum finite-precision solution is found in four steps. First, the number of filters and their (common odd) order are determined such that the given criteria are sufficiently exceeded in order to allow some coefficient quantization errors. Second, those coefficient values of the subfilters having a negligible effect on the overall system performance are fixed to be zero valued. In addition, the experimentally observed attractive connections between the coefficient values of the subfilters, after setting some coefficient values equal to zero, are utilized to reduce both the implementation cost and the parameters to be optimized even more. Third, constrained nonlinear optimization is applied to determine for the remaining infinite-precision coefficients a parameter space that includes the feasible space where the given criteria are met. The fourth step involves finding in this space the desired finite-precision coefficient values for minimizing the given implementation costs to meet the stated overall criteria. Several examples are included illustrating the efficiency of the proposed synthesis scheme. 相似文献

9.

Complexity Reduction For FRM-based FIR Filters Using The Prefilter-Equalizer Technique

Yong?Lian Email author 《Circuits, Systems, and Signal Processing》2003,22(2):137-155

A new method to reduce the number of arithmetic operations in a sharp FIR filter synthesized by the frequency-response masking (FRM) technique is presented. The success of the proposed method is based on a modified FRM approach where the subfilters in the FRM approach are implemented by using recently introduced prefilter-equalizer based filters. It is shown, by means of examples, that the proposed method yields considerable savings in the numbers of multipliers and adders compared to the original single-stage FRM approach. 相似文献

10.

A programmable CORDIC chip for digital signal processingapplications

Timmermann D. Hahn H. Hosticka B.J. Schmidt G. 《Solid-State Circuits, IEEE Journal of》1991,26(9):1317-1321

A chip implementing the coordinate rotation digital computer (CORDIC) algorithm is described. It contains a 10-MHz 16-b fixed-point CORDIC arithmetic unit, 2-kb RAM, a controller, and input/output (I/O) registers. A modified data-path architecture allows cross-wire free data flow. The chip design involved development of optimized carry-select adders and a modified programmable-logic-array (PLA) cell layout, which allows speed increase in single-layer metal technology. The authors designed, fabricated, and tested a general-purpose fully parallel programmable CORDIC chip in CMOS technology and developed optimal iteration sequences 相似文献

11.

Hardware Architecture of Polyphase Filter Banks Performing Embedded Resampling for SoftwareDefined Radio FrontEnds

Mehmood Awan Yannick Le Moullec Peter Koch Fred Harris 《中兴通讯技术（英文版）》2012,10(1):54-62,70

In this paper, we describe resource-efficient hardware architectures for software-defined radio (SDR) front-ends. These architectures are made efficient by using a polyphase channelizer that performs arbitrary sample rate changes, frequency selection, and bandwidth control. We discuss area, time, and power optimization for field programmable gate array (FPGA) based architectures in an M -path polyphase filter bank with modified N -path polyphase filter. Such systems allow resampling by arbitrary ratios while simultaneously performing baseband aliasing from center frequencies at Nyquist zones that are not multiples of the output sample rate. A non-maximally decimated polyphase filter bank, where the number of data loads is not equal to the number of M subfilters, processes M subfilters in a time period that is either less than or greater than the M data-load’s time period. We present a load-process architecture (LPA) and a runtime architecture (RA) (based on serial polyphase structure) which have different scheduling. In LPA, N subfilters are loaded, and then M subfilters are processed at a clock rate that is a multiple of the input data rate. This is necessary to meet the output time constraint of the down-sampled data. In RA, M subfilters processes are efficiently scheduled within N data-load time while simultaneously loading N subfilters. This requires reduced clock rates compared with LPA, and potentially less power is consumed. A polyphase filter bank that uses different resampling factors for maximally decimated, under-decimated, over-decimated, and combined up- and down-sampled scenarios is used as a case study, and an analysis of area, time, and power for their FPGA architectures is given. For resource-optimized SDR front-ends, RA is superior for reducing operating clock rates and dynamic power consumption. RA is also superior for reducing area resources, except when indices are pre-stored in LUTs. 相似文献

12.

A VLSI architecture for lifting-based forward and inverse wavelettransform

Andra K. Chakrabarti C. Acharya T. 《Signal Processing, IEEE Transactions on》2002,50(4):966-977

We propose an architecture that performs the forward and inverse discrete wavelet transform (DWT) using a lifting-based scheme for the set of seven filters proposed in JPEG2000. The architecture consists of two row processors, two column processors, and two memory modules. Each processor contains two adders, one multiplier, and one shifter. The precision of the multipliers and adders has been determined using extensive simulation. Each memory module consists of four banks in order to support the high computational bandwidth. The architecture has been designed to generate an output every cycle for the JPEG2000 default filters. The schedules have been generated by hand and the corresponding timings listed. Finally, the architecture has been implemented in behavioral VHDL. The estimated area of the proposed architecture in 0.18-μ technology is 2.8 nun square, and the estimated frequency of operation is 200 MHz 相似文献

13.

Implementation of microprogrammed control in FPGAs

Bomar B.W. 《Industrial Electronics, IEEE Transactions on》2002,49(2):415-422

The microprogrammed approach to implementing control state machines has been widely used since the early 1960s and has the advantages of structured programming and fixed timing characteristics. This paper presents a microprogrammed control unit that has been tailored to implementation in field-programmable gate arrays (FPGAs). The microsequencer has a novel architecture which takes advantage of the enhancements existing in coarse-grained FPGAs to implement efficiently four basic functions: registers, multiplexers, adders, and counters. The sequencer supports both nested subroutines and nested loops, and can operate in both pipelined and nonpipelined modes. The pipelined mode of operation uses delayed branching in which one additional microinstruction always executes following any instruction that changes program flow. It is found that in a typical medium-sized (50 K gates) FPGA, the sequencer can be clocked at over 60 MHz nonpipelined and over 100 MHz pipelined while using less than 5% of the available FPGA logic resources. This leaves the bulk of the FPGA resources available for implementing other digital circuitry that is to be controlled by the microsequencer. While not attractive for a small number of states, the microprogrammed approach has some significant advantages for complex controllers with a large number of states 相似文献

14.

An efficient architecture for accumulator-based test generation of SIC pairs

I. Voyiatzis C. Efstathiou 《Microelectronics Journal》2010,41(8):487-493

Research conducted over the years has shown that the application of single input change (SIC) pairs of test patterns for sequential, i.e. stuck-open and delay fault testing is extremely efficient. In this paper, a novel architecture for the generation of SIC pairs is presented. The implementation of the proposed architecture is based on Ling adders that are commonly utilized in current data paths due to their high-operating speed. Since the timing characteristics of the adder are not modified, the presented architecture provides a practical solution for the built-in testing of circuits that contain such adders. 相似文献

15.

An Area-Efficient 4-Stream FIR Interpolation/Decimation for IEEE 802.11n WLAN

Zhen-dong Zhang Bin Wu Yong-xu Zhu Yu-mei Zhou 《Journal of Signal Processing Systems》2012,69(2):115-123

This paper presents an area-efficient 4- stream finite impulse response (FIR) interpolation/decimation for IEEE 802.11n wireless local area network (WLAN). Novelty of the presented design is threefold. First, a multi-path pipelined polyphase structure is proposed to deal with multiple data streams, thereby four simultaneous data streams can be supported in the design with minimal hardware complexity. Second, a hybrid common subexpression elimination (HCSE) method that using signed binary representation of coefficients is applied to the implementation of subfilters. The multiplications in each subfilter are efficiently implemented using a few hardwired shifts, adders, and subtracters. And last, the interpolating mode and decimating mode of the design are configurable. That help to improve system level hardware utilization efficiency since WLAN is a time division duplex system. Under 0.13 μm 1.2 V 1P6M CMOS technology, the cell area and power consumption of the presented interpolation/decimation are 0.22 mm² and 10.08 mW respectively. The error vector magnitude (EVM) performance of an 802.11n baseband prototype which adopts the presented design is measured −42.2 dB. 相似文献

16.

数字差动匹配滤波器技术在PN码捕获系统中的应用

李乐平余理富《现代电子技术》2003,26(17):73-75

重点介绍了数字差动匹配滤波器(DDMF)的结构和原理，提出了一种采用DDMF进行PN捕获的方案。研究结果表明：此方案能实现快速捕获，与传统的数字匹配滤波器(CDMF)相比能节约硬件资源。相似文献

17.

High performance square rooting circuit using hybrid radix-2 adders

Corsonello P. Perri S. 《Electronics letters》1999,35(3):185-186

A new high performance bit parallel architecture for computing square roots is proposed. The architecture implements a non-restoring algorithm and is structured as a pipelined cellular array. To improve the performance, hybrid radix-2 adders are used. However, the conventional two's complement representation for both the radicand and square root is maintained 相似文献

18.

Intraframe Image Coding by Cascaded Hadamard Transforms

Fukinuki T. Miyata M. 《Communications, IEEE Transactions on》1973,21(3):175-180

Various image coding schemes have been studied for digital transmission of videophone signals. The Hadamard transform, which is now studied for the transmission of pictures such as those from satellites, has been considered too complicated for public use, though the characteristics such as the ratio of bit-rate reduction are more desirable than those of differential pulse-code modulation (DPCM). We have found a very simple scheme of the transform, where digitized videophone signals are transformed to Hadamard components all digitally just byndigital adders and some shift registers for 2ⁿth-order transform. For example, three adders are necessary for eighth-order transform. It is extendable to two-dimensional transform with ease. We have made an experimental model running in real time. Experiments and theoretical calculation have shown that 3 bits/sample are required for good picture quality in the case of two-dimensional(4 times 2)th transform and 0.5 bits more for one-dimensional eighthorder transform. 相似文献

19.

High Performance Reconfigurable FIR Filter Architecture Using Optimized Multiplier

J. L. Mazher Iqbal S. Varadarajan 《Circuits, Systems, and Signal Processing》2013,32(2):663-682

In mobile communication systems and multimedia applications, need for efficient reconfigurable digital finite impulse response (FIR) filters has been increasing tremendously because of the advantage of less area, low cost, low power and high speed of operation. This article presents a near optimum low- complexity, reconfigurable digital FIR filter architecture based on computation sharing multipliers (CSHM), constant shift method (CSM) and modified binary-based common sub-expression elimination (BCSE) method for different word-length filter coefficients. The CSHM identifies common computation steps and reuses them for different multiplications. The proposed reconfigurable FIR filter architecture reduces the adders cost and operates at high speed for low-complexity reconfigurable filtering applications such as channelization, channel equalization, matched filtering, pulse shaping, video convolution functions, signal preconditioning, and various other communication applications. The proposed architecture has been implemented and tested on a Virtex 2 xc2vp2-6fg256 field-programmable gate array (FPGA) with a precision of 8-bits, 12-bits, and 16-bits filter coefficients. The proposed novel reconfigurable FIR filter architecture using dynamically reconfigurable multiplier block offers good area and speed improvement compared to existing reconfigurable FIR filter implementations. 相似文献

20.

Folded Architecture for Digital Gammatone Filter Used in Speech Processor of Cochlear Implant

Rajalakshmi Karuppuswamy Kandaswamy Arumugam Swathi Priya M. 《ETRI Journal》2013,35(4):697-705

Emerging trends in the area of digital very large scale integration (VLSI) signal processing can lead to a reduction in the cost of the cochlear implant. Digital signal processing algorithms are repetitively used in speech processors for filtering and encoding operations. The critical paths in these algorithms limit the performance of the speech processors. These algorithms must be transformed to accommodate processors designed to be high speed and have less area and low power. This can be realized by basing the design of the auditory filter banks for the processors on digital VLSI signal processing concepts. By applying a folding algorithm to the second‐order digital gammatone filter (GTF), the number of multipliers is reduced from five to one and the number of adders is reduced from three to one, without changing the characteristics of the filter. Folded second‐order filter sections are cascaded with three similar structures to realize the eighth‐order digital GTF whose response is a close match to the human cochlea response. The silicon area is reduced from twenty to four multipliers and from twelve to four adders by using the folding architecture. 相似文献