首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An efficient implementation of discrete cosine transform (DCT) computations are presented based on the so-called shifted discrete Fourier transform (SDFT), a generalization of the conventional DFT (DFT). Due to the simple form of the factorized matrices, the derived architecture can be easily constructed from the cascade of only two types of parameterized hardware modules: butterfly operators and rotators. The butterfly operator performs the conventional butterfly shuffling and addition/subtraction. The rotator that performs plane rotations of two-dimensional (2-D) vectors is designed using carry-save-adder (CSA)-based unfolded pipelined CORDIC architecture where the rotation angles can be approximated with different accuracies using a sequence of bipolar signs. The proposed one-dimensional and 2-D DCT implementations composed of the above two types of parameterized modules can be used as flexible and reusable Silicon Intellectual Property (SIP) for the DCT computation unit to be embedded in system-on-a-chip (SoC) design. The proposed implementations have many features and advantages, including SIP reusability, low complexity, high-throughput, regularity, scalability (easy extension of transform length), and flexibility (approximated DCT with various accuracies).  相似文献   

2.
多维DFT的多维多项式变换与离散W变换算法   总被引:1,自引:1,他引:0       下载免费PDF全文
钟广军  成礼智  陈火旺 《电子学报》2001,29(8):1053-1056
本文首先通过引进一种序列的重排技术将m(m2) 维离散Fourier变换 (m-D DFT)转化为一系列的一维广义离散Fourier变换(GDFT)的多重和.然后引入一维离散W变换(DWT)以及多维多项式变换(MD-PT)计算该多重和以减少冗余的算术运算,从而得到了高效的多维DFT算法,该算法与常用的行-列DFT算法相比,乘法仅约为行-列法的1/2m,而加法仅约为行-列法的(2m+1)/4m.对于2维DFT的计算,本文方法同单纯的多项式变换方法相比,乘法与加法分别减少50%与40%左右.另外,本文算法计算结构简单,易于编程实现,通过数值实验验证了本文算法的高效性.  相似文献   

3.
In this paper, we propose two new VLSI architectures for computing the N-point discrete Fourier transform (DFT) and its inverse (IDFT) based on a radix-2 fast algorithm, where N is a power of two. The first part of this work presents a linear systolic array that requires log2 N complex multipliers and is able to provide a throughput of one transform sample per clock cycle. Compared with other related systolic designs based on direct computation or a radix-2 fast algorithm, the proposed one has the same throughput performance but involves less hardware complexity. This design is suitable for high-speed real-time applications, but it would not be easily realized in a single chip when N gets large. To balance the chip area and the processing speed, we further present a new reduced-complexity design for the DFT/IDFT computation. The alternative design is a memory-based architecture that consists of one complex multiplier, two complex adders, and some special memory units. The new design has the capability of computing one transform sample every log2 N+1 clock cycles on average. In comparison with the first design, the second design reaches a lower throughput with less hardware complexity. As N=512, the chip area required for the memory-based design is about 5742×5222 μm2, and the corresponding throughput can attain a rate as high as 4M transform samples per second under 0.6 μm CMOS technology. Such area-time performance makes this design very competitive for use in long-length DFT applications, such as asymmetric digital subscriber lines (ADSL) and orthogonal frequency-division multiplexing (OFDM) systems  相似文献   

4.
A reduced-complexity algorithm is presented for computation of the discrete Fourier transform, where$N$-point transform is computed from eight number of nearly$(N/8)$-point circular-convolution-like operations. A systolic architecture is also derived for very large-scale integration circuit implementation of the proposed algorithm. The proposed architecture is fully pipelined and contains regular and simple locally connected processing elements. It is devoid of complex control structure and is scalable for higher transform lengths. It is observed that the proposed systolic structure involves either less or nearly the same hardware-complexity compared with the corresponding existing systolic structures. In addition, it offers eight times more throughput and significantly low latency compared with the others.  相似文献   

5.
A chip architecture designed to compute a 16-point discrete Fourier transform (DFT) using S. Winograd's algorithm (1978) every 457 ns is presented. The 99500-transistor 1.2-μm chip incorporates arithmetic, control, and input/output circuitry with testability and fault detection into a 144-pin package. A throughput of 2.3×1012 gate-Hz/cm2 and 79-million multiplications/s is attained with 70-MHz pipelined bit-serial logic. Combined with similar chips computing 15- and 17-point DFTs, 4080-point DFTs can be computed every 118 μs. Using the 16- and 17-point chips, 272×272-point complex data imagery can be transformed in 4.25 ms. A 24-bit block floating-point data representation combined with an adaptive scaling algorithm delivers a numerical precision of 106 dB (17.6 bits) after computing 4080-point DFTs  相似文献   

6.
DHT algorithm based on encoding algebraic integers   总被引:1,自引:0,他引:1  
Baghaie  R. Dimitrov  V. 《Electronics letters》1999,35(16):1303-1305
A novel algorithm for computing the discrete Hartley transform (DHT) is presented. The proposed algorithm is based on the algebraic integer encoding scheme. With the aid of this scheme, an error-free representation of the cas function becomes possible. Furthermore, for the implementation of the algorithm, a fully pipelined systolic architecture with O(N) throughput is proposed  相似文献   

7.
This paper presents an area-efficient algorithm for the pipelined processing of fast Fourier transform (FFT). The proposed algorithm is to decompose a discrete Fourier transform (DFT) into two balanced sub-DFTs in order to minimize the total number of twiddle factors to be stored into tables. The radix in the proposed decomposition is adaptively changed according to the remaining transform length to make the transform lengths of sub-DFTs resulting from the decomposition as close as possible. An 8192-point pipelined FFT processor designed for digital video broadcasting-terrestrial (DVB-T) systems saves 33% of general multipliers and 23% of the total size of twiddle factor tables compared to a conventional pipelined FFT processor based on the radix-22 algorithm. In addition to the decomposition, several implementation techniques are proposed to reduce area, such as a simple index generator of twiddle factor and add/subtract units combined with the two's complement operation  相似文献   

8.
A new high-performance systolic architecture for calculating the discrete Fourier transform (DFT) is described which is based on two levels of transform factorization. One level uses an index remapping that converts the direct transform into structured sets of arithmetically simple four-point transforms. Another level adds a row/column decomposition of the DFT. The architecture supports transform lengths that are not powers of two or based on products of coprime numbers. Compared to previous systolic implementations, the architecture is computationally more efficient and uses less hardware. It provides low latency as well as high throughput, and can do both one- and two-dimensional DFTs. An automated computer-aided design tool was used to find latency and throughput optimal designs that matched the target field programmable gate array structure and functionality.  相似文献   

9.
The fast Fourier transform (FFT) is a very important algorithm in digital signal processing. The locally pipelined (LPPL) architecture is an efficient structure for FFT processor designing in a real-time embedded system. Two basic building blocks, to the LPPL FFT processor, the butterfly in pipeline, and address generating, are discussed in this brief. Based on the "deep" feedback to butterfly-2, a novel approach for pipelined architecture, the radix-2 single-path deep delay feedback architecture is proposed. For length-N discrete Fourier transform computation, the dominant hardware requirements are minimal for complex multipliers log/sub 4/N-1 and adders 2log/sub 4/N. As an integral need of the LPPL FFT processor design, address generating and coefficient store-load structures are also presented.  相似文献   

10.
Recently, the power consumption of integrated circuits has been attracting increasing attention. Many techniques have been studied to improve the power efficiency of digital signal processing units such as fast Fourier transform (FFT) processors, which are popularly employed in both traditional research fields, such as satellite communications, and thriving consumer electronics, such as wireless communications. This paper presents solutions based on parallel architectures for high throughput and power efficient FFT cores. Different combinations of hybrid low‐power techniques are exploited to reduce power consumption, such as multiplierless units which replace the complex multipliers in FFTs, low‐power commutators based on an advanced interconnection, and parallel‐pipelined architectures. A number of FFT cores are implemented and evaluated for their power/area performance. The results show that up to 38% and 55% power savings can be achieved by the proposed pipelined FFTs and parallel‐pipelined FFTs respectively, compared to the conventional pipelined FFT processor architectures.  相似文献   

11.
A new fast algorithm for the computation of the modulated lapped transform (MLT) is proposed and its efficient implementation using pipelining techniques and complex programmable logic device (CPLD) is presented. The new algorithm computes a length-M MLT via the length-M/2 fast Fourier transform (FFT). Computational overhead due to data shuffling in pre-processing and post-processing is offset in hardware realisation. Hence the overall throughput of the MLT computation for real-time applications is significantly improved. The pipelined CPLD architecture and circuitry are described in detail. Computational complexity of the proposed algorithm is analysed, and throughput improvement is verified by experimental results  相似文献   

12.
In this article, we present the implementation of high throughput two-dimensional (2-D) 8?×?8 forward and inverse integer DCT transform for H.264. Using matrix decomposition and matrix operation, such as the Kronecker product and direct sum, the forward and inverse integer transform can be represented using simple addition operations. The dual clocked pipelined structure of the proposed implementation uses non-floating point adders and does not require any transpose memory. Hardware synthesis shows that the maximum operating frequency of the proposed pipelined architecture is 1.31?GHz, which achieves 21.05 Gpixels/s throughput rate with the hardware cost of 42932 gates. High throughput and low hardware makes the proposed design useful for real time H.264/AVC high definition processing.  相似文献   

13.
We present the design of parallel architectures for the computation of the Hough transform based on application-specific CORDIC processors. The design of the circular CORDIC in rotation mode is simplified by the a priori knowledge of the angles participating in the transform and a high throughput is obtained through a pipelined design combined with the use of redundant arithmetic (carry save adders in this paper). Saving area is essential to the design of a pipelined CORDIC and can be achieved through the reduction in the number of microrotations and/or the size of the coefficient ROM. To reduce the number of microrotations we incorporate radix 4, when it is possible, or mixed radix (radix 2 and radix 4) in the design of the processor, achieving a reduction by half and 25% microrotations, respectively, with respect to a totally radix 2 implementation. Furthermore, if we allocate two circular CORDIC rotators into one processors then the size of the shared coefficient ROM is only 50% of the ROM of a design based on two separated rotators. Finally, we have also incorporated additional microrotations in order to reduce the scale factor to one. The result is a pipelined architecture which can be easily integrated in VLSI technology due to its regularity and modularity.This work was supported by the Ministry of Education and Science (CICYT) of Spain under project TIC-92-0942.  相似文献   

14.
This paper presents a new fast Discrete Fourier Transform (DFT) algorithm. By rewriting the DFT, a new algorithm is obtained that uses 2n–2(3n–13)+4n–2 real multiplications and 2n–2(7n–29)+6n+2 real additions for a real data N=2n point DFT, comparable to the number of operations in the Split-Radix method, but with slightly fewer multiply and add operations in total. Because of the organization of multiplications as plane rotations in this DFT algorithm, it is possible to apply a pipelined CORDIC algorithm in a hardware implementation of a long-point DFT, e.g., at a 100 MHz input rate, a 1024-point transform can be realized with a 200 MHz clocking of a single CORDIC pipeline.  相似文献   

15.
Many pipelined adaptive signal processing systems are subject to a tradeoff between throughput and signal processing performance incurred by the pipelined adaptation feedback loops. In the conventional synchronous design regime, such throughput/performance tradeoff is typically fixed since the pipeline depth is usually determined in the design phase and remains unchanged in the run time. Nevertheless, in many real-life scenarios, the overall system performance can be potentially improved if we can run-time dynamically configure this tradeoff. With this motivation, we propose to apply self-timed pipeline, an alternative to synchronous pipeline, to implement the pipelined adaptive signal processing systems, in which the pipeline depth can be dynamically changed to realize run-time configurable throughput/performance tradeoffs. Based on a well-known high speed self-timed pipeline style, we developed architecture and circuit level design techniques to implement the self-timed pipelined adaptation feedback loop with configurable pipeline depth. We demonstrate the proposed design approach using a delayed least mean square (DLMS) adaptive equalizer for magnetic recording read channel. The data transfer rate in hard disk varies as the read head moves among tracks with different distance from the center of the disk platter. By adjusting the pipeline depth on-the-fly, the DLMS equalizer can dynamically track the best equalization performance allowed by the varying data transfer rates. Simulation result shows a significant performance improvement compared with its synchronous counterpart.  相似文献   

16.
In this letter, a frequency-domain despreading method for the cyclic-prefix code-division multiple-access (CP-CDMA) system is introduced. Using the orthogonality transformation property of the discrete Fourier transform (DFT), we can despread the received CP-CDMA signals in the frequency domain. Moreover, we propose an efficient architecture for the proposed frequency-domain despreader. Comparison with the conventional time-domain despreading approach shows that the proposed architecture can save a large amount of computation. The proposed scheme is therefore suitable for efficient implementation of CP-CDMA receivers that adopt frequency-domain equalization and despreading.  相似文献   

17.
18.
提出了一种基于提升算法的二维离散5/3小波变换(DWT)高效并行VLSI结构设计方法。该方法使得行和列滤波器同时进行滤波,采用流水线设计方法处理,在保证同样的精度下,大大减少了运算量,提高了变换速度,节约了硬件资源。该方法已通过了VerilogHDL行为级仿真验证,可作为单独的IP核应用在JPEG2000图像编、解码芯片中。该结构可推广到9/7小波提升结构。  相似文献   

19.
Integer fast Fourier transform   总被引:5,自引:0,他引:5  
A concept of integer fast Fourier transform (IntFFT) for approximating the discrete Fourier transform is introduced. Unlike the fixed-point fast Fourier transform (FxpFFT), the new transform has the properties that it is an integer-to-integer mapping, is power adaptable and is reversible. The lifting scheme is used to approximate complex multiplications appearing in the FFT lattice structures where the dynamic range of the lifting coefficients can be controlled by proper choices of lifting factorizations. Split-radix FFT is used to illustrate the approach for the case of 2N-point FFT, in which case, an upper bound of the minimal dynamic range of the internal nodes, which is required by the reversibility of the transform, is presented and confirmed by a simulation. The transform can be implemented by using only bit shifts and additions but no multiplication. A method for minimizing the number of additions required is presented. While preserving the reversibility, the IntFFT is shown experimentally to yield the same accuracy as the FxpFFT when their coefficients are quantized to a certain number of bits. Complexity of the IntFFT is shown to be much lower than that of the FxpFFT in terms of the numbers of additions and shifts. Finally, they are applied to noise reduction applications, where the IntFFT provides significantly improvement over the FxpFFT at low power and maintains similar results at high power  相似文献   

20.
An expandable two-dimensional systolic array consisting of N homogeneous processing elements in a rectangular sturcture to compute the one-dimensional DFT transform is proposed. DFT of size N = M2 can be computed in 2M steps of pipelined operations, achieving the optimal Area–Time complexity of AT2 = O(N2). The architecture is based on a new approach that exploits the symbiosis between the one-dimensional systolic arrays of Kung [6] and Chang [7]. After a two-dimensional formulation with Common Factor Algorithm, recursive time and frequency extractions are applied to the column and row transforms respectively. Twiddle factor multiplication is integrated gracefully into the row recursion. The rearrangement of the input data enables the recursive operations to be pipelined orthogonally in the dual-mode processing elements. The proposed array structure is modular and expandable. A DFT of size 2LN can be readily computed with 2L N-size arrays abutted together without reconfiguration. VHDL modules have been written and simulated successfully for the proposed architecture.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号