共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Che-Hong Chen Bin-Da Liu Jar-Ferr Yang 《IEEE transactions on circuits and systems. I, Regular papers》2004,51(10):2017-2030
In this paper, new recursive structures for computing radix-r two-dimensional (2-D) discrete cosine transform (DCT) and 2-D inverse DCT (IDCT) are proposed. The 2-D DCT/IDCT are first decomposed into cosine-cosine and sine-sine transforms. Based on indexes of transform bases, the regular pre-addition preprocess is established and the recursive structures for 2-D DCT/IDCT, which can be realized in a second-order infinite-impulse response (IIR) filter, are derived without involving any transposition procedure. For computation of 2-D DCT/IDCT, the recursive loops of the proposed structures are less than that of one-dimensional DCT/IDCT recursive structures, which require data transposition to achieve the so-called row-column approach. With advantages of fewer recursive loops and no transposition, the proposed recursive structures achieve more accurate results and less power consumption than the existed methods. The regular and modular properties are suitable for very large-scale integration (VLSI) implementation. By using similar procedures, the recursive structures for 2-D DST and 2-D IDST are also proposed. 相似文献
3.
《Microelectronics Journal》2014,45(11):1480-1488
—In this paper, we present a coordinate rotation digital computer (CORDIC) based fast algorithm for power-of-two point DCT, and develop its corresponding efficient VLSI implementation. The proposed algorithm has some distinguish advantages, such as regular Cooley-Tukey FFT-like data flow, identical post-scaling factor, and arithmetic-sequence rotation angles. By using the trigonometric formula, the number of the CORDIC types is reduced dramatically. This leads to an efficient method for overcoming the problem that lack synchronization among the various rotation angles CORDICs. By fully reusing the uniform processing cell (PE), for 8-point DCT, only four carry save adders (CSAs)-based PEs with two different types are required. Compared with other known architectures, the proposed 8-point DCT architecture has higher modularity, lower hardware complexity, higher throughput and better synchronization. 相似文献
4.
Chiper D.F. Swamy M.N.S. Ahmad M.O. Stouraitis T. 《IEEE transactions on circuits and systems. I, Regular papers》2005,52(6):1125-1137
In this paper, an efficient design approach for a unified very large-scale integration (VLSI) implementation of the discrete cosine transform/discrete sine transform/inverse discrete cosine transform/inverse discrete sine transform based on an appropriate formulation of the four transforms into cyclic convolution structures is presented. This formulation allows an efficient memory-based systolic array implementation of the unified architecture using dual-port ROMs and appropriate hardware sharing methods. The performance of the unified design is compared to that of some of the existing ones. It is found that the proposed design provides a superior performance in terms of the hardware complexity, speed, I/O costs, in addition to such features as regularity, modularity, pipelining capability, and local connectivity, which make the unified structure well suited for VLSI implementation. 相似文献
5.
DCT/IDCT processor for HDTV developed with dsp silicon compiler 总被引:1,自引:0,他引:1
Takashi Miyazaki Takao Nishitani Masato Edahiro Ikuko Ono Kaoru Mitsuhashi 《The Journal of VLSI Signal Processing》1993,5(2-3):151-158
This article presents a discrete cosine transform (DCT) processor for high definition television (HDTV) by using an extended version of DSP Silicon Compiler. The extension is mainly concerned with module generation functions. A matrix-vector product module composed of multiply-accumulators (MACs) is newly added to the silicon compiler. The compiler accomplishes placement of leaf-cells and routing between the cells, referring to a prototype layout for the MAC. The prototype, which consists of a Booth multiplier and a carry look ahead adder, is carefully designed to attain high operation speed. The processor developed by the silicon compiler carries out 8×8 DCT and its inverse transform (IDCT). In order to evaluate the newly extended functions in the compiler, the architecture employed for the processor is based on the matrix-vector product method. By using DSP Silicon Compiler and 0.8 µm triple metal CMOS technology, the DCT processor is easily implemented with error-free environment and achieves a 50MHz data rate, which meets Japanese HDTV base line signal processing. The chip is implemented on a 12.80×12.57mm
2 area. 相似文献
6.
In this paper, CORDIC (coordinate rotation digital computer)-based Cooley-Tukey fast Fourier transform (FFT)-like algorithms for power-of-two point discrete cosine transform/discrete sine transform/inverse discrete cosine transform/inverse discrete sine transform are proposed and their corresponding unified architectures are developed by fully reusing the unique two basic processing elements. The proposed algorithms have some distinguished advantages, such as FFT-like regular data flow, unique post-scaling factor, and arithmetic-sequence rotation angles. The developed unified architectures can compute four different transforms by simple routing the data flow according to the specific transform without feeding different transform coefficients or different transform kernels. The unfolding technique is used to overcome the problem of difficult to realize pipeline that occur in iterative CORDIC algorithms. Compared to existing unified architectures, the proposed architectures have a superior performance in terms of hardware complexity, control complexity, throughput, scalability, modularity, and pipelinability. 相似文献
7.
8.
Pai C.-Y. Lynch W.E. Al-Khalili A.J. 《Vision, Image and Signal Processing, IEE Proceedings -》2003,150(4):245-255
Traditional fast discrete cosine transform (DCT)/inverse DCT (IDCT) algorithms have focused on reducing arithmetic complexity and have fixed run-time complexities regardless of the input. Recently, data-dependent signal processing has been applied to the DCT/IDCT. These algorithms have variable run-time complexities. A two-dimensional 8/spl times/8 low-power DCT/IDCT design is implemented using VHDL by applying the data-dependent signal processing concept onto the traditional fixed-complexity fast DCT/IDCT algorithm. To reduce power, the design is based on Loeffler's fast algorithm, which uses a low number of multiplications. On top of that, zero bypassing, data segmentation, input truncation and hardwired canonical sign-digit (CSD) multipliers are used to reduce the run-time computation, hence reducing the switching activities and the power. When synthesised using CMC 0.18 /spl mu/m 1.6 V CMOSP technology, the proposed FDCT/IDCT design consumes 8.94/9.54 mW, respectively, with a clock frequency of 40 MHz and a processing rate of 320 Msample/s. This design features lower dynamic power consumption per sample, i.e. it is more power-efficient than other previously reported high-performance FDCT/IDCT designs. 相似文献
9.
Che-Hong Chen Bin-Da Liu Jar-Ferr Yang 《IEEE transactions on circuits and systems. I, Regular papers》2005,52(9):1819-1831
In this paper, efficient recursive structures for computing arbitrary length M-dimensional (M-D) discrete cosine transform (DCT) and its inverse DCT (IDCT) are proposed. The M-D DCT and IDCT are first converted into condensed one-dimensional (1-D) DCT and discrete sine transform (DST) with a regular preprocessing procedure. The recursive filters for condensed 1-D DCT/DST are then derived by using Chebyshev polynomials to compute M-D DCT/IDCT without data transposition. The proposed structures require fewer recursive loops than traditional 1-D recursive structures, which are realized in M passes and (M-1) data transposition by the so-called row-column approach. With advantages of fewer recursive loops and no transposition memory, the proposed structures attain more accurate results and less power consumption than traditional row-column structures. The proposed recursive M-D DCT/IDCT structures are suitable for very large-scale integration implementation due to regular and modular features. 相似文献
10.
An advanced, high-speed, and universal-coding-rate Viterbi decoder VLSI implementation is presented. Two novel circuit design schemes have been proposed: scarce state transition (SST) decoding and direct high-coding-rate convolutional code generation and variable-rate decoding. SST makes it possible to omit the final decision circuit and to reduce the required path memory length without degrading error probability performance. Moreover, the power consumption of the SST Viterbi decoder is significantly reduced when implemented as a CMOS device. These features overcome the speed limits of high-speed and high-coding-gain Viterbi decoder VLSIs in the rate one-half mode imposed by the thermal limitation. The other Viterbi decoding scheme makes it possible to realize a simple and variable coding-rate forward-error-correction circuit by changing only the branch metric calculation ROM tables. By employing these schemes, high-speed (25-Mb/s) and universal-coding-rate Viterbi decoder VLSIs have been developed 相似文献
11.
Ching-Hsien Chang Chin-Liang Wang Yu-Tai Chang 《Signal Processing, IEEE Transactions on》2000,48(11):3206-3216
In this paper, we propose two new VLSI architectures for computing the N-point discrete Fourier transform (DFT) and its inverse (IDFT) based on a radix-2 fast algorithm, where N is a power of two. The first part of this work presents a linear systolic array that requires log2 N complex multipliers and is able to provide a throughput of one transform sample per clock cycle. Compared with other related systolic designs based on direct computation or a radix-2 fast algorithm, the proposed one has the same throughput performance but involves less hardware complexity. This design is suitable for high-speed real-time applications, but it would not be easily realized in a single chip when N gets large. To balance the chip area and the processing speed, we further present a new reduced-complexity design for the DFT/IDFT computation. The alternative design is a memory-based architecture that consists of one complex multiplier, two complex adders, and some special memory units. The new design has the capability of computing one transform sample every log2 N+1 clock cycles on average. In comparison with the first design, the second design reaches a lower throughput with less hardware complexity. As N=512, the chip area required for the memory-based design is about 5742×5222 μm2, and the corresponding throughput can attain a rate as high as 4M transform samples per second under 0.6 μm CMOS technology. Such area-time performance makes this design very competitive for use in long-length DFT applications, such as asymmetric digital subscriber lines (ADSL) and orthogonal frequency-division multiplexing (OFDM) systems 相似文献
12.
2.5Gb/s Reed-Solomon译码器的VLSI优化实现 总被引:1,自引:0,他引:1
研究了基于改进的欧氏算法的高速Reed-Solomon(255,239)译码器的VLSI优化实现。采用管线方式减少关键方程获取模块中的有限域乘法器数量,并对乘法器结构进行优化。同时提出了基于全局优化的公共项提取算法,并用该算法对伴随式计算模块进行优化。结果表明,与直接实现方法相比,关键方程模块的面积节省了约30%,用于伴随式计算的各单元电路面积也普遍减少20%以上。该Reed-Solomon译码器已用Synopsys综合工具综合并用TSMC 0.25μm CMOS工艺实现,其端口处理速率可达2.5Gb/s。 相似文献
13.
Xu Zhanqi Yi Kechu Liu Zengji 《电子科学学刊(英文版)》2006,23(4):528-531
Derived from a proposed universal mathematical expression, this paper investigates a novel algorithm for parallel Cyclic Redundancy Check (CRC) computation, which is an iterative algorithm to update the check-bit sequence step by step and suits to various argument selections of CRC computation. The algorithm proposed is quite suitable for hardware implementation. The simulation implementation and performance analysis suggest that it could efficiently speed up the computation compared with the conventional ones. The algorithm is implemented in hardware at as high as 21Gbps, and its usefulness in high-speed CRC computations is implied, such as Asynchronous Transfer Mode (ATM) networks and 10G Ethernet. 相似文献
14.
基于脉动阵列的二维DCT算法及其VLSI设计 总被引:3,自引:0,他引:3
本文介绍了一种基于脉动阵列算法的二维离散余弦变换 (2 -DDCT)电路设计。该电路结构不需要复杂的转移存储器 ,而是采用平行输入平行输出的结构 ,完成一次N×N个DCT变换只需要N个周期 ,因此吞吐率是传统DCT的N倍。这种电路结构具有模块化、布线简单、芯片占用面积小等优点 ,十分适合VLSI的实现 相似文献
15.
本文设计了应用于光通信系统的RS(255,239)+BCH(2184,2040)级联码编解码电路。级联码系统中,RS码与BCH码速度的不匹配是影响性能的最大瓶颈,本文采用并行度为8的并行BCH编解码器来实现与RS码速度的匹配。推导了BCH编码器并行化方法,并利用子项共享的方法来减少子项的扇出,使每个子项的最大扇出数不超过10。利用并行伴随式计算和并行钱氏搜索来提高BCH译码器的吞吐量,同时充分利用截短码的特性使钱氏搜索时间减少了46%。级联码的编解码器已用TSMC 0.18-μmCMOS标准单元库方法实现,后仿真结果表明,在312.5MHz的时钟下,级联码能够正常工作,能实现2.5Gb/s的数据吞吐量。建立了基于Xilinx FPGA的测试验证平台,测试结果表明电路功能正确、工作正常。 相似文献
16.
17.
A new CMOS VLSI implementation of an asymmetric programmable sigmoid neural activation function, as well as of its derivative, is presented. It consists of two coupled PMOS and NMOS differential pairs with different programmable bias currents that set the upper and lower limits of the sigmoid. The circuit works in the weak inversion region, for low power consumption and exponential envelope, or in strong inversion to achieve higher speeds. The results obtained from the theoretical transfer function, and from the simulations of the circuit implemented in AMI's 0.35 /spl mu/m technology, show a very good match. 相似文献
18.
An algorithm for the computation of the transfer function matrix of generalized two-dimensional systems 总被引:1,自引:0,他引:1
A generalized Leverrier's algorithm is developed for the computation of the transfer function matrix of a singular two-dimensional discrete-time system. The algorithm is a recursion in terms of the original system matrices, and does not require the inversion of a polynomial matrix.This research was partially supported by NSF Grant ECS-8518164. 相似文献
19.
Sarmiento R. de Armas V. Lopez J.F. Montiel-Nelson J.A. Nunez A. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1998,6(1):18-30
In this paper, the architecture and the implementation of a complex fast Fourier transform (CFFT) processor using 0.6 μm gallium arsenide (GaAs) technology are presented. This processor computes a 1024-point FFT of 16 bit complex data in less than 8 μs, working at a frequency beyond 700 MHz, with a power consumption of 12.5 W. The architecture of the processor is based on the COordinate Rotation DIgital Computer (CORDIC) algorithm, which avoids the use of conventional multiplication-and-accumulation (MAC) units, but evaluates the trigonometric functions using only add and shift operations, Improvements to the basic CORDIC architecture are introduced in order to reduce the area and power of the processor. This together with the use of pipelining and carry save adders produces a very regular and fast processor, The CORDIC units were fabricated and tested in order to anticipate the final performance of the processor. This work also demonstrates the maturity of GaAs technology for implementing ultrahigh-performance signal processors 相似文献
20.
In the present work, the finite planar waveguide array problem is formulated as a matrix equation. It is shown that entries of the matrix are independent of the scan angles, and it is necessary to compute and invert the matrix only once in order to obtain information on the array Scanning operation at any angular position. It is also shown that the use of certain symmetry relations greatly reduces the computation time of the matrix entries. Finally, a numerical example is given to demonstrate the effectiveness of the matrix formulation in treating the finite array problem. 相似文献