首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Traditional fast discrete cosine transform (DCT)/inverse DCT (IDCT) algorithms have focused on reducing arithmetic complexity and have fixed run-time complexities regardless of the input. Recently, data-dependent signal processing has been applied to the DCT/IDCT. These algorithms have variable run-time complexities. A two-dimensional 8/spl times/8 low-power DCT/IDCT design is implemented using VHDL by applying the data-dependent signal processing concept onto the traditional fixed-complexity fast DCT/IDCT algorithm. To reduce power, the design is based on Loeffler's fast algorithm, which uses a low number of multiplications. On top of that, zero bypassing, data segmentation, input truncation and hardwired canonical sign-digit (CSD) multipliers are used to reduce the run-time computation, hence reducing the switching activities and the power. When synthesised using CMC 0.18 /spl mu/m 1.6 V CMOSP technology, the proposed FDCT/IDCT design consumes 8.94/9.54 mW, respectively, with a clock frequency of 40 MHz and a processing rate of 320 Msample/s. This design features lower dynamic power consumption per sample, i.e. it is more power-efficient than other previously reported high-performance FDCT/IDCT designs.  相似文献   

2.
This paper describes a block processing unit in a single-chip MPEG-2 MP@ML video encoder LSI. The block processing unit executes algorithms such as a discrete cosine transform (DCT), a quantization, an inverse quantization, and an inverse discrete cosine transform (IDCT). A double-block pipeline scheme has been introduced to execute DCT and IDCT operations on the shared circuits. Using a time-multiplexed DCT/IDCT architecture, we achieve processing performance of 2.0 clk/pel. This architecture has 21% fewer transistors and 30% less power dissipation than a conventional one. The number of transistors of the block processing unit is 240 kTr which measures 7.7% of the total of the chip. By controlling the clock signal supply, power dissipation can be reduced to 43% which is about 400 mW at 3.3 V using a 0.35 m triple-layer metal CMOS cell-base technology at 54 MHz.  相似文献   

3.
A new algorithm to compute the DCT and its inverse   总被引:2,自引:0,他引:2  
A novel algorithm to convert the discrete cosine transform (DCT) to skew-circular convolutions is presented. The motivation for developing such an algorithm is the fact that VLSI implementation of distributed arithmetic is very efficient for computing convolutions. It is also shown that the inverse DCT (IDCT) can be computed using the same building blocks which are used for computing the DCT. A DCT/IDCT processor can be designed to compute either the DCT or the IDCT depending on a 1-b control signal  相似文献   

4.
A new linear-array architecture for computation of both the discrete cosine transform (DCT) and the inverse DCT (IDCT) is derived from the heterogeneous dependence graphs representing the factorised coefficient matrices in the matrix formulation of the recursive algorithm. Using the Kronecker product representation of the order-recursive algorithm, it is observed that the kernel operations of the DCT and IDCT can be merged together by proper input/output data reordering. The processor containing only O(log2N) stages is fully pipelineable and easily scaleable to compute longer DCT/IDCTs with transform length N to the power of two. Owing to the systematic matrix formulation and the corresponding efficient architectural design, the new DCT/IDCT processor has the advantages of high-throughput rate and low hardware cost. Furthermore, the power consumption can be reduced significantly by turning off the operation of the arithmetic units whenever possible  相似文献   

5.
设计了一种低功耗的2D DCT/IDCT处理器。为了降低功耗,设计基于行列分解的结构,采用了Loeffler的DCT/IDCT快速算法,并使用了零输入旁路、门控时钟、截断处理等技术,在满足设计需求的基础上降低了系统的功耗。常系数乘法器是该处理器的一个重要部件,文中基于并行乘法器结构设计了一种新型的低功耗常系数乘法器,它采用了CSD编码、Wallace Tree乘法算法,结合采用了截断处理、变数校正的优化技术,使得2D DCT/IDCT处理器整体性能有较大提高。设计的时钟频率为100 MHz,可以满足MPEG2 MP@HL实时解码的应用。采用SMIC0.18μm工艺进行综合,该2D DCT/IDCT处理器的面积为341 212μm2,功耗为14.971 mW。通过与其他结构的2DDCT/IDCT处理器设计分析与比较,在满足MPEG2 MP@HL实时解码应用的同时,实现了较低的功耗。  相似文献   

6.
In this paper, efficient recursive structures for computing arbitrary length M-dimensional (M-D) discrete cosine transform (DCT) and its inverse DCT (IDCT) are proposed. The M-D DCT and IDCT are first converted into condensed one-dimensional (1-D) DCT and discrete sine transform (DST) with a regular preprocessing procedure. The recursive filters for condensed 1-D DCT/DST are then derived by using Chebyshev polynomials to compute M-D DCT/IDCT without data transposition. The proposed structures require fewer recursive loops than traditional 1-D recursive structures, which are realized in M passes and (M-1) data transposition by the so-called row-column approach. With advantages of fewer recursive loops and no transposition memory, the proposed structures attain more accurate results and less power consumption than traditional row-column structures. The proposed recursive M-D DCT/IDCT structures are suitable for very large-scale integration implementation due to regular and modular features.  相似文献   

7.
赵滨  黄大庆 《电子设计工程》2011,19(24):126-129
提出了一种新的二维DCT和IDCT的FPGA实现结构,采用行列快速算法将二维算法分解为两个一维算法实现,其中每个一维算法采用并行的流水线结构,每一个时钟处理8个数据,大大提高电路的数据吞吐率和运算速度。通过Modelsim仿真工具对该设计进行仿真,证明该算法的功能的正确性,进行一次8*8的分块二维DCT变换仅仅需要16个时钟,满足图像以及视频实时性的要求。  相似文献   

8.
This paper demonstrates the design of efficient asynchronous bundled-data pipelines for the matrix-vector multiplication core of discrete cosine transforms (DCTs). The architecture is optimized for both zero and small-valued data, typical in DCT applications, yielding both high average performance and low average power. The proposed bundled-data pipelines include novel data-dependent delay lines with integrated control circuitry to efficiently implement speculative completion sensing. The control circuits are based on a novel control-circuit template that simplifies the design of such nonlinear pipelines. Extensive post-layout back-end timing analysis was performed to gain confidence in the timing margins as well as to quantify performance and energy. Comparison with a synchronous counterpart suggests that our best asynchronous design yields 30% higher average throughput with negligible energy overhead.  相似文献   

9.
This paper reconsiders the discrete cosine transform (DCT) algorithm of Narashima and Peterson (1978) in order to reduce the computational cost of the evaluation of N-point inverse discrete cosine transform (IDCT) through an N-point FFT. A new relationship between the IDCT and the discrete Fourier transform (DFT) is established. It allows the evaluation of two simultaneous N-point IDCTs by computing a single FFT of the same dimension. This IDCT implementation technique reduces by half the number of operations  相似文献   

10.
In this paper, new recursive structures for computing radix-r two-dimensional (2-D) discrete cosine transform (DCT) and 2-D inverse DCT (IDCT) are proposed. The 2-D DCT/IDCT are first decomposed into cosine-cosine and sine-sine transforms. Based on indexes of transform bases, the regular pre-addition preprocess is established and the recursive structures for 2-D DCT/IDCT, which can be realized in a second-order infinite-impulse response (IIR) filter, are derived without involving any transposition procedure. For computation of 2-D DCT/IDCT, the recursive loops of the proposed structures are less than that of one-dimensional DCT/IDCT recursive structures, which require data transposition to achieve the so-called row-column approach. With advantages of fewer recursive loops and no transposition, the proposed recursive structures achieve more accurate results and less power consumption than the existed methods. The regular and modular properties are suitable for very large-scale integration (VLSI) implementation. By using similar procedures, the recursive structures for 2-D DST and 2-D IDST are also proposed.  相似文献   

11.
This paper proposes new concepts of the all phase biorthogonal transform (APBT) and the dual biorthogonal basis vectors. In the light of all phase digital filtering theory, three kinds of all phase biorthogonal transforms based on the Walsh transform (WT), the discrete cosine transform (DCT) and the inverse discrete cosine transform (IDCT) are proposed. The matrices of APBT based on WT, DCT and IDCT are deduced, which can be used in image compression instead of the conventional DCT. Compared with DCT-based JPEG (DCT-JPEG) image compression algorithm at the same bit rates, the PSNR and visual quality of the reconstructed images using these transforms are approximate to DCT, outgoing DCT-JPEG at low bit rates especially. But the advantage is that the quantization table is simplified and the transform coefficients can be quantized uniformly. Therefore, the computing time becomes shorter and the hardware implementation easier.  相似文献   

12.
A configurable architecture for performing image transform algorithms is presented that provides a better tradeoff between low complexity and algorithm flexibility than either software-programmable processors or dedicated ASIC's. The configurable processor unit requires only 110 K transistors and can execute several image transform algorithms. By emulating the signal flow of the algorithms in hardware, rather than software, complexity is reduced by an order of magnitude compared with current software programmable video signal processors, while providing more flexibility than single function ASIC's. The processor has been fabricated in 1.2-μm CMOS and has been successfully used to execute the discrete cosine transform/inverse discrete cosine transform (DCT/IDCT), subband coding, vector quantization, and two-dimensional filtering algorithms at pixel rates up to 25 MPixels/s  相似文献   

13.
This paper proposes a high performance and low cost inverse discrete cosine transform (IDCT) processor for high definition Television (HDTV) applications by using cyclic convolution and hardwired multipliers. By properly arranging the input sequence, we formulate the one-dimensional (1-D) IDCT into cyclic convolution that is regular and suitable for VLSI implementation. The hardwired multiplier that implements multiplication with IDCT coefficients are first scaled and optimized by using the common sub-expression techniques. Based on these techniques, the data-path in the proposed two-dimensional (2-D) IDCT design costs 7504 gates plus 1024 bits of memory with 100 M pixels/sec throughput according to the cost estimation based on the cell library of COMPASS 0.6 m SPDM CMOS technology. Also, we have verified that the precision analysis of the proposed 2-D 8 × 8 IDCT meets the demands of IEEE Std. 1180-1990. Due to the good performance in the computing speed as well as the hardware cost, the proposed design is compact and suitable for HDTV applications. This design methodology can be applied to forward DCT as well as other transforms like discrete sine transform (DST), discrete Fourier transform (DFT), and discrete Hartley transform (DHT).  相似文献   

14.
High efficiency video coding (HEVC) transform algorithm for residual coding uses 2-dimensional (2D) 4×4 transforms with higher precision than H.264's 4×4 transforms, resulting in increased hardware complexity. In this paper, we present a shared architecture that can compute the 4×4 forward discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) of HEVC using a new mapping scheme in the video processor array structure. The architecture is implemented with only adders and shifts to an area-efficient design. The proposed architecture is synthesized using ISE14.7 and implemented using the BEE4 platform with the Virtex-6 FF1759 LX550T field programmable gate array (FPGA). The result shows that the video processor array structure achieves a maximum operation frequency of 165.2 MHz. The architecture and its implementation are presented in this paper to demonstrate its programmable and high performance.  相似文献   

15.
This paper describes a new technique for integrating asynchronous modules within a high-speed synchronous pipeline. Our design eliminates potential metastability problems by using a clock generated by a stoppable ring oscillator, which is capable of driving the large clock load found in present day microprocessors. Using the ATACS design tool, we designed highly optimized transistor-level circuits to control the ring oscillator and generate the clock and handshake signals with minimal overhead. Our interface architecture requires no redesign of the synchronous circuitry. Incorporating asynchronous modules in a high-speed pipeline improves performance by exploiting data-dependent delay variations. Since the speed of the synchronous circuitry tracks the speed of the ring oscillator under different processes, temperatures, and voltages, the entire chip operates at the speed dictated by the current operating conditions, rather than being governed by the worst case conditions. These two factors together can lead to a significant improvement in average-case performance. The interface design is simulated using the 0.6-μm HP CMOS14B process in HSPICE  相似文献   

16.
A chip has been designed and tested to demonstrate the feasibility of an ultra-low-power, two-dimensional inverse discrete cosine transform (IDCT) computation unit in a standard 3.3-V process. A data-driven computation algorithm that exploits the relative occurrence of zero-valued DCT coefficients coupled with clock gating has been used to minimize switched capacitance. In addition, circuit and architectural techniques such as deep pipelining have been used to lower the voltage and reduce the energy dissipation per sample. A Verilog-based power tool has been developed and used for architectural exploration and power estimation. The chip has a measured power dissipation of 4.65 mW at 1.3 V and 14 MHz, which meets the sample rate requirements for MPEG-2 MP@ML. The power dissipation improves significantly at lower bit rates (coarser quantization), which makes this implementation ideal for emerging quality-on-demand protocols that trade off energy efficiency and video quality  相似文献   

17.
DCT/IDCT processor for HDTV developed with dsp silicon compiler   总被引:1,自引:0,他引:1  
This article presents a discrete cosine transform (DCT) processor for high definition television (HDTV) by using an extended version of DSP Silicon Compiler. The extension is mainly concerned with module generation functions. A matrix-vector product module composed of multiply-accumulators (MACs) is newly added to the silicon compiler. The compiler accomplishes placement of leaf-cells and routing between the cells, referring to a prototype layout for the MAC. The prototype, which consists of a Booth multiplier and a carry look ahead adder, is carefully designed to attain high operation speed. The processor developed by the silicon compiler carries out 8×8 DCT and its inverse transform (IDCT). In order to evaluate the newly extended functions in the compiler, the architecture employed for the processor is based on the matrix-vector product method. By using DSP Silicon Compiler and 0.8 µm triple metal CMOS technology, the DCT processor is easily implemented with error-free environment and achieves a 50MHz data rate, which meets Japanese HDTV base line signal processing. The chip is implemented on a 12.80×12.57mm 2 area.  相似文献   

18.
A fast algorithm for computing multidimensional DCT on certain small sizes   总被引:2,自引:0,他引:2  
This paper presents a new algorithm for the fast computation of multidimensional (m-D) discrete cosine transform (DCT) with size N/sub 1//spl times/N/sub 2//spl times//spl middot//spl middot//spl middot//spl times/N/sub m/, where N/sub i/ is a power of 2 and N/sub i//spl les/256, by using the tensor product decomposition of the transform matrix. It is shown that the m-D DCT or inverse discrete cosine transform (IDCT) on these small sizes can be computed using only one-dimensional (1-D) DCTs and additions and shifts. If all the dimensional sizes are the same, the total number of multiplications required for the algorithm is only 1/m times of that required for the conventional row-column method. We also introduce approaches for computing scaled DCTs in which the number of multiplications is considerably reduced.  相似文献   

19.
A real-time image processor which is capable of video compression using either the sequency-ordered Walsh-Hadamard transform (WHT)W, or the discrete cosine transform (DCT), is considered. The processing is done on an intraframe basis in (8 X 8) data blocks. The (WHT)W coefficients are computed directly, and then used to obtain the DCT coefficients. This is achieved via an (8 X 8) transformation matrix which is orthonormal, and has a block-diagonal structure. As such, it results in substantial savings in the number of multiplications and additions required to obtain the DCT, relative to its direct computation. Some aspects of a hardware implementation of the processor are also included.  相似文献   

20.
欧阳万里  肖创柏  刘广 《电子学报》2005,33(11):2074-2079
本文使用矩阵形式在超长指令字(VLIW)的观点下将几种经典算法与已有的适合于VLIW的算法进行了比较.然后利用VLIW结构的特性,提出了一种快速IDCT算法.与现有算法相比,新算法进一步减少了所需的指令周期.并利用VLIW结构的寄存器特性,将视频编解码过程中的运动补偿(预测)和IDCT(DCT)组合,使运动补偿所需时间降低为原来的约50%,这种思想能应用于MPEG1/2/4,H.263和H.264.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号