首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
设计了一种低功耗的2D DCT/IDCT处理器。为了降低功耗,设计基于行列分解的结构,采用了Loeffler的DCT/IDCT快速算法,并使用了零输入旁路、门控时钟、截断处理等技术,在满足设计需求的基础上降低了系统的功耗。常系数乘法器是该处理器的一个重要部件,文中基于并行乘法器结构设计了一种新型的低功耗常系数乘法器,它采用了CSD编码、Wallace Tree乘法算法,结合采用了截断处理、变数校正的优化技术,使得2D DCT/IDCT处理器整体性能有较大提高。设计的时钟频率为100 MHz,可以满足MPEG2 MP@HL实时解码的应用。采用SMIC0.18μm工艺进行综合,该2D DCT/IDCT处理器的面积为341 212μm2,功耗为14.971 mW。通过与其他结构的2DDCT/IDCT处理器设计分析与比较,在满足MPEG2 MP@HL实时解码应用的同时,实现了较低的功耗。  相似文献   

2.
本文在W.Li(1991)循环斜卷积算法和分布式算法的基础上,通过软件模拟和具体硬件设计,利用FPGA完成了可用于高清晰度电视核心解码器及其它信号与信息处理系统的88二维DCT/IDCT处理器的全部电路设计工作。它采用一根信号线控制计算DCT/IDCT,其输入、输出为12位,内部数据线及内部参数均为16位。  相似文献   

3.
A direct method for the computation of 2-D DCT/IDCT on a linear-array architecture is presented. The 2-D DCT/IDCT is first converted into its corresponding I-D DCT/IDCT problem through proper input/output index reordering. Then, a new coefficient matrix factorisation is derived, leading to a cascade of several basic computation blocks. Unlike other previously proposed high-speed 2-D N /spl times/ N DCT/IDCT processors that usually require intermediate transpose memory and have computation complexity O(N/sup 3/), the proposed hardware-efficient architecture with distributed memory structure has computation complexity O(N/sup 2/ log/sub 2/ N) and requires only log/sub 2/ N multipliers. The new pipelinable and scalable 2-D DCT/IDCT processor uses storage elements local to the processing elements and thus does not require any address generation hardware or global memory-to-array routing.  相似文献   

4.
We describe the design and implementation of an asynchronous discrete cosine transform/inverse discrete cosine transform (DCT/IDCT) processor core compliant with the CCITT recommendation H.261. First, a micropipelined implementation with level-sensitive latches is shown. This is improved by replacing the level-sensitive latches with dual-edge triggered flip-flops to save power and using completion-detection adders in the critical stage of the pipeline to exploit the data-dependent processing delay. Gate-level simulation of extracted layouts indicates that the performance of asynchronous implementations is comparable with that of a synchronous implementation based on an identical architecture. This is because part of the penalty introduced by handshaking circuitry in an asynchronous pipeline can be recovered by exploiting data-dependent processing delays with completion-detection circuitry. In pipelines with significant arithmetic processing such as the DCT/IDCT processor, this is easily accomplished. Our results are encouraging because asynchronous designs do not employ global clocking. In the near future when clock generation, clock distribution, and the power consumed in the clock circuitry become limiting factors in the design of large synchronous application specific integrated circuits (ASICs), asynchronous implementation methodology could be pursued as a real alternative  相似文献   

5.
A new linear-array architecture for computation of both the discrete cosine transform (DCT) and the inverse DCT (IDCT) is derived from the heterogeneous dependence graphs representing the factorised coefficient matrices in the matrix formulation of the recursive algorithm. Using the Kronecker product representation of the order-recursive algorithm, it is observed that the kernel operations of the DCT and IDCT can be merged together by proper input/output data reordering. The processor containing only O(log2N) stages is fully pipelineable and easily scaleable to compute longer DCT/IDCTs with transform length N to the power of two. Owing to the systematic matrix formulation and the corresponding efficient architectural design, the new DCT/IDCT processor has the advantages of high-throughput rate and low hardware cost. Furthermore, the power consumption can be reduced significantly by turning off the operation of the arithmetic units whenever possible  相似文献   

6.
An efficient image downconversion algorithm in the compressed domain for mixed field/frame-mode macroblocks is proposed. A 16 /spl times/ 16 field/frame-mode macroblock is converted into an 8 /spl times/ 8 reduced block in the discrete cosine transform (DCT) domain using a modified inverse DCT (IDCT) kernel. Experimental results show that the proposed algorithm provides the downconverted image quality similar to an existing method with significantly lower computational cost.  相似文献   

7.
一种动态精度匹配的面积优化2-D DCT/IDCT的实现   总被引:1,自引:0,他引:1  
提出了一种JPEG标准推荐的2—DDCT/IDCT的改进型Loeffler算法的ASIC实现。该设计采用硬件复用的方法,在正向和反向变换过程中使用同一运算电路,达到了面积优化的目的;并对输入数据进行系数预判,在特定输入情况下,有效提高了处理速度和降低功耗;还根据JPEG体系结构,在DCT变换和量化器之间建立劝态的精度匹配,保证了不同压缩比下的图像质量和功耗效率。该电路应用于140万像素数码相机的JPEG图像处理ASIC芯片中,已成功通过了FPGA验证和流片测试。  相似文献   

8.
A fast algorithm for computing multidimensional DCT on certain small sizes   总被引:2,自引:0,他引:2  
This paper presents a new algorithm for the fast computation of multidimensional (m-D) discrete cosine transform (DCT) with size N/sub 1//spl times/N/sub 2//spl times//spl middot//spl middot//spl middot//spl times/N/sub m/, where N/sub i/ is a power of 2 and N/sub i//spl les/256, by using the tensor product decomposition of the transform matrix. It is shown that the m-D DCT or inverse discrete cosine transform (IDCT) on these small sizes can be computed using only one-dimensional (1-D) DCTs and additions and shifts. If all the dimensional sizes are the same, the total number of multiplications required for the algorithm is only 1/m times of that required for the conventional row-column method. We also introduce approaches for computing scaled DCTs in which the number of multiplications is considerably reduced.  相似文献   

9.
This paper describes a block processing unit in a single-chip MPEG-2 MP@ML video encoder LSI. The block processing unit executes algorithms such as a discrete cosine transform (DCT), a quantization, an inverse quantization, and an inverse discrete cosine transform (IDCT). A double-block pipeline scheme has been introduced to execute DCT and IDCT operations on the shared circuits. Using a time-multiplexed DCT/IDCT architecture, we achieve processing performance of 2.0 clk/pel. This architecture has 21% fewer transistors and 30% less power dissipation than a conventional one. The number of transistors of the block processing unit is 240 kTr which measures 7.7% of the total of the chip. By controlling the clock signal supply, power dissipation can be reduced to 43% which is about 400 mW at 3.3 V using a 0.35 m triple-layer metal CMOS cell-base technology at 54 MHz.  相似文献   

10.
欧阳万里  肖创柏  刘广 《电子学报》2005,33(11):2074-2079
本文使用矩阵形式在超长指令字(VLIW)的观点下将几种经典算法与已有的适合于VLIW的算法进行了比较.然后利用VLIW结构的特性,提出了一种快速IDCT算法.与现有算法相比,新算法进一步减少了所需的指令周期.并利用VLIW结构的寄存器特性,将视频编解码过程中的运动补偿(预测)和IDCT(DCT)组合,使运动补偿所需时间降低为原来的约50%,这种思想能应用于MPEG1/2/4,H.263和H.264.  相似文献   

11.
In this paper, new three-dimensional (3-D) radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) and radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) decimation-in-frequency (DIF) fast Fourier transform (FFT) algorithms are developed and their implementation schemes discussed. The algorithms are developed by introducing the radix-2/4 and radix-2/8 approaches in the computation of the 3-D DFT using the Kronecker product and appropriate index mappings. The butterflies of the proposed algorithms are characterized by simple closed-form expressions facilitating easy software or hardware implementations of the algorithms. Comparisons between the proposed algorithms and the existing 3-D radix-(2/spl times/2/spl times/2) FFT algorithm are carried out showing that significant savings in terms of the number of arithmetic operations, data transfers, and twiddle factor evaluations or accesses to the lookup table can be achieved using the radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) DIF FFT algorithm over the radix-(2/spl times/2/spl times/2) FFT algorithm. It is also established that further savings can be achieved by using the radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) DIF FFT algorithm.  相似文献   

12.
In this paper, we present an efficient HW/SW codesign architecture for H.263 video encoder and its FPGA implementation. Each module of the encoder is investigated to find which approach between HW and SW is better to achieve real-time processing speed as well as flexibility. The hardware portions include the Discrete Cosine Transform (DCT), inverse DCT (IDCT), quantization (Q) and inverse quantization (IQ). Remaining parts were realized in software executed by the NIOS II softcore processor. This paper also introduces efficient design methods for HW and SW modules. In hardware, an efficient architecture for the 2-D DCT/IDCT is suggested to reduce the chip size. A NIOS II Custom instruction logic is used to implement Q/IQ. Software optimization technique is also explored by using the fast block-matching algorithm for motion estimation (ME). The whole design is described in VHDL language, verified in simulations and implemented in Stratix II EP2S60 FPGA. Finally, the encoder has been tested on the Altera NIOS II development board and can work up to 120 MHz. Implementation results show that when HW/SW codesign is used, a 15.8-16.5 times improvement in coding speed is obtained compared to the software based solution.  相似文献   

13.
Multiplication is frequently the speed-limiting function in digital signal processing systems. High-speed hardware multiplier ICs can therefore greatly enhance the throughput and bandwidth of many digital systems. In this paper, the design, fabrication, and performance of GaAs parallel multipliers are discussed. The largest of these circuits, an 8/spl times/8 bit multiplier, has 1008 gates, and is by far the most complex GaAs IC demonstrated today. This multiplier forms the 16 bit product of two 8 bit input numbers in 5.25 ns. This corresponds to an equivalent gate propagation delay of 150 ps/gate. The power dissipation ranges between 0.6-2 mW/gate.  相似文献   

14.
A new algorithm to compute the DCT and its inverse   总被引:2,自引:0,他引:2  
A novel algorithm to convert the discrete cosine transform (DCT) to skew-circular convolutions is presented. The motivation for developing such an algorithm is the fact that VLSI implementation of distributed arithmetic is very efficient for computing convolutions. It is also shown that the inverse DCT (IDCT) can be computed using the same building blocks which are used for computing the DCT. A DCT/IDCT processor can be designed to compute either the DCT or the IDCT depending on a 1-b control signal  相似文献   

15.
赵滨  黄大庆 《电子设计工程》2011,19(24):126-129
提出了一种新的二维DCT和IDCT的FPGA实现结构,采用行列快速算法将二维算法分解为两个一维算法实现,其中每个一维算法采用并行的流水线结构,每一个时钟处理8个数据,大大提高电路的数据吞吐率和运算速度。通过Modelsim仿真工具对该设计进行仿真,证明该算法的功能的正确性,进行一次8*8的分块二维DCT变换仅仅需要16个时钟,满足图像以及视频实时性的要求。  相似文献   

16.
An 8-bit high-speed A/D converter has been developed in a 1.5-/spl mu/m bulk CMOS double-polysilicon process technology. The design, process technology, and performance of the converter are described. In order to achieve high speed and low power, a fine-pattern process technology and a novel capacitor structure have been introduced and the transistor sizes of a chopper-type comparator have been optimized. High speed (30 MS/s) and low power consumption (60 mW) have been obtained. Computerized evaluations such as the histogram test and the fast Fourier transform test have been used to measure dynamic performance. The linearity error in dynamic operation is less than /spl plusmn/1 LSB. Signal-to-peak-noise ratio is 40 dB at a sampling rate of 14.32 MS/s and an input frequency of 1.42 MHz.  相似文献   

17.
一种基于DFT的DCT改进算法的研究   总被引:1,自引:1,他引:0  
焦计平  周又玲  吴素珍 《通信技术》2010,43(8):247-249,252
离散余弦变换(DCT)是一种广泛应用于信号处理、图像处理领域的重要工具,并已被多个国际标准所接受。将DCT应用到实际系统中的前提是具有能够快速实现的算法。给出了一种基于DFT的DCT/IDCT的实现,它避免了变换序列长度的限制。由于DFT可以由FFT实现,所以这种实现方式进而利用到FFT的优势。在满足输入序列长度满足一定条件的情况下,对所提出的算法做了进一步的优化,使得DCT的实现更加容易。  相似文献   

18.
伍汉华  李平 《电讯技术》2005,45(5):157-159
在MPEG视音频标准中,使用DCT(离散余旋变换)/IDCT(反离散余旋变换)来压缩数据。在数字视音频MP3解码电路中,IDCT是整个解码过程中运算量最大最耗时的一部分,因此IDCT的速度对整个MP3解码进程的速度起着极为关键的作用。在众多的解码过程实现方案中,用芯片实现是速度最快的一种方案。本方案是用RTL级的Verilog语言进行描述,用Synplify Pro综合成门级电路,然后用ModelSim仿真通过后,下载到X ilinx公司的V irtex的FPGA中。结果表明:电路工作正确可靠,速度上能满足MP3的实时播放要求。  相似文献   

19.
In this paper, the novel two-dimensional (2-D) fast algorithm for realization of 4 /spl times/ 4 forward integer transform in H.264 is proposed. Based on matrix operations with Kronecker product and direct sum, the efficient fast 2-D 4 /spl times/ 4 forward integer transform can be derived from the proposed one-dimensional fast 4 /spl times/ 4 forward integer transform through matrix decompositions. The proposed fast 2-D 4 /spl times/ 4 forward integer transform design doesn't need transpose memory for direct parallel pipelined architecture. The fast 2-D 4 /spl times/ 4 forward integer transform requires fewer latency delays than the state-of-the-art methods. With regular modularity, the proposed fast algorithm is suitable for VLSI implementation to achieve real-time H.264/advanced video coding (AVC) signal processing.  相似文献   

20.
A configurable architecture for performing image transform algorithms is presented that provides a better tradeoff between low complexity and algorithm flexibility than either software-programmable processors or dedicated ASIC's. The configurable processor unit requires only 110 K transistors and can execute several image transform algorithms. By emulating the signal flow of the algorithms in hardware, rather than software, complexity is reduced by an order of magnitude compared with current software programmable video signal processors, while providing more flexibility than single function ASIC's. The processor has been fabricated in 1.2-μm CMOS and has been successfully used to execute the discrete cosine transform/inverse discrete cosine transform (DCT/IDCT), subband coding, vector quantization, and two-dimensional filtering algorithms at pixel rates up to 25 MPixels/s  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号