首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, new three-dimensional (3-D) radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) and radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) decimation-in-frequency (DIF) fast Fourier transform (FFT) algorithms are developed and their implementation schemes discussed. The algorithms are developed by introducing the radix-2/4 and radix-2/8 approaches in the computation of the 3-D DFT using the Kronecker product and appropriate index mappings. The butterflies of the proposed algorithms are characterized by simple closed-form expressions facilitating easy software or hardware implementations of the algorithms. Comparisons between the proposed algorithms and the existing 3-D radix-(2/spl times/2/spl times/2) FFT algorithm are carried out showing that significant savings in terms of the number of arithmetic operations, data transfers, and twiddle factor evaluations or accesses to the lookup table can be achieved using the radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) DIF FFT algorithm over the radix-(2/spl times/2/spl times/2) FFT algorithm. It is also established that further savings can be achieved by using the radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) DIF FFT algorithm.  相似文献   

2.
In this article, we present the implementation of high throughput two-dimensional (2-D) 8?×?8 forward and inverse integer DCT transform for H.264. Using matrix decomposition and matrix operation, such as the Kronecker product and direct sum, the forward and inverse integer transform can be represented using simple addition operations. The dual clocked pipelined structure of the proposed implementation uses non-floating point adders and does not require any transpose memory. Hardware synthesis shows that the maximum operating frequency of the proposed pipelined architecture is 1.31?GHz, which achieves 21.05 Gpixels/s throughput rate with the hardware cost of 42932 gates. High throughput and low hardware makes the proposed design useful for real time H.264/AVC high definition processing.  相似文献   

3.
In this paper, a new radix-2/8 fast Fourier transform (FFT) algorithm is proposed for computing the discrete Fourier transform of an arbitrary length N=q/spl times/2/sup m/, where q is an odd integer. It reduces substantially the operations such as data transfer, address generation, and twiddle factor evaluation or access to the lookup table, which contribute significantly to the execution time of FFT algorithms. It is shown that the arithmetic complexity (multiplications+additions) of the proposed algorithm is, in most cases, the same as that of the existing split-radix FFT algorithm. The basic idea behind the proposed algorithm is the use of a mixture of radix-2 and radix-8 index maps. The algorithm is expressed in a simple matrix form, thereby facilitating an easy implementation of the algorithm, and allowing for an extension to the multidimensional case. For the structural complexity, the important properties of the Cooley-Tukey approach such as the use of the butterfly scheme and in-place computation are preserved by the proposed algorithm.  相似文献   

4.
A fast algorithm for computing multidimensional DCT on certain small sizes   总被引:2,自引:0,他引:2  
This paper presents a new algorithm for the fast computation of multidimensional (m-D) discrete cosine transform (DCT) with size N/sub 1//spl times/N/sub 2//spl times//spl middot//spl middot//spl middot//spl times/N/sub m/, where N/sub i/ is a power of 2 and N/sub i//spl les/256, by using the tensor product decomposition of the transform matrix. It is shown that the m-D DCT or inverse discrete cosine transform (IDCT) on these small sizes can be computed using only one-dimensional (1-D) DCTs and additions and shifts. If all the dimensional sizes are the same, the total number of multiplications required for the algorithm is only 1/m times of that required for the conventional row-column method. We also introduce approaches for computing scaled DCTs in which the number of multiplications is considerably reduced.  相似文献   

5.
H.264是新一代的视频编码标准,具有优秀的压缩性能。其获得优越性能的代价是运算复杂度的大幅增加,因此在实际应用上存在困难。使用专门的硬件设备是解决这个问题的方法之一。H.264标准中的整数变换运算适合使用硬件实现。首先对H,264标准中的整数变换运算进行介绍,针对H.264中的变换运算提出一种基于矩阵分解的快速并行算法。分析了该算法的结构,表明是符合H.264标准的一种快速算法。并对变换算法的硬件寡现进行了分析,表明这种硬件算法结构适合在实时编解码中应用。  相似文献   

6.
一种用于H.264编解码的新型高效 可重构多变换VLSI结构   总被引:3,自引:3,他引:0  
H.264/AVC标准采用了4×4整数变换.本文针对4×4正反变换分别提出了两个新的二维直接信号流图.在此基础上,设计了一个支持多变换的可重构高性能二维结构.该结构无需转置寄存器.采用0.18微米CMOS工艺实现了该电路结构.结果表明,该结构同现有典型结构相比具有更高的效率.同采用三个独立的单一变换结构实现的ASIC相比,可重构结构以较少的效率下降(14.4%)获得了较大的芯片面积节省(61.1%).在100MHz的时钟频率下工作,该电路即可实时处理分辨率为4096×2048、每秒60帧的高质量视频序列.  相似文献   

7.
该文在分析了H.264整数DCT(Discrete Cosine Transform)变换原理的基础上,介绍了一种实现4×4前向整数变换的新算法。该算法较多地运用了矩阵运算,与传统的将一个二维DCT变换转变为两个一维DCT变换相比,省略了转置模块,降低了时钟延时,减少了资源占用,更利于达到基于H.264的视频信号处理的性能要求。根据新的算法编写了verilog程序并在QuartusⅡ8.0软件中进行了仿真并得出结果。  相似文献   

8.
陆晓凤  刘锋  佟冬  王克义 《电子学报》2011,39(5):1072-1076
本文针对H.264 Fidelity Range Extensions(FRExt,High Profile)解码过程中扩展的所有变换,采用二维矩阵分解和基于矩阵运算提取公共因子的操作,利用通用运算单元来设计高效的可重构VLSI结构.该结构不但节省面积(可重构变换结构只消耗了4807门电路),并且具有高性能(采用TSM...  相似文献   

9.
Two-dimensional discrete cosine transforms are used in the core transformations in all profiles of the H.264/Advanced video coding (AVC) standard. In this paper, implementing the resource sharing of high throughput 4 × 4 and 8 × 8 forward and inverse integer transforms for high definition H.264 is presented. It is shown that the 4 × 4 forward/inverse transform can be obtained from 8 × 8 forward/inverse transform using selective data input and data arrangement at intermediate stages. Fast 8 × 8 forward and inverse transform is implemented using matrix decomposition and matrix operation such as Kronecker product and direct sum. The proposed implementation does not require any transpose memory and has a dual clocked pipeline structure. Compared with existing designs, the gate count is reduced by 27.7% in the proposed design. The maximum operating frequency of the proposed system is approx. 1.3 GHz, while the throughput is 7 G and 18.7 G pixels/s for 4 × 4 and 8 × 8 forward integer transforms, respectively. The proposed design can be used for real time H.264/AVC high definition processing owing to its high throughput and low hardware cost.  相似文献   

10.
Mode dependent directional transform (MDDT) can improve the coding efficiency of H.264/AVC but it also brings high computation complexity. In this paper we present a new design for implementing fast MDDT transform through integer lifting steps. We first approximate the optimal MDDT by a proper transform matrix that can be implemented with butterfly-style operation. We further factorize the butterfly-style transform into a series of integer lifting steps to eliminate the need of multiplications. Experimental results show that the proposed fast MDDT can significantly reduce the computation complexity while introducing negligible loss in the coding efficiency. Due to the merit of integer lifting steps, the proposed fast MDDT is reversible and can be implemented on hardware very easily.  相似文献   

11.
In this paper, a VLSI architecture based on radix-2/sup 2/ integer fast Fourier transform (IntFFT) is proposed to demonstrate its efficiency. The IntFFT algorithm guarantees the perfect reconstruction property of transformed samples. For a 64-points radix-2/sup 2/ FFT architecture, the proposed architecture uses 2 sets of complex multipliers (six real multipliers) and has 6 pipeline stages. By exploiting the symmetric property of lossless transform, the memory usage is reduced by 27.4%. The whole design is synthesized and simulated with a 0.18-/spl mu/m TSMC 1P6M standard cell library and its reported equivalent gate count usage is 17,963 gates. The whole chip size is 975 /spl mu/m/spl times/977 /spl mu/m with a core size of 500 /spl mu/m/spl times/500 /spl mu/m. The core power consumption is 83.56 mW. A Simulink-based orthogonal frequency demodulation multiplexing platform is utilized to compare the conventional fixed-point FFT and proposed IntFFT from the viewpoint of system-level behavior in items of signal-to-quantization-noise ratio (SQNR) and bit error rate (BER). The quantization loss analysis of these two types of FFT is also derived and compared. Based on the simulation results, the proposed lossless IntFFT architecture can achieve comparative SQNR and BER performance with reduced memory usage.  相似文献   

12.
In this paper, the fast one-dimensional (1-D) algorithms and their hardware-sharing designs for the 1-D 2 $times$ 2, 4 $times$ 4, and 8 $times$ 8 inverse transforms of H.264/AVC and the 1-D 8 $times$ 8 inverse transform of AVS are proposed with the low hardware cost, especially for the multiple decoding applications in China. By sharing the hardware, the proposed 1-D hardware sharing architecture is realized by adding the offset computations, and it is implemented with the pipelined architecture. Thus, the hardware cost of the proposed sharing architecture is smaller than that of the individual and separate designs. With regular modularity, the proposed sharing architecture is suitable to achieve H.264/AVC and AVS signal processing by VLSI implementations.   相似文献   

13.
数字视频技术在通信和广播领域获得了日益广泛的应用,视频信息和多媒体信息在网络中的处理和传输成为当前我国信息化中的热点技术。运动图像专家组和视频编码专家组给出一种更好的标准,确定为MPEG-4标准的第十部分,即H.264/AVC。简述H.264的研究意义及DCT的原理。为了减少运算量,分析H.264中如何对宏块的整数变换,详述H.264的编码变换的方法,给出整数变换方法与传统的DCT的区别和联系,并给出H.264的整数变换方法的快速算法即蝶形算法,这与传统的DCT变换是不同的。  相似文献   

14.
A folding rearrangeable nonblocking 4/spl times/4 optical matrix switch was designed and fabricated on silicon-on-insulator wafer. To compress chip size, switch elements (SEs) were interconnected by total internal reflection (TIR) mirrors instead of conventional S-bends. For obtaining smooth interfaces, potassium hydroxide anisotropic chemical etching of silicon was utilized to make the matrix switch for the first time. The device has a compact size of 20/spl times/1.6 mm/sup 2/ and a fast response of 7.5 /spl mu/s. The power consumption of each 2/spl times/2 SE and the average excess loss per mirror were 145 mW and -1.1 dB, respectively. Low path dependence of /spl plusmn/0.7 dB in total excess loss was obtained because of the symmetry of propagation paths in this novel matrix switch.  相似文献   

15.
The latest international video-coding standard H.264/AVC significantly achieves better coding performance compared to prior video coding standards such as MPEG-2 and H.263, which have been widely used in today’s digital video applications. To provide the interoperability between different coding standards, this paper proposes an efficient architecture for MPEG-2/H.263/H.264/AVC to H.264/AVC intra frame transcoding, using the original information such as discrete cosine transform (DCT) coefficients and coded mode type. Low-frequency components of DCT coefficients and a novel rate distortion cost function are used to select a set of candidate modes for rate distortion optimization (RDO) decision. For H.263 and H.264/AVC, a mode refinement scheme is utilized to eliminate unlikely modes before RDO mode decision, based on coded mode information. The experimental results, conducted on JM12.2 with fast C8MB mode decision, reveal that average 58%, 59% and 60% of computation (re-encoding) time can be saved for MPEG-2, H.263, H.264/AVC to H.264/AVC intra frame transcodings respectively, while preserving good coding performance when compared with complex cascaded pixel domain transcoding (CCPDT); or average 88% (a speed up factor of 8) when compared with CCPDT without considering fast C8MB. The proposed algorithm for H.264/AVC homogeneous transcoding is also compared to the simple cascaded pixel domain transcoding (with original mode reuse). The results of this comparison indicate that the proposed algorithm significantly outperforms the mode reuse algorithm in coding performance, with only slightly higher computation.  相似文献   

16.
This brief presents an in-place computing design for the deblocking filter used in H.264/AVC video coding standard. The proposed in-placed computing flow reuses intermediate data as soon as data is available. Thus, the intermediate data storage is reduced to only the four 4 /spl times/ 4 blocks instead of whole 16 /spl times/ 16 macroblock. The resulting design can achieve 100 MHz with only 13.41K gate count and support real-time deblocking operation of 2K /spl times/ 1K@30 Hz video application when clocked at 73.73 MHz by using 0.25-/spl mu/m CMOS technology.  相似文献   

17.
Compared with other existing video coding standards, H.264/AVC can achieve a significant improvement in compression performances. A robust criterion named the rate distortion optimization (RDO) is employed to select the optimal coding modes and motion vectors for each macroblock (MB), which achieves a high compression ratio while leading to a great increase in the complexity and computational load unfortunately. In this paper, a fast mode decision algorithm for H.264/AVC intra prediction based on integer transform and adaptive threshold is proposed. Before the intra prediction, integer transform operations on the original image are executed to find the directions of local textures. According to this direction, only a small part of the possible intra prediction modes are tested for RDO calculation at the first step. If the minimum mean absolute error (MMAE) of the reconstructed block corresponding to the best mode is smaller than an adaptive threshold which depends on the quantization parameter (QP), the RDO calculation is terminated. Otherwise, more possible modes need to be tested. The adaptive threshold aims to balance the compression performance and the computational load. Simulation results with various video sequences show that the fast mode decision algorithm proposed in this paper can accelerate the encoding speed significantly only with negligible PSNR loss or bit rate increment. This work is supported in part by China National Natural Science Foundation (CNSF) under Project No.60572045, the Ministry of Education of China Ph.D. Program Foundation under Project No.20050698033, and by a Cooperation Project (2005.7– 2007.7) with Microsoft Research Asia.  相似文献   

18.
 离散Hartley变换(Discrete Hartley Transform,DHT)作为实值离散傅立叶变换的一种替代,在信号和图像处理领域已有广泛应用,针对现有三维DHT快速算法均仅能计算长度为2的整数次幂的DHT,本文提出一种适用于更多不同长度三维DHT的分裂基-2/4快速算法,较之将已有最优算法补零计算的方法,该算法有效的降低了计算复杂度.  相似文献   

19.
This paper proposes a method for progressive lossy-to-lossless compression of four-dimensional (4-D) medical images (sequences of volumetric images over time) by using a combination of three-dimensional (3-D) integer wavelet transform (IWT) and 3-D motion compensation. A 3-D extension of the set-partitioning in hierarchical trees (SPIHT) algorithm is employed for coding the wavelet coefficients. To effectively exploit the redundancy between consecutive 3-D images, the concepts of key and residual frames from video coding is used. A fast 3-D cube matching algorithm is employed to do motion estimation. The key and the residual volumes are then coded using 3-D IWT and the modified 3-D SPIHT. The experimental results presented in this paper show that our proposed compression scheme achieves better lossy and lossless compression performance on 4-D medical images when compared with JPEG-2000 and volumetric compression based on 3-D SPIHT.  相似文献   

20.
The authors present an efficient algorithm for the computation of the 4×4 discrete cosine transform (DCT). The algorithm is based on the decomposition of the 4×4 DCT into four 4-point 1-D DCTs. Thus, only 1-D transformations and some additions are required. It is shown that the proposed algorithm requires only 16 multiplications, which is half the number needed for the conventional row-column method. Since the 2m×2m DCT can be computed using the 4×4 DCT recursively for any m, the proposed algorithm leads to a fast algorithm for the computation of the 2-D DCT  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号