共查询到20条相似文献,搜索用时 31 毫秒
1.
Bouguezel S. Ahmad M.O. Swamy M.N.S. 《IEEE transactions on circuits and systems. I, Regular papers》2006,53(2):306-315
In this paper, new three-dimensional (3-D) radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) and radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) decimation-in-frequency (DIF) fast Fourier transform (FFT) algorithms are developed and their implementation schemes discussed. The algorithms are developed by introducing the radix-2/4 and radix-2/8 approaches in the computation of the 3-D DFT using the Kronecker product and appropriate index mappings. The butterflies of the proposed algorithms are characterized by simple closed-form expressions facilitating easy software or hardware implementations of the algorithms. Comparisons between the proposed algorithms and the existing 3-D radix-(2/spl times/2/spl times/2) FFT algorithm are carried out showing that significant savings in terms of the number of arithmetic operations, data transfers, and twiddle factor evaluations or accesses to the lookup table can be achieved using the radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) DIF FFT algorithm over the radix-(2/spl times/2/spl times/2) FFT algorithm. It is also established that further savings can be achieved by using the radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) DIF FFT algorithm. 相似文献
2.
In this article, we present the implementation of high throughput two-dimensional (2-D) 8?×?8 forward and inverse integer DCT transform for H.264. Using matrix decomposition and matrix operation, such as the Kronecker product and direct sum, the forward and inverse integer transform can be represented using simple addition operations. The dual clocked pipelined structure of the proposed implementation uses non-floating point adders and does not require any transpose memory. Hardware synthesis shows that the maximum operating frequency of the proposed pipelined architecture is 1.31?GHz, which achieves 21.05 Gpixels/s throughput rate with the hardware cost of 42932 gates. High throughput and low hardware makes the proposed design useful for real time H.264/AVC high definition processing. 相似文献
3.
Bouguezel S. Ahmad M.O. Swamy M.N.S. 《IEEE transactions on circuits and systems. I, Regular papers》2004,51(9):1723-1732
In this paper, a new radix-2/8 fast Fourier transform (FFT) algorithm is proposed for computing the discrete Fourier transform of an arbitrary length N=q/spl times/2/sup m/, where q is an odd integer. It reduces substantially the operations such as data transfer, address generation, and twiddle factor evaluation or access to the lookup table, which contribute significantly to the execution time of FFT algorithms. It is shown that the arithmetic complexity (multiplications+additions) of the proposed algorithm is, in most cases, the same as that of the existing split-radix FFT algorithm. The basic idea behind the proposed algorithm is the use of a mixture of radix-2 and radix-8 index maps. The algorithm is expressed in a simple matrix form, thereby facilitating an easy implementation of the algorithm, and allowing for an extension to the multidimensional case. For the structural complexity, the important properties of the Cooley-Tukey approach such as the use of the butterfly scheme and in-place computation are preserved by the proposed algorithm. 相似文献
4.
This paper presents a new algorithm for the fast computation of multidimensional (m-D) discrete cosine transform (DCT) with size N/sub 1//spl times/N/sub 2//spl times//spl middot//spl middot//spl middot//spl times/N/sub m/, where N/sub i/ is a power of 2 and N/sub i//spl les/256, by using the tensor product decomposition of the transform matrix. It is shown that the m-D DCT or inverse discrete cosine transform (IDCT) on these small sizes can be computed using only one-dimensional (1-D) DCTs and additions and shifts. If all the dimensional sizes are the same, the total number of multiplications required for the algorithm is only 1/m times of that required for the conventional row-column method. We also introduce approaches for computing scaled DCTs in which the number of multiplications is considerably reduced. 相似文献
5.
H.264是新一代的视频编码标准,具有优秀的压缩性能。其获得优越性能的代价是运算复杂度的大幅增加,因此在实际应用上存在困难。使用专门的硬件设备是解决这个问题的方法之一。H.264标准中的整数变换运算适合使用硬件实现。首先对H,264标准中的整数变换运算进行介绍,针对H.264中的变换运算提出一种基于矩阵分解的快速并行算法。分析了该算法的结构,表明是符合H.264标准的一种快速算法。并对变换算法的硬件寡现进行了分析,表明这种硬件算法结构适合在实时编解码中应用。 相似文献
6.
一种用于H.264编解码的新型高效 可重构多变换VLSI结构 总被引:3,自引:3,他引:0
H.264/AVC标准采用了4×4整数变换.本文针对4×4正反变换分别提出了两个新的二维直接信号流图.在此基础上,设计了一个支持多变换的可重构高性能二维结构.该结构无需转置寄存器.采用0.18微米CMOS工艺实现了该电路结构.结果表明,该结构同现有典型结构相比具有更高的效率.同采用三个独立的单一变换结构实现的ASIC相比,可重构结构以较少的效率下降(14.4%)获得了较大的芯片面积节省(61.1%).在100MHz的时钟频率下工作,该电路即可实时处理分辨率为4096×2048、每秒60帧的高质量视频序列. 相似文献
7.
8.
9.
Honey Durga Tiwari Meeturani Sharma Yong Beom Cho 《AEUE-International Journal of Electronics and Communications》2012,66(7):521-531
Two-dimensional discrete cosine transforms are used in the core transformations in all profiles of the H.264/Advanced video coding (AVC) standard. In this paper, implementing the resource sharing of high throughput 4 × 4 and 8 × 8 forward and inverse integer transforms for high definition H.264 is presented. It is shown that the 4 × 4 forward/inverse transform can be obtained from 8 × 8 forward/inverse transform using selective data input and data arrangement at intermediate stages. Fast 8 × 8 forward and inverse transform is implemented using matrix decomposition and matrix operation such as Kronecker product and direct sum. The proposed implementation does not require any transpose memory and has a dual clocked pipeline structure. Compared with existing designs, the gate count is reduced by 27.7% in the proposed design. The maximum operating frequency of the proposed system is approx. 1.3 GHz, while the throughput is 7 G and 18.7 G pixels/s for 4 × 4 and 8 × 8 forward integer transforms, respectively. The proposed design can be used for real time H.264/AVC high definition processing owing to its high throughput and low hardware cost. 相似文献
10.
Wenpeng Ding Ruiqin Xiong Yunhui Shi Dehui Kong Baocai Yin 《Journal of Visual Communication and Image Representation》2011,22(8):721-726
Mode dependent directional transform (MDDT) can improve the coding efficiency of H.264/AVC but it also brings high computation complexity. In this paper we present a new design for implementing fast MDDT transform through integer lifting steps. We first approximate the optimal MDDT by a proper transform matrix that can be implemented with butterfly-style operation. We further factorize the butterfly-style transform into a series of integer lifting steps to eliminate the need of multiplications. Experimental results show that the proposed fast MDDT can significantly reduce the computation complexity while introducing negligible loss in the coding efficiency. Due to the merit of integer lifting steps, the proposed fast MDDT is reversible and can be implemented on hardware very easily. 相似文献
11.
Wei-Hsin Chang Truong Nguyen 《IEEE transactions on circuits and systems. I, Regular papers》2006,53(6):1235-1243
In this paper, a VLSI architecture based on radix-2/sup 2/ integer fast Fourier transform (IntFFT) is proposed to demonstrate its efficiency. The IntFFT algorithm guarantees the perfect reconstruction property of transformed samples. For a 64-points radix-2/sup 2/ FFT architecture, the proposed architecture uses 2 sets of complex multipliers (six real multipliers) and has 6 pipeline stages. By exploiting the symmetric property of lossless transform, the memory usage is reduced by 27.4%. The whole design is synthesized and simulated with a 0.18-/spl mu/m TSMC 1P6M standard cell library and its reported equivalent gate count usage is 17,963 gates. The whole chip size is 975 /spl mu/m/spl times/977 /spl mu/m with a core size of 500 /spl mu/m/spl times/500 /spl mu/m. The core power consumption is 83.56 mW. A Simulink-based orthogonal frequency demodulation multiplexing platform is utilized to compare the conventional fixed-point FFT and proposed IntFFT from the viewpoint of system-level behavior in items of signal-to-quantization-noise ratio (SQNR) and bit error rate (BER). The quantization loss analysis of these two types of FFT is also derived and compared. Based on the simulation results, the proposed lossless IntFFT architecture can achieve comparative SQNR and BER performance with reduced memory usage. 相似文献
12.
《Circuits and Systems II: Express Briefs, IEEE Transactions on》2008,55(12):1249-1253
13.
数字视频技术在通信和广播领域获得了日益广泛的应用,视频信息和多媒体信息在网络中的处理和传输成为当前我国信息化中的热点技术。运动图像专家组和视频编码专家组给出一种更好的标准,确定为MPEG-4标准的第十部分,即H.264/AVC。简述H.264的研究意义及DCT的原理。为了减少运算量,分析H.264中如何对宏块的整数变换,详述H.264的编码变换的方法,给出整数变换方法与传统的DCT的区别和联系,并给出H.264的整数变换方法的快速算法即蝶形算法,这与传统的DCT变换是不同的。 相似文献
14.
Integrated folding 4/spl times/4 optical matrix switch with total internal reflection mirrors on SOI by anisotropic chemical etching 总被引:1,自引:0,他引:1
Jingwei Liu Jinzhong Yu Shaowu Chen Zhiyong Li 《Photonics Technology Letters, IEEE》2005,17(6):1187-1189
A folding rearrangeable nonblocking 4/spl times/4 optical matrix switch was designed and fabricated on silicon-on-insulator wafer. To compress chip size, switch elements (SEs) were interconnected by total internal reflection (TIR) mirrors instead of conventional S-bends. For obtaining smooth interfaces, potassium hydroxide anisotropic chemical etching of silicon was utilized to make the matrix switch for the first time. The device has a compact size of 20/spl times/1.6 mm/sup 2/ and a fast response of 7.5 /spl mu/s. The power consumption of each 2/spl times/2 SE and the average excess loss per mirror were 145 mW and -1.1 dB, respectively. Low path dependence of /spl plusmn/0.7 dB in total excess loss was obtained because of the symmetry of propagation paths in this novel matrix switch. 相似文献
15.
The latest international video-coding standard H.264/AVC significantly achieves better coding performance compared to prior
video coding standards such as MPEG-2 and H.263, which have been widely used in today’s digital video applications. To provide
the interoperability between different coding standards, this paper proposes an efficient architecture for MPEG-2/H.263/H.264/AVC
to H.264/AVC intra frame transcoding, using the original information such as discrete cosine transform (DCT) coefficients
and coded mode type. Low-frequency components of DCT coefficients and a novel rate distortion cost function are used to select
a set of candidate modes for rate distortion optimization (RDO) decision. For H.263 and H.264/AVC, a mode refinement scheme
is utilized to eliminate unlikely modes before RDO mode decision, based on coded mode information. The experimental results,
conducted on JM12.2 with fast C8MB mode decision, reveal that average 58%, 59% and 60% of computation (re-encoding) time can
be saved for MPEG-2, H.263, H.264/AVC to H.264/AVC intra frame transcodings respectively, while preserving good coding performance
when compared with complex cascaded pixel domain transcoding (CCPDT); or average 88% (a speed up factor of 8) when compared
with CCPDT without considering fast C8MB. The proposed algorithm for H.264/AVC homogeneous transcoding is also compared to
the simple cascaded pixel domain transcoding (with original mode reuse). The results of this comparison indicate that the
proposed algorithm significantly outperforms the mode reuse algorithm in coding performance, with only slightly higher computation. 相似文献
16.
Chao-Chung Cheng Tian-Sheuan Chang Kun-Bin Lee 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2006,53(7):530-534
This brief presents an in-place computing design for the deblocking filter used in H.264/AVC video coding standard. The proposed in-placed computing flow reuses intermediate data as soon as data is available. Thus, the intermediate data storage is reduced to only the four 4 /spl times/ 4 blocks instead of whole 16 /spl times/ 16 macroblock. The resulting design can achieve 100 MHz with only 13.41K gate count and support real-time deblocking operation of 2K /spl times/ 1K@30 Hz video application when clocked at 73.73 MHz by using 0.25-/spl mu/m CMOS technology. 相似文献
17.
Compared with other existing video coding standards, H.264/AVC can achieve a significant improvement in compression performances.
A robust criterion named the rate distortion optimization (RDO) is employed to select the optimal coding modes and motion
vectors for each macroblock (MB), which achieves a high compression ratio while leading to a great increase in the complexity
and computational load unfortunately. In this paper, a fast mode decision algorithm for H.264/AVC intra prediction based on
integer transform and adaptive threshold is proposed. Before the intra prediction, integer transform operations on the original
image are executed to find the directions of local textures. According to this direction, only a small part of the possible
intra prediction modes are tested for RDO calculation at the first step. If the minimum mean absolute error (MMAE) of the
reconstructed block corresponding to the best mode is smaller than an adaptive threshold which depends on the quantization
parameter (QP), the RDO calculation is terminated. Otherwise, more possible modes need to be tested. The adaptive threshold
aims to balance the compression performance and the computational load. Simulation results with various video sequences show
that the fast mode decision algorithm proposed in this paper can accelerate the encoding speed significantly only with negligible
PSNR loss or bit rate increment.
This work is supported in part by China National Natural Science Foundation (CNSF) under Project No.60572045, the Ministry
of Education of China Ph.D. Program Foundation under Project No.20050698033, and by a Cooperation Project (2005.7– 2007.7)
with Microsoft Research Asia. 相似文献
18.
19.
Ashraf A Kassim Pingkun Yan Wei Siong Lee Kuntal Sengupta 《IEEE transactions on information technology in biomedicine》2005,9(1):132-138
This paper proposes a method for progressive lossy-to-lossless compression of four-dimensional (4-D) medical images (sequences of volumetric images over time) by using a combination of three-dimensional (3-D) integer wavelet transform (IWT) and 3-D motion compensation. A 3-D extension of the set-partitioning in hierarchical trees (SPIHT) algorithm is employed for coding the wavelet coefficients. To effectively exploit the redundancy between consecutive 3-D images, the concepts of key and residual frames from video coding is used. A fast 3-D cube matching algorithm is employed to do motion estimation. The key and the residual volumes are then coded using 3-D IWT and the modified 3-D SPIHT. The experimental results presented in this paper show that our proposed compression scheme achieves better lossy and lossless compression performance on 4-D medical images when compared with JPEG-2000 and volumetric compression based on 3-D SPIHT. 相似文献
20.
The authors present an efficient algorithm for the computation of the 4×4 discrete cosine transform (DCT). The algorithm is based on the decomposition of the 4×4 DCT into four 4-point 1-D DCTs. Thus, only 1-D transformations and some additions are required. It is shown that the proposed algorithm requires only 16 multiplications, which is half the number needed for the conventional row-column method. Since the 2m×2m DCT can be computed using the 4×4 DCT recursively for any m , the proposed algorithm leads to a fast algorithm for the computation of the 2-D DCT 相似文献