期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Multi-core Architecture Based Parallel Framework for H.264/AVC Deblocking Filters

Sung-Wen Wang Shu-Sian Yang Hong-Ming Chen Chia-Lin Yang Ja-Ling Wu 《Journal of Signal Processing Systems》2009,57(2):195-211

Deblocking filter is one of the most time consuming modules in the H.264/AVC decoder as indicated in many studies. Therefore, accelerating deblocking filter is critical for improving the overall decoding performance. This paper proposes a novel parallel algorithm for H.264/AVC deblocking filter to speed the H.264/AVC decoder up. We exploit pixel-level data parallelism among filtering steps, and observe that results of each filtering step only affect a limited region of pixels. We call this “the limited propagation effect”. Based on this observation, the proposed algorithm could partition a frame into multiple independent rectangles with arbitrary granularity. The proposed parallel deblocking filter algorithm requires very little synchronization overhead, and provides good scalability. Experimental results show that applying the proposed parallelization method to a SIMD optimized sequential deblocking filter achieves up to 95.31% and 224.07% speedup on a two-core and four-core processor, respectively. We have also observed a significant speedup for H.264/AVC decoding, 21% and 34% on a two-core and four-core processor, respectively.

Ja-Ling WuEmail:

Sung-Wen Wang received his Ph.D. degree in computer science from National Taiwan University, Taipei, Taiwan, in 2008. His general research interests are in the field of digital video coding, codec-processor architecture co-design and multimedia systems optimization, especially in video coding technology optimization. Shu-Sian Yang received the B.S. and M.S. degrees in computer science and information engineering from National Taiwan University, Taiwan, in 2005 and 2007, respectively. His current research interests include video compression, image processing, and multimedia application. He is currently working at PixArt Imaging Inc., HsinChu, Taiwan as a senior engineer. Hong-Ming Chen received the B.S. degree in computer science and information engineering from National Taiwan University, Taiwan, in 2007. He is currently pursuing the M.S. degree in the same department in National Taiwan University. His current research interests include video compression, image processing, digital content analysis, and multimedia application. Chia-Lin Yang received the B.S. degree from the National Taiwan Normal University, Taiwan, R.O.C., in 1989, the M.S. degree from the University of Texas at Austin in 1992, and the Ph.D. degree from the Department of Computer Science, Duke University, Durham, NC, in 2001. In 1993, she joined VLSI Technology Inc. (now Philips Semiconductors) as a Software Engineer. She is currently an Associate Professor in the Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, R.O.C. Her research interests include energy-efficient microarchitectures, memory hierarchy design, and multimedia workload characterization. Dr. Yang is the recipient of a 2000-2001 Intel Foundation Graduate Fellowship Award and 2005 IBM Faculty Award. Ja-Ling Wu (SM ’98, Fellow ’08) received his Ph.D. degree in electrical engineering from Tatung Institute of Technology, Taipei, Taiwan, in 1986. From 1986 to 1987, he was an Associate Professor of the Electrical Engineering Department, Tatung Institute of Technology. Since 1987, he transferred to the Department of Computer Science and Information Engineering(CSIE), National Taiwan University(NTU), Taipei, where he is presently a Professor. From 1996 to 1998, he was assigned to be the first Head of the CSIE Department, National Chi Nan University, Puli, Taiwan. During his sabbatical leave (from 1998 to 1999), Prof. Wu was invited to be the Chief Technology Officer of the Cyberlink Corp. In this one year term, he involved with the developments of some well-known audio-video softwares, such as the PowerDVD. Since Aug. 2004, Prof. Wu has been appointed to head the Graduate Institute of Networking and Multimedia, NTU. Prof. Wu has published more than 200 technique and conference papers. His research interests include digital signal processing, image and video compression, digital content analysis, multimedia systems, digital watermarking, and digital right management systems. Prof. Wu was the recipient of the Outstanding Young Medal of the Republic of China in 1987 and the Outstanding Research Award three times of the National Science Council, Republic of China, in 1998, 2000 and 2004, respectively. In 2001, his paper “Hidden Digital Watermark in Images” (co-authored with Prof. Chiou-Ting Hsu), published in IEEE Transactions on Image Processing, was selected to be one of the winners of the “Honoring Excellence in Taiwanese Research Award”, offered by ISI Thomson Scientific. Moreover, his paper “Tiling Slideshow” (co-authored with his students) won the Best Full Technical Paper Award in ACM Multimedia 2006. Professor Wu was selected to be one of the lifetime Distinguished Professors of NTU, November 2006. Prof. Wu has been elected to be IEEE Fellow, since 1 January 2008, for his contributions to image and video analysis, coding, digital watermarking, and rights management. 相似文献

2.

Fast motion estimation for H.264

Canhui Cai Huanqiang Zeng Sanjit K. Mitra 《Signal Processing: Image Communication》2009,24(8):630-636

Several specific features have been incorporated into Motion estimation (ME) in H.264 coding standard to improve its coding efficiency. However, they result in very high computational load. In this paper, a fast ME algorithm is proposed to reduce the computational complexity. First, a mode discriminant method is used to free the encoder from checking the small block size modes in homogeneous regions. Second, a condensed hierarchical block matching method and a spatial neighbor searching scheme are employed to find the best full-pixel motion vector. Finally, direction-based selection rule is utilized to reduce the searching range in sub-pixel ME process. Experimental results on commonly used QCIF and CIF format test sequences have shown that the proposed algorithm achieves a reduction of 88% ME process time on average, while incurring only 0.033 dB loss in PSNR and 0.50% increment on the total bit rate compared with that of exhaustive ME process, which is a default approach adopted in the JM reference software. 相似文献

3.

VLSI Architecture Design of Fractional Motion Estimation for H.264/AVC

Yi-Hau Chen Tung-Chien Chen Shao-Yi Chien Yu-Wen Huang Liang-Gee Chen 《Journal of Signal Processing Systems》2008,53(3):335-347

The H.264/AVC Fractional Motion Estimation (FME) with rate-distortion constrained mode decision can improve the rate-distortion efficiency by 2–6 dB in peak signal-to-noise ratio. However, it comes with considerable computation complexity. Acceleration by dedicated hardware is a must for real-time applications. The main difficulty for FME hardware implementation is parallel processing under the constraint of the sequential flow and data dependency. We analyze seven inter-correlative loops extracted from FME procedure and provide decomposing methodologies to obtain efficient projection in hardware implementation. Two techniques, 4×4 block decomposition and efficiently vertical scheduling, are proposed to reuse data among the variable block size and to improve the hardware utilization. Besides, advanced architectures are designed to efficiently integrate the 6-taps 2D finite impulse response, residue generation, and 4×4 Hadamard transform into a fully pipelined architecture. This design is finally implemented and integrated into an H.264/AVC single chip encoder that supports realtime encoding of 720×480 30fps video with four reference frames at 81 MHz operation frequency with 405 K logic gates (41.9% area of the encoder).

Liang-Gee ChenEmail:

相似文献

4.

A Novel Cost-Effective and Programmable VLSI Architecture of CAVLC Decoder for H.264/AVC

Yanmei Qu Yun He Shunliang Mei 《Journal of Signal Processing Systems》2008,50(1):41-51

This paper proposes a novel cost-effective and programmable architecture of CAVLC decoder for H.264/AVC, including decoders for Coeff_token, T1_sign, Level, Total_zeros and Run_before. To simplify the hardware architecture and provide programmability, we propose four new techniques: a new group-based VLD with efficient memory (NG–VLDEM) for Coeff_token decoder, a novel combined architecture (NCA) for level decoder, a new group-based VLD with memory access once (GMAO) for Total_zeros decoder and a new VLD architecture based on multiplexers instead of searching memory (MISM) for Run_before decoder. With the above four techniques, the proposed CAVLC decoder can decode every syntax element within one clock cycle. Synthesis result shows that the hardware cost is 3,310 gates with 0.18 μm CMOS technology at a clock constrain of 125 MHz. Therefore, the proposed design is satisfied for real-time applications, such as H.264/AVC HD1080i video decoding.

Shunliang MeiEmail:

相似文献

5.

Design of High‐Speed CAVLC Decoder Architecture for H.264/AVC

Myungseok Oh Wonjae Lee Yunho Jung Jaeseok Kim 《ETRI Journal》2008,30(1):167-169

In this paper, we propose hardware architecture for a high‐speed context‐adaptive variable length coding (CAVLC) decoder in H.264. In the CAVLC decoder, the codeword length of the current decoding block is used to determine the next input bitstreams (valid bits). Since the computation of valid bits increases the total processing time of CAVLC, we propose two techniques to reduce processing time: one is to reduce the number of decoding steps by introducing a lookup table, and the other is to reduce cycles for calculating the valid bits. The proposed CAVLC decoder can decode 1920×1088 30 fps video in real time at a 30.8 MHz clock. 相似文献

6.

A Hardware Architecture of CABAC Encoding and Decoding with Dynamic Pipeline for H.264/AVC

Lingfeng Li Yang Song Shen Li Takeshi Ikenaga Satoshi Goto 《Journal of Signal Processing Systems》2008,50(1):81-95

This paper presents a compact hardware architecture of Context-Based Adaptive Binary Arithmetic Coding (CABAC) codec for H.264/AVC. The similarities between encoding algorithm and decoding algorithm are explored to achieve remarkable hardware reuse. System-level hardware/software partition is conducted to improve overall performance. Meanwhile, the characteristics of CABAC algorithm are utilized to implement dynamic pipeline scheme, which increases the processing throughput with very small hardware overhead. Proposed architecture is implemented under 0.18 μm technology. Results show that the core area of proposed design is 0.496 mm² when the maximum clock frequency is 230 MHz. It is estimated that the proposed architecture can support CABAC encoding or decoding for HD1080i resolution at a speed of 30 frame/s.

Lingfeng LiEmail:

相似文献

7.

H.264/AVC视频编码器在DM642平台上的实现与优化 总被引：3，自引：0，他引：3

张彤宇苏睿刘宝兰《微电子学与计算机》2005,22(12):165-168

文章介绍了H．264视频压缩标准的原理和DM642数字信号处理器的结构，并在该平台上实现了H．264视频编码器。对H．264标准中的几个主要模块进行了理论分析，并结合该数字信号处理器的特点对程序进行了优化．有效降低了整个编码器的运行时间。实验结果表明文章实现的视频编码器在性能和效率方面都达到了良好的效果。相似文献

8.

一种H.264/AVC帧内4x4预测算法的高效流水线结构

任怀鲁《电视技术》2012,36(7)

H.264编码器中的帧内4x4预测部分具有严重的数据依赖性,它的硬件化设计很难采用流水线实现,从而导致关键路径很长,硬件利用率很低,成为H.264编码器设计中的一个瓶颈。针对这个问题, 在不减少预测模式和不增加系统资源的的前提下,本文提出了一种新的结构,它通过利用原始像素进行模式判决和利用重构像素进行帧内预测的方法,可以使帧内预测与重构循环完全流水线实现,基本上达到了100%的硬件利用率,而且没有明显的PSNR的损失。本文所提出的硬件结构可在215个时钟周期内完成一个宏块的帧内4x4预测。用SMIC 0.13um工艺库综合,结果显示该结构最高可运行在250M,面积约为116K门,可支持4096x2160@30fps视频序列的实时编码。相似文献

9.

基于H.264/AVC 1/4像素精度的运动估计器设计

王学渊龙惠民《电视技术》2006,(3):17-21

提出一种H.264/AVC的1/4像素精度、可变块大小运动估计器的设计方法,它包括全像素和1/4像素运动估计两大部分,前者是由256个处理单元所构成的二维脉冲式阵列,并重复使用先前运算的结果.其采用全域搜索的方法,能在一个脉冲期间对比一个候选区块,算出像素差值的绝对值总和和运动向量,并传给1/4像素运动估计器.实验结果证明,此运动估计器可处理720×480解析度、30fps的影片. 相似文献

10.

An improved reversible data hiding-based approach for intra-frame error concealment in H.264/AVC

《Journal of Visual Communication and Image Representation》2014,25(2):410-422

As H.264/AVC video streams are highly compressed, they become sensitive to errors caused by unreliable transmission channels. In order to address this issue, an improved version of Chung et al.’s reversible data hiding-based approach for intra-frame error concealment is proposed for H.264/AVC codec. By using the histogram shifting technique, the original work reversibly embeds the motion vector (MV) of a macroblock (MB) into other MB within the same intra-frame. If an MB is corrupted at the decoder side, the embedded MV can be extracted from the corresponding MB for the recovery of the corrupted MB. However, Chung et al.’s work did not fully exploit the number of coefficients which need to be modified in order to reversibly hiding data, and did not consider many extra nonzero residual blocks produced by data hiding. These two issues could reduce the visual quality of the stego-video. This paper adopts MV data pre-processing, the selection of most suitable embedding region, and the minimum possible amount of histogram modification, which lead to higher PSNR of the stego-video for a given payload. Experimental results further reveal that the proposed method offers stego-video with better visual quality over Chung et al.’s work. 相似文献

11.

On Hardware Implementations Of DCT and Quantization Blocks for H.264/AVC

Roman Kordasiewicz Shahram Shirani 《The Journal of VLSI Signal Processing》2007,47(3):189-199

H.264/AVC also known as MPEG 4 part 10 or JVT, is a recently established video coding standard by the Joint Video Team (JVT) of the ISO/IEC MPEG and ITU-T VCEG. The main goal of the paper is to give a broader understanding of the design considerations for the transform and quantization blocks from H.264/AVC, by presenting area and speed optimized implementations of these blocks. The area optimized design can be used in low performance applications like mobile devices, while the speed optimized designs can be used in high definition encoders. Various designs with these blocks were synthesized with 0.18 μm TSCM technology and were also implemented on a Xilinx FPGA. The resulting gate counts were anywhere from 294 to 47,762 gates and the throughput was anywhere from 6 to 2,552 M pixels/s depending on block and optimization. In addition, a system on a programmable chip implementation of the DCT and quantization blocks is presented, which uses the Xilinx Virtex II-Pro’s FPGA and its Power PC. Using this system it is possible to process 0.8 M pixels/s.

Shahram ShiraniEmail:

相似文献