共查询到20条相似文献,搜索用时 31 毫秒
1.
Bart Pieters Charles-Frederik Hollemeersch Jan De Cock Peter Lambert Rik Van de Walle 《Signal Processing: Image Communication》2012,27(3):220-237
With the increasing number of processor cores available in modern computing architectures, task or data parallelism is required to maximally exploit the available hardware and achieve optimal processing speed. Current state-of-the-art data-parallel processing methods for decoding image and video bitstreams are limited in parallelism by dependencies introduced by the coding tools and the number of synchronization points introduced by these dependencies, only allowing task or coarse-grain data parallelism. In particular, entropy decoding and data prediction are bottleneck coding tools for parallel image and video decoding. We propose a new data-parallel processing scheme for block-based intra sample and coefficient prediction that allows fine-grain parallelism and is suitable for integration in current and future state-of-the-art image and video codecs. Our prediction scheme enables maximum concurrency, independent of slice or tile configuration, while minimizing synchronization points. This paper describes our data-parallel processing scheme for one- and two-dimensional prediction and investigates its application to block-based image and video codecs using JPEG XR and H.264/AVC Intra as a starting point. We show how our scheme enables faster decoding than the state-of-the-art wavefront method with speedup factors of up to 21.5 and 7.9 for JPEG XR and H.264/AVC Intra coding tools respectively. Using the H.264/AVC Intra coding tool, we discuss the requirements of the algorithm and the impact on decoded image quality when these requirements are not met. Finally, we discuss the impact on coding rate in order to allow for optimal parallel intra decoding. 相似文献
2.
《Solid-State Circuits, IEEE Journal of》2009,44(11):2943-2956
3.
Tung-Chien Chen Hung-Chi Fang Chung-Jr Lian Chen-Han Tsai Yu-Wen Huang To-Wei Chen Ching-Yen Chen Yu-Han Chen Chuan-Yung Tsai Liang-Gee Chen 《Circuits and Devices Magazine, IEEE》2006,22(3):22-31
In this article, we suggest some techniques to design the H.264/AVC video coding system for HDTV applications. The design exploration is made according to software profiling. The design considerations of system scheduling and pipelining are discussed followed by the architecture optimization of the significant modules. The efficient H.264/AVC video coding system is achieved by combining these techniques. 相似文献
4.
The microprocessor industry trend towards many-core architectures introduced the necessity of devising appropriately scalable applications. While implementing software based video decoding, the main challenges are the optimized partitioning of decoder operations, efficient tracking of dependencies and resource synchronization for multiple parallel units. The same applies for hardware implementations of video decoders where monolithic approaches anticipate scalability of the design and reusability of already implemented core components.In this paper, we propose an intermediate data stream format (Meta Format Stream) which is suited for architectural decomposition of video decoding by replacing the conventional monolithic decoder architecture design with a pipelined structure. The Meta Format is forward-oriented and self contained and multistandard capable, so that processing of Meta Streams is independent of the originating bit stream. Our approach does not require special coding settings and is applicable to accelerated decoding of any standards-compliant bit stream. A H.264/AVC multiprocessing proposal is presented as a case study for the potential our our concept. The case study combines coarse grained frame-level parallel decoding of the bit stream with fine-grained macroblock level parallelism in the image processing stage.The proposed H.264 decoder achieved speedup factors of up to 7.6 on an 8 core machine with 2-way SMT. We are reporting actual decoding speeds of up to 150 frames per second in 2160p-resolution. 相似文献
5.
An application specific processor for an H.264 decoder with a configurable embedded processor is designed in this research. The motion compensation, inverse integer transform, inverse quantization, and entropy decoding algorithm of H.264 decoder software are optimized. We improved the performance of the processor with instruction‐level hardware optimization, which is tailored to configurable embedded processor architecture. The optimized instructions for video processing can be used in other video compression standards such as MPEG 1, 2, and 4. A significant performance improvement is achieved with high flexibility. Experimental results show that we could achieve 300% performance for the H.264 baseline profile level 2 decoder. 相似文献
6.
An architecture of entropy decoder,inverse quantiser and predictor for multi-standard video decoding
Leibo Liu Yingjie Chen Shouyi Yin Hao Lei Guanghui He Shaojun Wei 《International Journal of Electronics》2013,100(7):877-893
A VLSI architecture for entropy decoder, inverse quantiser and predictor is proposed in this article. This architecture is used for decoding video streams of three standards on a single chip, i.e. H.264/AVC, AVS (China National Audio Video coding Standard) and MPEG2. The proposed scheme is called MPMP (Macro-block-Parallel based Multilevel Pipeline), which is intended to improve the decoding performance to satisfy the real-time requirements while maintaining a reasonable area and power consumption. Several techniques, such as slice level pipeline, MB (Macro-Block) level pipeline, MB level parallel, etc., are adopted. Input and output buffers for the inverse quantiser and predictor are shared by the decoding engines for H.264, AVS and MPEG2, therefore effectively reducing the implementation overhead. Simulation shows that decoding process consumes 512, 435 and 438 clock cycles per MB in H.264, AVS and MPEG2, respectively. Owing to the proposed techniques, the video decoder can support H.264 HP (High Profile) 1920 × 1088@30fps (frame per second) streams, AVS JP (Jizhun Profile) 1920 × 1088@41fps streams and MPEG2 MP (Main Profile) 1920 × 1088@39fps streams when exploiting a 200 MHz working frequency. 相似文献
7.
ITU—T和ISO联合开发的最新视频编码标准H.264/AVC具有压缩效率高、网络适应能力强等特点,对实现高效的媒体通信平台有着重要的工程意义和市场价值。介绍了H.264的技术特点、传输结构和编解码构架。提出了在Windows CE.NET操作系统下实现的一套H.264的实时解码软件系统。 相似文献
8.
The scalable extension of the H.264 Advanced Video Coding (AVC) standard called Scalable Video Coding (SVC), or H.264/SVC,
provides scalable video streams which are composed by a base layer and one or more enhancement layers. Enhancement layers
may improve the temporal, the spatial or the signal-to-noise ratio resolutions of the content represented by the lower layers.
One of the applications of this video coding standard is related to point-to-multipoint video distributions in both wired
and wireless communication systems, where packet losses contribute to the degradation of the user’s Quality of Experience.
Designed for the transmission of data over Binary Erasure Channels (BEC), Raptor codes are a Forward Error Correction (FEC)
mechanism that is gaining popularity for Internet Protocol Television (IPTV) applications due to their small decoding complexity
and reduced overhead. This paper evaluates the quality enhancements introduced by the integration of several H.264/SVC layers
with a Raptor coding protection scheme. Our goal is to improve the distribution of video over loss prone networks in terms
of rate-distortion performance by assessing several alternative packetization options and protection schemes. 相似文献
9.
通过分析H.264软件解码器的结构和复杂度,确定了解码器在优化过程中的重点和难点,并结合TMS320DM642DSP性能特点,详细讨论了在TMS320DM642DSP平台上H.264解码器所采用的优化方法。这些方法主要涉及提高程序代码的并行性和增强存储器访问的效率,重点是运动补偿、IDCT等关键模块的优化。通过实验结果表明,本解码器可以实现CIF格式视频流的实时解码。 相似文献
10.
《Signal Processing: Image Communication》2009,24(4):324-332
Context-based Binary Arithmetic Coding (CBAC) is a normative part of the newest X Profile of Advanced Audio Video coding Standard (AVS). This paper presents an efficient VLSI architecture for CBAC decoding in AVS. Compared with CBAC in H.264/AVC, the simpler binarization methods and context selection schemes are adopted in AVS. In order to avoid the slow multiplications, the traditional arithmetic calculation is transformed to the logarithm domain. Although these features can obtain better balance between the compression gain and implementation cost, it still brings huge challenge for high-throughput implementation. The fact that current bin decoding depends on previous bin results in long latency and limits overall system performance. In this paper, we present a software–hardware co-design by using bin distribution feature. A novel pipeline-based architecture is proposed where the arithmetic decoding engine works in parallel with the context maintainer. A finite state machine (FSM) is used to control the decoding procedure flexibly and the context scheduling is organized carefully to minimize the access times of context RAMs. In addition, the critical path is optimized for the timing. The proposed implementation can work at 150 MHz and achieve the real-time AVS CBAC decoding for 1080i HDTV video. 相似文献
11.
《Journal of Visual Communication and Image Representation》2008,19(8):558-572
In order to achieve high computational performance and low power consumption, many modern microprocessors are equipped with special multimedia instructions and multi-core processing capabilities. The number of cores on a single chip increases double every three years. Therefore, besides complexity reduction by smart algorithms such as fast macroblock mode selection, an effective algorithm for parallelizing H.264/AVC is also very crucial in implementing a real-time encoder on a multi-core system. This algorithm serves to uniformly distribute workloads for H.264/AVC encoding over several slower and simpler processor cores on a single chip. In this paper, we propose a new adaptive slice-size selection technique for efficient slice-level parallelism of H.264/AVC encoding on a multi-core processor using fast macroblock mode selection as a pre-processing step. For this we propose an estimation method for the computational complexity of each macroblock using pre macroblock mode selection. Simulation results, with a number of test video sequences, show that, without any noticeable degradation, the proposed fast macroblock mode selection reduces the total encoding time by about 57.30%. The proposed adaptive slice-level parallelism has good parallel performance compared to conventional fixed slice-size parallelism. The proposed method can be applied to many multi-core systems for real-time H.264 video encoding. 相似文献
12.
H.264视频压缩标准凭借高压缩比和较好的图像质量,已经作为一种新型的标准被广泛接受。由于H.264的解码复杂度很高,软件实现难以满足实时性的要求,所以需要采用硬件解码。本文提出了一种针对H.264视频编码标准的可变长指数哥伦布码解码的硬件设计结构,给出了一种系统解码时间消耗与系统资源占用较少的硬件设计方案,最后给出了设计最终的仿真以及后端设计的结果。 相似文献
13.
H.264是新一代的视频编码标准,具有优秀的压缩性能。其获得优越性能的代价是运算复杂度的大幅增加,因此在实际应用上存在困难。使用专门的硬件设备是解决这个问题的方法之一。H.264标准中的整数变换运算适合使用硬件实现。首先对H,264标准中的整数变换运算进行介绍,针对H.264中的变换运算提出一种基于矩阵分解的快速并行算法。分析了该算法的结构,表明是符合H.264标准的一种快速算法。并对变换算法的硬件寡现进行了分析,表明这种硬件算法结构适合在实时编解码中应用。 相似文献
14.
Szu-Wei Lee C.-C. Jay Kuo 《Journal of Visual Communication and Image Representation》2011,22(6):557-562
In this work, we propose a novel entropy coding mode decision algorithm to balance the tradeoff between the rate-distortion (R-D) performance and the entropy decoding complexity for the H.264/AVC video coding standard. Context-based adaptive binary arithmetic coding (CABAC), context-based adaptive variable length coding (CAVLC), and universal variable length coding (UVLC) are three entropy coding tools adopted by H.264/AVC. CABAC can be used to encode the texture and the header data while CAVLC and UVLC are employed to encode the texture and the header data, respectively. Although CABAC can provide better R-D performance than CAVLC/UVLC, its decoding complexity is higher. Thus, by taking the entropy decoding complexity into account, CABAC may not be the best tool, which motivates us to examine the entropy coding mode decision problem in depth. It will be shown experimentally that the proposed mode decision algorithm can help the encoder generate the bit streams that can be decoded at much lower complexity with little R-D performance loss. 相似文献
15.
为了满足低速数据链下目标识别的需求,把感兴趣区域(Region of Interest,ROI)编码策略引入红外视频编码。在H.264视频编/解码框架的基础上,通过增加ROI编码的处理,构建基于ROI的视频编/解码框架,并在码率控制过程中调整ROI宏块与非ROI宏块的量化参数,优化了ROI量化模型。实验结果表明,该方法能够节省有限码率,增加ROI目标的细节,提高ROI对象的清晰度,可从整体上提高主观视觉的质量。 相似文献
16.
根据H.264/AVC的特点,设计出一种适合于帧内预测解码的硬件实现方式,并且引入了帧场自适应模式,有利于提高解码效率,并将该结构配合其他设计好的解码器模块,在FPGA上实现了标准清晰度的H.264视频的实时解码。 相似文献
17.
18.
基于Directshow的H.264流媒体播放器设计 总被引:1,自引:0,他引:1
基于目前最新的视频压缩编解码标准H.264,采用Directshow应用框架,设计了网络流媒体播放器系统;Directshow是微软提供的基于windows平台的优秀的流媒体应用架构,H.264标准具有高压缩比和优良的网络亲和性,基于Directshow和H.264设计的网络流媒体播放器不仅具有优良的系统架构,同时具有更好的灵活性和可扩展性,可以很方便地应用于视频点播系统(Video On Demand System)中和移植到嵌入式WINCE平台。基于Directshow分析和设计了网络源Filter和H.264解码Filter,同时阐述了网络流媒体播放器系统的整体框架。 相似文献
19.
20.
H.264主要档次采用的CABAC熵编码技术在提高视频压缩比率的同时,严重增加了编/解码的计算复杂度,嵌入式系统由于其低成本低功耗的要求,需要专用硬件加速器来进行CABAC编/解码。设计了一个高性能H.264 CABAC硬件加速器,该加速器可配置为编码或解码模式,高效地实现CABAC编/解码操作。通过性能评估实验,在220 MHz时钟频率下,该加速器能够实现平均147 Mbps(1.5 cycle/bit)的编码速度和220 Mbps(1 cycle/bit)的解码速度。与软件实现相比,加速器获得50倍以上的性能提升。 相似文献