首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
With the increasing number of processor cores available in modern computing architectures, task or data parallelism is required to maximally exploit the available hardware and achieve optimal processing speed. Current state-of-the-art data-parallel processing methods for decoding image and video bitstreams are limited in parallelism by dependencies introduced by the coding tools and the number of synchronization points introduced by these dependencies, only allowing task or coarse-grain data parallelism. In particular, entropy decoding and data prediction are bottleneck coding tools for parallel image and video decoding. We propose a new data-parallel processing scheme for block-based intra sample and coefficient prediction that allows fine-grain parallelism and is suitable for integration in current and future state-of-the-art image and video codecs. Our prediction scheme enables maximum concurrency, independent of slice or tile configuration, while minimizing synchronization points. This paper describes our data-parallel processing scheme for one- and two-dimensional prediction and investigates its application to block-based image and video codecs using JPEG XR and H.264/AVC Intra as a starting point. We show how our scheme enables faster decoding than the state-of-the-art wavefront method with speedup factors of up to 21.5 and 7.9 for JPEG XR and H.264/AVC Intra coding tools respectively. Using the H.264/AVC Intra coding tool, we discuss the requirements of the algorithm and the impact on decoded image quality when these requirements are not met. Finally, we discuss the impact on coding rate in order to allow for optimal parallel intra decoding.  相似文献   

2.
The H.264/AVC video coding standard can deliver high compression efficiency at a cost of increased complexity and power. The increasing popularity of video capture and playback on portable devices requires that the power of the video codec be kept to a minimum. This work implements several architecture optimizations such as increased parallelism, pipelining with FIFOs, multiple voltage/frequency domains, and custom voltage-scalable SRAMs that enable low voltage operation to reduce the power of a high-definition decoder. Dynamic voltage and frequency scaling can efficiently adapt to the varying workloads by leveraging the low voltage capabilities and domain partitioning of the decoder. An H.264/AVC Baseline Level 3.2 decoder ASIC was fabricated in 65-nm CMOS and verified. For high definition 720p video decoding at 30 frames per second (fps), it operates down to 0.7$~$ V with a measured power of 1.8 mW, which is significantly lower than previously published results. The highly scalable decoder is capable of operating down to 0.5 V for decoding QCIF at 15 fps with a measured power of 29 $mu$W.   相似文献   

3.
In this article, we suggest some techniques to design the H.264/AVC video coding system for HDTV applications. The design exploration is made according to software profiling. The design considerations of system scheduling and pipelining are discussed followed by the architecture optimization of the significant modules. The efficient H.264/AVC video coding system is achieved by combining these techniques.  相似文献   

4.
The microprocessor industry trend towards many-core architectures introduced the necessity of devising appropriately scalable applications. While implementing software based video decoding, the main challenges are the optimized partitioning of decoder operations, efficient tracking of dependencies and resource synchronization for multiple parallel units. The same applies for hardware implementations of video decoders where monolithic approaches anticipate scalability of the design and reusability of already implemented core components.In this paper, we propose an intermediate data stream format (Meta Format Stream) which is suited for architectural decomposition of video decoding by replacing the conventional monolithic decoder architecture design with a pipelined structure. The Meta Format is forward-oriented and self contained and multistandard capable, so that processing of Meta Streams is independent of the originating bit stream. Our approach does not require special coding settings and is applicable to accelerated decoding of any standards-compliant bit stream. A H.264/AVC multiprocessing proposal is presented as a case study for the potential our our concept. The case study combines coarse grained frame-level parallel decoding of the bit stream with fine-grained macroblock level parallelism in the image processing stage.The proposed H.264 decoder achieved speedup factors of up to 7.6 on an 8 core machine with 2-way SMT. We are reporting actual decoding speeds of up to 150 frames per second in 2160p-resolution.  相似文献   

5.
An application specific processor for an H.264 decoder with a configurable embedded processor is designed in this research. The motion compensation, inverse integer transform, inverse quantization, and entropy decoding algorithm of H.264 decoder software are optimized. We improved the performance of the processor with instruction‐level hardware optimization, which is tailored to configurable embedded processor architecture. The optimized instructions for video processing can be used in other video compression standards such as MPEG 1, 2, and 4. A significant performance improvement is achieved with high flexibility. Experimental results show that we could achieve 300% performance for the H.264 baseline profile level 2 decoder.  相似文献   

6.
A VLSI architecture for entropy decoder, inverse quantiser and predictor is proposed in this article. This architecture is used for decoding video streams of three standards on a single chip, i.e. H.264/AVC, AVS (China National Audio Video coding Standard) and MPEG2. The proposed scheme is called MPMP (Macro-block-Parallel based Multilevel Pipeline), which is intended to improve the decoding performance to satisfy the real-time requirements while maintaining a reasonable area and power consumption. Several techniques, such as slice level pipeline, MB (Macro-Block) level pipeline, MB level parallel, etc., are adopted. Input and output buffers for the inverse quantiser and predictor are shared by the decoding engines for H.264, AVS and MPEG2, therefore effectively reducing the implementation overhead. Simulation shows that decoding process consumes 512, 435 and 438 clock cycles per MB in H.264, AVS and MPEG2, respectively. Owing to the proposed techniques, the video decoder can support H.264 HP (High Profile) 1920 × 1088@30fps (frame per second) streams, AVS JP (Jizhun Profile) 1920 × 1088@41fps streams and MPEG2 MP (Main Profile) 1920 × 1088@39fps streams when exploiting a 200 MHz working frequency.  相似文献   

7.
张凤 《数字通信》2009,36(4):50-52
ITU—T和ISO联合开发的最新视频编码标准H.264/AVC具有压缩效率高、网络适应能力强等特点,对实现高效的媒体通信平台有着重要的工程意义和市场价值。介绍了H.264的技术特点、传输结构和编解码构架。提出了在Windows CE.NET操作系统下实现的一套H.264的实时解码软件系统。  相似文献   

8.
The scalable extension of the H.264 Advanced Video Coding (AVC) standard called Scalable Video Coding (SVC), or H.264/SVC, provides scalable video streams which are composed by a base layer and one or more enhancement layers. Enhancement layers may improve the temporal, the spatial or the signal-to-noise ratio resolutions of the content represented by the lower layers. One of the applications of this video coding standard is related to point-to-multipoint video distributions in both wired and wireless communication systems, where packet losses contribute to the degradation of the user’s Quality of Experience. Designed for the transmission of data over Binary Erasure Channels (BEC), Raptor codes are a Forward Error Correction (FEC) mechanism that is gaining popularity for Internet Protocol Television (IPTV) applications due to their small decoding complexity and reduced overhead. This paper evaluates the quality enhancements introduced by the integration of several H.264/SVC layers with a Raptor coding protection scheme. Our goal is to improve the distribution of video over loss prone networks in terms of rate-distortion performance by assessing several alternative packetization options and protection schemes.  相似文献   

9.
通过分析H.264软件解码器的结构和复杂度,确定了解码器在优化过程中的重点和难点,并结合TMS320DM642DSP性能特点,详细讨论了在TMS320DM642DSP平台上H.264解码器所采用的优化方法。这些方法主要涉及提高程序代码的并行性和增强存储器访问的效率,重点是运动补偿、IDCT等关键模块的优化。通过实验结果表明,本解码器可以实现CIF格式视频流的实时解码。  相似文献   

10.
Context-based Binary Arithmetic Coding (CBAC) is a normative part of the newest X Profile of Advanced Audio Video coding Standard (AVS). This paper presents an efficient VLSI architecture for CBAC decoding in AVS. Compared with CBAC in H.264/AVC, the simpler binarization methods and context selection schemes are adopted in AVS. In order to avoid the slow multiplications, the traditional arithmetic calculation is transformed to the logarithm domain. Although these features can obtain better balance between the compression gain and implementation cost, it still brings huge challenge for high-throughput implementation. The fact that current bin decoding depends on previous bin results in long latency and limits overall system performance. In this paper, we present a software–hardware co-design by using bin distribution feature. A novel pipeline-based architecture is proposed where the arithmetic decoding engine works in parallel with the context maintainer. A finite state machine (FSM) is used to control the decoding procedure flexibly and the context scheduling is organized carefully to minimize the access times of context RAMs. In addition, the critical path is optimized for the timing. The proposed implementation can work at 150 MHz and achieve the real-time AVS CBAC decoding for 1080i HDTV video.  相似文献   

11.
In order to achieve high computational performance and low power consumption, many modern microprocessors are equipped with special multimedia instructions and multi-core processing capabilities. The number of cores on a single chip increases double every three years. Therefore, besides complexity reduction by smart algorithms such as fast macroblock mode selection, an effective algorithm for parallelizing H.264/AVC is also very crucial in implementing a real-time encoder on a multi-core system. This algorithm serves to uniformly distribute workloads for H.264/AVC encoding over several slower and simpler processor cores on a single chip. In this paper, we propose a new adaptive slice-size selection technique for efficient slice-level parallelism of H.264/AVC encoding on a multi-core processor using fast macroblock mode selection as a pre-processing step. For this we propose an estimation method for the computational complexity of each macroblock using pre macroblock mode selection. Simulation results, with a number of test video sequences, show that, without any noticeable degradation, the proposed fast macroblock mode selection reduces the total encoding time by about 57.30%. The proposed adaptive slice-level parallelism has good parallel performance compared to conventional fixed slice-size parallelism. The proposed method can be applied to many multi-core systems for real-time H.264 video encoding.  相似文献   

12.
H.264视频压缩标准凭借高压缩比和较好的图像质量,已经作为一种新型的标准被广泛接受。由于H.264的解码复杂度很高,软件实现难以满足实时性的要求,所以需要采用硬件解码。本文提出了一种针对H.264视频编码标准的可变长指数哥伦布码解码的硬件设计结构,给出了一种系统解码时间消耗与系统资源占用较少的硬件设计方案,最后给出了设计最终的仿真以及后端设计的结果。  相似文献   

13.
H.264是新一代的视频编码标准,具有优秀的压缩性能。其获得优越性能的代价是运算复杂度的大幅增加,因此在实际应用上存在困难。使用专门的硬件设备是解决这个问题的方法之一。H.264标准中的整数变换运算适合使用硬件实现。首先对H,264标准中的整数变换运算进行介绍,针对H.264中的变换运算提出一种基于矩阵分解的快速并行算法。分析了该算法的结构,表明是符合H.264标准的一种快速算法。并对变换算法的硬件寡现进行了分析,表明这种硬件算法结构适合在实时编解码中应用。  相似文献   

14.
In this work, we propose a novel entropy coding mode decision algorithm to balance the tradeoff between the rate-distortion (R-D) performance and the entropy decoding complexity for the H.264/AVC video coding standard. Context-based adaptive binary arithmetic coding (CABAC), context-based adaptive variable length coding (CAVLC), and universal variable length coding (UVLC) are three entropy coding tools adopted by H.264/AVC. CABAC can be used to encode the texture and the header data while CAVLC and UVLC are employed to encode the texture and the header data, respectively. Although CABAC can provide better R-D performance than CAVLC/UVLC, its decoding complexity is higher. Thus, by taking the entropy decoding complexity into account, CABAC may not be the best tool, which motivates us to examine the entropy coding mode decision problem in depth. It will be shown experimentally that the proposed mode decision algorithm can help the encoder generate the bit streams that can be decoded at much lower complexity with little R-D performance loss.  相似文献   

15.
陈青华 《红外》2013,34(8):16-20
为了满足低速数据链下目标识别的需求,把感兴趣区域(Region of Interest,ROI)编码策略引入红外视频编码。在H.264视频编/解码框架的基础上,通过增加ROI编码的处理,构建基于ROI的视频编/解码框架,并在码率控制过程中调整ROI宏块与非ROI宏块的量化参数,优化了ROI量化模型。实验结果表明,该方法能够节省有限码率,增加ROI目标的细节,提高ROI对象的清晰度,可从整体上提高主观视觉的质量。  相似文献   

16.
根据H.264/AVC的特点,设计出一种适合于帧内预测解码的硬件实现方式,并且引入了帧场自适应模式,有利于提高解码效率,并将该结构配合其他设计好的解码器模块,在FPGA上实现了标准清晰度的H.264视频的实时解码。  相似文献   

17.
主要介绍视频会议系统的基本概念及其对视频编解码技术提出的要求,在MPEG-4精细可伸缩性编码(FGS)的基础上,提出了一种基于H.264的精细可伸缩性视频编码方案,仿真和实验结果表明,基于H.264的FGS具有更高的信噪比和视觉质量,能较好地满足基于IP的H.323视频会议系统不同终端的视频质量要求。  相似文献   

18.
基于Directshow的H.264流媒体播放器设计   总被引:1,自引:0,他引:1  
基于目前最新的视频压缩编解码标准H.264,采用Directshow应用框架,设计了网络流媒体播放器系统;Directshow是微软提供的基于windows平台的优秀的流媒体应用架构,H.264标准具有高压缩比和优良的网络亲和性,基于Directshow和H.264设计的网络流媒体播放器不仅具有优良的系统架构,同时具有更好的灵活性和可扩展性,可以很方便地应用于视频点播系统(Video On Demand System)中和移植到嵌入式WINCE平台。基于Directshow分析和设计了网络源Filter和H.264解码Filter,同时阐述了网络流媒体播放器系统的整体框架。  相似文献   

19.
随着低速率的视频编码技术与无线通信技术的迅速发展,使得为用户提供移动视频监控业务成为可能。这里,文章实现了一套在移动环境下的实时视频监控系统。该系统融合了H.264视频编码技术,并对H.264核心代码进行了优化,以确保其在处理能力较弱的移动终端上流畅解码。实验表明,即使在低速率的无线网络环境下,也可以获得较高质量的监控图像。  相似文献   

20.
H.264主要档次采用的CABAC熵编码技术在提高视频压缩比率的同时,严重增加了编/解码的计算复杂度,嵌入式系统由于其低成本低功耗的要求,需要专用硬件加速器来进行CABAC编/解码。设计了一个高性能H.264 CABAC硬件加速器,该加速器可配置为编码或解码模式,高效地实现CABAC编/解码操作。通过性能评估实验,在220 MHz时钟频率下,该加速器能够实现平均147 Mbps(1.5 cycle/bit)的编码速度和220 Mbps(1 cycle/bit)的解码速度。与软件实现相比,加速器获得50倍以上的性能提升。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号