首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The hardware implementation of the intra prediction described in this paper allows the H.264/AVC encoder to achieve optimal compression efficiency in real-time conditions. The architecture has some features that distinguish it from other solutions described in literature. Firstly, the architecture supports all intra prediction modes defined in High Profile of the H.264/AVC standard for all chroma formats. Secondly, the architecture can generate predictions for several quantization parameters. Thirdly, the hardware cost is reduced as the same resources are used to compute prediction samples for all the modes. Fourthly, the high sample-generation rate enables the encoder to achieve high throughputs. Fifthly, 4?×?4 block reordering and interleaving with other modes minimize the impact of the long-delay reconstruction loop on the encoder throughput. The architecture is verified against the JM.12 reference model and within the real-time FPGA hardware encoder. The synthesis results show that the design can operate at 100 MHz and 200 MHz for FPGA Aria II and 0.13 μm TSMC technology, respectively. These frequencies allow the encoder to support 720p and 1080p video at 30 fps.  相似文献   

2.
针对H.264/AVC中的去块效应滤波器,该文提出了一种新的滤波处理顺序,能够显著减小片上数据缓存容量,并以此为基础设计了一种去块效应滤波器的VLSI硬件新结构。该结构利用数据复用机制减少对片外存储的访问量、节省处理时间,同时不使用片内SRAM,将对片内SRAM的访问降为0。仿真结果显示,该电路在工作频率为100MHz时对HDTV能较好地实现实时滤波;在0.18m工艺下,综合后的等效逻辑门数只有16.8k。  相似文献   

3.
We present a full HD (1080p) H.264/AVC High Profile hardware encoder based on fast motion estimation (ME). Most processing cycles are occupied with ME and use external memory access to fetch samples, which degrades the performance of the encoder. A novel approach to fast ME which uses shared multibank memory can solve these problems. The proposed pixel subsampling ME algorithm is suitable for fast motion vector searches for high‐quality resolution images. The proposed algorithm achieves an 87.5% reduction of computational complexity compared with the full search algorithm in the JM reference software, while sustaining the video quality without any conspicuous PSNR loss. The usage amount of shared multibank memory between the coarse ME and fine ME blocks is 93.6%, which saves external memory access cycles and speeds up ME. It is feasible to perform the algorithm at a 270 MHz clock speed for 30 frame/s real‐time full HD encoding. Its total gate count is 872k, and internal SRAM size is 41.8 kB.  相似文献   

4.
The in-loop deblocking filter is one of the complex parts in H.264/AVC. It has such a large amount of computation that almost all the pixels in all the frames are involved in the worst case. In this paper, a fast deblocking filter architecture is proposed, and it can effectively save the operating time. In the proposed architecture, two 1-D filters are introduced so that the vertical filtering and the horizontal filtering can be performed at the same time, Only 120 cycles are needed for a macroblock. Our architecture is also a memory efficient one, and only one 4×4 pixels register, one 4×4 transpose array and one 16×32 b two-port (SRAM) are used as buffers in the filtering process. The simulation and synthesis results show that, with almost the same or even smaller area than some 1-D filter based architectures before, the proposed one can save more than 40% processing time. The architecture is suitable for real-time applications and can easily achieve the requirement of processing real-time video in 1080HD (high definition format, 1 920×1 088@30 fps) at 100 MHz.  相似文献   

5.
In addition to coding efficiency, the scalable extension of H.264/AVC provides good functionality for video adaptation in heterogeneous environments. Fine grain scalability (FGS) is a technique to extract video bitstream at the finest quality level under the given bandwidth. In this paper, an architecture of FGS encoder with low external memory bandwidth and low hardware cost is proposed. Up to 99% of bandwidth reduction can be attained by the proposed scan bucket algorithm, early context modeling with context reduction, and first scan pre-encoding. The area-efficient hardware architecture is implemented by layer-wise hardware reuse. Besides, three design strategies for enhancement layer coder are explored so that the trade-off between external memory bandwidth and silicon area is allowed. The proposed hardware architecture can real-time encode HDTV 1920×1080 video with two FGS enhancement layers at 200 MHz working frequency, or HDTV 1280×720 video with three FGS enhancement layers at 130 MHz working frequency.  相似文献   

6.
在采用外部存储和内部缓存的两级存储方案的基础上,提出了一种基于纹理图像的MPEG-4ASP@L5运动补偿电路的硬件结构,并完成了VLSI设计。针对运动向量的预测算法,在满足实时译码的前提下对电路的内部缓存LM2进行了优化。对于重叠块运动补偿算法,提出了一种有效的双循环替换缓存结构。采用TSMC0.25μm1P5MCMOS工艺,完成了运动补偿电路的VLSI实现,芯片内核面积为1.31mm×1.31mm,最高工作频率150MHz。系统仿真结果表明该电路可在120MHz的频率下对符合ASProfile标准的ITU-R601格式的纹理视频流进行实时运动补偿。  相似文献   

7.
陆晓凤  刘锋  佟冬  王克义 《电子学报》2011,39(5):1072-1076
本文针对H.264 Fidelity Range Extensions(FRExt,High Profile)解码过程中扩展的所有变换,采用二维矩阵分解和基于矩阵运算提取公共因子的操作,利用通用运算单元来设计高效的可重构VLSI结构.该结构不但节省面积(可重构变换结构只消耗了4807门电路),并且具有高性能(采用TSM...  相似文献   

8.
The performance of the processor core depends on the configuration parameters and utilization of on-chip memory in multimedia applications such as image, video and audio processing. The design of the on-chip memory architecture is critical for power and area efficient design without compromising quality in data-intensive computing applications. This paper proposes a design of high speed, area, and energy efficient Static Segment On-Chip (SSOC) memory for error-tolerant applications. In this static segment method, n-bit data array is reduced by m-bit data array for significant value of input data to achieve balanced design metrics at the cost of accuracy. The proposed m-bit static segmentation algorithm is implemented and verified in Single Port Static Random Access Memory (SP SRAM) architecture for the approximate computing applications. From the overall simulation results, the proposed 4-bit SSOC SP SRAM design provides 49.02% area savings, 50.62% power reduction and 16.92% speed improvement at the cost of 0.64% Peak Signal to Noise Ratio (PSNR) and exhibits same visual quality in comparison with the existing 8-bit conventional on-chip SP SRAM design in the image processing applications.  相似文献   

9.
Two-dimensional (2-D) convolution is widely used in image and video processing. Although the operation is simple, 2-D convolution is however both computationally expensive and memory-intensive. Field-programmable-gate-array (FPGA)-based parallel processing architectures were proposed to accelerate calculations for 2-D convolution. And data buffers implemented with FPGA on-chip resources were used to avoid direct access to external memories. Full buffering and partial buffering (PB) schemes were adopted in previous works. The former would consume a large amount of FPGA resources, while the latter would cause a sharp increase in external memory bus bandwidth. In this brief, we present a multiwindow PB scheme for FPGA-based 2-D convolvers. Compared with the aforementioned methods, the new buffering strategy exhibits a good balance between on-chip resource utilization and external memory bus bandwidth, and therefore is suitable for low-cost FPGA implementation  相似文献   

10.
Multiview video coding (MVC) is the appendix H of H.264/AVC, and it requires a great amount of time to compress multiple viewpoints׳ video with complex prediction structures. To reduce the whole computational complexity of MVC, this paper proposes a fast macroblock (MB) encoding algorithm based on rate-distortion (RD) activity, and it includes the fast mode decision and the fast motion/disparity estimation. First, the RD activity type of the current MB is calculated by utilizing the Skip/Direct RD cost and the average RD costs of classified MB modes. Then, through utilizing the RD activity type and RD costs of the estimated modes, the selection of candidate modes, the early decision of Skip/Direct mode, and the reduction of Inter8×8 mode estimation are all presented in the fast mode decision. By using the RD activity type and the correlations of vectors, the selection of search center and the prediction of search range are introduced in the fast motion/disparity estimation. In addition, the proposed algorithm can be applied to temporal and inter-view views as well as anchor and non-anchor frames. An experiment with a wide range of video scenes, camera setups and quantization parameters was implemented, and the results confirmed that the proposed algorithm can reduce the encoding time significantly while maintaining a similar RD performance as the original MVC encoder. Compared to the state-of-the-art algorithms, the proposed algorithm also demonstrated better performances in the various test cases.  相似文献   

11.
Due to the growing demand of digital convergence, there is a need to have a video encoder/decoder (codec) that is capable of supporting multiple video standards on a single platform. High Efficiency Video Coding (HEVC), successor to H.264/MPEG-4 AVC, is a new standard under development that aims to substantially improve coding efficiency compared to AVC High Profile. This paper presents an efficient architecture based on a resource sharing strategy that can perform the quantization operation of the emerging HEVC encoder and six other video encoders: H.264/AVC, AVS, VC-1, MPEG-2, MPEG-4, and Motion JPEG (MJPEG). Since HEVC is still in the drafting stage, the proposed architecture is designed in such a way that any final changes can be accommodated into the design. The proposed quantizer architecture is completely division-free, as the division operation is replaced by shift and addition operations for all the codecs. The design is implemented on an FPGA and later synthesized in CMOS 0.18 μm technology. While working at 190 MHz, the design can decode a 1080p HD video at up to 61 frames per second. The multi-codec architecture is also suitable for low-cost VLSI implementation.  相似文献   

12.
An architecture for an embedded 8-port SRAM with 256 bit simultaneous horizontal and vertical data access for adjacent or alternate addresses is proposed. This architecture makes possible four kinds of address configurations which are effective in video applications by selecting multiple word lines and one of four bit lines for each column multiplexer. The proposed SRAM provides 25.6 Gbit/s of high bandwidth  相似文献   

13.
Recently, the level of realism in PC graphics applications has been approaching that of high-end graphics workstations, necessitating a more sophisticated texture data cache memory to overcome the finite bandwidth of the AGP or PCI bus. This paper proposes a multilevel parallel texture cache memory to reduce the required data bandwidth on the AGP or PCI bus and to accelerate the operations of parallel graphics pipelines in PC graphics cards. The proposed cache memory is fabricated by 0.16-μm DRAM-based SOC technology. It is composed of four components: an 8-MB DRAM L2 cache, 8-way parallel SRAM L1 caches, pipelined texture data filters, and a serial-to-parallel loader. For high-speed parallel L1 cache data replacement, the internal bus bandwidth has been maximized up to 75 GB/s with a newly proposed hidden double data transfer scheme. In addition, the cache memory has a reconfigurable architecture in its line size for optimal caching performance in various graphics applications from three-dimensional (3-D) games to high-quality 3-D movies  相似文献   

14.
Lowering the supply voltage is an effective way to significantly reduce the power consumption of a static random access memory (SRAM). However, the minimum supply voltage (Vminf) required to support a given operating frequency in an SRAM macro is often elusive from one chip to another due to process variations. Moreover, temperature could vary when an SRAM macro is in operation, and thus exacerbating the problem since temperature variation could affect the Vminf . In this paper, we propose an on-chip self-VDD-tuning scheme that automatically adjusts each manufactured SRAM macro to a minimal voltage near its Vminf. Our scheme can provide a user-specified speed margin (e.g., 10% of the target frequency), and thereby creating a guard band for assuring robust operations over a wide range of temperatures. Simulation results show that, with the proposed speed margining technique, a 64 Kb SRAM macro can tolerate temperature up to 125degC. Measurement results from a test chip in a 0.18-mum CMOS process also demonstrate that we can achieve 40% power savings for an 8 Kb SRAM macro operating at 150 MHz by means of this resilient self-VDD-tuning.  相似文献   

15.
FFT algorithms have memory access patterns that prevent many architectures from achieving high computational utilization, particularly when parallel processing is required to achieve the desired levels of performance. Starting with a highly efficient hybrid linear algebra/FFT core, we co-design the on-chip memory hierarchy, on-chip interconnect, and FFT algorithms for a multicore FFT processor. We show that it is possible to to achieve excellent parallel scaling while maintaining power and area efficiency comparable to that of the single-core solution. The result is an architecture that can effectively use up to 16 hybrid cores for transform sizes that can be contained in on-chip SRAM. When configured with 12MiB of on-chip SRAM, our technology evaluation shows that the proposed 16-core FFT accelerator should sustain 388 GFLOPS of nominal double-precision performance, with power and area efficiencies of 30 GFLOPS/W and 2.66 GFLOPS/mm2, respectively.  相似文献   

16.
H.264编码器中的帧内4x4预测部分具有严重的数据依赖性,它的硬件化设计很难采用流水线实现,从而导致关键路径很长,硬件利用率很低,成为H.264编码器设计中的一个瓶颈。针对这个问题, 在不减少预测模式和不增加系统资源的的前提下,本文提出了一种新的结构,它通过利用原始像素进行模式判决和利用重构像素进行帧内预测的方法,可以使帧内预测与重构循环完全流水线实现,基本上达到了100%的硬件利用率,而且没有明显的PSNR的损失。本文所提出的硬件结构可在215个时钟周期内完成一个宏块的帧内4x4预测。用SMIC 0.13um工艺库综合,结果显示该结构最高可运行在250M,面积约为116K门,可支持4096x2160@30fps视频序列的实时编码。  相似文献   

17.
基于DSP平台的H.264编码器的优化与实现   总被引:1,自引:0,他引:1  
王强  卓力  沈兰荪 《电子与信息学报》2007,29(12):2970-2973
针对H.264编码器的复杂度很高和基于DSP平台的H.264编码器的实时处理实现难度很大的问题,该文首先介绍了H.264标准基本档次的编码结构,对影响编码速度的原因进行了深入的分析。然后针对TMS320DM642芯片的特点,提出一些优化实现方案,最终在DSP平台上实时实现了H.264基本档次编码器。测试结果表明,经过优化,H.264编码器的处理速度有了很大的提高,对于CIF格式的视频序列能够满足视频编码的实时处理要求。  相似文献   

18.
This paper describes a novel memory hierarchy and line-pixel-lookahead (LPL) for an H.264/AVC video decoder. The memory system is the bottleneck of most video processors, particularly in the newly announced H.264/AVC. This is because it utilizes the neighboring pixels to create a reliable predictor, leading to a dependency on a long past history of data. This problem can be resolved by allocating memory space but inducing large silicon area and power consumption as well. We first review the existing solutions and propose a three-level memory hierarchy with line-pixel-lookahead to improve access efficiency. Three-level memory hierarchy includes registers, content/slice SRAM and external frame DRAM. We emphasize the need to consider the secondary hierarchy, content/slice SRAM, during the design of an H.264/AVC decoder. Specifically, we introduce a slice SRAM and line-pixel-lookahead to lower the memory capacity and external bandwidth. This SRAM stores neighboring pixels and prevents the data re-access from DRAM. Line-pixel-lookahead exploits multi-dimensional pixel locality so as to averagely improve prediction performance by 6.54% compared to conventional vertical prediction. Simulation results also reveal that the proposal makes a better trade-off between memory allocation and external bandwidth as well as power, leading to 50% of memory power reduction compared to the design without exploiting the secondary slice SRAM hierarchy.
Chen-Yi LeeEmail:
  相似文献   

19.
郑兆青  桑红石  黄卫锋  沈绪榜 《电子学报》2007,35(10):1921-1926
本文提出了一种用于H.264/AVC的D级数据重用整数运动估计VLSI结构.提出的结构是在一种固定块尺寸运动估计VLSI结构基础上,利用交叉网络实现变块尺寸的计算,使用多bank的存储器组织方式,使片上存储器的读写规则简单,易于处理不同搜索范围和不同尺寸的视频的运动估计.提出的运动估计结构用Verilog HDL描述,使用HJTC 0.18μm工艺,用Synopsys DC做了逻辑综合.相比现有结构,该结构由于增加片上存储器,因此数据重用率高,大大降低了存储带宽需求;另外数据吞吐率高,能够满足高性能视频编码需求.  相似文献   

20.
H.264视频编码标准中引入了1/4像素精度插值算法,大大提高了压缩效率,但同时使运算复杂度增加、存储带宽增大。针对以上问题,从运动估计的角度出发,采用一步插值法和数据复用技术,可使带宽减少26%,处理周期可减少45%;设计了相应的硬件结构:采用了5级流水线实现一步插值算法,通过输入缓冲单元实现了参考数据的复用;针对插值过程中产生的大量数据,采用乒乓操作结构,保证数据及时传递。该结构可以显著降低带宽,提高吞吐率,完全可以应用于实时编码器中。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号