共查询到20条相似文献,搜索用时 15 毫秒
High-efficiency video coding (HEVC) is a latest video coding standard and the motion estimation unit is the most important block. The work presents the different types of Matching Criteria for Block-Based Motion Estimation technique in HEVC standard. HEVC requires fast motion estimation algorithms to have better real time performance. The hardware implementation of motion estimation helps to achieve high speed though parallel processing. An improved block matching technique is designed with reduced blocks for HEVC. The proposed method has less execution time where only blocks having motion are compared for prediction computation. The searching time complexity is dependent on the number of blocks that are having motion. The searching time of frame having small motion can be reduced to 80–85% as compared to the traditional full search algorithm. In the paper, sum of absolute difference, mean square error and mean absolute difference are computed to find the best matching algorithm for HEVC. However, SAD has less computational complexity with compare to other matching criteria. The results suggest that proposed motion estimation algorithm has better performance with compare to similar previous work. 相似文献
提出了一种新的两维全搜索运动估计VLSI结构。该结构基于两维脉动阵列,能够完全实现两维数据重用,减少了对外部存储器数据量的访问,具有100%的硬件效率和高吞吐率。该结构也可以很容易地应用于不同块尺寸、不同的搜索范围的全搜索块匹配运动估计,具有通用性。 相似文献
在对目前运动估计快速块匹配算法研究的基础上,描述了运动估计的基本原理;揭示了提高运动估计效率的关键技术,并对相关的算法进行了分析和比较;提出了运动估计算法今后的研究方向。 相似文献
This paper presents a novel memory-based VLSI architecture for full search block matching algorithms. We propose a semi-systolic array to meet the requirements of high computational complexity, where data communications are handled in two styles: (1) global connections for search data and (2) local connections for partial sum. Data flow is handled by a multiple-port memory bank so that all processor elements function on target data items. Thus hardware efficiency achieved can be up to 100%. Both semi-systolic array structure and related memory management strategies for full-search block matching algorithms are highlighted and discussed in detail in the paper. 相似文献
根据H.264/AVC视频编码中分数像素运动估计(FME)的算法特点,针对视频编码系统的不同具体需求,提出了FME的4种VLSI实现结构,并对这些结构的硬件利用率和运算速度进行了对比分析. 相似文献
本文提出了一种全新的低延滞、高吞吐率、可编程的VLSI树型结构,它能十分有效地实现FSA和TSSA运动估计算法。该结构比其它树型结构少1/3的处理单元(PE),而且PE单元的延时减少一半。独特的ME窗缓冲结构使I/O带宽和I/O管脚大大减小,交叉流水线技术使硬件利用率可达到100%。这些特点使得该结构适合VLSI实现。 相似文献
提出一种高度并行和多流水线处理的硬件结构,实现MPEG-4视频部分的全搜索块匹配运动估计算法.该硬件结构能实时地通过全搜索块匹配运动估计算法来搜索每个像素块最佳匹配运动向量,具有计算速度高、运动向量准确、较少的内置存储器要求、低运行时钟和低功耗等诸多优点,从而可满足移动视频业务和高清晰视频业务的需求.该硬件结构基于富士通的CE66库实现. 相似文献
在一种基于望远镜搜索的块匹配运动估值的VLSI实现中,对用于加速搜索的传统心动阵列引擎进行了结构上的改进,从而能够显著地降低功耗.方法是使用一种新的块匹配误差计算的提早跳出技术,并通过在阵列处理单元中屏蔽操作数来避免不必要的计算操作.基于算法模拟结果的简单估计表明:使用新结构搜索引擎的运动估值,功耗可降低到原来的40%左右,而仍然保持着相同的处理速度和相似的视频解码图质量. 相似文献
一般基于预测的算法采用块匹配算法来消除相继帧简单冗余.通常,基于块的运动估计快速搜索算法采用的是减少搜索点的快速算法,文中介绍了一种适用于窄带低码率活动图像的帧间预测编码方法.根据H.264标准中的算法和编码方案,提出了一种帧间预测的硬件实现架构.运动预测完全针对亮度分量,采用基于中心预测和中途截止的快速搜索,给出了搜索窗结构.最后对其编码效果进行了分析. 相似文献
在一种基于望远镜搜索的块匹配运动估值的 VL SI实现中 ,对用于加速搜索的传统心动阵列引擎进行了结构上的改进 ,从而能够显著地降低功耗 .方法是使用一种新的块匹配误差计算的提早跳出技术 ,并通过在阵列处理单元中屏蔽操作数来避免不必要的计算操作 .基于算法模拟结果的简单估计表明 :使用新结构搜索引擎的运动估值 ,功耗可降低到原来的 40 %左右 ,而仍然保持着相同的处理速度和相似的视频解码图质量 . 相似文献
提出了一种适用于H.264标准中可变块大小运动估计算法的硬件实现架构.架构中采用一维处理单元(PE)阵列来实现运动估计算法中匹配块的搜索,通过对较小子块的块间误差(SAD)的复用来计算不同大小块的块间误差.与传统的处理一个运动矢量的架构相比,这种架构在一定的时钟周期内最多可处理41个运动矢量,并且具有面积小、速度快的特点. 相似文献
运动估计是HEVC中计算量最大、耗时最多的模块。为了加速编码过程,设计了适用于HEVC运动估计的六边形搜索算法的VLSI架构。该架构支持HEVC标准中的尺寸可变块设计,并且充分考虑六边形模板的数据复用特点,在PE阵列中使用流水线的组织策略,有效降低了片上缓存的访问次数。采用SMIC 65 nm工艺综合该电路,最高工作频率可达100 MHz,电路规模101 k门,能够满足高清视频(1 920×1 080,60帧/秒)的实时编码要求。 相似文献
文中给出了一种用于实现多分辨率运动估算算法后阶段任务的改进的树结构。在一个简单的RISC类型核控制下,它能够完成整个运动估值过程中除粗分辨率精度运动矢量搜索之外的所有后阶段子任务。包括运动矢量优化(搜索)在内的多任务是通过二叉树最底层叶节点上的多功能处理单元和可以拆分成子树的加法树来实现的。此外,运算单元寄存器堆的设计使能在二维方向上复用图像数据,完全避免了同一类数据从存储器中重复读取,从而实现了最小的存储器访问带宽,并有助于减小存储功耗。 相似文献
以快速运动估计技术的发展为主线,描述了运动估计的原理,将目前的研究方法归纳为四类:固定模式法、预测运动矢量法、分层法和快速全搜索法,并对四类运动估计方法作了深入讨论和比较性研究,展望了运动估计算法的未来发展趋势. 相似文献
This paper addresses the development and hardware implementation of an efficient hierarchical motion estimation algorithm, HMEA, using multiresolution frames to reduce the computational complexity. Excellent estimation performance is ensured using an averaging filter to downsample the original image. At the smallest resolution, the least two motion vector candidates are selected using a full-search block matching algorithm. At the middle level, these two candidate motion vectors are employed as the center points for small range local searches. Then, at the original resolution, the final motion vector is obtained by performing a local search around the single candidate from the middle level. HMEA exhibits regular data flow and is suitable for hardware implementation. An efficient VLSI architecture that includes an averaging filter to downsample the image and two 2-D semisystolic processing element arrays to determine the sum of absolute difference in pipeline is also presented. Simulation results indicate that HMEA is more area-efficient and faster than many full-search and multiresolution architectures while maintaining high video quality. This architecture with 59K gates and 1393 bytes of RAM is implemented for a search range of [ $-$16.0, $+$15.5]. 相似文献
文章提出一种高效的VLSI结构,实现MPEG-4视频编码标准中二值形状的运动估值算法。我们称这种结构为DDBME。其主要由一个基于一维脉动阵列的数据分配器和16*32bit的搜索区域缓冲器组成。在DDBME中,采用数据位并行处理技术进行块匹配算法中绝对误差和(SAD)的计算。 相似文献
Variable block size motion estimation is adopted in MPEG-4 AVC/H.264. This paper presents a new VLSI and FPGA architecture using full search block matching algorithm and online arithmetic. Several ways for data refreshing are described. There is not any increment in the number of clock cycles to process all sub-block formats. Only 54K gates are used, allowing to implement this architecture in devices with low hardware requirements. Moreover, low power consumption is obtained. A qualitative analysis of other designs is reported. Early termination of SAD calculation is analysed. Real-time video processing can be achieved for HDTV using early termination or increasing the parallelism. 相似文献
The H.264/AVC Fractional Motion Estimation (FME) with rate-distortion constrained mode decision can improve the rate-distortion
efficiency by 2–6 dB in peak signal-to-noise ratio. However, it comes with considerable computation complexity. Acceleration
by dedicated hardware is a must for real-time applications. The main difficulty for FME hardware implementation is parallel
processing under the constraint of the sequential flow and data dependency. We analyze seven inter-correlative loops extracted
from FME procedure and provide decomposing methodologies to obtain efficient projection in hardware implementation. Two techniques,
4×4 block decomposition and efficiently vertical scheduling, are proposed to reuse data among the variable block size and
to improve the hardware utilization. Besides, advanced architectures are designed to efficiently integrate the 6-taps 2D finite
impulse response, residue generation, and 4×4 Hadamard transform into a fully pipelined architecture. This design is finally
implemented and integrated into an H.264/AVC single chip encoder that supports realtime encoding of 720×480 30fps video with
four reference frames at 81 MHz operation frequency with 405 K logic gates (41.9% area of the encoder).
块匹配方法(Block Matching Algorithm,简称BMA)是目前广泛使用的运动估计方法,但该方法的最大缺点是容易陷于局部最优,这主要是由搜索模式决定的。而遗传算法(Genetic Algorithm,简称GA)是一种具有广泛适应性的全局最优的搜索算法。将块匹配方法的局域性搜索与遗传算法的全局性搜索结合起来,本文提出了一种基于改进的遗传算法的块匹配运动估计方法。实验证明,该方法的平均绝对误差(MAE)接近全搜索(FSS),优于三步法(TSS),而运算量相对较低,接近三步法。 相似文献