期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

谷会涛陈书明孙书为《计算机研究与发展》2011,48(11):2015-2022

针对运动估计的各种实现方案难以同时满足实时计算性能和灵活性需求的问题,提出了一种支持多种标准的运动估计协处理器.该协处理器采用6流出超长指令字结构,可执行多种运动估计算法.协处理器中包含一个可二维数据重用的处理单元阵列、一个SAD加法树和一个多模编码耗费比较器.处理单元阵列和加法树可满足运动估计巨大的计算复杂度,而耗费... 相似文献

2.

全搜索算法的SSE并行优化

陶志强徐萌徐荣飞《微计算机应用》2011,32(11)

在基于宏块划分的视频编码算法中,运动估计阶段因为其庞大的计算量占用了绝大多数的编码时间.特别是在对高清视频进行编码时,运动估计已经成为提升编码性能的最大瓶颈.本文通过对全搜索运动估计算法进行基于像素的并行化修改和优化,使用SSE指令调用CPU的SIMD单元同时对当前宏块与参考宏块的多个像素进行SAD运算,对运动估计进行了并行化的实现.在相同的硬件环境以及保证编码质量的前提下,相对于传统的全搜索CPU运算获得了2倍以上的编码性能提升. 相似文献

3.

一种支持PMVFAST运动估计算法的VLSI体系结构 总被引：4，自引：0，他引：4

黎铁军沈承东李思昆《计算机研究与发展》2005,42(4):537-543

在分析PMVFAST算法的基础上,提出了一种支持该算法的灵活、高效和低功耗的体系结构．该体系结构的核心是一个运动估计引擎,它包含3种支持特定范围内任意延时的可变延时单元,使其支持多种搜索模式,并通过重用计算单元实现了基本的独立SAD计算引擎．另外,通过关闭不用的单元和资源复用,该引擎能够有效地降低功耗．分析结果表明,该体系结构比经典的16PE阵列低功耗全搜索体系结构提高约15倍的性能,可以获得接近全搜索的视频质量．相似文献

4.

基于起点预测和SAD分布的快速运动估计算法 总被引：7，自引：0，他引：7

李炜乐立鸾李波《计算机学报》2001,24(10):1110-1114

基于块的运动估计是视频压缩国际标准中广泛采用的关键技术。文中提出了结合相邻块运动向量相等和SAD值比较的起点预测方法,减少了起点预测时计算SAD的开销;利用SAD分布的方向性,对SAD值偏小部分重点搜索,加速了块匹配的快速搜索策略。在此基础上设计了一种新的快速运动估计算法,该算法在大幅度提高搜索效率的同时,得到了与全搜索非常接近的搜索结果,从而减少或避免了不必要的搜索。相似文献

5.

一种基于Kalman滤波的4×4块运动估计方法

钟声李众立胡莉张莹《微计算机信息》2006,22(27):302-304

为了更加准确的估计运动矢量,提高运动补偿性能,提出了一种基于Kalman滤波的4×4块运动估计方法。该方法通过宏块内相邻子块的运动相关性形成基于4×4块Kalman滤波器,根据Kalman滤波器多子块预测自修正来自三步搜索(TSA)的16×16宏块级运动矢量值,从而估算出最优宏块级运动矢量值,有效的提高的运动估计精度。试验表明,特别是在运动较为剧烈和局部场景瞬时变化时,该方法比TSA和16-KF提供了一个更大的平均PSNR。相似文献

6.

基于二维Winograd算法的深流水线5×5卷积方法

黄程程董霄霄李钊《计算机应用》2021,41(8):2258-2264

针对二维Winograd卷积算法中存储器带宽需求过高、计算复杂度高、设计探索周期漫长、级联的卷积存在层间计算延迟等问题,提出一种基于二维Winograd算法的双缓冲区5×5卷积层设计方法。首先使用列缓冲结构完成数据布局,以重用相邻分块之间的重叠数据,降低存储器带宽需求;然后精确搜索并复用Winograd算法加法计算过程中重复的中间计算结果,来降低加法运算量,从而减小加速器系统的能耗开销和设计面积;最后根据Winograd算法计算过程来完成6级流水线结构的设计,并实现针对5×5卷积的高效率计算。实验结果表明,这种5×5卷积的计算方法在基本不影响卷积神经网络（CNN）预测准确率的前提下,与传统卷积相比降低了83%的乘法运算量,加速倍率为5.82;该方法与级联3×3二维Winograd卷积组成5×5卷积的方法相比降低了12%的乘法运算量,降低了约24.2%的存储器带宽需求,并减少了20%的运算时间。相似文献

7.

一种基于块分割结构的运动估计新算法

李志敏郭科伟黄鸿黄凯梁《计算机工程与应用》2012,48(23):143-147

运动估计是剔除视频压缩中的时间冗余的关键,现有算法大都是基于全搜索策略的SAD匹配算法,这些算法虽然压缩性能很好,但计算复杂,实时性差。提出一种快速运动估计新算法,将块分割成多个子块,计算每个子块的灰度值之和与灰度值的平方和,将其整体作为一个参数再结合提出的三个匹配准则,求出当前帧和候选帧之间的最优运动估计。通过实验表明,采用该算法后计算的复杂度明显减小,实时性得到较大提高,其压缩性能却非常接近基于全搜索策略的SAD算法。相似文献

8.

一种适用于MPEG-4形状编码的快速运动估计算法 总被引：2，自引：0，他引：2

倪伟郭宝龙《计算机科学》2005,32(7):128-130

运动估计是MPEG-4形状编码的一项关键技术,本文提出了一种适用于形状编码的快速运动估计算法。算法首先在参考帧中进行扫描,得出视频对象的二值边界掩模;在匹配运算时使用lbit的异或运算代替原有的加法运算;设定有效的中止准则,对于静止点直接中止搜索;在搜索过程中采用了渐进消除算法,能够在不影响搜索精度的前提下减少搜索点。实验结果表明使用本文的快速搜索算法,运动估计中的运算量比MPEG-4 VM原有搜索算法有较大幅度的降低,且编码后的码字长度与原算法基本一致。相似文献

9.

基于FPGA的全搜索运动估计硬件电路设计

童桢王祖强杨恒《电子技术应用》2014,(7):44-47

设计了一种分层的二维阵列全搜索运动估计硬件电路。与传统的二维阵列全搜索运动估计电路相比,它在处理单元(PE)的并行结构设计以及存储器设计方面作出了改进,节约了硬件资源和编码时间。根据各模块的时序关系合理安排并行流水线结构,采用一列像素并行处理,实现了运动估计实时编码。相似文献

10.

用于运动估计的对角线匹配准则与硬件实现

廖裕民《电脑与微电子技术》2013,(20):66-70

根据运动估计块匹配原理提出对角线匹配准则．该算法相较于运动估计中常用的SAD、MSE和MAD匹配．在大幅减少计算量的同时,运动估计质量只有很少的下降。该匹配算法可应用于各种搜索策略,并且都有良好的效果。根据算法设计硬件结构．还有一种用于该结构的双斜光栅扫描方式和相关的可复用资料的交叉寄存器组结构．可充分利用运动估计中交叠的数据。该结构通过FPGA验证．当频率为240MHz时4个运动估计电路并行工作就可以完全满足解析度为1280x720@30fps的实时性要求。相似文献

11.

High performance architecture for real-time HDTV broadcasting

Yasser Ismail Wael El-Medany Hessa Al-Junaid Ahmed Abdelgawad 《Journal of Real-Time Image Processing》2016,11(4):633-644

A novel full search motion estimation co-processor architecture design is presented in this paper. The proposed architecture efficiently reuses search area data to minimize memory I/O while fully utilizing the hardware resources. A smart processing element (PE) and an efficient simple internal memory are the main components of the proposed co-processor. An efficient algorithm is used for loading both the current block and the search area inside the PE array. The search area data flow horizontally while the current block data are stationary. As a result, the speed of the co-processor is improved in terms of the throughput and the operating frequency compared to the state-of-the-art techniques. A smart local memory and PE design guarantees a simple and a regular data flow. The design of the local memory is implemented using only registers and a simple counter. This simplifies the design by avoiding the use of complicated addressing to write or read into/from the local memory. The proposed architecture is implemented using both the FPGA and the ASIC flow design tools. For a search range of 32 × 32 and block size of 16 × 16, the architecture can perform motion estimation for 30 fps of HDTV video at 350 MHz and easily outperforms many fast full search architectures. 相似文献

12.

基于3D可扩展PE阵列CNN加速器的设计

苏梓培杨鑫陈弟虎粟涛《计算机工程与科学》2021,43(3):389-397

卷积神经网络具有参数大、运算量大的特点,当将其具体应用在移动端设备时,需要在满足帧率（速度）的前提下,尽量减少功耗与芯片面积。考虑满足现有移动端网络的兼容性、性能和面积等因素,设计一个基于3D可扩展PE阵列的CNN加速器。该加速器兼容3×3卷积、3×3深度可分离卷积、1×1卷积和全连接层,其PE阵列能根据具体应用的网络和硬件约束,设定3个维度上最优的并行度参数,以达到更优的性能。该CNN加速器在512个PE下运行yolo-v2达到76.52 GOPS、74.72%的性能效率,在512个PE下运行mobile-net-v1达到78.05 GOPS、76.22%的性能效率。最后应用CNN加速器构建了一个实时目标检测系统,将yolo-lite网络部署至XILINX Zynq-7000 SoC ZC706硬件开发平台上,其CNN运算性能达到了53.65 fps。相似文献

13.

Block matching algorithm for motion estimation based on Artificial Bee Colony (ABC)

Erik Cuevas Daniel Zaldívar Marco Pérez-Cisneros Humberto Sossa Valentín Osuna 《Applied Soft Computing》2013,13(6):3047-3059

Block matching (BM) motion estimation plays a very important role in video coding. In a BM approach, image frames in a video sequence are divided into blocks. For each block in the current frame, the best matching block is identified inside a region of the previous frame, aiming to minimize the sum of absolute differences (SAD). Unfortunately, the SAD evaluation is computationally expensive and represents the most consuming operation in the BM process. Therefore, BM motion estimation can be approached as an optimization problem, where the goal is to find the best matching block within a search space. The simplest available BM method is the full search algorithm (FSA) which finds the most accurate motion vector through an exhaustive computation of SAD values for all elements of the search window. Recently, several fast BM algorithms have been proposed to reduce the number of SAD operations by calculating only a fixed subset of search locations at the price of poor accuracy. In this paper, a new algorithm based on Artificial Bee Colony (ABC) optimization is proposed to reduce the number of search locations in the BM process. In our algorithm, the computation of search locations is drastically reduced by considering a fitness calculation strategy which indicates when it is feasible to calculate or only estimate new search locations. Since the proposed algorithm does not consider any fixed search pattern or any other movement assumption as most of other BM approaches do, a high probability for finding the true minimum (accurate motion vector) is expected. Conducted simulations show that the proposed method achieves the best balance over other fast BM algorithms, in terms of both estimation accuracy and computational cost. 相似文献

14.

Energy efficient processing of motion estimation for embedded multimedia systems

Jooheung Lee 《Multimedia Tools and Applications》2017,76(23):24749-24765

Visual sensor networks require low power compression techniques of large amount of video data in each camera node due to the energy-constrained and bandwidth-limited environments. In this paper, energy-efficient architecture for Variable Block Size Motion Estimation is proposed to fully utilize dynamic partial reconfiguration capability of programmable hardware fabric in distributed embedded vision processing nodes. Partial reconfiguration of FPGA is exploited to support run-time reconfiguration of the proposed modular hardware architecture for motion estimation. According to the required search range, hardware reconfiguration is performed adaptively to reduce the hardware resources and power consumption. A reconfigurable ME ranging from simple 1-D to a complex 2-D Sum of Absolute Differences (SAD) array to perform full search block matching is selected in order to support different search window size. The implemented scalable SAD array can provide different resolutions and frame rates for real time applications with multiple reconfigurable regions. 相似文献

15.

一种快速H.264帧间模式选择算法

包国兴谌德荣胡宏华曹旭平《微计算机信息》2010,(12)

本文提出了一种快速帧间模式选择算法。算法首先完成P16x16模式运动矢量搜索,然后分析所有4x4块的预测差值绝对值和(SAD)值分布特性,基于SAD值分布特性选择P16x8、P8x16和P8x8中的一种模式进行搜索,如果P8x8被选中,则在运动搜索后,使用同样的算法选择P8x4、P4x8、P4x4中一种模式进行搜索。对6组不同运动特点序列仿真结果表明:与JM13.2算法相比,本文算法编码时间平均降低59.72%,而图像峰值信噪比(PSNR)平均降低0.07dB,输出码率增加为-1.22%。相似文献

16.

基于线性预测的半像素运动估计

下载免费PDF全文

章伟明徐元欣王匡《中国图象图形学报》2007,12(1):27-31

在视频编码系统中,半像素精度的运动估计虽可以明显地改善编码效果,但也因此增加了不少运算量。为了降低运算量和提高半像素运动估计的速度,提出了一种全新的半像素搜索算法(half-pixel motion estimation based on linear prediction,简称BLPHME),其关键思想是通过分析整像素搜索和半像素搜索结果之间的相关性来建立一个线性模型,通过动态调整判决门限,以便预测并跳过那些不能从半像素块匹配搜索中得到好处的块。实验结果表明,该算法不仅可以明显地降低运动估计的运算量,同时还能得到与传统算法非常接近的图像质量和码率。此外,该算法还可以和基于整像素和半像素的快速运动估计算法一起使用,以进一步降低运算量。相似文献

17.

MPEG-4中四运动向量运动估计的预先排除算法

朱洪波张松《计算机工程与应用》2005,41(30):52-54,195

为了提高运动补偿的效率,在M PEG-4压缩编码中采用了变尺寸块(8x8和16x16)运动估计算法,这种算法复杂度高,是影响M PEG-4编码整体效率的关键。本文研究了M PEG-4中率失真框架下四运动向量运动估计的预先排除算法(4M VPE),在已知一个宏块的16x16运动向量的情况下,通过简单的计算和判断,排除相当一部分宏块不再需要4次8x8块运动估计(四运动向量运动估计),从而简化整个编码过程。实验表明,4M VPE算法使变尺寸块运动估计的速度得到了很大的提高,而峰值信噪比(PSNR)则下降很小。相似文献

18.

H.264中与码率控制相结合的快速运动估计

下载免费PDF全文

黄晓平沈未名郭晓云喻占武《计算机工程》2008,34(12):192-193

根据H.264中整数变换、量化的特点,可以证明在宏块预测SAD和实际SAD的误差小于某一门限时,对编码质量不会产生影响。结合H.264码率控制算法中自适应SAD的预测,提出一种根据SAD预测值来提前终止运动搜索的快速运动估计方法。实验结果表明,该算法在图像质量稍有变化的情况下,能有效提高运动估计速度,最高可提高到2.2倍。相似文献