共查询到20条相似文献,搜索用时 62 毫秒
1.
本文提出了一种全新的低延滞、高吞吐率、可编程的VLSI树型结构,它能十分有效地实现FSA和TSSA运动估计算法。该结构比其它树型结构少1/3的处理单元(PE),而且PE单元的延时减少一半。独特的ME窗缓冲结构使I/O带宽和I/O管脚大大减小,交叉流水线技术使硬件利用率可达到100%。这些特点使得该结构适合VLSI实现。 相似文献
2.
本文给出了一种用于块匹配运动估值的改进的多分辨率望远镜搜索(MRTlcS)算法.它以望远镜的逆向搜索取代了传统的望远镜搜索,这一改进有效地降低了VLSI实现时对片上存储器容量和带宽的要求.此外本文还采用运动跟踪和自适应搜索窗技术来减小运动估值的计算复杂性.适合于低代价、低功耗VLSI实现是新算法的显著特点.模拟结果表明新算法要求的平均运算量仅为MRTlcS算法的30%左右,而仍然可以得到相似的视频解码图质量.本文也给出了新算法和MRTlcS算法用于VLSI实现时的硬件代价和功耗比较. 相似文献
3.
4.
本文给出了一种新的块匹配运动估计算法,它根据视频图像内容的复杂程度自适应地选择常规的或者低比特分辨率的图像来进行块匹配,并且采用了一种混合使用两种比特分辨率图像的新望远镜搜索算法.模拟结果表明,新算法具有较低的计算复杂性,并且能够保证较好的视频质量.基于该算法,我们设计了一种新的脉动阵列结构的搜索引擎.该引擎具有可分割的数据通道,从而在使用低比特分辨率图像进行块匹配时能够通过加强处理的并行性来提高吞吐率.新的运动估计器可工作在较低的时钟频率和电源电压之下,具有低的功耗消耗. 相似文献
5.
6.
运动估计是视频压缩中最重要的环节.在分析了六边形算法的基础上,给出了一种有效的数据流结构,并据此提出了一种全流水并行的整数像素运动估计实现电路,该电路结构能有效的减小对外部存储器的访问.综合后的电路能够稳定的工作在100MHz频率下,仿真实验证明,该VLSI结构完成一次块搜索平均需要140个周期,能满足HDTV实时编码的要求. 相似文献
7.
8.
9.
运动估计计算阵列(ME)是视频编码器中不可或缺的重要部分,承担着编码器50%~80%的计算任务.实现了一种可以达到高清图像实时处理的可变块运动估计计算阵列.该阵列具有输入带宽需求低,计算效率高等优点,能够满足高清视频(1280×720@30fps)的实时处理的需求. 相似文献
10.
11.
12.
以Synopsys推出的TCAD软件TSUPREM-Ⅳ和Medici为蓝本,结合100nm栅长PMOSFET的可制造性联机仿真与优化实例,阐述了超大规模集成电路DFM阶段所进行的工艺级、器件物理特性级优化及工艺参数的提取。 相似文献
13.
14.
本文提出一种新的低功率分层运动估值器的VLSI结构,它支持低比特视频编码器的高级预测模式,如H.263和MPEG-4。为减少芯片尺寸及功率消耗,在所有搜索层中使用同一个基本的搜索单元 (BSU)。另外,通过对数据流的有效控制,使其在高级预测模式下,在获得宏块运动矢量的同时,也获得每个宏块中的4个88子块的运动矢量。实验结果表明,这种结构采用较少的门电路,有效降低了功率消耗,并且实现了与全搜索块匹配算法(FSBMA)相似的编码效果,可广泛应用于无线视频通信所需的低功率视频编码器中。 相似文献
15.
MPEG-4运动补偿的VLSI结构设计 总被引:1,自引:0,他引:1
针对MPEG.4解码中运动补偿控制复杂、数据吞吐量大、实现较困难,提出了一种适合MPEG-4的运动补偿硬件实现方案,解决了时序分配、输入输出控制等较难处理的问题。此方案已经在Xilinx ISE6.li集成开发环境下,采用了VHDL进行描述,并使用了电子设计自动化(EDA)工具进行了模拟和验证。仿真和综合结果表明,设计的运动补偿处理器逻辑功能完全正确,而且可以满足MPEG-4 Core Profiles & Level 2的实时编码要求,可用于MPEG-4的VLSI实现。 相似文献
16.
Motion estimation (ME) is the most critical component of a video coding standard. H.264/AVC adopts the variable block size motion estimation (VBSME) to obtain excellent coding efficiency, but the high computational complexity makes design difficult. This paper presents an effective processor chip for integer motion estimation (IME) in H264/AVC based on the full-search block-matching algorithm (FSBMA). It uses architecture with a configurable 2D systolic array to obtain a high data reuse of search area. This systolic array supports a three-direction scan format in which only one row of pixels is changed between the two adjacent subblocks, thus reducing the memory accesses and saving clock cycles. A computing array of 64 PEs calculates the SAD of basic 4×4 subblocks and a modified Lagrangian cost is used as matching criterion to find the best 41 variable-size blocks by means of a tree pipeline parallel architecture. Finally, a mode decision module uses serial data flow to find the best mode by comparing the total minimum Lagrangian costs. The IME processor chip was designed in UMC 0.18 μm technology resulting in a circuit with only 32.3 k gates and 6 RAMs (total 59kBits on-chip memory). In typical working conditions (25 °C, 1.8 V), a clock frequency of 300 MHz can be estimated with a processing capacity for HDTV (1920×1088 @ 30 fps) and a search range of 32×32. 相似文献
17.
Peng Li 《International Journal of Electronics》2013,100(9):1240-1255
Variable block size motion estimation (VBSME) is becoming the new coding technique in H.264/AVC. This article presents a low-power VLSI implementation for VBSME, which employs a fast full-search block-matching algorithm to reduce power consumption, while preserving the optimal motion vectors (MVs). The fast full-search algorithm is based on the comparison of the current minimum sum of absolute difference (SAD) to a conservative lower bound so that unnecessary SAD calculations can be eliminated. We first experimentally determine the specific conservative lower bound of SAD and then implement the fast full-search algorithm in FPGA and 0.18?µm CMOS technology. To the best of our knowledge, this is the first time that a fast full-search block-matching algorithm is explored to reduce power consumption in the context of VBSME and implemented in hardware. Experiment results show that the proposed design can save power consumption by 45% compared to conventional VBSME designs that give optimal MV based on the full-search algorithms. 相似文献
18.
19.
《Microelectronics Journal》2014,45(11):1480-1488
—In this paper, we present a coordinate rotation digital computer (CORDIC) based fast algorithm for power-of-two point DCT, and develop its corresponding efficient VLSI implementation. The proposed algorithm has some distinguish advantages, such as regular Cooley-Tukey FFT-like data flow, identical post-scaling factor, and arithmetic-sequence rotation angles. By using the trigonometric formula, the number of the CORDIC types is reduced dramatically. This leads to an efficient method for overcoming the problem that lack synchronization among the various rotation angles CORDICs. By fully reusing the uniform processing cell (PE), for 8-point DCT, only four carry save adders (CSAs)-based PEs with two different types are required. Compared with other known architectures, the proposed 8-point DCT architecture has higher modularity, lower hardware complexity, higher throughput and better synchronization. 相似文献
20.
Mohammad Hossein Moaiyeri Reza Faghih Mirzaee Tooraj Nikoubin Omid Kavehei 《International Journal of Electronics》2013,100(6):647-662
Novel direct designs for 3-input exclusive-OR (XOR) function at transistor level are proposed in this article. These designs are appropriate for low-power and high-speed applications. The critical path of the presented designs consists of only two pass-transistors, which causes low propagation delay. Neither complementary inputs, nor V DD and ground exist in the basic structure of these designs. The proposed designs have low dynamic and short-circuit power consumptions and their internal nodes dissipate negligible leakage power, which leads to low average power consumption. Some effective approaches are presented for improving the performance, voltage levels, and the driving capability and lowering the number of transistors of the basic structure of the designs. All of the proposed designs and several classical and state-of-the-art 3-input XOR circuits are simulated in a realistic condition using HSPICE with 90 nm CMOS technology at six supply voltages, ranging from 1.3 V down to 0.8 V. The simulation results demonstrate that the proposed circuits are superior in terms of speed, power consumption and power-delay product (PDP) with respect to other designs. 相似文献