期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

宋锐贾媛吴成柯张建龙《电路与系统学报》2008,13(2):112-117

根据视频编码对运动估计算法灵活性的要求,本文给出了一种新颖的基于微控制器(MCU)架构的运动估计协处理器设计,并将单指令多数据和流水线技术应用于这一结构中.处理器各模块功能独立,由运动估计算法代码控制运行,结构中加入了运动估计专用插值树和累加器树,配合运动估计专用指令,完成一个8×8小块的匹配运算只需要11个周期.处理器中加入的数据链路控制和专用双引DMA控制器解决了数据传输的瓶颈问题.该设计结构新颖独特,清晰灵活,便于硬件实现,可配合不同的主处理器完成视频编码.硬件仿真和验证表明,协处理器可以在50MHz下稳定地工作. 相似文献

2.

基于ZSP500的运动估计协处理器的设计

李勇明曾孝平刘玲蒲秀娟《电讯技术》2005,45(2):72-75

运动估计在视频处理的所有计算量中占有相当大的比重。本文基于此,为ZSP500设计了一个运动估计协处理器辅助计算,以提高ZSP500视频处理方面的性能。实验结果表明,该协处理器可明显减少ZSP500处理运动估计时的计算量,缩短相应的时间,从而大大增强ZSP500视频处理的实时性。此外,由于该协处理器作为一部分内嵌在ZSP500中,因此集成度较高。相似文献

3.

一种新的算法可编程的运动估计协处理器

刘锋庄奕琪何威《电路与系统学报》2007,12(5):126-130,125

本文针对块匹配运动估计快速搜索算法的要求,设计了一种算法可编程的运动估计及运动补偿协处理器。该协处理器设计采用软硬件协同处理结构。灵活的指令集和高效的硬件并行执行单元相结合,使得该协处理器具备可编程处理器结构及树形结构运动估计VLSI结构的优点,可以兼顾运动估计算法高处理效率和灵活性的要求。设计的协处理器不拘泥于某种快速搜索算法,通过改变内部程序代码,可以实现多种快速运动估计算法,包括TSS、DS、HEXBS、MVFAST、EPZS等,同时具备很强的可扩展性。与同类设计相比,本设计具有高效、灵活、算法可配置的特点,同时设计消耗的硬件资源也大幅减小。相似文献

4.

LS MPP协处理器的通信机制及其VLSI实现

李莉钱刚沈绪榜《微电子学与计算机》2002,19(9):52-56

文章首先介绍了LS MPP协处理器的通信机制，即通信网络、通信部件及通信调度的设计，通信网络采用网格互连并补充沿行或沿列的播送互连，非常适合图像匹配算法，通信部件（路由器）把位并行互连转换中行互连，从而减少芯片的设计复杂度及实现小型化目标，同时通过适当的指令调度策略实现路由器指令和其它指令的并行执行，指令的并行可以使数据交换隐含的实现，较好的解决了PE间的通信瓶颈问题，着重讨论了在版图设计中时钟控制信号的走线问题。相似文献

5.

一种低功耗容错运动估计硬件结构

王洪源陈慕羿《现代电子技术》2009,32(22):1-3

提出一种低功耗的运动估计硬件结构。该结构在并行GEA结构的基础上,对关键的绝对差和模块应用了差错复原机制,以对抗在工艺参数波动和（或）工作电压超比例缩小（VOS）时可能产生的逻辑级时序错误。这里采用一个亚采样电路ISR-SSAD,将VOS技术和算法级容噪设计集成到绝对差和模块中,实现了该模块的差错检测和纠正,与原并行GEA结构相比,具有更低的功耗。计算结果表明,整个运动估计模块的功率可节省16%。相似文献

6.

一种嵌入式协处理器的设计 总被引：1，自引：0，他引：1

梁政沈绪榜《微电子学与计算机》2001,18(5):21-24

文章介绍了嵌入式协处理器LSC87的结构和控制方式，LSC87为与Intel8087指令功能全兼容的嵌入式协处理器，研制中采用了Top-down完全正向设计流程，选择微程序作LSC87数据路径的控制以便于支持所有7种类型定浮点操作数与6种异常的屏蔽和非屏蔽处理，其中部分数据路径部件还组合了硬连线控制，使LSC87不仅对复杂操作的处理可控性好，而且有利于数值迭代计算的简单快速实现。相似文献

7.

一种基于数字信号处理器的媒体处理器结构及设计

高健陈杰《微电子学与计算机》2007,24(4):1-4

设计了一种针对图像、音频、视频等多媒体数据的处理新型结构的媒体处理器。该媒体处理器由一个通用数字信号处理器及多媒体协处理器构成,其指令集包含了通用的数字信号处理指令及扩展的多媒体处理指令。多媒体协处理器中包含了多个专用于多媒体处理的功能模块,可以加速多媒体处理的进行。该媒体处理器具有强大的多媒体处理能力,可实现对JPEG压缩图像、MP3音频流或MPEG2的MP@ML级别的压缩视频流的实时解码。相似文献

8.

基于Montgomery算法的智能卡RSA密码协处理器

刘丽蓓邵丙铣《微电子学》2003,33(5):399-402

对Montgomery算法进行了改进，提供了一种适合智能卡应用、以RISC微处理器形式实现的RSA密码协处理器。该器件的核心部分采用了两个32位乘法器的并行流水结构，其功能部件是并发操作的，指令执行亦采用了流水线的形式。在10MHz的时钟频率下，加密1024位明文平均仅需3ms，解密平均需177ms。相似文献

9.

32位嵌入式CPU中系统控制协处理器的设计与实现

金钊《电子设计应用》2006,(10):97-98,100

系统控制协处理器是MIPS体系结构CPU中必需的一个单元模块。它最主要的功能就是利用一系列特权寄存器记录当前CPU所处的状态,负责异常/中断处理,提供指令正常执行所需的环境。本文论述了一个实现MIPS4Kc指令集CPU中系统控制协处理器的设计,包括对特权寄存器写操作的实现,精确异常处理机制和全定制后端物理设计。相似文献

10.

超高清实时H.264/AVC编码系统设计

邓刚《电视技术》2014,38(15)

基于多核处理器的并行计算能力,设计并实现实时超高清分辨率(3 840×2 160)的H.264/AVC视频编码系统。该系统在原始像素输入端实现高效的内存管理,超高清编码器采用帧级、条带级、指令级的并行方案,码流输出端则采用FIFO缓冲器对RTP包的传输速度进行控制。实验结果表明,编码系统能实时对超高清视频源进行并行编码,通过RTP封装格式传输至IP网络,用户可使用视频播放器接收并回放。相似文献

11.

基于双DSP结构的实时视频压缩编码器

张旭东王德生关鸿杰纪秀泉《信号处理》1998,(1)

本文讨论了实时视频压缩编码器的设计与实现，提出了一种DSP加运动协处理器的系统结构，给出了用单象素精度运动估计器实现1／2精度运动估计的方法，并列出了系统可实现的性能指标。相似文献

12.

Scalable Parallel Memory Architectures for Video Coding 总被引：1，自引：0，他引：1

Jarno K. Tanskanen Jarkko T. Niittylahti 《The Journal of VLSI Signal Processing》2004,38(2):173-199

Current video compression standards, which process frames macroblock by macroblock, employ several processing functions to achieve the compression. These functions refer to data memory address space in different ways. E.g., performing motion estimation and motion compensation functions requires many times data accesses unaligned to word boundaries. On the other hand, Discrete Cosine Transformation (DCT) and inverse of it (IDCT) for 8 × 8 block can be performed first for rows and then for columns. Thus, transposition is needed between these two stages. Among other things, parallel memory architecture can provide a solution for these tasks. In our other paper, we shortly surveyed parallel memory architectures and proposed parallel memory architecture designs for different data path widths for video coding applications. In this paper, we construct video coding function examples by using the proposed parallel data memory efficiently. Furthermore, performance and implementation cost of the parallel memory architecture are estimated and compared to more conventional memory architectures. The examples are given for different data bus widths (16, 32, 64, and 128 bits). We show that the parallel memory can keep the data path fully utilized in many video coding function implementations. This ensures high-speed operation and full utilization of the processing resources. 相似文献

13.

一种改进的奇偶阵列结构运动估计器设计

陈孙阳陈颖琪《信息技术》2010,34(8):1-5

提出了一种改进的奇偶阵列计算结构的运动估计器架构,该运动估计器利用了二维数据复用并能够实现全搜索法。设计了运动估计器的状态机控制逻辑,在其控制下,运动估计器的处理单元达到了100%的利用率。本运动估计器实现了高速、并行的运算,从而可以应用在高清视频的实时后处理等场合。相似文献

14.

用于H.264/AVC的D级数据重用整数运动估计VLSI结构

郑兆青桑红石黄卫锋沈绪榜《电子学报》2007,35(10):1921-1926

本文提出了一种用于H.264/AVC的D级数据重用整数运动估计VLSI结构.提出的结构是在一种固定块尺寸运动估计VLSI结构基础上,利用交叉网络实现变块尺寸的计算,使用多bank的存储器组织方式,使片上存储器的读写规则简单,易于处理不同搜索范围和不同尺寸的视频的运动估计.提出的运动估计结构用Verilog HDL描述,使用HJTC 0.18μm工艺,用Synopsys DC做了逻辑综合.相比现有结构,该结构由于增加片上存储器,因此数据重用率高,大大降低了存储带宽需求;另外数据吞吐率高,能够满足高性能视频编码需求. 相似文献

15.

The spring scheduling coprocessor: a scheduling accelerator

Burleson W. Ko J. Niehaus D. Ramamritham K. Stankovic J.A. Wallace G. Weems C. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(1):38-47

The spring scheduling coprocessor is a novel very large scale integration (VLSI) accelerator for multiprocessor real-time systems. The coprocessor can be used for static as well as online scheduling. Many different policies and their combinations can be used (e.g., earliest deadline first, highest value first, or resource-oriented policies such as earliest available time first). In this paper, we describe a coprocessor architecture, a CMOS implementation, an implementation of the host/coprocessor interface and a study of the overall performance improvement. We show that the current VLSI chip speeds up the main portion of the scheduling operation by over three orders of magnitude. We also present an overall system improvement analysis by accounting for the operating system overheads and identify the next set of bottlenecks to improve. The scheduling coprocessor includes several novel VLSI features. It is implemented as a parallel architecture for scheduling that is parameterized for different numbers of tasks, numbers of resources, and internal wordlengths. The architecture was implemented using a single-phase clocking style in several novel ways. The 328 000 transistor custom 2-μm VLSI accelerator running with a 100-MHz clock, combined with careful hardware/software co-design results in a considerable performance improvement, thus removing a major bottleneck in real-time systems 相似文献

16.

Hardware implementation and validation of the fast variable block size motion estimation architecture for H.264/AVC

A. Ben Atitallah S. Arous H. Loukil N. Masmoudi 《AEUE-International Journal of Electronics and Communications》2012,66(8):701-710

Block matching motion estimation is the heart of video coding system. It leads to a high compression ratio, whereas it is time consuming and calculation intensive. Many fast search block matching motion estimation algorithms have been developed in order to minimize search positions and speed up computation but they do not take into account how they can be effectively implemented by hardware. In this paper, we propose an efficient hardware architecture of the fast line diamond parallel search (LDPS) algorithm with variable block size motion estimation (VBSME) for H.264/AVC video coding system. The design is described in VHDL language, synthesized to Altera Stratix III FPGA and to TSMC 0.18 μm standard-cells. The throughput of the hardware architecture reaches a processing rate up to 78 millions of pixels per second at 83.5 MHz frequency clock and uses only 28 kgates when mapped to standard-cells. Finally, a system on a programmable chip (SoPC) implementation and validation of the proposed design as an IP core is presented using the embedded video system. 相似文献

17.

A VLSI architecture for variable block size video motion estimation 总被引：1，自引：0，他引：1

Swee Yeow Yap McCanny J.V. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2004,51(7):384-389

With the advent of new video standards such as MPEG-4 part-10 and H.264/H.26L, demands for advanced video coding, particularly in the area of variable block size video motion estimation (VBSME), are increasing. In this paper, we propose a new one-dimensional (1-D) very large-scale integration architecture for full-search VBSME (FSVBSME). The VBS sum of absolute differences (SAD) computation is performed by re-using the results of smaller sub-block computations. These are distributed and combined by incorporating a shuffling mechanism within each processing element. Whereas a conventional 1-D architecture can process only one motion vector (MV), this new architecture can process up to 41 MV sub-blocks (within a macroblock) in the same number of clock cycles. 相似文献

18.

A Low Power Architecture for HASM Motion Tracking

Wael Badawy Magdy Bayoumi 《The Journal of VLSI Signal Processing》2004,37(1):111-127

This paper proposes low power VLSI architecture for motion tracking that can be used in online video applications such as in MPEG and VRML. The proposed architecture uses a hierarchical adaptive structured mesh (HASM) concept that generates a content-based video representation. The developed architecture shows the significant reducing of power consumption that is inherited in the HASM concept. The proposed architecture consists of two units: a motion estimation and motion compensation units.The motion estimation (ME) architecture generates a progressive mesh code that represents a mesh topology and its motion vectors. ME reduces the power consumption since it (1) implements a successive splitting strategy to generate the mesh topology. The successive split allows the pipelined implementation of the processing elements. (2) It approximates the mesh nodes motion vector by using the three step search algorithm. (3) and it uses parallel units that reduce the power consumption at a fixed throughput.The motion compensation (MC) architecture processes a reference frame, mesh nodes and motion vectors to predict a video frame using affine transformation to warp the texture with different mesh patches. The MC reduces the power consumption since it uses (1) a multiplication-free algorithm for affine transformation. (2) It uses parallel threads in which each thread implements a pipelined chain of scalable affine units to compute the affine transformation of each patch.The architecture has been prototyped using top-down low-power design methodology. The performance of the architecture has been analyzed in terms of video construction quality, power and delay. 相似文献

19.

Hardware design focusing in the tradeoff cost versus quality for the H.264/AVC fractional motion estimation targeting high definition videos

Gustavo Sanchez Marcel Corrêa Diego Noble Marcelo Porto Sergio Bampi Luciano Agostini 《Analog Integrated Circuits and Signal Processing》2012,73(3):931-944

This article presents an architecture for the fractional motion estimation (FME) of the H.264/AVC video coding standard focusing in a good tradeoff between the hardware cost and the video quality. The support to FME guarantees a high quality in the motion estimation process. The applied algorithmic simplifications together with the multiplierless implementation and with a well balanced pipeline allow a low cost and a high throughput solution. The architecture was also designed to avoid redundant external memory accesses when computing the FME. The design was divided in two main modules: integer motion estimation (with diamond search algorithm) and fractional refinement (half-pixel and quarter-pixel interpolation and search). The designed architecture was described in VHDL and synthesized to an Altera Stratix III FPGA. The architecture is able to reach 260 MHz when running in the target FPGA. In worst case scenario, this operation frequency allows a processing rate of 43 HD 1080p (1,920 × 1,080 pixels) frames per second, surpassing the requirements for real time processing. In comparison to related works, the developed architecture was able to achieve a good tradeoff among hardware costs, video quality and processing rate. 相似文献

20.

A MIMD-based video signal processing architecture suitable forlarge area integration and a 16.6-cm² monolithicimplementation

Herrmann K. Otterstedt J. Jeschke H. Kuboschek M. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1998,6(2):284-291

The architecture and implementation of a programmable video signal processor dedicated as building block of a multiple instruction multiple data (MIMD)-based bus-connected multiprocessor system is presented. This system can either be constructed from several single processor chips, or it can be integrated on a large area integrated circuit containing several processors. The processor allows an efficient implementation of different video coding standards like H.261, H.263, MPEG-1 and MPEG-2. It consists of a RISC processor supplemented by a coprocessor for computation intensive convolution-like tasks, which provides a peak performance of more than 1 giga-arithmetic operations per second (GOPS). A large area integrated circuit integrating 9 processor elements (PE's) on an area of 16.6 cm² has been designed. Due to yield considerations redundancy concepts have been implemented, that-even in the presence of production defects-result in working chips utilizing a lower number of PE's. Each PE has built-in self-test (BIST) capabilities, which allow for an independent test of itself under the control of its integrated fault-tolerant BIST controller. Defective PE's are switched off. Only the PE's passing the BIST are used for video processing tasks. Prototypes have been fabricated in a 0.8 μm complementary metal-oxide-semiconductor (CMOS) process structured by masks using wafer stepping with overlapping exposures. Employing redundancy, up to 6 PE's per chip were functional at 66 MHz, thus providing a peak arithmetic performance of up to 6 GOPS 相似文献