期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Scalable Architecture for SoC Video Encoders

Tero Kangas Timo D. H?m?l?inen Kimmo Kuusilinna 《The Journal of VLSI Signal Processing》2006,44(1-2):79-95

Evolving video coding standards demand functional flexibility for implementations, not only at design time but also after fabrication. This paper presents a System-on-Chip design approach with a feasible combination of performance, scalability, programmability, area efficiency, and design time effort for a video encoder. The encoder is based on a homogeneous master-slave processor architecture. Each slave encodes a part of the frame in the Single Program Multiple Data (SPMD) data parallel model. Both shared and distributed memory architectures are presented. Design effort is reduced by identical program codes, automated assembly of software and hardware modules independent of the number and type of processors, as well as our flexible on-chip communication network called Heterogeneous IP Block Interconnection (HIBI). A case study implementation with two to ten simple ARM7 processors, 32-bit HIBI bus and non-optimized processor-independent software gives the performance from 6 to 53 fps for QCIF. The whole encoder area ranges from 173 to 770 kgates excluding the memories. The relation scales reasonably well to systems with more powerful processors and optimized code. The optimization of the communication network shows that with more than six slaves even a serial HIBI connection with 100 MHz speed is feasible. HIBI and the parallelization approach allow exploration and optimization of the communication both at the application and architecture layers. Tero Kangas, MSc ’01, Tampere University of Technology (TUT). Since 1999 he has been working as a research scientist in the Institute of Digital and Computer Systems (DCS) at TUT. Currently he is working towards his PhD degree and his main research topics are system architectures and SoC design methodologies in multimedia applications. Kimmo Kuusilinna, PhD ’01, TUT. His main research interests include system-level design and verification, interconnection networks, and parallel memories. Currently he is working as a senior research engineer at the Nokia Research Center. Timo D. H?m?l?inen, MSc ’93, PhD ’97, TUT. He acted as a senior research scientist and project manager at TUT in 1997-2001. He was nominated to full professor at TUT/Institute of Digital and Computer Systems in 2001. He heads the DACI research group that focuses on three main lines: wireless local area networking and wireless sensor networks, high-performance DSP/HW based video encoding, and interconnection networks with design flow tools for heterogeneous SoC platforms. 相似文献

2.

Automated design of networks of transport-triggered architecture processors using dynamic dataflow programs

Hervé Yviquel Jani Boutellier Mickaël Raulet Emmanuel Casseau 《Signal Processing: Image Communication》2013,28(10):1295-1302

相似文献

3.

Design and Synthesis of a Multiprocessor System-on-Chip Architecture for Real-Time Biomedical Signal Processing in Gamma Cameras

Kai Sun Meng Wang Zili Shao Hui Liu Hongxing Wei Tianmiao Wang 《Journal of Signal Processing Systems》2010,59(1):71-83

MPSoC (Multi-Processor System-on-Chip) architecture is becoming increasingly used because it can provide designers much more opportunities to meet specific performance and power goals. In this paper, we propose an MPSoC architecture for implementing real-time signal processing in gamma camera. Based on a fully analysis of the characteristics of the application, we design several algorithms to optimize the systems in terms of processing speed, power consumption, and area costs etc. Two types of DSP core have been designed for the integral algorithm and the coordinate algorithm, the key parts of signal processing in a gamma camera. An interconnection synthesis algorithm is proposed to reduce the area cost of the Network-on-Chip. We implement our MPSoC architecture on FPGA, and synthesize DSP cores and Network-on-Chip using Synopsys Design Compiler with a UMC 0.18 \upmum\upmu\textrm m standard cell library. The results show that our technique can effectively accelerate the processing and satisfy the requirements of real-time signal processing for 256 × 256 image construction. 相似文献

4.

异构多处理器系统芯片的设计与研究

邵利群张文婷《中国集成电路》2008,17(3):49-52

随着集成电路工艺特征尺寸的缩小和电路规模的不断扩大,单颗芯片上集成器件数目成指数倍增长。传统的SoC架构在提高系统整体性能上已出现一些瓶颈,多核系统设计正成为目前集成电路设计的研究热点之一。对称式多处理器系统芯片可以在很大程度上提高系统的并行性,但是在一些复杂应用领域中并不能提供最优的性能。本文通过在单颗芯片上集成多个不同的处理器核来研究异构多核系统相对于同构多核系统所带来的技术优势。相似文献

5.

Enhanced pipelined architecture of H.264/AVC intra prediction

《Signal Processing: Image Communication》2016

This paper presents a high-performance encoder for H.264/AVC intra prediction. Due to long data dependency loop of intra 4×4 prediction and complex algorithms, improving encoding speed turns into a stumbling block we have to face. To solve this problem, we first propose a pipelined method in and between macro blocks with new block processing order to accelerate the encoding speed. Benefiting from the pipelined method, reconstructed pixels of up-right blocks are available for two blocks in a macro block which could not take advantage of reconstructed pixels of up-right blocks in JM. So diagonal down left mode and vertical left mode are effective for these two blocks, which ultimately achieves a better bit-rate. Secondly, all 4×4 mode formula sharing method is proposed to reduce the redundancy of predicting formulas. Thirdly, streamlined reconstruction method is applied to improve the performance of reconstruction. CAVLC encoder with three parallel units is proposed to improve entropy coding speed significantly. As a result, it takes 268 cycles to encode a macro block. The experimental results indicate that synthesized into a 0.18 µm CMOS cell library, the new architecture only requires about 238K gates and it is able to encode 1080pHD video sequences at 30 frames per second (fps), at the operating frequency of 56 MHz. 相似文献

6.

VLSI Architecture Design of Fractional Motion Estimation for H.264/AVC

Yi-Hau Chen Tung-Chien Chen Shao-Yi Chien Yu-Wen Huang Liang-Gee Chen 《Journal of Signal Processing Systems》2008,53(3):335-347

The H.264/AVC Fractional Motion Estimation (FME) with rate-distortion constrained mode decision can improve the rate-distortion efficiency by 2–6 dB in peak signal-to-noise ratio. However, it comes with considerable computation complexity. Acceleration by dedicated hardware is a must for real-time applications. The main difficulty for FME hardware implementation is parallel processing under the constraint of the sequential flow and data dependency. We analyze seven inter-correlative loops extracted from FME procedure and provide decomposing methodologies to obtain efficient projection in hardware implementation. Two techniques, 4×4 block decomposition and efficiently vertical scheduling, are proposed to reuse data among the variable block size and to improve the hardware utilization. Besides, advanced architectures are designed to efficiently integrate the 6-taps 2D finite impulse response, residue generation, and 4×4 Hadamard transform into a fully pipelined architecture. This design is finally implemented and integrated into an H.264/AVC single chip encoder that supports realtime encoding of 720×480 30fps video with four reference frames at 81 MHz operation frequency with 405 K logic gates (41.9% area of the encoder).

Liang-Gee ChenEmail:

相似文献

7.

Architecture Design of Fine Grain Quality Scalable Encoder with CABAC for H.264/AVC Scalable Extension

Tzu-Der Chuang Yu-Jen Chen Yi-Hau Chen Shao-Yi Chien Liang-Gee Chen 《Journal of Signal Processing Systems》2010,60(3):363-375

In addition to coding efficiency, the scalable extension of H.264/AVC provides good functionality for video adaptation in heterogeneous environments. Fine grain scalability (FGS) is a technique to extract video bitstream at the finest quality level under the given bandwidth. In this paper, an architecture of FGS encoder with low external memory bandwidth and low hardware cost is proposed. Up to 99% of bandwidth reduction can be attained by the proposed scan bucket algorithm, early context modeling with context reduction, and first scan pre-encoding. The area-efficient hardware architecture is implemented by layer-wise hardware reuse. Besides, three design strategies for enhancement layer coder are explored so that the trade-off between external memory bandwidth and silicon area is allowed. The proposed hardware architecture can real-time encode HDTV 1920×1080 video with two FGS enhancement layers at 200 MHz working frequency, or HDTV 1280×720 video with three FGS enhancement layers at 130 MHz working frequency. 相似文献

8.

Deep chroma prediction of Wyner–Ziv frames in distributed video coding of wireless capsule endoscopy video

《Journal of Visual Communication and Image Representation》2022

Compression of captured video frames is crucial for saving the power in wireless capsule endoscopy (WCE). A low complexity encoder is desired to limit the power consumption required for compressing the WCE video. Distributed video coding (DVC) technique is best suitable for designing a low complexity encoder. In this technique, frames captured in RGB colour space are converted into YCbCr colour space. Both Y and CbCr representing luma and chroma components of the Wyner–Ziv (WZ) frames are processed and encoded in existing DVC techniques proposed for WCE video compression. In the WCE video, consecutive frames exhibit more similarity in texture and colour properties. The proposed work uses these properties to present a method for processing and encoding only the luma component of a WZ frame. The chroma components of the WZ frame are predicted by an encoder–decoder based deep chroma prediction model at the decoder by matching luma and texture information of the keyframe and WZ frame. The proposed method reduces the computations required for encoding and transmitting of WZ chroma component. The results show that the proposed DVC with a deep chroma prediction model performs better when compared to motion JPEG and existing DVC systems for WCE at the reduced encoder complexity. 相似文献

9.

基于IP的MPEG-4视频编码器设计及其在应急通信中应用

王琦王永生李文峰《微电子学与计算机》2005,22(12):150-153

介绍了MPEG-4视频压缩标准及视频采集、编码原理，以数字信号处理芯片DSP TMS320C6211构建平台．设计了应用于矿山救援应急多媒体通信中，基于IP传输的MPEG-4视频编码器硬软件，重点讨论算法优化方法．并给出实际应用结果。相似文献

10.

AVS编码器中帧内预测模块的硬件设计

黄圣勋王法翔钟昌标《电视技术》2013,37(17)

根据AVS标准中帧内预测算法的特点,提出了一种应用于AVS高清实时编码器的帧内预测硬件设计方案.该设计中将亮度和色度预测共用一个预测单元,采用6路数据并行流水处理的结构,提高了处理速度.同时在分析AVS帧内预测各模式算法的基础上,结合移位寄存器操作实现各模式运算单元的进一步资源共享,简化了参考数据选择机制,减少资源消耗.实验结果表明,该设计完全能够满足高清视频图像(1 920×1 080,30 f/s(帧/秒))实时编码要求. 相似文献

11.

Low-Power Parallel Video Compression Architecture for a Single-Chip Digital CMOS Camera

Jeff Y.F. Hsieh Teresa H.Y. Meng 《The Journal of VLSI Signal Processing》1999,21(3):195-207

A low-power, large-scale parallel video compression architecture for a single-chip digital CMOS camera is discussed in this paper. This architecture is designed for highly computationally intensive image and video processing tasks necessary to support video compression. Two designs of this architecture, an MPEG2 encoder and a DV encoder, are presented. At an image resolution of 640 × 480 pixels (MPEG2) and 720 × 576 (DV) and a frame rate of 25 to 30 frames per second, a computational throughput of up to 1.8 billion operations per second (BOPS) is required. This is supported in the proposed architecture using a 40 MHz clock and an array of 40 to 45 parallel processors implemented in a 0.2 m CMOS technology and with a 1.5 V supply voltage. Power consumption is significantly reduced through the single-chip integration of the CMOS photo sensors, the embedded DRAM technology, and the proposed pipelined parallel processors. The parallel processors consume approximately 45 mW of power resulting a power efficiency of 40 BOPS/W. 相似文献

12.

A 64 mW High Picture Quality H.264/MPEG-4 Video Codec IP for HD Mobile Applications in 90 nm CMOS

《Solid-State Circuits, IEEE Journal of》2008,43(11):2354-2362

We have developed an H.264/MPEG-4 dual video codec IP for mobile applications such as digital still cameras (DSCs), digital video cameras (DVCs), and mobile phones. The codec is capable of encoding and decoding HD-sized moving pictures (1280 pixels by 720 lines at 30 fps) in real-time at an operating frequency of 144 MHz, and SD-sized pictures at 54 MHz. We have implemented our original architecture based on a macroblock-level pipeline method and encoding algorithms suitable for the architecture in the codec, which enable low power of 64 mW for HD encoding with high picture quality equivalent to that of the H.264 reference encoder “JM (Joint Model)”. 相似文献

13.

Intra Prediction for the Hardware H.264/AVC High Profile Encoder

Mikołaj Roszkowski Grzegorz Pastuszak 《Journal of Signal Processing Systems》2014,76(1):11-17

The hardware implementation of the intra prediction described in this paper allows the H.264/AVC encoder to achieve optimal compression efficiency in real-time conditions. The architecture has some features that distinguish it from other solutions described in literature. Firstly, the architecture supports all intra prediction modes defined in High Profile of the H.264/AVC standard for all chroma formats. Secondly, the architecture can generate predictions for several quantization parameters. Thirdly, the hardware cost is reduced as the same resources are used to compute prediction samples for all the modes. Fourthly, the high sample-generation rate enables the encoder to achieve high throughputs. Fifthly, 4?×?4 block reordering and interleaving with other modes minimize the impact of the long-delay reconstruction loop on the encoder throughput. The architecture is verified against the JM.12 reference model and within the real-time FPGA hardware encoder. The synthesis results show that the design can operate at 100 MHz and 200 MHz for FPGA Aria II and 0.13 μm TSMC technology, respectively. These frequencies allow the encoder to support 720p and 1080p video at 30 fps. 相似文献

14.

A VLSI architecture for a real-time code book generator and encoderof a vector quantizer

Tsang K. Wei B.W.Y. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1994,2(3):360-364

Image compression applications use vector quantization (VQ) for its high compression ratio and image quality. The current VQ hardware employs static instead of dynamic code book generation as the latter demands intensive computation and corresponding expensive hardware even though it offers better image quality. This paper describes a VLSI architecture for a real-time dynamic code book generator and encoder of 512×512 images at 30 frames/s. The four-chip 0.8 μm CMOS design implements a tree of Kohonen self-organizing maps, and consists of two VQ processors and two image buffer memory chips. The pipelined VQ processor contains a computational core for both code book generation and encoding, and is scalable to processing larger frames 相似文献

15.

Performance of H.26L Video Encoder on General-Purpose Processor 总被引：2，自引：0，他引：2

Ville Lappalainen Antti Hallapuro Timo D. Hämäläinen 《The Journal of VLSI Signal Processing》2003,34(3):239-249

Two optimized implementations of the emerging ITU-T H.26L video encoder are described. The first, medium-optimized version, is implemented in C and the latter, highly optimized version, utilizes both algorithmic and platform-specific optimizations. Comparisons to a correspondingly optimized H.263/H.263+ implementation are given with the spatial and temporal video quality fixed and the bit rate and complexity varied. On a 733 MHz general-purpose processor, an average encoding speed of 17 frames per second for QCIF sequences is achieved with a 29% reduction in bit rate compared to H.263+. The complexity of H.26L is about 3.4 times more than that of H.263+. 相似文献

16.

基于Loeffler算法的2-D DCT IP软核设计

郭宝增牛力刘志明《微电子学与计算机》2011,28(2):136-139,144

提出一种基于Loeffler算法的2-D DCT IP软核设计方法.用移位和加法运算代替乘法运算.为减少芯片占用面积,对乘法系数采用CSD编码,1-D DCT复用技术;为提高电路的速度,采用流水线结构,优化转置矩阵.基于上述算法,设计了用Verilog HDL语言描述的IP软核.对软核进行了编译、综合、布局布线和后仿真,验证了算法的正确性.实验结果显示最高工作频率可以达到139.43MHz,能够满足视频图像压缩的实时性要求. 相似文献

17.

Analysis and Design of Low-Cost Bit-Serial Architectures for Motion Estimation in H.264/AVC

Mohammad R. H. Fatemi Hasan Ates Rosli Salleh 《Journal of Signal Processing Systems》2013,71(2):111-121

Variable block-size motion estimation (VBSME) process occupies a major part of computation of an H.264 encoder, which is usually accelerated by bit-parallel hardware architectures with large I/O bit width to meet real-time constrains. However, such kind of architectures increase the area overhead and pin count, and therefore will not be suitable for area-constrained electronic consumer designs such as small portable multimedia devices. This paper addresses this problem by proposing two area efficient least significant bit (LSB) bit-serial architectures with small pin numbers. Both designs take advantage of data reusing technique in different ways for sum of absolute differences (SAD) computation and reading reference pixels, leading to a considerable reduction of memory bandwidth. The first architecture propagates the partial SAD and sum results and broadcasts the reference pixel rows whereas the second design reuse the SAD of small blocks and has a reconfigurable reference buffer leading to a better memory bandwidth when using hardware parallelism. The proposed designs benefit from several optimization techniques including an efficient serial absolute difference architecture, word length reduction by parallelism, bit truncation, mode filtering, and macroblock (MB) level subsampling, which significantly enhance their performances in terms of silicon area, throughput, latency, and power consumption. The first and second designs can support full search VBSME of 720?×?480 video with 30 frames per second (fps), two reference frames, and [?16, 15] search range at a clock frequency of 414 MHz with 29.28 k and 31.5 k gates, respectively. 相似文献

18.

视频压缩编码器的研究与设计 总被引：3，自引：1，他引：2

朱鹏杜洪根丁文锐《无线电工程》2006,36(6):36-38

设计并实现了一种基于TMS320C64x系列高性能通用DSPs的MPEG-4 Simple Profile编码器。详细介绍了系统的硬件结构和工作流程。为了解决高分辨率视频编码的实时性问题,研究了采用预测技术的运动估计算法以及基于C64xCPU的软件优化技术。实验结果表明,编码器对D1分辨率(720×576)视频的编码速率达到25帧/秒以上,且具有较低的码率和较好的图像质量。相似文献

19.

一种改进的高速彩票总线仲裁器

吴睿振杨银堂张丽周端《电子与信息学报》2014,36(8):2016-2022

随着半导体工艺的发展,片上系统(System-on-Chip, SoC)内部集成的不同功能IP(Intellectual Property)核越来越多。各IP核通过总线方式连接,多核同时抢占总线很大地制约了片上系统的性能。高效的总线仲裁器可以解决多核抢占总线引起的冲突和竞争问题,提升片上系统性能。该文提出一种改进的高速彩票总线仲裁器。使用4相双轨协议代替时钟实现彩票抽取机制以防止彩票丢弃,采用异步流水线交叉并行的工作方式以提升工作速度。在NINP(NonIdling and NonPreemptive)模型下通过65 nm CMOS工艺的Xilinx Virtex5板级验证,相比经典彩票仲裁器和动态自适应彩票仲裁器,具有更好的带宽分配功能,有效避免撑死和饿死现象,工作速度提高49.2%以上,具有一定的功耗优势,适用于有速度要求的多核片上系统。相似文献

20.

基于TMS320C6201的DMA数据传输优化

冯宇红李炜徐晶李波《电视技术》2003,(1):14-16

以一个实际的视频编码系统为背景，提出了基于TMS320C6201 DSP芯片的3种DMA方式数据传输方案，并对3种方案进行了比较分析与实验。实验结果表明，3种DMA方案不仅具有较高的数据传输效率，而且对各类视频场景具有通用性。相似文献