期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An efficient VLSI architecture for CBAC of AVS HDTV decoder

《Signal Processing: Image Communication》2009,24(4):324-332

Context-based Binary Arithmetic Coding (CBAC) is a normative part of the newest X Profile of Advanced Audio Video coding Standard (AVS). This paper presents an efficient VLSI architecture for CBAC decoding in AVS. Compared with CBAC in H.264/AVC, the simpler binarization methods and context selection schemes are adopted in AVS. In order to avoid the slow multiplications, the traditional arithmetic calculation is transformed to the logarithm domain. Although these features can obtain better balance between the compression gain and implementation cost, it still brings huge challenge for high-throughput implementation. The fact that current bin decoding depends on previous bin results in long latency and limits overall system performance. In this paper, we present a software–hardware co-design by using bin distribution feature. A novel pipeline-based architecture is proposed where the arithmetic decoding engine works in parallel with the context maintainer. A finite state machine (FSM) is used to control the decoding procedure flexibly and the context scheduling is organized carefully to minimize the access times of context RAMs. In addition, the critical path is optimized for the timing. The proposed implementation can work at 150 MHz and achieve the real-time AVS CBAC decoding for 1080i HDTV video. 相似文献

2.

VLSI architecture for digital processing of speech signals

Patrice Le Scan Marc Soler Michel Cand 《电信纪事》1993,48(7-8):404-412

Due to the evolution of increasingly high performantdsp algorithms for bit rate reduction of speech signals in telecommunications,vlsi implementations of these applications are becoming more and more complex. The solutions currently being used for these applications are general purpose digital signal processors or dsp cores which are never fully adapted to the application in terms ofvlsi architecture, i.e. silicon area and power consumption. We propose an alternative to these solutions, based on a parametrable Harvard architecture, and a C compiler which gives an optimized microcode suited to this architecture and to the application. Finally we present two examples of audio applications implemented using this solution. 相似文献

3.

An efficient VLSI architecture for 2-D wavelet image coding withnovel image scan

Lafruit G. Catthoor F. Cornelis J.P.H. De Man H.J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(1):56-68

A folded very large scale integration (VLSI) architecture is presented for the implementation of the two-dimensional discrete wavelet transform, without constraints on the choice of the wavelet-filter bank. The proposed architecture is dedicated to flexible block-oriented image processing, such as adaptive vector quantization used in wavelet image coding. We show that reading the image along a two-dimensional (2-D) pseudo-fractal scan creates a very modular and regular data flow and, therefore, considerably reduces the folding complexity and memory requirements for VLSI implementation. This leads to significant area savings for on-chip storage (up to a factor of two) and reduces the power consumption. Furthermore, data scheduling and memory management remain very simple. The end result is an efficient VLSI implementation with a reduced area cost compared to the conventional approaches, reading the input data line by line 相似文献

4.

An efficient VLSI architecture design for logarithmic multiplication by using the improved operand decomposition

《Integration, the VLSI Journal》2017

Over the last few years, the Logarithmic Number System (LNS) has played a pivotal and decisive role in the field of Digital Signal Processing (DSP) and Image processing. Multiplication is a ubiquitous thirsty area to perform arithmetic operations in DSP applications and researchers have found that LNS is the possible solution for multiplication to be performed for a DSP application. In this paper, we propose a novel approach based on the Improved Operand Decomposition (IOD) to make an efficient logarithmic multiplier and subsequent achievement through scale realization. The Pipeline technique and the efficient correction circuit are used for error minimization at the cost of minimal hardware and delay. Reported and proposed multiplier is evaluated and compared in terms of Data Arrival Time (DAT), area, power, Area Delay Product (ADP), and EPS (Energy per Sample) at 90 nm CMOS technology by using Synopsys Design Compiler. Simulation results show that the proposed IOD method for logarithmic multiplication without the pipelining gives maximum of 35.39% less ADP and 11.15% less EPS for 32-bit architecture than of the reported logarithmic multiplier architecture. The proposed IOD based logarithmic multiplier with the pipelining gives a maximum of 20.17% less ADP for 8-bit architecture and 21.72% for 32-bit architecture than of the reported iterative pipelined architecture of logarithmic multiplier. Simulation results show that the optimized logarithmic converter gives 7.32%, and optimized antilogarithmic converter gives 41.59% less ADP respectively than of the reported logarithmic and antilogarithmic converter structures. The optimized antilogarithmic converter architecture gives a maximum of 43.94% less EPS than of the reported antilogarithmic converter structure. 相似文献

5.

An architecture for a VLSI FFT processor

Joseph Ja'Ja' Robert Michael Owens 《Integration, the VLSI Journal》1983,1(4):305-316

We propose a new VLSI architecture for an FFT processor. Our architecture uses few processing elements and can be laid out in a mesh-interconnected pattern. We show how to compute the discrete Fourier transform at n points with an optimal speed-up as long as the memory is large enough. The control is shown to be simple and easily implementable in VLSI. 相似文献

6.

一种高效的三维DWT VLSI结构设计方法

高涛白璘《电子设计工程》2012,20(14):120-122

文中通过深入研究三维离散小波变换（3D DWT）核心算法并根据序列图像编码的特点,设计并实现了一种适合硬件实现的高效的三维小波变换VLSI结构。编写了相应verilog模型,并进行了仿真和逻辑综合。仿真结果表明行列滤波并行处理并采用流水线设计方法,加快了运算速度,有效降低了片内存储容量。相似文献

7.

An hierarchical VLSI neural network architecture

Mason R. Robertson W. Pincock D. 《Solid-State Circuits, IEEE Journal of》1992,27(1):106-108

As neural network systems are scaled up in size it will become extremely difficult, if not impossible, to maintain full connectivity. A digital architecture which exhibits hierarchical connectivity similar to that observed in many biological neural networks is described. At the lowest level, clusters of fully connected neurons correspond to subnetworks. These subnetworks are then sparsely connected to form the complete neural network system. The architecture exploits the inherent density and large bandwidth of on-chip RAM and can use either a large number of bit-serial processors or a reduced number of bit-parallel processors. A prototype chip which implements a complete subnetwork has been fabricated in 3-μm CMOS and is fully functional 相似文献

8.

A VLSI architecture for real-time image coding using a vectorquantization based algorithm

Dezhgosha K. Jamali M.M. Kwatra S.C. 《Signal Processing, IEEE Transactions on》1992,40(1):181-189

Digital image coding using vector quantization (VQ) based techniques provides low-bit rates and high quality coded images, at the expense of intensive computational demands. The computational requirement due to the encoding search process, had hindered application of VQ to real-time high-quality coding of color TV images. Reduction of the encoding search complexity through partitioning of a large codebook into the on-chip memories of a concurrent VLSI chip set is proposed. A real-time vector quantizer architecture for encoding color images is developed. The architecture maps the mean/quantized residual vector quantizer (MQRVQ) (an extension of mean/residual VQ) onto a VLSI/LSI chip set. The MQRVQ contributes to the feasibility of the VLSI architecture through the use of a simple multiplication free distortion measure and reduction of the required memory per code vector. Running at a clock rate of 25 MHz the proposed hardware implementation of this architecture is capable of real-time processing of 480×768 pixels per frame with a refreshing rate of 30 frames/s. The result is a real-time high-quality composite color image coder operating at a fixed rate of 1.12 b per pixel 相似文献

9.

An efficient parallel architecture for ray-tracing

Alexandre S. Nery Nadia Nedjah Felipe M. G. França 《Analog Integrated Circuits and Signal Processing》2012,70(2):189-202

Real time rendering of three-dimensional scenes in high photorealistic details is a hard task, such as in the ray tracing rendering algorithm. In general, the performance achieved by a sequential software-based implementation of ray tracing is far from satisfactory. However, parallel implementations of ray tracing have been enabling reasonable real time performance, as the algorithm is embarrassingly parallel. Thus, a custom parallel design in hardware is likely to achieve an even higher performance. In this paper, we propose a hardware parallel architecture capable of dealing with the main desirable features of ray tracing, such as shadows and reflection effects, imposing low area cost and a promising rendering performance. Such architecture, called GridRT, is based on the Uniform Grid acceleration structure and is intended to deliver massive parallelism through parallel ray-triangle intersection tests as well as parallel processing of many rays. A hardware implementation of the proposed architecture is presented, together with some performance results and resources requirements. The rendering is reduced by 80% using a grid configuration of eight processing elements. 相似文献

10.

Flipping structure: an efficient VLSI architecture for lifting-based discrete wavelet transform 总被引：6，自引：0，他引：6

Chao-Tsung Huang Po-Chih Tseng Liang-Gee Chen 《Signal Processing, IEEE Transactions on》2004,52(4):1080-1089

In this paper, an efficient very large scale integration (VLSI) architecture, called flipping structure, is proposed for the lifting-based discrete wavelet transform. It can provide a variety of hardware implementations to improve and possibly minimize the critical path as well as the memory requirement of the lifting-based discrete wavelet transform by flipping conventional lifting structures. The precision issues are also analyzed. By case studies of the JPEG2000 default lossy (9,7) filter, an integer (9,7) filter, and the (6,10) filter, the efficiency of the proposed flipping structure is demonstrated. 相似文献

11.

An architecture for real-time multimedia communication systems

Nicolaou C. 《Selected Areas in Communications, IEEE Journal on》1990,8(3):391-400

A multimedia communication system includes both the communication protocols used to transport the real-time data and the distributed computing system (DCS) within which any applications using the protocols must execute. The architecture presented attempts to integrate these communications protocols with the DCS in a smooth fashion in order to ease the writing of multimedia applications. Two issues are identified as being essential to the success of this integration: the synchronization of related real-time data streams, and the management of heterogeneous multimedia hardware. The synchronization problem is tackled by defining explicit synchronization properties at the presentation level and by providing control and synchronization operations within the DCS which operate in terms of these properties. The heterogeneity problems are addressed by separating the data transport semantics (protocols themselves) from the control semantics (protocol interfaces) 相似文献

12.

基于行的实时、二维提升整数小波变换VLSI结构

王柯俨刘凯郭杰李云松吴成柯《电路与系统学报》2010,15(2)

提出一种基于行的实时、二维提升整数小波变换的VLSI结构。该结构包括行变换器、列变换器、中间缓存器以及输出控制单元。利用中间缓存器暂存行变换的中间结果,由输出控制单元按优先级从高到低的顺序依次输出各级小波系数。由于在硬件实现中采用基于行的提升变换结构,从而水平和垂直方向上的变换能并行处理。与现有结构相比,该结构具有并行度高、存储量低的特点,并且能够在一幅图像逐行扫描的时间间隔内完成整幅图像的多级小波变换。相似文献

13.

An efficient architecture for fault-tolerant ATM switches

Padmanabhan K. 《Networking, IEEE/ACM Transactions on》1995,3(5):527-537

A cost-effective fault-tolerant architecture called FAUST is presented for ATM switches. The key idea behind the architecture is the incorporation of spare units and associated commutation logic into strategic partitions of the switching system. The definition of a replaceable unit is flexible, and based on packaging considerations. The commutation logic can switch in a spare unit in place of a failed one at cell rate, and is distributed entirely in the existing switch control units. So the additional overhead is almost entirely in the spare modules provided. The technique is far superior to a duplex configuration in terms of reliability improvement vs. component redundancy, and can be applied to established architectures for ATM switches, including multistage sort and shared memory based architectures. Its scalability also makes it applicable to system sizes from a few tens of lines to a few thousand 相似文献

14.

A VLSI architecture for a real-time code book generator and encoderof a vector quantizer

Tsang K. Wei B.W.Y. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1994,2(3):360-364

Image compression applications use vector quantization (VQ) for its high compression ratio and image quality. The current VQ hardware employs static instead of dynamic code book generation as the latter demands intensive computation and corresponding expensive hardware even though it offers better image quality. This paper describes a VLSI architecture for a real-time dynamic code book generator and encoder of 512×512 images at 30 frames/s. The four-chip 0.8 μm CMOS design implements a tree of Kohonen self-organizing maps, and consists of two VQ processors and two image buffer memory chips. The pipelined VQ processor contains a computational core for both code book generation and encoding, and is scalable to processing larger frames 相似文献

15.

An efficient code-timing estimator for DS-CDMA signals 总被引：5，自引：0，他引：5

Dunmin Zheng Jian Li Miller S.L. Strom E.G. 《Signal Processing, IEEE Transactions on》1997,45(1):82-89

We present an efficient algorithm for estimating the code timing of a known training sequence in an asynchronous direct-sequence code division multiple access (DS-CDMA) system. The algorithm is a large sample maximum likelihood (LSML) estimator that is derived by modeling the known training sequence as the desired signal and all other signals including the interfering signals and thermal noise as unknown colored Gaussian noise that is uncorrelated with the desired signal. The LSML estimator is shown to be robust against the near-far problem and is also compared with several other code timing estimators via numerical examples. It is found that the LSML approach can offer noticeable performance improvement, especially when the loading of the system is heavy 相似文献

16.

A VLSI based MIMD architecture of a multiprocessor system for real-time video processing applications

Klaus Gaedke Hartwig Jeschke Peter Pirsch 《The Journal of VLSI Signal Processing》1993,5(2-3):159-169

A MIMD based multiprocessor architecture for real-time video processing applications consisting of identical bus connected processing elements has been developed. Each processing element contains a RISC processor for controlling and data-dependent tasks and a Low Level Coprocessor for fast processing of convolution-type video processing tasks. To achieve efficient parallel processing of video input signals, the architecture supports independent processing of overlapping image segments. Running at a clock rate of 40 MHz, a single processing element provides a peak performance of 640 Mega arithmetic operations per second (MOPS). For the real-time processing of basic video processing tasks like 3×3 FIR-filter, 8×8 2D-DCT and motion estimation, a single processing element provides a sufficient computational rate for video signals with Common Intermediate Format (CIF) at a frame rate up to 30 Hz. For hybrid source coding of CIF video signals at a frame rate of 30 Hz a multiprocessor system consisting of six processing elements is required. A linear speedup of the multiprocessor system compared to a single processing element is achieved. A VLSI implementation of a processing element in 0.8 µm CMOS technology is under development. 相似文献

17.

An efficient network-switch scheduling for real-time applications

Caimu Tang Chronopoulos A.T. Yaprak E. 《Communications, IEEE Transactions on》2005,53(3):401-404

Bursts consist of a varying number of asynchronous transfer mode cells corresponding to a datagram. Here, we generalized weighted fair queueing to a burst-based algorithm with preemption. The new algorithm enhances the performance of the switch service for real-time applications, and it preserves the quality of service guarantees. We study this algorithm theoretically and via simulations. 相似文献

18.

VLSI architecture for sparse matrix multiplication

Brown C.I. Yates R.B. 《Electronics letters》1996,32(10):891-893

A new VLSI architecture for sparse matrices reduces the I/O and reduces the number of trivial multiplications. A two pipeline 30 MHz processor has been fabricated. This device performs 60 million MACs per second and reduces the time complexity for matrix multiplication by several orders of magnitude for most applications 相似文献

19.

An efficient bit-serial FIR filter architecture

Yong Ching Lim Joseph B. Evans Bede Liu 《Circuits, Systems, and Signal Processing》1995,14(5):639-651

A new bit-serial architecture for implementation of high order FIR filters is introduced, as well as example FPGA and CMOS realizations. This structure exploits the simplicity of coefficients that consist of two power-of-two terms to yield efficient implementations. Quantization effects are discussed and a simple block scaling method for reducing rounding and truncation noise in high order filters is also presented.This research is supported by the Office of Naval Research under Grant N00014-89-J1327, NSF Grant ECS87-13598, by an AT&T Bell Laboratories Graduate Fellowship, and by University of Kansas General Research allocation 3775-20-0038. Portions of this work were presented at ICASSP-90 in Albuquerque, New Mexico. 相似文献

20.

AVS插值算法的一种高效的硬件结构设计与实现 总被引：2，自引：0，他引：2

胡倩虞露《电路与系统学报》2008,13(3):148-152

提出了AVS解码系统中帧间运动补偿插值算法的一种面向FPGA/ASIC的硬件结构设计.阐述了插值过程的各功能单元的结构,给出了仿真结果及硬件规模.结果表明本文提出的结构设计支持720×576,4:2:0,30FPS的视频在54MHz最低工作频率下的实时解码,是一种适合于集成的高效并行VLSI结构设计. 相似文献