首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Context-based Binary Arithmetic Coding (CBAC) is a normative part of the newest X Profile of Advanced Audio Video coding Standard (AVS). This paper presents an efficient VLSI architecture for CBAC decoding in AVS. Compared with CBAC in H.264/AVC, the simpler binarization methods and context selection schemes are adopted in AVS. In order to avoid the slow multiplications, the traditional arithmetic calculation is transformed to the logarithm domain. Although these features can obtain better balance between the compression gain and implementation cost, it still brings huge challenge for high-throughput implementation. The fact that current bin decoding depends on previous bin results in long latency and limits overall system performance. In this paper, we present a software–hardware co-design by using bin distribution feature. A novel pipeline-based architecture is proposed where the arithmetic decoding engine works in parallel with the context maintainer. A finite state machine (FSM) is used to control the decoding procedure flexibly and the context scheduling is organized carefully to minimize the access times of context RAMs. In addition, the critical path is optimized for the timing. The proposed implementation can work at 150 MHz and achieve the real-time AVS CBAC decoding for 1080i HDTV video.  相似文献   

2.
Due to the evolution of increasingly high performantdsp algorithms for bit rate reduction of speech signals in telecommunications,vlsi implementations of these applications are becoming more and more complex. The solutions currently being used for these applications are general purpose digital signal processors or dsp cores which are never fully adapted to the application in terms ofvlsi architecture, i.e. silicon area and power consumption. We propose an alternative to these solutions, based on a parametrable Harvard architecture, and a C compiler which gives an optimized microcode suited to this architecture and to the application. Finally we present two examples of audio applications implemented using this solution.  相似文献   

3.
A folded very large scale integration (VLSI) architecture is presented for the implementation of the two-dimensional discrete wavelet transform, without constraints on the choice of the wavelet-filter bank. The proposed architecture is dedicated to flexible block-oriented image processing, such as adaptive vector quantization used in wavelet image coding. We show that reading the image along a two-dimensional (2-D) pseudo-fractal scan creates a very modular and regular data flow and, therefore, considerably reduces the folding complexity and memory requirements for VLSI implementation. This leads to significant area savings for on-chip storage (up to a factor of two) and reduces the power consumption. Furthermore, data scheduling and memory management remain very simple. The end result is an efficient VLSI implementation with a reduced area cost compared to the conventional approaches, reading the input data line by line  相似文献   

4.
Over the last few years, the Logarithmic Number System (LNS) has played a pivotal and decisive role in the field of Digital Signal Processing (DSP) and Image processing. Multiplication is a ubiquitous thirsty area to perform arithmetic operations in DSP applications and researchers have found that LNS is the possible solution for multiplication to be performed for a DSP application. In this paper, we propose a novel approach based on the Improved Operand Decomposition (IOD) to make an efficient logarithmic multiplier and subsequent achievement through scale realization. The Pipeline technique and the efficient correction circuit are used for error minimization at the cost of minimal hardware and delay. Reported and proposed multiplier is evaluated and compared in terms of Data Arrival Time (DAT), area, power, Area Delay Product (ADP), and EPS (Energy per Sample) at 90 nm CMOS technology by using Synopsys Design Compiler. Simulation results show that the proposed IOD method for logarithmic multiplication without the pipelining gives maximum of 35.39% less ADP and 11.15% less EPS for 32-bit architecture than of the reported logarithmic multiplier architecture. The proposed IOD based logarithmic multiplier with the pipelining gives a maximum of 20.17% less ADP for 8-bit architecture and 21.72% for 32-bit architecture than of the reported iterative pipelined architecture of logarithmic multiplier. Simulation results show that the optimized logarithmic converter gives 7.32%, and optimized antilogarithmic converter gives 41.59% less ADP respectively than of the reported logarithmic and antilogarithmic converter structures. The optimized antilogarithmic converter architecture gives a maximum of 43.94% less EPS than of the reported antilogarithmic converter structure.  相似文献   

5.
We propose a new VLSI architecture for an FFT processor. Our architecture uses few processing elements and can be laid out in a mesh-interconnected pattern. We show how to compute the discrete Fourier transform at n points with an optimal speed-up as long as the memory is large enough. The control is shown to be simple and easily implementable in VLSI.  相似文献   

6.
高涛  白璘 《电子设计工程》2012,20(14):120-122
文中通过深入研究三维离散小波变换(3D DWT)核心算法并根据序列图像编码的特点,设计并实现了一种适合硬件实现的高效的三维小波变换VLSI结构。编写了相应verilog模型,并进行了仿真和逻辑综合。仿真结果表明行列滤波并行处理并采用流水线设计方法,加快了运算速度,有效降低了片内存储容量。  相似文献   

7.
Digital image coding using vector quantization (VQ) based techniques provides low-bit rates and high quality coded images, at the expense of intensive computational demands. The computational requirement due to the encoding search process, had hindered application of VQ to real-time high-quality coding of color TV images. Reduction of the encoding search complexity through partitioning of a large codebook into the on-chip memories of a concurrent VLSI chip set is proposed. A real-time vector quantizer architecture for encoding color images is developed. The architecture maps the mean/quantized residual vector quantizer (MQRVQ) (an extension of mean/residual VQ) onto a VLSI/LSI chip set. The MQRVQ contributes to the feasibility of the VLSI architecture through the use of a simple multiplication free distortion measure and reduction of the required memory per code vector. Running at a clock rate of 25 MHz the proposed hardware implementation of this architecture is capable of real-time processing of 480×768 pixels per frame with a refreshing rate of 30 frames/s. The result is a real-time high-quality composite color image coder operating at a fixed rate of 1.12 b per pixel  相似文献   

8.
As neural network systems are scaled up in size it will become extremely difficult, if not impossible, to maintain full connectivity. A digital architecture which exhibits hierarchical connectivity similar to that observed in many biological neural networks is described. At the lowest level, clusters of fully connected neurons correspond to subnetworks. These subnetworks are then sparsely connected to form the complete neural network system. The architecture exploits the inherent density and large bandwidth of on-chip RAM and can use either a large number of bit-serial processors or a reduced number of bit-parallel processors. A prototype chip which implements a complete subnetwork has been fabricated in 3-μm CMOS and is fully functional  相似文献   

9.
In this paper, an efficient very large scale integration (VLSI) architecture, called flipping structure, is proposed for the lifting-based discrete wavelet transform. It can provide a variety of hardware implementations to improve and possibly minimize the critical path as well as the memory requirement of the lifting-based discrete wavelet transform by flipping conventional lifting structures. The precision issues are also analyzed. By case studies of the JPEG2000 default lossy (9,7) filter, an integer (9,7) filter, and the (6,10) filter, the efficiency of the proposed flipping structure is demonstrated.  相似文献   

10.
Real time rendering of three-dimensional scenes in high photorealistic details is a hard task, such as in the ray tracing rendering algorithm. In general, the performance achieved by a sequential software-based implementation of ray tracing is far from satisfactory. However, parallel implementations of ray tracing have been enabling reasonable real time performance, as the algorithm is embarrassingly parallel. Thus, a custom parallel design in hardware is likely to achieve an even higher performance. In this paper, we propose a hardware parallel architecture capable of dealing with the main desirable features of ray tracing, such as shadows and reflection effects, imposing low area cost and a promising rendering performance. Such architecture, called GridRT, is based on the Uniform Grid acceleration structure and is intended to deliver massive parallelism through parallel ray-triangle intersection tests as well as parallel processing of many rays. A hardware implementation of the proposed architecture is presented, together with some performance results and resources requirements. The rendering is reduced by 80% using a grid configuration of eight processing elements.  相似文献   

11.
提出一种基于行的实时、二维提升整数小波变换的VLSI结构。该结构包括行变换器、列变换器、中间缓存器以及输出控制单元。利用中间缓存器暂存行变换的中间结果,由输出控制单元按优先级从高到低的顺序依次输出各级小波系数。由于在硬件实现中采用基于行的提升变换结构,从而水平和垂直方向上的变换能并行处理。与现有结构相比,该结构具有并行度高、存储量低的特点,并且能够在一幅图像逐行扫描的时间间隔内完成整幅图像的多级小波变换。  相似文献   

12.
A multimedia communication system includes both the communication protocols used to transport the real-time data and the distributed computing system (DCS) within which any applications using the protocols must execute. The architecture presented attempts to integrate these communications protocols with the DCS in a smooth fashion in order to ease the writing of multimedia applications. Two issues are identified as being essential to the success of this integration: the synchronization of related real-time data streams, and the management of heterogeneous multimedia hardware. The synchronization problem is tackled by defining explicit synchronization properties at the presentation level and by providing control and synchronization operations within the DCS which operate in terms of these properties. The heterogeneity problems are addressed by separating the data transport semantics (protocols themselves) from the control semantics (protocol interfaces)  相似文献   

13.
Image compression applications use vector quantization (VQ) for its high compression ratio and image quality. The current VQ hardware employs static instead of dynamic code book generation as the latter demands intensive computation and corresponding expensive hardware even though it offers better image quality. This paper describes a VLSI architecture for a real-time dynamic code book generator and encoder of 512×512 images at 30 frames/s. The four-chip 0.8 μm CMOS design implements a tree of Kohonen self-organizing maps, and consists of two VQ processors and two image buffer memory chips. The pipelined VQ processor contains a computational core for both code book generation and encoding, and is scalable to processing larger frames  相似文献   

14.
A MIMD based multiprocessor architecture for real-time video processing applications consisting of identical bus connected processing elements has been developed. Each processing element contains a RISC processor for controlling and data-dependent tasks and a Low Level Coprocessor for fast processing of convolution-type video processing tasks. To achieve efficient parallel processing of video input signals, the architecture supports independent processing of overlapping image segments. Running at a clock rate of 40 MHz, a single processing element provides a peak performance of 640 Mega arithmetic operations per second (MOPS). For the real-time processing of basic video processing tasks like 3×3 FIR-filter, 8×8 2D-DCT and motion estimation, a single processing element provides a sufficient computational rate for video signals with Common Intermediate Format (CIF) at a frame rate up to 30 Hz. For hybrid source coding of CIF video signals at a frame rate of 30 Hz a multiprocessor system consisting of six processing elements is required. A linear speedup of the multiprocessor system compared to a single processing element is achieved. A VLSI implementation of a processing element in 0.8 µm CMOS technology is under development.  相似文献   

15.
A cost-effective fault-tolerant architecture called FAUST is presented for ATM switches. The key idea behind the architecture is the incorporation of spare units and associated commutation logic into strategic partitions of the switching system. The definition of a replaceable unit is flexible, and based on packaging considerations. The commutation logic can switch in a spare unit in place of a failed one at cell rate, and is distributed entirely in the existing switch control units. So the additional overhead is almost entirely in the spare modules provided. The technique is far superior to a duplex configuration in terms of reliability improvement vs. component redundancy, and can be applied to established architectures for ATM switches, including multistage sort and shared memory based architectures. Its scalability also makes it applicable to system sizes from a few tens of lines to a few thousand  相似文献   

16.
An efficient code-timing estimator for DS-CDMA signals   总被引:5,自引:0,他引:5  
We present an efficient algorithm for estimating the code timing of a known training sequence in an asynchronous direct-sequence code division multiple access (DS-CDMA) system. The algorithm is a large sample maximum likelihood (LSML) estimator that is derived by modeling the known training sequence as the desired signal and all other signals including the interfering signals and thermal noise as unknown colored Gaussian noise that is uncorrelated with the desired signal. The LSML estimator is shown to be robust against the near-far problem and is also compared with several other code timing estimators via numerical examples. It is found that the LSML approach can offer noticeable performance improvement, especially when the loading of the system is heavy  相似文献   

17.
Bursts consist of a varying number of asynchronous transfer mode cells corresponding to a datagram. Here, we generalized weighted fair queueing to a burst-based algorithm with preemption. The new algorithm enhances the performance of the switch service for real-time applications, and it preserves the quality of service guarantees. We study this algorithm theoretically and via simulations.  相似文献   

18.
Brown  C.I. Yates  R.B. 《Electronics letters》1996,32(10):891-893
A new VLSI architecture for sparse matrices reduces the I/O and reduces the number of trivial multiplications. A two pipeline 30 MHz processor has been fabricated. This device performs 60 million MACs per second and reduces the time complexity for matrix multiplication by several orders of magnitude for most applications  相似文献   

19.
The wide-band digital receiving systems require digital downconversion(DDC) with high data rate and short tuning time in order to intercept the narrow-band signals within broad tuning bandwidth. But these requirements can not be met by the commercial DDC. In this paper an efficient implementation architecture is presented. It combines the flexibility of DFT tuning with the efficiency of the polyphase filter bank decomposition. By first decimating the data prior to filtering and mixing, this architecture gives a better solution to the mismatch between the lower hardware speed and high data rate. The computer simulations show the feasibility of this processing architecture.  相似文献   

20.
Noise-interference is one of the major concerns in low-power VLSI circuits. Due to power supply downscaling, these circuits have an extremely limited noise margin that is inadequate for dealing with intrinsic and extrinsic noise. The MRF-based design has been accepted as a highly effective method for designing noise-tolerant low-power circuits. However, the MRF-based circuits suffer from a complex structure and the methods which tried to simplify the structure always sacrificed the noise immunity for hardware simplicity. In this paper, we propose a novel MRF-based method for designing efficient and reliable low-power VLSI circuits. For the first time, an innovative reliability boosting mechanism based on maximum conditional correct probability is incorporated into an efficient MRF-based structure which leads to highly reliable circuits with considerably low cost, delay, and power consumption. The proposed method demonstrates the best performance among all of the previously reported methods. Moreover, the Monte Carlo simulations confirm that the proposed method can preserve its superior noise immunity even under serious process, voltage, and temperature variations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号