首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Digital image coding using vector quantization (VQ) based techniques provides low-bit rates and high quality coded images, at the expense of intensive computational demands. The computational requirement due to the encoding search process, had hindered application of VQ to real-time high-quality coding of color TV images. Reduction of the encoding search complexity through partitioning of a large codebook into the on-chip memories of a concurrent VLSI chip set is proposed. A real-time vector quantizer architecture for encoding color images is developed. The architecture maps the mean/quantized residual vector quantizer (MQRVQ) (an extension of mean/residual VQ) onto a VLSI/LSI chip set. The MQRVQ contributes to the feasibility of the VLSI architecture through the use of a simple multiplication free distortion measure and reduction of the required memory per code vector. Running at a clock rate of 25 MHz the proposed hardware implementation of this architecture is capable of real-time processing of 480×768 pixels per frame with a refreshing rate of 30 frames/s. The result is a real-time high-quality composite color image coder operating at a fixed rate of 1.12 b per pixel  相似文献   

2.
We present a new systolic architecture for implementing Finite State Vector Quantization in real-time for both speech and image data. This architecture is modular and has a very simple control flow. Only one processor is needed for speech compression. A linear array of processors is used for image compression; the number of processors needed is independent of the size of the image. We also present a simple architecture for converting line-scanned image data into the format required by this systolic architecture. Image data is processed at a rate of 1 pixel per clock cycle. An implementation at 31.5 MHz can quantize 1024×1024 pixel images at 30 frames/sec in real-time. We describe a VLSI implementation of these processors.  相似文献   

3.
This paper presents an integrated systolic array design for implementing full-search block matching, 2-D discrete wavelet transform, and full-search vector quantization on the same VLSI architecture. These functions are the prime components in video compression and take a great amount of computation. To meet the real-time application requirements, many systolic array architectures are proposed for individually performing one of those functions. However, these functions contain similar computational procedure. The matrix-vector product forms of the three functions are quite analogous. After extracting the common computation component, we design an integrated one-dimensional systolic array that can perform aforementioned three functions. The proposed architecture can efficiently perform three typical functions: (1) the full-search block matching with block of size 16 × 16 and the search are from –8 to 7; (2) the 2-D 2 level Harr transform with block of size 8 × 8; and (3) the full-search vector quantization with input vector of size 2 × 2. A utilization rate of 100% to 97% is achieved in the course of executing full-search block matching and full-search vector quantization. When it comes to perform 2-D discrete wavelet transform, the utilization rate is about 32%. The proposed integrated architecture has lowered hardware cost and reduced hardware structure. It befits the VLSI implementation for video/image compression applications.  相似文献   

4.
A feature-extraction and vector-generation VLSI has been developed for real-time image recognition. An arrayed-shift-register architecture has been employed in conjunction with a pipelined directional-edge-filtering circuitry. As a result, it has become possible to scan an image, pixel by pixel, with a 64 x 64-pixel recognition window and generate a 64-dimensional feature vector in every 64 clock cycles. In order to determine the threshold for edge-filtering operation adaptive to local luminance variation, a high-speed median circuit has been developed. A binary median search algorithm has been implemented using high-precision majority voting circuits working in the mixed-signal principle. A prototype chip was designed and fabricated in a 0.18-mum 5-metal CMOS technology. A high-speed feature vector generation in less than 9.7 ns/vector element has been experimentally demonstrated. It is possible to scan a VGA-size image at a rate of 6.1 frames/s, thus generating as many as 1.5 x 106 feature vectors per second for recognition. This is more than 103 times faster than software processing running on a 3-GHz general-purpose processor.  相似文献   

5.
Describes the architecture and design of a CMOS VLSI chip for data compression and decompression using tree-based codes. The chip, called MARVLE, implements a memory-based architecture for variable length encoding and decoding based on tree-based codes. The architecture is based on an efficient scheme of mapping the tree representing any binary code onto a memory device. A prototype 2-mm CMOS VLSI chip has been designed, verified, and fabricated by the MOSIS facility. The chip has a 512×12 static RAM with an access time of 4 ns and logic circuitry for compression as well as decompression. The chip occupies a silicon area of 6.8 mm×6.9 mm and consists of 49695 transistors. The prototype chip yields a compression rate of 95.2 Mb/s and a decompression rate of 60.6 Mb/s with a clock rate of 83.3 MHz. The VLSI hardware can be used to implement the JPEG baseline compression scheme  相似文献   

6.
A real-time, low-power video encoder design for pyramid vector quantization (PVQ) has been presented. The quantizer is estimated to dissipate only 2.1 mW for real-time video compression of images of 256 × 256 pixels at 30 frames per second in standard 0.8-micronCMOS technology with a 1.5 V supply. Applying this quantizer to subband decomposed images, the quantizer performs better than JPEG on average. We achieve this high level of power efficiency with image quality exceeding that of variable rate codes through algorithmic and architectural reformulation. The PVQ encoder is well-suited for wireless, portable communication applications.  相似文献   

7.
In this paper, we describe a fully pipelined single chip VLSI architecture for implementing the JPEG baseline image compression standard. The architecture exploits the principles of pipelining and parallelism to the maximum extent in order to obtain high speed and throughput. The architecture for discrete cosine transform and the entropy encoder are based on efficient algorithms designed for high speed VLSI implementation. The entire architecture can be implemented on a single VLSI chip to yield a clock rate of about 100 MHz which would allow an input rate of 30 frames per second for 1024×1024 color images  相似文献   

8.
This paper proposes an architecture of the wireless endoscopy system for the diagnoses of whole human digestive tract and real-time endoscopic image monitoring. The low-power digital IC design inside the wireless endoscopic capsule is discussed in detail. A very large scale integration (VLSI) architecture of three-stage clock management is applied, which can save 46% power inside the capsule compared with the design without such a low-power design. A stoppable ring crystal oscillator with minimal overhead is used in the sleep mode, which results in about 60-muW system power dissipation in sleep mode. A new image compression algorithm based on Bayer image format and its corresponding VLSI architecture are both proposed for low-power, high-data volume. Thus, 8 frames per second with 320*288 pixels can be transmitted with 2 Mb/s. The digital IC design also assures that the capsule has many flexible and useful functions for clinical application. The digital circuits were verified on field-programmable gate arrays and have been implemented in 0.18-mum CMOS process with 6.2 mW  相似文献   

9.
Current and future requirements for adaptive real-time image compression challenge even the capabilities of highly parallel realizations in terms of hardware performance. Previously proposed linear array structures for full-search vector quantization do not offer scalability and adaptivity in this context, because they require separate data/control pins for dynamically updating the codevectors and complicated interlock mechanisms to ensure that the regular data flow is not corrupted as a result of updates. We explore the design space for full-search vector quantizers and propose a novel linear processor array architecture in which global wiring is limited to clock and power supply distribution, thus allowing high-speed processing in spite of only limited communication with the host via the boundary processors. The resulting fully pipelined design is not only area-efficient for VLSI implementation but is also readily scalable and offers extremely high performance.  相似文献   

10.
In this paper, we propose two new VLSI architectures for computing the N-point discrete Fourier transform (DFT) and its inverse (IDFT) based on a radix-2 fast algorithm, where N is a power of two. The first part of this work presents a linear systolic array that requires log2 N complex multipliers and is able to provide a throughput of one transform sample per clock cycle. Compared with other related systolic designs based on direct computation or a radix-2 fast algorithm, the proposed one has the same throughput performance but involves less hardware complexity. This design is suitable for high-speed real-time applications, but it would not be easily realized in a single chip when N gets large. To balance the chip area and the processing speed, we further present a new reduced-complexity design for the DFT/IDFT computation. The alternative design is a memory-based architecture that consists of one complex multiplier, two complex adders, and some special memory units. The new design has the capability of computing one transform sample every log2 N+1 clock cycles on average. In comparison with the first design, the second design reaches a lower throughput with less hardware complexity. As N=512, the chip area required for the memory-based design is about 5742×5222 μm2, and the corresponding throughput can attain a rate as high as 4M transform samples per second under 0.6 μm CMOS technology. Such area-time performance makes this design very competitive for use in long-length DFT applications, such as asymmetric digital subscriber lines (ADSL) and orthogonal frequency-division multiplexing (OFDM) systems  相似文献   

11.
Image compression applications use vector quantization (VQ) for its high compression ratio and image quality. The current VQ hardware employs static instead of dynamic code book generation as the latter demands intensive computation and corresponding expensive hardware even though it offers better image quality. This paper describes a VLSI architecture for a real-time dynamic code book generator and encoder of 512×512 images at 30 frames/s. The four-chip 0.8 μm CMOS design implements a tree of Kohonen self-organizing maps, and consists of two VQ processors and two image buffer memory chips. The pipelined VQ processor contains a computational core for both code book generation and encoding, and is scalable to processing larger frames  相似文献   

12.
Finite-state vector quantization for waveform coding   总被引:3,自引:0,他引:3  
A finite-state vector quantizer is a finite-state machine used for data compression: Each successive source vector is encoded into a codeword using a minimum distortion rule, and into a code book, depending on the encoder state. The current state and the selected codeword then determine the next encoder state. A finite-state vector quantizer is capable of making better use of the memory in a source than is an ordinary memoryless vector quantizer of the same dimension or blocklength. Design techniques are introduced for finite-state vector quantizers that combine ad hoc algorithms with an algorithm for the design of memoryless vector quantizers. Finite-state vector quantizers are designed and simulated for Gauss-Markov sources and sampled speech data, and the resulting performance and storage requirements are compared with ordinary memoryless vector quantization.  相似文献   

13.
A folded very large scale integration (VLSI) architecture is presented for the implementation of the two-dimensional discrete wavelet transform, without constraints on the choice of the wavelet-filter bank. The proposed architecture is dedicated to flexible block-oriented image processing, such as adaptive vector quantization used in wavelet image coding. We show that reading the image along a two-dimensional (2-D) pseudo-fractal scan creates a very modular and regular data flow and, therefore, considerably reduces the folding complexity and memory requirements for VLSI implementation. This leads to significant area savings for on-chip storage (up to a factor of two) and reduces the power consumption. Furthermore, data scheduling and memory management remain very simple. The end result is an efficient VLSI implementation with a reduced area cost compared to the conventional approaches, reading the input data line by line  相似文献   

14.
Quantization is fundamental to analog-to-digital converter (ADC) and signal compression. In this paper, we propose an adaptive quantizer with piecewise companding and scaling for signals of Gaussian mixture model (GMM). Our adaptive quantizer operates under three modes, each of which corresponds to different types of GMM. Moreover, we propose a reconfigurable architecture to implement our adaptive quantizer in an ADC. We also use it to quantize images and design the tone mapping algorithm for high dynamic range (HDR) image compression. Our experimental results show that (1) the proposed quantizer is able to achieve performance close to the optimal quantizer (i.e., Lloyd–Max quantizer for GMM) in the sense of mean squared error (MSE), at much lower computational cost than it; (2) the proposed quantizer is able to achieve much better MSE performance than a uniform quantizer, at a cost similar to the uniform quantizer. The proposed adaptive quantizer holds great potential in the appilcations of the existing ADC and HDR image compression.  相似文献   

15.
邸志雄  史江义  郝跃  逄杰  刘凯  李云松 《电子学报》2012,40(11):2158-2164
传统的JPEG2000MQ编码器串行编码效率低下,同时现有的多上下文并行编码的MQ编码器占用资源过大.本文对MQ编码算法中的运算流程,索引值和概率估计值的求解函数,条件交换和重归一化算法等四个方面进行了优化,减弱了上下文之间的依赖性,简化了条件交换和重归一化算法的复杂度.依据该算法,本文提出了一种高速的MQ编码器VLSI结构,实验结果表明,本文提出的MQ编码器VLSI结构能够工作在532.91MHz,吞吐率为532.91 Msymbols/sec,相比Dyer提出的Brute force with modified结构,工作频率提高1倍,吞吐量提高近27%,且面积仅为其四分之一.  相似文献   

16.
A mixed mode digital/analog special purpose VLSI hardware implementation of an associative memory with neural architecture is presented. The memory concept is based on a matrix architecture with binary storage elements holding the connection weights. To enhance the processing speed analog circuit techniques are applied to implement the algorithm for the association. To keep the memory density as high as possible two design strategies are considered. First, the number of transistors per storage element is kept to a minimum. In this paper a circuit technique that uses a single 6-transistor cell for weight storage and analog signal processing is proposed. Second, the device precision has been chosen to a moderate level to save area as much as possible. Since device mismatch limits the performance of analog circuits, the impact of device precision on the circuit performance is explicitly discussed. It is shown that the device precision limits the number of rows activated in parallel. Since the input vector as well as the output vector are considered to be sparsely coded it is concluded, that even for large matrices the proposed circuit technique is appropriate and ultra large scale integration with a large number of connection weights is feasible.  相似文献   

17.
We present a parallel algorithm, architecture, and implementation for efficient Lempel-Ziv (LZ)-based data compression. The parallel algorithm exhibits a scalable, parameterized, and regular structure and is well suited for VLSI array implementation. Based on our parallel algorithm and systematic design methodologies, two semisystolic array architectures have been developed which are low power and area efficient. The first architecture trades off the compression speed for the area and has a low run-time overhead for multichannel compression. The second architecture achieves a high compression rate (one data symbol per clock) at the expense of the area due to a large clock load and global wiring. Compared to a recent state-of-the-art parallel architecture, our first array structure requires significantly less chip area (≃330 k versus ≃36 k transistors) and more than an order of magnitude less power (≈1.0 W versus ≈70 mW) while still providing the compression speed required for most data communication applications. Hence, data compression can be adopted in portable data communication as well as wireless local area networks. The second architecture has at least three times less area and power while providing the same constant compression rate. To demonstrate the correctness of our design, a prototype module for the first architecture has been implemented using 1.2 μ complementary metal-oxide-semiconductor (CMOS) technology. The compression module contains 32 simple and identical processors, has an average compression rate of 12.5 million bytes/s, and consumes 18.34 mW without the dictionary (≈70 mW with a 4.1k SRAM for the dictionary) while operating at a 100 MHz clock rate (simulated)  相似文献   

18.
An analog computing-based systolic architecture which employs multiple neuroprocessors for high-speed early vision processing is presented. For a two-dimensional image, parallel processing is performed in the row direction and pipelined processing is performed in the column direction. The mixed analog/digital design approach is suitable for implementation of electronic neural systems. Local data computation is executed by analog circuitry to achieve full parallelism and to minimize power dissipation. Inter-processor communication is carried out in the digital format to maintain strong signal strength across the chip boundary and to achieve direct scalability in neural network size. For demonstration purposes, a compact and efficient VLSI neural chip that includes multiple neuroprocessors for high-speed digital image restoration is designed. Measured results of the programmable synapse, and statistical distribution of measured synapse conductances are presented. Based on these results, system-level analyses at 8-bit resolution are conducted. A 8.0×6.0-mm 2 chip from a 1.2-µm CMOS technology can accommodate 5 neuroprocessors and the speed-up factor over the Sun-4/75 SPARC workstation is around 450. This chip achieves 18 Giga connections per second.This research was partially supported by DARPA under Contract MDA 972-90-C-0037 and by TRW Inc., Samsung Electronics Co., Ltd., and NKK Corp.  相似文献   

19.
A memory system which rapidly chooses the stored item most closely matching a given input is fundamental to a number of recognition tasks. A memory architecture which performs this function is discussed. In addition, a measure of the quality of the selected (best matching) memory is generated. The architecture is capable of significant data throughput rates and is amenable to implementation using conventional digital VLSI fabrication process. These characteristics are demonstrated by a prototype device fabricated using the MOSIS 3-μm CMOS design rules, which can compare more than two million 9-bit input works per second. Behavioral simulations demonstrate the applicability of the architecture to some basic recognition tasks  相似文献   

20.
A method of designing testable systolic architectures is proposed in this paper. Testing systolic arrays involves mapping of an algorithm into a specific VLSI systolic architecture, and then modifying the design to achieve concurrent testing. In our approach, redundant computations are introduced at the algorithmic level by deriving two versions of a given algorithm. The transformed dependency matrix (TDM) of the first version is a valid transformation matrix while the second version is obtained by rotating the first TDM by 180 degrees about any of the indices that represent the spatial component of the TDM. Concurrent error detection (CED) systolic array is constructed by merging the corresponding systolic array of the two versions of the algorithm. The merging method attempts to obtain the self testing systolic array at minimal cost in terms of area and speed. It is based on rescheduling input data, rearranging data flow, and increasing the utilization of the array cells. The resulting design can detect all single permanent and temporary faults and the majority of the multiple fault patterns with high probability. The design method is applied to an algorithm for matrix multiplication in order to demonstrate the generality and novelty of our approach to design testable VLSI systolic architectures.This work has been supported by a grant from the Natural Sciences and Engineering Research Council of Canada.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号