首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
A high-speed Viterbi decoder VLSI with coding rate R=1/2 and constraint length K=7 for bit-error correction has been developed using 1.5-/spl mu/m n-well CMOS technology. To reduce both hardware size and power dissipation, a recently developed scarce-state-transition (SST) Viterbi decoding scheme has been utilized. In addition, three-layer metallization and an advanced hierarchical macrocell design method (HMCM) have been adopted to improve packing density and reduce chip size. As a result, active chip area has been reduced by half, compared to the conventional standard cell design method (SCM) with two-layer metallization, and 42 K gates have been integrated on a chip with a die size of 9.52/spl times/10.0 mm/SUP 2/. The VLSI decoder has achieved a maximum data throughput rate of 23 Mb/s with a net coding gain of 4.4 dB (at 10/SUP -4/ bit-error rate). The chip dissipates only 825 mW at a data rate of 10 Mb/s.  相似文献   

2.
We present a parallel algorithm, architecture, and implementation for efficient Lempel-Ziv (LZ)-based data compression. The parallel algorithm exhibits a scalable, parameterized, and regular structure and is well suited for VLSI array implementation. Based on our parallel algorithm and systematic design methodologies, two semisystolic array architectures have been developed which are low power and area efficient. The first architecture trades off the compression speed for the area and has a low run-time overhead for multichannel compression. The second architecture achieves a high compression rate (one data symbol per clock) at the expense of the area due to a large clock load and global wiring. Compared to a recent state-of-the-art parallel architecture, our first array structure requires significantly less chip area (≃330 k versus ≃36 k transistors) and more than an order of magnitude less power (≈1.0 W versus ≈70 mW) while still providing the compression speed required for most data communication applications. Hence, data compression can be adopted in portable data communication as well as wireless local area networks. The second architecture has at least three times less area and power while providing the same constant compression rate. To demonstrate the correctness of our design, a prototype module for the first architecture has been implemented using 1.2 μ complementary metal-oxide-semiconductor (CMOS) technology. The compression module contains 32 simple and identical processors, has an average compression rate of 12.5 million bytes/s, and consumes 18.34 mW without the dictionary (≈70 mW with a 4.1k SRAM for the dictionary) while operating at a 100 MHz clock rate (simulated)  相似文献   

3.
This paper describes a 256 Mb DRAM chip architecture which provides up to ×32 wide organization. In order to minimize the die size, three new techniques: an exchangeable hierarchical data line structure, an irregular sense amp layout, and a split address bus with local redrive scheme in the both-ends DQ were introduced. A chip has been developed based on the architecture with 0.25 μm CMOS technology. The chip measures 13.25 mm×21.55 mm, which is the smallest 256 Mb DRAM ever reported. A row address strobe (RAS) access time of 26 ns was obtained under 2.8 V power supply and 85°C. In addition, a 100 MHz×32 page mode operation, namely 400 M byte/s data rate, in the standard extended data output (EDO) cycle has been successfully demonstrated  相似文献   

4.
A high-speed image compression VLSI processor based on the systolic architecture of difference-codebook binary tree-searched vector quantization has been developed to meet the increasing demands on large-volume data communication and storage requirements. Simulation results show that this design is applicable to many types of image data and capable of producing good reconstructed data quality at high compression ratios. Various design aspects of the binary tree-searched vector quantizer including the algorithm, architecture, and detailed functional design are thoroughly investigated for VLSI implementation. An 8-level difference-codebook binary tree-searched vector quantizer can be implemented on a custom VLSI chip that includes a systolic array of eight identical processors and a hierarchical memory of eight subcodebook memory banks. The total transistor count is about 300000 and the die size is about 8.67×7.72 mm2 in a 1.0 μm CMOS technology. The throughput rate of this high-speed VLSI compression system is approximately 25 Mpixels per second and its equivalent computation power is 600 million instructions per second  相似文献   

5.
Low-density parity-check block codes (LDPC-BCs) are quickly becoming the forward error correcting code of choice for emerging communication standards. However, low-density parity-check convolutional codes (LDPC-CCs), the convolutional counterpart of LDPC-BCs, seem to be better suited in applications with streaming data or variable sized packets. A rate-1/2, (128,3,6) LDPC-CC ASIC has been implemented in 180-nm, 1.8-V CMOS technology. We present the VLSI architecture of a register-based LDPC-CC encoder and decoder that includes an on-chip, pseudo-random additive white Gaussian noise channel emulator. The decoder comprises a pipeline of ten identical processing units and attains up to 175 Mb/s of decoded throughput.  相似文献   

6.
We propose a novel integration of image compression and sensing in order to enhance the performance of an image sensor. By integrating a compression function onto the sensor focal plane, the image signal to be read out from the sensor is significantly reduced and the pixel rate of the sensor ran consequently be increased. The potential applications of the proposed sensor are in high pixel-rate imaging, such as high frame-rate image sensing and high-resolution image sensing. The compression scheme we employ is a conditional replenishment, which detects and encodes moving areas. In this paper, we introduce two architectures for on-sensor compression; one is the pixel parallel approach and the other is the column parallel approach. We prototyped a VLSI chip of the proposed sensor based on the pixel parallel architecture. We show the design and describe the results of the experiments obtained by the prototype chip  相似文献   

7.
This paper proposes an architecture of the wireless endoscopy system for the diagnoses of whole human digestive tract and real-time endoscopic image monitoring. The low-power digital IC design inside the wireless endoscopic capsule is discussed in detail. A very large scale integration (VLSI) architecture of three-stage clock management is applied, which can save 46% power inside the capsule compared with the design without such a low-power design. A stoppable ring crystal oscillator with minimal overhead is used in the sleep mode, which results in about 60-muW system power dissipation in sleep mode. A new image compression algorithm based on Bayer image format and its corresponding VLSI architecture are both proposed for low-power, high-data volume. Thus, 8 frames per second with 320*288 pixels can be transmitted with 2 Mb/s. The digital IC design also assures that the capsule has many flexible and useful functions for clinical application. The digital circuits were verified on field-programmable gate arrays and have been implemented in 0.18-mum CMOS process with 6.2 mW  相似文献   

8.
This paper investigates VLSI architectures for low-density parity-check (LDPC) decoders amenable to low- voltage and low-power operation. First, a highly-parallel decoder architecture with low routing overhead is described. Second, we propose an efficient method to detect early convergence of the iterative decoder and terminate the computations, thereby reducing dynamic power. We report on a bit-serial fully-parallel LDPC decoder fabricated in a 0.13-$mu{hbox{m}}$ CMOS process and show how the above techniques affect the power consumption. With early termination, the prototype is capable of decoding with 10.4 pJ/bit/iteration, while performing within 3 dB of the Shannon limit at a BER of 10$^{-5}$ and with 3.3 Gb/s total throughput. If operated from a 0.6 V supply, the energy consumption can be further reduced to 2.7 pJ/bit/iteration while maintaining a total throughput of 648 Mb/s, due to the highly-parallel architecture. To demonstrate the applicability of the proposed architecture for longer codes, we also report on a bit-serial fully-parallel decoder for the (2048, 1723) LDPC code in 10GBase-T standard synthesized with a 90-nm CMOS library.   相似文献   

9.
A parallel architecture for high-data-rate AGC/decision/clock-recovery circuit, recovering digital NRZ data in optical-fiber receivers, is described. Improvement over traditional architecture in throughput is achieved through the use of parallel signal paths. An experimental prototype, fabricated in a 1.2-μm double-poly double-metal n-well CMOS process, achieves a maximum bit rate of 480 Mb/s. The chip contains variable gain amplifiers, clock recovery, and demultiplexing circuits. It yields a BER of 10-11 with an 18 mVp-p differential input signal. The power consumption is 900 mW from a single 5 V supply  相似文献   

10.
MIMO has been proposed as an extension to 3G and Wireless LANs. As an implementation scheme of MIMO systems, V-BLAST is suitable for the applications with very high data rates. The square root algorithm for V-BLAST detection is attractive to hardware implementations due to its low computational complexity and numerical stability. In this paper, the fixed-point implementation of the square root algorithm is analyzed, and a low complexity VLSI architecture is proposed. The proposed architecture is scalable for various configurations, and implemented for a 4 × 4 QPSK V-BLAST system in a 0.35 m CMOS technology. The chip core covers 9 and 190 K gates. The detection throughput of the chip depends on the received symbol packet length. When the packet length is larger than or equal to 100 bytes, it can achieve a maximal detection throughput of 128 160 Mb/s at a maximal clock frequency of 80 MHz. The core power consumption, measured at 2.7 V and room temperature, is about 608 mW for 160 Mb/s data rate at 80 MHz, and 81 mW for 20 Mb/s at 10 MHz. The proposed architecture is shown to meet the requirements for emerging MIMO applications, such as 3G HSDPA and IEEE 802.11n.  相似文献   

11.
Future video services in the loop plant may be based on digital transmission through optical fibers. A small and inexpensive codec is required for a variety of services based on digital television. We have demonstrated a compression algorithm for transmission of NTSC color television over a DS3 channel (44.736 Mbits/s). This predictive coding algorithm has been implemented using circuits built with conventional TTL logic. The resulting picture is visually unimpaired, but may not have network quality. The major portion of the compression or reconstruction circuit has also been implemented in one CMOS VLSI chip. Compression is accomplished using this chip with two small ROM chips. Reconstruction is done with the same VLSI chip and one ROM chip.  相似文献   

12.
A test chip for an integrated full CMOS LED driver has been realized with a modulation current of 60 mA at a maximum bit rate of 155 Mb/s. A CMOS receiver is evaluated to amplify PIN diode photocurrents less than 10 µA at the same bit rate of 155 Mb/s. Both circuits are integrated on one chip. The circuit has been developed in a 0.8-µm digital CMOS process.  相似文献   

13.
A 640-Mb/s 2048-bit programmable LDPC decoder chip   总被引:3,自引:0,他引:3  
A 14.3-mm/sup 2/ code-programmable and code-rate tunable decoder chip for 2048-bit low-density parity-check (LDPC) codes is presented. The chip implements the turbo-decoding message-passing (TDMP) algorithm for architecture-aware (AA-)LDPC codes which has a faster convergence rate and hence a throughput advantage over the standard decoding algorithm. It employs a reduced complexity message computation mechanism free of lookup tables, and features a programmable network for message interleaving based on the code structure. The chip decodes any mix of 2048-bit rate-1/2 (3,6)-regular AA-LDPC codes in standard mode by programming the network, and attains a throughput of 640 Mb/s at 125 MHz for 10 TDMP-decoding iterations. In augmented mode, the code rate can be tuned up to 14/16 in steps of 1/16 by augmenting the code. The chip is fabricated in 0.18-/spl mu/m six-metal-layer CMOS technology, operates at a peak clock frequency of 125 MHz at 1.8 V (nominal), and dissipates an average power of 787 mW.  相似文献   

14.
K-best Schnorr-Euchner (KSE) decoding algorithm is proposed in this paper to approach near-maximum-likelihood (ML) performance for multiple-input-multiple-output (MIMO) detection. As a low complexity MIMO decoding algorithm, the KSE is shown to be suitable for very large scale integration (VLSI) implementations and be capable of supporting soft outputs. Modified KSE (MKSE) decoding algorithm is further proposed to improve the performance of the soft-output KSE with minor modifications. Moreover, a VLSI architecture is proposed for both algorithms. There are several low complexity and low-power features incorporated in the proposed algorithms and the VLSI architecture. The proposed hard-output KSE decoder and the soft-output MKSE decoder is implemented for 4/spl times/4 16-quadrature amplitude modulation (QAM) MIMO detection in a 0.35-/spl mu/m and a 0.13-/spl mu/m CMOS technology, respectively. The implemented hard-output KSE chip core is 5.76 mm/sup 2/ with 91 K gates. The KSE decoding throughput is up to 53.3 Mb/s with a core power consumption of 626 mW at 100 MHz clock frequency and 2.8 V supply. The implemented soft-output MKSE chip can achieve a decoding throughput of more than 100 Mb/s with a 0.56 mm/sup 2/ core area and 97 K gates. The implementation results show that it is feasible to achieve near-ML performance and high detection throughput for a 4/spl times/4 16-QAM MIMO system using the proposed algorithms and the VLSI architecture with reasonable complexity.  相似文献   

15.
A VLSI architecture, which exhibits both SIMD and systolic behaviour for computing the dynamic time-warping (DTW) algorithm is presented. Such an architecture is well-suited for VLSI implementation because of its regular structure and small number of input/output. Currently, based on a 1-2 µm CMOS technology, a SIMD-systolic data-path chip has been designed and fabricated for computing the DTW algorithm. It is functionally correct and packaged as a 68-pin PGA chip. With such a chip, a 20000-word real-time DTW-based speech recognition system is achievable.  相似文献   

16.
This paper presents a low-power bit-serial Viterbi decoder chip with the code rate r=1/3 and the constraint length K=9 (256 states) for next generation wireless communication applications. The architecture of the add-compare-select (ACS) module is based on the bit-serial arithmetic and implemented with the pass transistor logic circuit. A cluster-based ACS placement and state metric routing topology is described for the 256 bit-serial ACS units, which achieves very high area efficiency. In the trace-back operation, a power efficient trace-back scheme, allowing higher memory read access rate than memory write in a time-multiplexing method, is implemented to reduce the number of iterations required to generate a decoded output. In addition, a low-power application-specific memory suitable for the function of survivor path memory has also been developed. The chip's core, implemented using 0.5-μm CMOS technology, contains approximately 200 K transistors and occupies 2.46 mm by 4.17 mm area. This chip can achieve the decode rate of 20 Mb/s under 3.3 V and 2 Mb/s under 1.8 V. The measured power dissipation at 2 Mb/s under 1.8 V is only about 9.8 mW. The Viterbi decoder presented here can be applied to next generation wide-band code division multiple access (W-CDMA) systems  相似文献   

17.
In the application of digital RF memory (DRFM) chips for radar jamming, an RF signal is sampled, stored in random access memory (RAM) and later recreated from the stored data. A CMOS (l/SUB eff/=1 /spl mu/m) DRFM chip is described that integrates static RAM, control circuitry, and two channels of shift registers on a single chip. The sample rate achieved was 0.5 GHz. VLSI density was made possible by the low-power dissipation of quiescent CMOS circuits. An 8K RAM prototype chip has been built and tested.  相似文献   

18.
A channel decoder chip compliant with the 3GPP mobile wireless standard is described. It supports both data and voice calls simultaneously in a unified turbo/Viterbi decoder architecture. For voice services, the decoder can process over 128 voice channels encoded with rate 1/2 or 1/3, constraint length 9 convolutional codes. For data services, the turbo decoder is capable of processing any mix of rate 1/3, constraint length 4 turbo encoded data streams with an aggregate data rate of up to 2.5 Mb/s with 10 iterations per block (or 4.1 Mb/s with six iterations). The turbo decoder uses the logMAP algorithm with a programmable logsum correction table. It features an interleaver address processor that computes the 3GPP interleaver addresses for all block sizes enabling it to quickly switch context to support different data services for several users. The decoder also contains the 3GPP first channel de-interleaving function and a post-decoder bit error rate estimation unit. The chip is fabricated in a 0.18-/spl mu/m six-layer metal CMOS technology, has an active area of 9 mm/sup 2/, and has a peak clock frequency of 110.8 MHz at 1.8 V (nominal). The power consumption is 306 mW when turbo decoding a 2-Mb/s data stream with ten iterations per block and eight voice calls simultaneously.  相似文献   

19.
In this paper, a parallel dictionary based LZW algorithm called PDLZW algorithm and its hardware architecture for compression and decompression processors are proposed. In this architecture, instead of using a unique fixed-word-width dictionary a hierarchical variable-word-width dictionary set containing several dictionaries of small address space and increasing word widths is used for both compression and decompression algorithms. The results show that the new architecture not only can be easily implemented in VLSI technology because of its high regularity but also has faster compression and decompression rate since it no longer needs to search the dictionary recursively as the conventional implementations do.  相似文献   

20.
In this paper, we describe a fully pipelined single chip VLSI architecture for implementing the JPEG baseline image compression standard. The architecture exploits the principles of pipelining and parallelism to the maximum extent in order to obtain high speed and throughput. The architecture for discrete cosine transform and the entropy encoder are based on efficient algorithms designed for high speed VLSI implementation. The entire architecture can be implemented on a single VLSI chip to yield a clock rate of about 100 MHz which would allow an input rate of 30 frames per second for 1024×1024 color images  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号