首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Soft-output sphere decoding: algorithms and VLSI implementation   总被引:6,自引:0,他引:6  
Multiple-input multiple-output (MIMO) detection algorithms providing soft information for a subsequent channel decoder pose significant implementation challenges due to their high computational complexity. In this paper, we show how sphere decoding can be used as an efficient tool to implement soft-output MIMO detection with flexible trade-offs between computational complexity and (error rate) performance. In particular, we provide VLSI implementation results which demonstrate that single tree-search, sorted QR-decomposition, channel matrix regularization, log-likelihood ratio clipping, and imposing runtime constraints are the key ingredients for realizing soft-output MIMO detectors with near max-log performance at a chip area that is only 58% higher than that of the best-known hard-output sphere decoder VLSI implementation.  相似文献   

2.
Soft-output decoding has evolved as a key technology for new error correction approaches with unprecedented performance as well as for improvement of well established transmission techniques. In this paper, we present a high-speed VLSI implementation of the soft-output Viterbi algorithm, a low complexity soft-output algorithm, for a 16-state convolutional code. The 43 mm2 standard cell chip achieves a simulated throughput of 40 Mb/s, while tested samples achieved a throughput of 50 Mb/s. The chip is roughly twice as big as a 16-state Viterbi decoder without soft outputs. It is thus shown with the design that transmission schemes using soft-output decoding can be considered practical even at very high throughput. Since such decoding systems are more complex to design than hard output systems, special emphasis is placed on the employed design methodology  相似文献   

3.
Presented in this paper is a pipelined 285-MHz maximum a posteriori probability (MAP) decoder IC. The 8.7-mm/sup 2/ IC is implemented in a 1.8-V 0.18-/spl mu/m CMOS technology and consumes 330 mW at maximum frequency. The MAP decoder chip features a block-interleaved pipelined architecture, which enables the pipelining of the add-compare-select kernels. Measured results indicate that a turbo decoder based on the presented MAP decoder core can achieve: 1) a decoding throughput of 27.6 Mb/s with an energy-efficiency of 2.36 nJ/b/iter; 2) the highest clock frequency compared to existing 0.18-/spl mu/m designs with the smallest area; and 3) comparable throughput with an area reduction of 3-4.3/spl times/ with reference to a look-ahead based high-speed design (Radix-4 design), and a parallel architecture.  相似文献   

4.
Two eight-state 7-bit soft-output Viterbi decoders matched to an EPR4 channel and a rate-8/9 convolutional code are implemented in a 0.18-/spl mu/m CMOS technology. The throughput of the decoders is increased through architectural transformation of the add-compare-select recursion, with a small area overhead. The survivor-path decoding logic of a conventional Viterbi decoder register exchange is adapted to detect the two most likely paths. The 4-mm/sup 2/ chip has been verified to decode at 500 Mb/s with 1.8-V supply. These decoders can be used as constituent decoders for Turbo codes in high-performance applications requiring information rates that are very close to the Shannon limit.  相似文献   

5.
A high-speed Viterbi decoder VLSI with coding rate R=1/2 and constraint length K=7 for bit-error correction has been developed using 1.5-/spl mu/m n-well CMOS technology. To reduce both hardware size and power dissipation, a recently developed scarce-state-transition (SST) Viterbi decoding scheme has been utilized. In addition, three-layer metallization and an advanced hierarchical macrocell design method (HMCM) have been adopted to improve packing density and reduce chip size. As a result, active chip area has been reduced by half, compared to the conventional standard cell design method (SCM) with two-layer metallization, and 42 K gates have been integrated on a chip with a die size of 9.52/spl times/10.0 mm/SUP 2/. The VLSI decoder has achieved a maximum data throughput rate of 23 Mb/s with a net coding gain of 4.4 dB (at 10/SUP -4/ bit-error rate). The chip dissipates only 825 mW at a data rate of 10 Mb/s.  相似文献   

6.
Multiple-input multiple-output (MIMO) wireless systems increase spectral efficiency by transmitting independent signals on multiple transmit antennas in the same channel bandwidth. The key to using MIMO is in building a receiver that can decorrelate the spatial signatures on the receiver antenna array. Original MIMO detection schemes such as the vertical Bell Labs layered space-time (VBLAST) detector use a ing and cancellation process for detection that is sub-optimal as compared to constrained maximum likelihood (ML) techniques. This paper presents a silicon complexity analysis of ML search techniques for MIMO as applied to the HSDPA extension of UMTS. For MIMO constellations of 4/spl times/4 QPSK or lower, it is possible to perform an exhaustive ML search in today's silicon technologies. When the search complexity exceeds technology limits for high complexity MIMO constellations, it is possible to apply spherical decoding techniques to achieve near-ML performance. The paper presents an architecture for a 4/spl times/4 16QAM MIMO spherical decoder with soft outputs that achieves 38.8 Mb/s over a 5-MHz channel using only approximately 10 mm/sup 2/ in a 0.18-/spl mu/m CMOS process.  相似文献   

7.
This brief presents a new technique in implementing a very large-scale integration trellis code modulation (TCM) decoder. The technique aims to reduce hardware complexity and increase decoding throughput. The technique is introduced in the design of a Viterbi decoder. To simplify the decoding algorithm and calculation, branch cost distances are pre-calculated and stored in a distance look-up table (DLUT). The concept of DLUT significantly reduces hardware requirements as this table eliminates the need for calculation circuitry. In addition, an output LUT (OLUT) is constructed based on the trellis diagram of the code. This table generates the decoding output using information provided by the algorithm. The use of this OLUT reduces the amount of storage requirement. The technique was used to design a 16-state, radix-4 codec for two-dimensional and four-dimensional TCM. The decoder was implemented in hardware after functional simulation. The tested ASIC has a core area of 1.1 mm/sup 2/ in 0.18-/spl mu/m CMOS. A decoding speed of 1 Gbps was achieved. Implementation results have shown that LUTs can be used to decrease hardware requirement and increase decoding speed.  相似文献   

8.
VLSI implementation of MIMO detection using the sphere decoding algorithm   总被引:3,自引:0,他引:3  
Multiple-input multiple-output (MIMO) techniques are a key enabling technology for high-rate wireless communications. This paper discusses two ASIC implementations of MIMO sphere decoders. The first ASIC attains maximum-likelihood performance with an average throughput of 73 Mb/s at a signal-to-noise ratio (SNR) of 20 dB; the second ASIC shows only a negligible bit-error-rate degradation and achieves a throughput of 170 Mb/s at the same SNR. The three key contributing factors to high throughput and low complexity are: depth-first tree traversal with radius reduction, implemented in a one-node-per-cycle architecture, the use of the /spl lscr//sup /spl infin//-instead of /spl lscr//sup 2/-norm, and, finally, the efficient implementation of the enumeration approach recently proposed in . The resulting ASICs currently rank among the fastest reported MIMO detector implementations.  相似文献   

9.
Decoding the Golden Code: A VLSI Design   总被引:1,自引:0,他引:1  
The recently proposed Golden code is an optimal space-time block code for 2$,times,$ 2 multiple-input–multiple-output (MIMO) systems. The aim of this work is the design of a VLSI decoder for a MIMO system coded with the Golden code. The architecture is based on a rearrangement of the sphere decoding algorithm that achieves maximum-likelihood (ML) decoding performance. Compared to other approaches, the proposed solution exhibits an inherent flexibility in terms of QAM modulation size and this makes our architecture particularly suitable for adaptive modulation schemes. Relying on the flexibility of this approach two different architectures are proposed: a parametric one able to achieve high decoding throughputs ($>$ 165 Mb/s) while keeping low overall decoder complexity (45 KGates), a flexible implementation able to dynamically adapt to the modulation scheme (4-,16-,64-QAM) retaining the low complexity and high throughput features.   相似文献   

10.
Parallel decoding is required for low density parity check (LDPC) codes to achieve high decoding throughput, but it suffers from a large set of registers and complex interconnections due to randomly located 1's in the sparse parity check matrix. This paper proposes a new LDPC decoding architecture to reduce registers and alleviate complex interconnections. To reduce the number of messages to be exchanged among processing units (PUs), two data flows that can be loosely coupled are developed by allowing duplicated operations. In addition, intermediate values are grouped and stored into local storages each of which is accessed by only one PU. In order to save area, local storages are implemented using memories instead of registers. A partially parallel architecture is proposed to promote the memory usage and an efficient algorithm that schedules the processing order of the partially parallel architecture is also proposed to reduce the overall processing time by overlapping operations. To verify the proposed architecture, a 1024 bit rate-1/2 LDPC decoder is implemented using a 0.18-/spl mu/m CMOS process. The decoder runs correctly at the frequency of 200 MHz, which enables almost 1 Gbps decoding throughput. Since the proposed decoder occupies an area of 10.08 mm/sup 2/, it is less than one fifth of area compared to the previous architecture.  相似文献   

11.
This brief proposes a fast multispeed comma-free Reed-Solomon (CFRS) decoder for the frame synchronization and code-group identification in the cell search of the Third Generation Partnership Project wide-band code-division multiple access/frequency division duplexing (W-CDMA/FDD) system. A foldable systolic array is proposed to achieve fast decoding and provide flexible tradeoffs between power consumption, chip size, and decoding latency. Multispeed decoding, an idea that is useful for cell search in different application scenarios, can also be achieved with the same array architecture. The proposed CFRS decoder is implemented in a 3.3-V 0.35-/spl mu/m CMOS technology with 2.2 /spl times/ 2.2 mm/sup 2/ core area and power dissipation of 13.3 and 1.23 mW in high- and low-speed decoding modes, respectively.  相似文献   

12.
A 640-Mb/s 2048-bit programmable LDPC decoder chip   总被引:3,自引:0,他引:3  
A 14.3-mm/sup 2/ code-programmable and code-rate tunable decoder chip for 2048-bit low-density parity-check (LDPC) codes is presented. The chip implements the turbo-decoding message-passing (TDMP) algorithm for architecture-aware (AA-)LDPC codes which has a faster convergence rate and hence a throughput advantage over the standard decoding algorithm. It employs a reduced complexity message computation mechanism free of lookup tables, and features a programmable network for message interleaving based on the code structure. The chip decodes any mix of 2048-bit rate-1/2 (3,6)-regular AA-LDPC codes in standard mode by programming the network, and attains a throughput of 640 Mb/s at 125 MHz for 10 TDMP-decoding iterations. In augmented mode, the code rate can be tuned up to 14/16 in steps of 1/16 by augmenting the code. The chip is fabricated in 0.18-/spl mu/m six-metal-layer CMOS technology, operates at a peak clock frequency of 125 MHz at 1.8 V (nominal), and dissipates an average power of 787 mW.  相似文献   

13.
A new class of soft MIMO demodulation algorithms   总被引:8,自引:0,他引:8  
We propose a new class of soft-input soft-output demodulation schemes for multiple-input multiple-output (MIMO) channels, based on the sequential Monte Carlo (SMC) framework under both stochastic and deterministic settings. The stochastic SMC sampler generates MIMO symbol samples based on importance sampling and resampling techniques, whereas the deterministic SMC approach recursively performs exploration and selection steps in a greedy manner. By exploiting the artificial sequential structure of the existing simple Bell-Labs layered space-time (BLAST) detection method based on ing and cancellation, the proposed algorithms achieve an error probability performance that is orders of magnitude better than the traditional BLAST detection schemes while maintaining a low computational complexity. In fact, the new methods offer performance comparable with that of the sphere decoding algorithm without attendant increase in complexity. More importantly, being soft-input soft-output in nature, both the stochastic and deterministic SMC detectors can be employed as the first-stage demodulator in a turbo receiver in coded MIMO systems. Such a turbo receiver successively improves the receiver performance by iteratively exchanging the so-called extrinsic information between the soft outer channel decoder and the inner soft MIMO demodulator under both known channel state and unknown channel state scenarios. Computer simulation results are provided to demonstrate the performance of the proposed algorithms.  相似文献   

14.
In this paper, a VLSI architecture based on radix-2/sup 2/ integer fast Fourier transform (IntFFT) is proposed to demonstrate its efficiency. The IntFFT algorithm guarantees the perfect reconstruction property of transformed samples. For a 64-points radix-2/sup 2/ FFT architecture, the proposed architecture uses 2 sets of complex multipliers (six real multipliers) and has 6 pipeline stages. By exploiting the symmetric property of lossless transform, the memory usage is reduced by 27.4%. The whole design is synthesized and simulated with a 0.18-/spl mu/m TSMC 1P6M standard cell library and its reported equivalent gate count usage is 17,963 gates. The whole chip size is 975 /spl mu/m/spl times/977 /spl mu/m with a core size of 500 /spl mu/m/spl times/500 /spl mu/m. The core power consumption is 83.56 mW. A Simulink-based orthogonal frequency demodulation multiplexing platform is utilized to compare the conventional fixed-point FFT and proposed IntFFT from the viewpoint of system-level behavior in items of signal-to-quantization-noise ratio (SQNR) and bit error rate (BER). The quantization loss analysis of these two types of FFT is also derived and compared. Based on the simulation results, the proposed lossless IntFFT architecture can achieve comparative SQNR and BER performance with reduced memory usage.  相似文献   

15.
Multiple-Input-Multiple-Output communication systems demand fast sphere decoding with high performance. To speed up the computation, we propose a scheme with multiple fixed complexity sphere decoders to construct a parallel soft-output fixed complexity sphere decoder (PFSD). The proposed decoder is highly parallel and has performance comparable to soft-output list fixed complexity sphere decoder (LFSD) and K-best sphere decoder. In addition, we propose a parallel QR decomposition algorithm to lower the preprocessing overhead, and a low complexity LLR algorithm to allow parallel update of LLR values. We demonstrate that the PFSD algorithm can increase the throughput and reduce bit error rate of a soft-output solution in a 4 × 4 16-QAM system, and has superior performance compared to other soft decoders with comparable throughput and computation complexity. The PFSD algorithm has been mapped onto Xilinx XC4VLX160 FPGA. The resulting PFSD decoder can achieve up to 75 Mbps throughput for 4 × 4 64-QAM configuration at 100MHz with low control overhead.  相似文献   

16.
MIMO has been proposed as an extension to 3G and Wireless LANs. As an implementation scheme of MIMO systems, V-BLAST is suitable for the applications with very high data rates. The square root algorithm for V-BLAST detection is attractive to hardware implementations due to its low computational complexity and numerical stability. In this paper, the fixed-point implementation of the square root algorithm is analyzed, and a low complexity VLSI architecture is proposed. The proposed architecture is scalable for various configurations, and implemented for a 4 × 4 QPSK V-BLAST system in a 0.35 m CMOS technology. The chip core covers 9 and 190 K gates. The detection throughput of the chip depends on the received symbol packet length. When the packet length is larger than or equal to 100 bytes, it can achieve a maximal detection throughput of 128 160 Mb/s at a maximal clock frequency of 80 MHz. The core power consumption, measured at 2.7 V and room temperature, is about 608 mW for 160 Mb/s data rate at 80 MHz, and 81 mW for 20 Mb/s at 10 MHz. The proposed architecture is shown to meet the requirements for emerging MIMO applications, such as 3G HSDPA and IEEE 802.11n.  相似文献   

17.
Design and VLSI implementation for a WCDMA multipath searcher   总被引:2,自引:0,他引:2  
The third generation (3G) of cellular communications standards is based on wideband CDMA. The wideband signal experiences frequency selective fading due to multipath propagation. To mitigate this effect, a RAKE receiver is typically used to coherently combine the signal energy received on different multipaths. An effective multipath searcher is, therefore, required to identify the delayed versions of the transmitted signal with low probability of false alarm and misdetection. This paper presents an efficient and novel WCDMA multipath searcher design and VLSI architecture that provides a good compromise between complexity, performance, and power consumption. Novel multipath searcher algorithms such as time domain interleaving and peak detection are also presented. The proposed searcher was implemented in 0.18 /spl mu/m CMOS technology and requires only 150 k gates for a total area of 1.5 mm/sup 2/ consuming 6.6 mw at 100 MHz. The functionality and performance of the searcher was verified under realistic conditions using a channel emulator.  相似文献   

18.
An efficient soft-input soft-output iterative decoding algorithm for block turbo codes (BTCs) is proposed. The proposed algorithm utilizes Kaneko's (1994) decoding algorithm for soft-input hard-output decoding. These hard outputs are converted to soft-decisions using reliability calculations. Three different schemes for reliability calculations incorporating different levels of approximation are suggested. The algorithm proposed here presents a major advantage over existing decoding algorithms for BTCs by providing ample flexibility in terms of performance-complexity tradeoff. This makes the algorithm well suited for wireless multimedia applications. The algorithm can be used for optimal as well as suboptimal decoding. The suboptimal versions of the algorithm can be developed by changing a single parameter (the number of error patterns to be generated). For any performance, the computational complexity of the proposed algorithm is less than the computational complexity of similar existing algorithms. Simulation results for the decoding algorithm for different two-dimensional BTCs over an additive white Gaussian noise channel are shown. A performance comparison of the proposed algorithm with similar existing algorithms is also presented  相似文献   

19.
A high-speed low-complexity Reed-Solomon decoder for optical communications   总被引:2,自引:0,他引:2  
This paper presents a high-speed low-complexity Reed-Solomon (RS) decoder architecture using a novel pipelined recursive modified Euclidean (PrME) algorithm block for very high-speed optical communications. The RS decoder features a low-complexity key equation solver using a PrME algorithm block. The recursive structure enables the novel low-complexity PrME algorithm block to be implemented. Pipelining and parallelizing allow the inputs to be received at very high fiber-optic rates, and outputs to be delivered at correspondingly high rates with minimum delay. This paper presents the key ideas applied to the design of an 80-Gb/s RS decoder architecture, especially that for achieving high throughput and reducing complexity. The 80-Gb/s 16-channel RS decoder has been designed and implemented using 0.13-/spl mu/m CMOS technology in a supply voltage of 1.2 V. The proposed RS decoder has a core gate count of 393 K and operates at a clock rate of 625 MHz.  相似文献   

20.
The full-complexity soft-input/soft-output (SISO) detector based on the BCJR algorithm for coded partial-response channels has a computational complexity growing exponentially with channel memory length. In this letter, we propose a low complexity soft-output channel detector based on the Chase decoding algorithm, which was previously applied to decode turbo product codes. At each iteration, the proposed detector forms a candidate list using all possible combinations of bit patterns in the weakest indices based on tentative hard estimates and a priori information fed back from the outer decoder. To demonstrate the performance/complexity tradeoff of the proposed detector, simulation results over rate-8/9 turbo-coded EPR4 and ME/sup 2/PR4 channels are presented, respectively. It is shown that the proposed detector can significantly reduce the computational complexity with only a small performance loss compared to the BCJR algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号