期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A 285-MHz pipelined MAP decoder in 0.18-/spl mu/m CMOS

Seok-Jun Lee Shanbhag N.R. Singer A.C. 《Solid-State Circuits, IEEE Journal of》2005,40(8):1718-1725

Presented in this paper is a pipelined 285-MHz maximum a posteriori probability (MAP) decoder IC. The 8.7-mm/sup 2/ IC is implemented in a 1.8-V 0.18-/spl mu/m CMOS technology and consumes 330 mW at maximum frequency. The MAP decoder chip features a block-interleaved pipelined architecture, which enables the pipelining of the add-compare-select kernels. Measured results indicate that a turbo decoder based on the presented MAP decoder core can achieve: 1) a decoding throughput of 27.6 Mb/s with an energy-efficiency of 2.36 nJ/b/iter; 2) the highest clock frequency compared to existing 0.18-/spl mu/m designs with the smallest area; and 3) comparable throughput with an area reduction of 3-4.3/spl times/ with reference to a look-ahead based high-speed design (Radix-4 design), and a parallel architecture. 相似文献

2.

Implementation of a Radix-4, Parallel Turbo Decoder and Enabling the Multi-Standard Support

Rizwan Asghar Di Wu Ali Saeed Yulin Huang Dake Liu 《Journal of Signal Processing Systems》2012,66(1):25-41

This paper presents a unified, radix-4 implementation of turbo decoder, covering multiple standards such as DVB, WiMAX, 3GPP-LTE and HSPA Evolution. The radix-4, parallel interleaver is the bottleneck while using the same turbo-decoding architecture for multiple standards. This paper covers the issues associated with design of radix-4 parallel interleaver to reach to flexible turbo-decoder architecture. Radix-4, parallel interleaver algorithms and their mapping on to hardware architecture is presented for multi-mode operations. The overheads associated with hardware multiplexing are found to be least significant. Other than flexibility for the turbo decoder implementation, the low silicon cost and low power aspects are also addressed by optimizing the storage scheme for branch metrics and extrinsic information. The proposed unified architecture for radix-4 turbo decoding consumes 0.65 mm² area in total in 65 nm CMOS process. With 4 SISO blocks used in parallel and 6 iterations, it can achieve a throughput up to 173.3 Mbps while consuming 570 mW power in total. It provides a good trade-off between silicon cost, power consumption and throughput with silicon efficiency of 0.005 mm²/Mbps and energy efficiency of 0.55 nJ/b/iter. 相似文献

3.

Turbo Product Code Decoder Without Interleaving Resource: From Parallelism Exploration to High Efficiency Architecture

Camille Leroux Christophe Jego Patrick Adde Deepak Gupta Michel Jezequel 《Journal of Signal Processing Systems》2011,64(1):17-29

This article proposes to explore parallelism in Turbo-Product Code (TPC) decoding through a parallelism level classification and characterization. From this design space exploration, an innovative TPC decoder architecture without any interleaving resource is presented. This architecture includes a fully-parallel SISO decoder capable of processing n symbols in one clock period. Syntheses results show the better efficiency of such an architecture compared with existing solutions. Considering a six-iteration turbo decoder of a BCH(32,26)² product code, synthesized in 90 nm CMOS technology, 10 Gb/s can be achieved with an area of 600 Kgates. Moreover, a second architecture enhancing parallelism rate is described. The throughput is 50 Gb/s while an area estimation gives 2.2 Mgates. Finally, comparisons with existing TPC decoders and existing LDPC decoders are performed. They validate the potential of proposed TPC decoder for Gb/s optical fiber transmission systems. 相似文献

4.

A Compact 1.1-Gb/s Encoder and a Memory-Based 600-Mb/s Decoder for LDPC Convolutional Codes

《IEEE transactions on circuits and systems. I, Regular papers》2009,56(5):1017-1029

We present a rate-1/2 (128,3,6) LDPC convolutional code encoder and decoder that we implemented in a 90-nm CMOS process. The 1.1-Gb/s encoder is a compact, low-power implementation that includes one-hot encoding for phase generation and built-in termination. The decoder design uses a memory-based interface with a minimum number of memory banks to deliver an information throughput of 1 b per clock cycle. The decoder shares one controller among a pipeline of decoder processors. The decoder dissipates 0.61 nJ of energy per decoded information bit at an SNR of 2 dB and a decoded throughput of 600 Mb/s. On-chip test circuitry permits accurate power measurements to be made at selectable SNR settings. 相似文献

5.

A 0.35-/spl mu/m CMOS analog turbo decoder for the 40-bit rate 1/3 UMTS channel code

Vogrig D. Gerosa A. Neviani A. Amat A.Gi. Montorsi G. Benedetto S. 《Solid-State Circuits, IEEE Journal of》2005,40(3):753-762

This work presents the design and the test results of an analog decoder for the 40-bit block length, rate 1/3, Turbo Code defined in the UMTS standard. The prototype is fully integrated in a three-metal double-poly 0.35-/spl mu/m CMOS technology, and includes an I/O interface that maximizes the decoder throughput. After the successful implementation of proof-of-concept analog iterative decoders by different research groups in both bipolar and CMOS technologies, this is the first reported prototype of an analog decoder for a realistic error-correcting code. The decoder was successfully tested at the maximum data rate defined in the standard (2 Mb/s), with an overall power consumption of 10.3 mW at 3.3 V, going down to 7.6 mW with the decoder core operated at 2 V, and an extremely low energy per decoded bit and trellis state (0.85 nJ for the decoder core alone). 相似文献

6.

High performance, high throughput turbo/SOVA decoder design

Zhongfeng Wang Parhi K.K. 《Communications, IEEE Transactions on》2003,51(4):570-579

Two efficient approaches are proposed to improve the performance of soft-output Viterbi (1998) algorithm (SOVA)-based turbo decoders. In the first approach, an easily obtainable variable and a simple mapping function are used to compute a target scaling factor to normalize the extrinsic information output from turbo decoders. An extra coding gain of 0.5 dB can be obtained with additive white Gaussian noise channels. This approach does not introduce extra latency and the hardware overhead is negligible. In the second approach, an adaptive upper bound based on the channel reliability is set for computing the metric difference between competing paths. By combining the two approaches, we show that the new SOVA-based turbo decoders can approach maximum a posteriori probability (MAP)-based turbo decoders within 0.1 dB when the target bit-error rate (BER) is moderately low (e.g., BER<10/sup -4/ for 1/2 rate codes). Following this, practical implementation issues are discussed and finite precision simulation results are provided. An area-efficient parallel decoding architecture is presented in this paper as an effective approach to design high-throughput turbo/SOVA decoders. With the efficient parallel architecture, multiple times throughput of a conventional serial decoder can be obtained by increasing the overall hardware by a small percentage. To resolve the problem of multiple memory accesses per cycle for the efficient parallel architecture, a novel two-level hierarchical interleaver architecture is proposed. Simulation results show that the proposed interleaver architecture performs as well as random interleavers, while requiring much less storage of random patterns. 相似文献

7.

High-throughput LDPC decoders

Mansour M.M. Shanbhag N.R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(6):976-996

A high-throughput memory-efficient decoder architecture for low-density parity-check (LDPC) codes is proposed based on a novel turbo decoding algorithm. The architecture benefits from various optimizations performed at three levels of abstraction in system design-namely LDPC code design, decoding algorithm, and decoder architecture. First, the interconnect complexity problem of current decoder implementations is mitigated by designing architecture-aware LDPC codes having embedded structural regularity features that result in a regular and scalable message-transport network with reduced control overhead. Second, the memory overhead problem in current day decoders is reduced by more than 75% by employing a new turbo decoding algorithm for LDPC codes that removes the multiple checkto-bit message update bottleneck of the current algorithm. A new merged-schedule merge-passing algorithm is also proposed that reduces the memory overhead of the current algorithm for low to moderate-throughput decoders. Moreover, a parallel soft-input-soft-output (SISO) message update mechanism is proposed that implements the recursions of the Balh-Cocke-Jelinek-Raviv (BCJR) algorithm in terms of simple "max-quartet" operations that do not require lookup-tables and incur negligible loss in performance compared to the ideal case. Finally, an efficient programmable architecture coupled with a scalable and dynamic transport network for storing and routing messages is proposed, and a full-decoder architecture is presented. Simulations demonstrate that the proposed architecture attains a throughput of 1.92 Gb/s for a frame length of 2304 bits, and achieves savings of 89.13% and 69.83% in power consumption and silicon area over state-of-the-art, with a reduction of 60.5% in interconnect length. 相似文献

8.

Design of a 20-mb/s 256-state Viterbi decoder 总被引：1，自引：0，他引：1

Xun Liu Papaefthymiou M.C. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(6):965-975

The design of high-throughput large-state Viterbi decoders relies on the use of multiple arithmetic units. The global communication channels among these parallel processors often consist of long interconnect wires, resulting in large area and high power consumption. In this paper, we propose a data transfer oriented design methodology to implement a low-power 256-state rate-1/3 Viterbi decoder. Our architectural level scheme uses operation partitioning, packing, and scheduling to analyze and optimize interconnect effects in early design stages. In comparison with other published Viterbi decoders, our approach reduces the global data transfers by up to 75% and decreases the amount of global buses by up to 48%, while enabling the use of deeply pipelined datapaths with no data forwarding. In the register-transfer level (RTL) implementation, we apply precomputation in conjunction with saturation arithmetic to further reduce power dissipation with provably no coding performance degradation. Designed using a 0.25 /spl mu/m standard cell library, our decoder achieves a throughput of 20 Mb/s in simulation and dissipates only 0.45 W. 相似文献

9.

A scalable LDPC decoder ASIC architecture with bit-serial message exchange

Tyler Robert Gary Vincent C. Bruce Sheryl Christian Keith Paul Siavash Sheikh Anthony Stephen Duncan Christian 《Integration, the VLSI Journal》2008,41(3):385-398

We present a scalable bit-serial architecture for ASIC realizations of low-density parity check (LDPC) decoders. Supporting the architecture's potential, we describe a decoder implementation for a (256,128) regular-(3,6) LDPC code that has a decoded information throughput of 250 Mbps, a core area of 6.96 mm² in 180-nm 6-metal CMOS, and an energy efficiency of 7.56 nJ per uncoded bit at low signal-to-noise ratios. The decoder is fully block-parallel, with all bits of each 256-bit codeword being processed by 256 variable nodes and 128 parity check nodes that together form an 8-stage iteration pipeline. Extrinsic messages are exchanged bit-serially between the variable and parity check nodes to significantly reduce the interleaver wiring. Parity check node processing is also bit-serial. The silicon implementation performs 32 iterations of the min-sum decoding algorithm on two staggered codewords in the same pipeline. The results of a supplementary layout study show that the reduced wiring congestion makes the decoder readily scaleable up to the longer kilobit-size LDPC codewords that appear in important emerging communication standards. 相似文献

10.

High-performance VLSI architectures for turbo decoders with QPP interleaver

Shivani Verma 《International Journal of Electronics》2013,100(4):599-618

This paper analyses different VLSI architectures for 3GPP LTE/LTE-advanced turbo decoders for trade-offs in terms of throughput and area requirement. Data flow graphs for standard SISO MAP (maximum a posteriori) turbo decoder, SW – SISO MAP turbo decoder, PW SISO MAP turbo decoder have been presented, thus analysing their performance. Two variants of quadratic permutation polynomial (QPP) interleaver have been proposed which tend to simplify the complexity of ‘mod’ operator implementation and provide best compromise between area, delay and power dissipation. Implementation of decoder using one variant of QPP interleaver has also been discussed. A novel approach for area optimisation has been proposed to reduce required number of interleavers for parallel window turbo decoder. Multi-port memory has also been used for parallel turbo decoder. To increase the throughput without any effective increase in area complexity, circuit-level pipelining and retiming have been used. Proposed architectures have been synthesised using Synopsys Design Compiler using 45-nm CMOS technology. 相似文献

11.

Highly-Parallel Decoding Architectures for Convolutional Turbo Codes

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(10):1147-1151

Highly parallel decoders for convolutional turbo codes have been studied by proposing two parallel decoding architectures and a design approach of parallel interleavers. To solve the memory conflict problem of extrinsic information in a parallel decoder, a block-like approach in which data is written row-by-row and read diagonal-wise is proposed for designing collision-free parallel interleavers. Furthermore, a warm-up-free parallel sliding window architecture is proposed for long turbo codes to maximize the decoding speeds of parallel decoders. The proposed architecture increases decoding speed by 6%-34% at a cost of a storage increase of 1% for an eight-parallel decoder. For short turbo codes (e.g., length of 512 bits), a warm-up-free parallel window architecture is proposed to double the speed at the cost of a hardware increase of 12% 相似文献

12.

An artificial neural net Viterbi decoder

Xiao-An Wang Wicker S.B. 《Communications, IEEE Transactions on》1996,44(2):165-171

The Viterbi algorithm is a maximum likelihood means for decoding convolutional codes and has thus played an important role in applications ranging from satellite communications to cellular telephony. In the past, Viterbi decoders have usually been implemented using digital circuits. The speed of these digital decoders is directly related to the amount of parallelism in the design. As the constraint length of the code increases, parallelism becomes problematic due to the complexity of the decoder. In this paper an artificial neural network (ANN) Viterbi decoder is presented. The ANN decoder is significantly faster than comparable digital-only designs due to its fully parallel architecture. The fully parallel structure is obtained by implementing most of the Viterbi algorithm using analog neurons as opposed to digital circuits. Several modifications to the ANN decoder are considered, including an analog/digital hybrid design that results in an extremely fast and efficient decoder. The ANN decoder requires one-sixth the number of transistors required by the digital decoder. The connection weights of the ANN decoder are either +1 or -1, so weight considerations in the implementation are eliminated. This, together with the design's modularity and local connectivity, makes the ANN Viterbi decoder a natural fit for VLSI implementation. Simulation results are provided to show that the performance of the ANN decoder matches that of an ideal Viterbi decoder 相似文献

13.

CMOS analog MAP decoder for (8,4) Hamming code

Winstead C. Jie Dai Shuhuan Yu Myers C. Harrison R.R. Schlegel C. 《Solid-State Circuits, IEEE Journal of》2004,39(1):122-131

Design and test results for a fully integrated translinear tail-biting MAP error-control decoder are presented. Decoder designs have been reported for various applications which make use of analog computation, mostly for Viterbi-style decoders. MAP decoders are more complex, and are necessary components of powerful iterative decoding systems such as turbo codes. Analog circuits may require less area and power than digital implementations in high-speed iterative applications. Our (8, 4) Hamming decoder, implemented in an AMI 0.5-/spl mu/m process, is the first functioning CMOS analog MAP decoder. While designed to operate in subthreshold, the decoder also functions above threshold with a small performance penalty. The chip has been tested at bit rates up to 2 Mb/s, and simulations indicate a top speed of about 10 Mb/s in strong inversion. The decoder circuit size is 0.82 mm/sup 2/, and typical power consumption is 1 mW at 1 Mb/s. 相似文献

14.

Parallel interleaver design and VLSI architecture for low-latency MAP turbo decoders

Dobkin R. Peleg M. Ginosar R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(4):427-438

Standard VLSI implementations of turbo decoding require substantial memory and incur a long latency, which cannot be tolerated in some applications. A parallel VLSI architecture for low-latency turbo decoding, comprising multiple single-input single-output (SISO) elements, operating jointly on one turbo-coded block, is presented and compared to sequential architectures. A parallel interleaver is essential to process multiple concurrent SISO outputs. A novel parallel interleaver and an algorithm for its design are presented, achieving the same error correction performance as the standard architecture. Latency is reduced up to 20 times and throughput for large blocks is increased up to six-fold relative to sequential decoders, using the same silicon area, and achieving a very high coding gain. The parallel architecture scales favorably: latency and throughput are improved with increased block size and chip area. 相似文献

15.

Area-efficient high-throughput MAP decoder architectures

Seok-Jun Lee Shanbhag N.R. Singer A.C. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(8):921-933

Iterative decoders such as turbo decoders have become integral components of modern broadband communication systems because of their ability to provide substantial coding gains. A key computational kernel in iterative decoders is the maximum a posteriori probability (MAP) decoder. The MAP decoder is recursive and complex, which makes high-speed implementations extremely difficult to realize. In this paper, we present block-interleaved pipelining (BIP) as a new high-throughput technique for MAP decoders. An area-efficient symbol-based BIP MAP decoder architecture is proposed by combining BIP with the well-known look-ahead computation. These architectures are compared with conventional parallel architectures in terms of speed-up, memory and logic complexity, and area. Compared to the parallel architecture, the BIP architecture provides the same speed-up with a reduction in logic complexity by a factor of M, where M is the level of parallelism. The symbol-based architecture provides a speed-up in the range from 1 to 2 with a logic complexity that grows exponentially with M and a state metric storage requirement that is reduced by a factor of M as compared to a parallel architecture. The symbol-based BIP architecture provides speed-up in the range M to 2M with an exponentially higher logic complexity and a reduced memory complexity compared to a parallel architecture. These high-throughput architectures are synthesized in a 2.5-V 0.25-/spl mu/m CMOS standard cell library and post-layout simulations are conducted. For turbo decoder applications, we find that the BIP architecture provides a throughput gain of 1.96 at the cost of 63% area overhead. For turbo equalizer applications, the symbol-based BIP architecture enables us to achieve a throughput gain of 1.79 with an area savings of 25%. 相似文献

16.

Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advance turbo decoder

Yang Sun Joseph R. CavallaroAuthor vitae 《Integration, the VLSI Journal》2011,44(4):305-315

We present an efficient VLSI architecture for 3GPP LTE/LTE-Advance Turbo decoder by utilizing the algebraic-geometric properties of the quadratic permutation polynomial (QPP) interleaver. The high-throughput 3GPP LTE/LTE-Advance Turbo codes require a highly-parallel decoder architecture. Turbo interleaver is known to be the main obstacle to the decoder parallelism due to the collisions it introduces in accesses to memory. The QPP interleaver solves the memory contention issues when several MAP decoders are used in parallel to improve Turbo decoding throughput. In this paper, we propose a low-complexity QPP interleaving address generator and a multi-bank memory architecture to enable parallel Turbo decoding. Design trade-offs in terms of area and throughput efficiency are explored to find the optimal architecture. The proposed parallel Turbo decoder has been synthesized, placed and routed in a 65-nm CMOS technology with a core area of 8.3 mm² and a maximum clock frequency of 400 MHz. This parallel decoder, comprising 64 MAP decoder cores, can achieve a maximum decoding throughput of 1.28 Gbps at 6 iterations 相似文献

17.

A Hybrid Decoder for Block Turbo Codes

Al-Dweik A. Le Goff S. Sharif B. 《Communications, IEEE Transactions on》2009,57(5):1229-1232

We propose a novel iterative decoder for block turbo codes (BTCs). The proposed decoder combines soft-input/softoutput (SISO) and hard-input/hard-output (HIHO) constituent decoders in order to obtain better error performance and reduce the computational complexity compared to classical BTC decoders. We show that the new decoder, called ?hybrid decoder?, offers a better complexity/performance tradeoff than a classical BTC decoder. 相似文献

18.

基于Radix-4实现高速Viterbi译码器设计

马力陈泳恩《通信技术》2002,(3):13-14

使用一种新的Viterbi译码器设计方法来达到高速率、低功耗设计。在传统Viterbi译码器中,ACS(add-compare-select)单元是基于radix-2网格设计的,而这里将介绍一种新的ACS设计方法,即基于radix-4网格的ACS单元设计。每个这样的ACS单元将有4路输入,即在每个时钟周期能够处理两级传统的基于radix-2设计的两级网格。同时在这里的Viterbi译码器设计中采用了Top-To-Down设计思想,用Verilog语言来描述RTL电路层。并用QuartusII软件进行电路仿真和综合。用本算法在33.333MHz时钟下实观在Altera公司的APEX20KFPGA的64状态Viterbi译码器译码速率可达8Mbps以上,且仅占用很小的硬件资源。采用此方法设计的高速Viterbi解码器SoftIPCore可应用于需要高速,低功耗译码的多媒体移动通讯上。相似文献

19.

A Flexible LDPC/Turbo Decoder Architecture

Yang Sun Joseph R. Cavallaro 《Journal of Signal Processing Systems》2011,64(1):1-16

Low-density parity-check (LDPC) codes and convolutional Turbo codes are two of the most powerful error correcting codes that are widely used in modern communication systems. In a multi-mode baseband receiver, both LDPC and Turbo decoders may be required. However, the different decoding approaches for LDPC and Turbo codes usually lead to different hardware architectures. In this paper we propose a unified message passing algorithm for LDPC and Turbo codes and introduce a flexible soft-input soft-output (SISO) module to handle LDPC/Turbo decoding. We employ the trellis-based maximum a posteriori (MAP) algorithm as a bridge between LDPC and Turbo codes decoding. We view the LDPC code as a concatenation of n super-codes where each super-code has a simpler trellis structure so that the MAP algorithm can be easily applied to it. We propose a flexible functional unit (FFU) for MAP processing of LDPC and Turbo codes with a low hardware overhead (about 15% area and timing overhead). Based on the FFU, we propose an area-efficient flexible SISO decoder architecture to support LDPC/Turbo codes decoding. Multiple such SISO modules can be embedded into a parallel decoder for higher decoding throughput. As a case study, a flexible LDPC/Turbo decoder has been synthesized on a TSMC 90 nm CMOS technology with a core area of 3.2 mm². The decoder can support IEEE 802.16e LDPC codes, IEEE 802.11n LDPC codes, and 3GPP LTE Turbo codes. Running at 500 MHz clock frequency, the decoder can sustain up to 600 Mbps LDPC decoding or 450 Mbps Turbo decoding. 相似文献

20.

SIMD Processor-Based Turbo Decoder Supporting Multiple Third-Generation Wireless Standards

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(7):801-810

相似文献