期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

High performance, high throughput turbo/SOVA decoder design

Zhongfeng Wang Parhi K.K. 《Communications, IEEE Transactions on》2003,51(4):570-579

Two efficient approaches are proposed to improve the performance of soft-output Viterbi (1998) algorithm (SOVA)-based turbo decoders. In the first approach, an easily obtainable variable and a simple mapping function are used to compute a target scaling factor to normalize the extrinsic information output from turbo decoders. An extra coding gain of 0.5 dB can be obtained with additive white Gaussian noise channels. This approach does not introduce extra latency and the hardware overhead is negligible. In the second approach, an adaptive upper bound based on the channel reliability is set for computing the metric difference between competing paths. By combining the two approaches, we show that the new SOVA-based turbo decoders can approach maximum a posteriori probability (MAP)-based turbo decoders within 0.1 dB when the target bit-error rate (BER) is moderately low (e.g., BER<10/sup -4/ for 1/2 rate codes). Following this, practical implementation issues are discussed and finite precision simulation results are provided. An area-efficient parallel decoding architecture is presented in this paper as an effective approach to design high-throughput turbo/SOVA decoders. With the efficient parallel architecture, multiple times throughput of a conventional serial decoder can be obtained by increasing the overall hardware by a small percentage. To resolve the problem of multiple memory accesses per cycle for the efficient parallel architecture, a novel two-level hierarchical interleaver architecture is proposed. Simulation results show that the proposed interleaver architecture performs as well as random interleavers, while requiring much less storage of random patterns. 相似文献

2.

Semi-Iterative Analog Turbo Decoding

Arzel M. Lahuec C. Seguin F. Gnaedig D. Jezequel M. 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(6):1305-1316

Based on multiple-slice turbo codes, a novel semi-iterative analog turbo decoding algorithm and its corresponding decoder architecture are presented. This work paves the way for integrating flexible analog decoders dealing with frame lengths over thousands of bits. The algorithm benefits from a partially continuous exchange of extrinsic information to improve decoding speed and correction performance. The proposed algorithm and architecture are applied to design an analog decoder for double-binary codes. Taking full advantage of multiple slice codes, the on-chip area is shown to be reduced by ten when compared to a conventional fully parallelized analog slice turbo decoder. The reconfigurable analog core area for frames of 40 bits up to 2432 bits is 37 nm² in a 0.25-mum BiCMOS process. 相似文献

3.

Modified Collision-Free Interleavers for High Speed Turbo Decoding

Defeng Ren Jianhua Ge Jing Li 《Wireless Personal Communications》2013,68(3):939-948

Inter-window shuffle (IWS) interleavers are a class of collision-free (CF) interleavers that have been applied to parallel turbo decoding. In this paper, we present modified IWS (M-IWS) interleavers that can further increase turbo decoding throughput only at the expense of slight performance degradation. By deriving the number of M-IWS interleavers, we demonstrate that the number is much smaller than that of IWS interleavers, whereas they both have a very simple algebraic representation. Further, it is shown by analysis that under given conditions, storage requirements of M-IWS interleavers can be reduced to only 368 storage bits for variable interleaving lengths. In order to realize parallel outputs of the on-line interleaving addresses, a low-complexity architecture design of M-IWS interleavers for parallel turbo decoding is proposed, which also supports variable interleaving lengths. Therefore, the M-IWS interleavers are very suitable for the turbo decoder in next generation communication systems with the high data rate and low latency requirements. 相似文献

4.

High-performance VLSI architectures for turbo decoders with QPP interleaver

Shivani Verma 《International Journal of Electronics》2013,100(4):599-618

This paper analyses different VLSI architectures for 3GPP LTE/LTE-advanced turbo decoders for trade-offs in terms of throughput and area requirement. Data flow graphs for standard SISO MAP (maximum a posteriori) turbo decoder, SW – SISO MAP turbo decoder, PW SISO MAP turbo decoder have been presented, thus analysing their performance. Two variants of quadratic permutation polynomial (QPP) interleaver have been proposed which tend to simplify the complexity of ‘mod’ operator implementation and provide best compromise between area, delay and power dissipation. Implementation of decoder using one variant of QPP interleaver has also been discussed. A novel approach for area optimisation has been proposed to reduce required number of interleavers for parallel window turbo decoder. Multi-port memory has also been used for parallel turbo decoder. To increase the throughput without any effective increase in area complexity, circuit-level pipelining and retiming have been used. Proposed architectures have been synthesised using Synopsys Design Compiler using 45-nm CMOS technology. 相似文献

5.

A parallel MAP algorithm for low latency turbo decoding 总被引：1，自引：0，他引：1

Seokhyun Yoon Bar-Ness Y. 《Communications Letters, IEEE》2002,6(7):288-290

To reduce the computational decoding delay of turbo codes, we propose a parallel algorithm for maximum a posteriori (MAP) decoders. We divide a whole noisy codeword into sub-blocks and use multiple processors to perform sub-block MAP decoding in parallel. Unlike the previously proposed approach with sub-block overlapping, we utilize the forward and backward variables computed in the previous iteration to provide boundary distributions for each sub-block MAP decoder. Our scheme depicts asymptotically optimal performance in the sense that the BER is the same as that of the regular turbo decoder 相似文献

6.

High-throughput LDPC decoders

Mansour M.M. Shanbhag N.R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(6):976-996

A high-throughput memory-efficient decoder architecture for low-density parity-check (LDPC) codes is proposed based on a novel turbo decoding algorithm. The architecture benefits from various optimizations performed at three levels of abstraction in system design-namely LDPC code design, decoding algorithm, and decoder architecture. First, the interconnect complexity problem of current decoder implementations is mitigated by designing architecture-aware LDPC codes having embedded structural regularity features that result in a regular and scalable message-transport network with reduced control overhead. Second, the memory overhead problem in current day decoders is reduced by more than 75% by employing a new turbo decoding algorithm for LDPC codes that removes the multiple checkto-bit message update bottleneck of the current algorithm. A new merged-schedule merge-passing algorithm is also proposed that reduces the memory overhead of the current algorithm for low to moderate-throughput decoders. Moreover, a parallel soft-input-soft-output (SISO) message update mechanism is proposed that implements the recursions of the Balh-Cocke-Jelinek-Raviv (BCJR) algorithm in terms of simple "max-quartet" operations that do not require lookup-tables and incur negligible loss in performance compared to the ideal case. Finally, an efficient programmable architecture coupled with a scalable and dynamic transport network for storing and routing messages is proposed, and a full-decoder architecture is presented. Simulations demonstrate that the proposed architecture attains a throughput of 1.92 Gb/s for a frame length of 2304 bits, and achieves savings of 89.13% and 69.83% in power consumption and silicon area over state-of-the-art, with a reduction of 60.5% in interconnect length. 相似文献

7.

Layered Approx-Regular LDPC: Code Construction and Encoder/Decoder Design

Haibin Zhang Jia Zhu Huifeng Shi Dawei Wang 《IEEE transactions on circuits and systems. I, Regular papers》2008,55(2):572-585

Layered approximately regular (LAR) low-density parity-check (LDPC) codes are proposed, with which one single pair of encoder and decoder support various code lengths and code rates. The parity check matrices of LAR-LDPC codes have a "layer-block-cell" structure with some additional constraints. An encoder architecture is then designed for LAR-LDPC codes, by making two improvements to the Richardson-Urbanke approach: the forward substitution operation is entirely removed and the dense-matrix-vector multiplication is handled using feedback shift-registers. A partially parallel decoder architecture is also designed for LAR-LDPC codes, where a layered modified min-sum decoding algorithm is used to trade off among complexity, speed, and performance. More importantly, the interconnection network, which is inevitable for partially parallel decoders, has much lower hardware complexity compared with that for general LDPC codes. Both the encoder and decoder architectures are highly flexible in code length and code rate. 相似文献

8.

Turbo Product Codes Based on Convolutional Codes

Orhan Gazi Ali zgür Y&#x;lmaz 《ETRI Journal》2006,28(4):453-460

In this article, we introduce a new class of product codes based on convolutional codes, called convolutional product codes. The structure of product codes enables parallel decoding, which can significantly increase decoder speed in practice. The use of convolutional codes in a product code setting makes it possible to use the vast knowledge base for convolutional codes as well as their flexibility in fast parallel decoders. Just as in turbo codes, interleaving turns out to be critical for the performance of convolutional product codes. The practical decoding advantages over serially‐concatenated convolutional codes are emphasized. 相似文献

9.

Memory-Reduced Maximum A Posteriori Probability Decoding for High-Throughput Parallel Turbo Decoders

Rahul Shrestha Roy Paily 《Circuits, Systems, and Signal Processing》2016,35(8):2832-2854

Wireless communication standards make use of parallel turbo decoder for higher data rate at the cost of large hardware resources. This paper presents a memory-reduced back-trace technique, which is based on a new method of estimating backward-recursion factors, for the maximum a posteriori probability (MAP) decoding. Mathematical reformulations of branch-metric equations are performed to reduce the memory requirement of branch metrics for each trellis stage. Subsequently, an architecture of MAP decoder and its scheduling based on the proposed back trace as well as branch-metric reformulation are presented in this work. Comparative analysis of bit-error-rate (BER) performances in additive white Gaussian noise channel environment for MAP as well as parallel turbo decoders are carried out. It has shown that a MAP decoder with a code rate of 1/2 and a parallel turbo decoder with a code rate of 1/3 have achieved coding gains of 1.28 dB at a BER of 10\(^{-5}\) and of 0.4 dB at a BER of 10\(^{-4}\), respectively. In order to meet high-data-rate benchmarks of recently deployed wireless communication standards, very large scale integration implementations of parallel turbo decoder with 8–64 MAP decoders have been reported. Thereby, savings of hardware resources by such parallel turbo decoders based on the suggested memory-reduced techniques are accounted in terms of complementary metal oxide semiconductor transistor count. It has shown that the parallel turbo decoder with 32 and 64 MAP decoders has shown hardware savings of 34 and 44 % respectively. 相似文献

10.

Parallel turbo coding interleavers: avoiding collisions in accessesto storage elements 总被引：1，自引：0，他引：1

Giulietti A. van der Perre L. Strum A. 《Electronics letters》2002,38(5):232-234

High-speed, low latency convolutional turbo codes require a parallel decoder architecture. To maximise the gain in speed, the interleaver also should have a parallel structure. Here, a class of optimum parallel interleavers regarding the access to storage elements is presented. They combine regularity (easy implementation) with no latency in data transfer between the decoder module and intrinsic/extrinsic values memories, and show excellent BER performance 相似文献

11.

Memory Conflict Analysis and Implementation of a Re-configurable Interleaver Architecture Supporting Unified Parallel Turbo Decoding

Rizwan Asghar Di Wu Johan Eilert Dake Liu 《Journal of Signal Processing Systems》2010,60(1):15-29

This paper presents a novel hardware interleaver architecture for unified parallel turbo decoding. The architecture is fully re-configurable among multiple standards like HSPA Evolution, DVB-SH, 3GPP-LTE and WiMAX. Turbo codes being widely used for error correction in today’s consumer electronics are prone to introduce higher latency due to bigger block sizes and multiple iterations. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithms used in different standards do not freely allow using them due to higher percentage of memory conflicts. The architecture presented in this paper provides a re-configurable platform for implementing the parallel interleavers for different standards by managing the conflicts involved in each. The memory conflicts are managed by applying different approaches like stream misalignment, memory division and use of small FIFO buffer. The proposed flexible architecture is low cost and consumes 0.085 mm² area in 65 nm CMOS process. It can implement up to 8 parallel interleavers and can operate at a frequency of 200 MHz, thus providing significant support to higher throughput systems based on parallel SISO processors. 相似文献

12.

Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes

《Circuits and Systems II: Express Briefs, IEEE Transactions on》2009,56(1):81-85

Nonbinary turbo codes have many advantages over single-binary turbo codes, but their decoder implementations require much more memory, particularly for storing symbolic extrinsic information to be exchanged between two soft-input–soft-output (SISO) decoders. To reduce the memory size required for double-binary turbo decoding, this paper presents a new method to convert symbolic extrinsic information to bit-level information and vice versa. By exchanging bit-level extrinsic information, the number of extrinsic information values to be exchanged in double-binary turbo decoding is reduced to the same amount as that in single-binary turbo decoding. A double-binary turbo decoder is designed for the WiMAX standard to verify the proposed method, which reduces the total memory size by 20%. 相似文献

13.

Area-efficient high-throughput MAP decoder architectures

Seok-Jun Lee Shanbhag N.R. Singer A.C. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(8):921-933

Iterative decoders such as turbo decoders have become integral components of modern broadband communication systems because of their ability to provide substantial coding gains. A key computational kernel in iterative decoders is the maximum a posteriori probability (MAP) decoder. The MAP decoder is recursive and complex, which makes high-speed implementations extremely difficult to realize. In this paper, we present block-interleaved pipelining (BIP) as a new high-throughput technique for MAP decoders. An area-efficient symbol-based BIP MAP decoder architecture is proposed by combining BIP with the well-known look-ahead computation. These architectures are compared with conventional parallel architectures in terms of speed-up, memory and logic complexity, and area. Compared to the parallel architecture, the BIP architecture provides the same speed-up with a reduction in logic complexity by a factor of M, where M is the level of parallelism. The symbol-based architecture provides a speed-up in the range from 1 to 2 with a logic complexity that grows exponentially with M and a state metric storage requirement that is reduced by a factor of M as compared to a parallel architecture. The symbol-based BIP architecture provides speed-up in the range M to 2M with an exponentially higher logic complexity and a reduced memory complexity compared to a parallel architecture. These high-throughput architectures are synthesized in a 2.5-V 0.25-/spl mu/m CMOS standard cell library and post-layout simulations are conducted. For turbo decoder applications, we find that the BIP architecture provides a throughput gain of 1.96 at the cost of 63% area overhead. For turbo equalizer applications, the symbol-based BIP architecture enables us to achieve a throughput gain of 1.79 with an area savings of 25%. 相似文献

14.

FPGA implementation of low complexity LDPC iterative decoder

Shivani Verma Sanjay Sharma 《International Journal of Electronics》2016,103(7):1112-1126

Low-density parity-check (LDPC) codes, proposed by Gallager, emerged as a class of codes which can yield very good performance on the additive white Gaussian noise channel as well as on the binary symmetric channel. LDPC codes have gained lots of importance due to their capacity achieving property and excellent performance in the noisy channel. Belief propagation (BP) algorithm and its approximations, most notably min-sum, are popular iterative decoding algorithms used for LDPC and turbo codes. The trade-off between the hardware complexity and the decoding throughput is a critical factor in the implementation of the practical decoder. This article presents introduction to LDPC codes and its various decoding algorithms followed by realisation of LDPC decoder by using simplified message passing algorithm and partially parallel decoder architecture. Simplified message passing algorithm has been proposed for trade-off between low decoding complexity and decoder performance. It greatly reduces the routing and check node complexity of the decoder. Partially parallel decoder architecture possesses high speed and reduced complexity. The improved design of the decoder possesses a maximum symbol throughput of 92.95 Mbps and a maximum of 18 decoding iterations. The article presents implementation of 9216 bits, rate-1/2, (3, 6) LDPC decoder on Xilinx XC3D3400A device from Spartan-3A DSP family. 相似文献

15.

An FPGA Implementation of High‐Speed Flexible 27‐Mbps 8‐StateTurbo Decoder

Duk Gun Choi Min‐Hyuk Kim Jin Hee Jeong Ji Won Jung Jong‐Tae Bae Seok‐Soon Choi Young Yun 《ETRI Journal》2007,29(3):363-370

In this paper, we propose a flexible turbo decoding algorithm for a high order modulation scheme that uses a standard half‐rate turbo decoder designed for binary quadrature phase‐shift keying (B/QPSK) modulation. A transformation applied to the incoming I‐channel and Q‐channel symbols allows the use of an off‐the‐shelf B/QPSK turbo decoder without any modifications. Iterative codes such as turbo codes process the received symbols recursively to improve performance. As the number of iterations increases, the execution time and power consumption also increase. The proposed algorithm reduces the latency and power consumption by combination of the radix‐4, dual‐path processing, parallel decoding, and early‐stop algorithms. We implement the proposed scheme on a field‐programmable gate array and compare its decoding speed with that of a conventional decoder. The results show that the proposed flexible decoding algorithm is 6.4 times faster than the conventional scheme. 相似文献

16.

Variable-size interleaver design for parallel turbo decoder architectures 总被引：1，自引：0，他引：1

Dinoi L. Benedetto S. 《Communications, IEEE Transactions on》2005,53(11):1833-1840

In this paper, we propose two techniques to design good S-random interleavers, to be used in parallel and serially concatenated codes with interleavers. The interleavers designed according to these algorithms can be shortened, in order to support different block lengths in such a way that all the permutations obtained by pruning, when employed in a parallel turbo decoder, are collision-free. The first technique, suitable for short and medium interleavers, guarantees the same performance of nonparallel interleavers in terms of spreading properties, simulated frame-error probabilities, and obtainable minimum distance of the actual codes. The second algorithm, to be used for large block lengths, permits achieving high degrees of parallelism at the price of a slight degradation of the spread properties, and also to change the degree of parallelism on-the-fly. The operations of a parallel turbo decoder employing these interleavers are described, and an example of the advantages of the proposed techniques is provided in a realistic system framework. 相似文献

17.

Efficient Generalized Minimum-distance Decoders of Reed-Solomon Codes

Jiangli Zhu Xinmiao Zhang 《Journal of Signal Processing Systems》2012,66(3):245-257

Generalized minimum distance (GMD) decoding of Reed–Solomon (RS) codes can correct more errors than conventional hard-decision decoding by running error-and-erasure decoding multiple times for different erasure patterns. The latency of the GMD decoding can be reduced by the Kötter’s one-pass decoding scheme. This scheme first carries out an error-only hard-decision decoding. Then all pairs of error-erasure locators and evaluators are derived iteratively in one run based on the result of the error-only decoding. In this paper, a more efficient interpolation-based one-pass GMD decoding scheme is studied. Applying the re-encoding and coordinate transformation, the result of erasure-only decoding can be directly derived. Then the locator and evaluator pairs for other erasure patterns are generated iteratively by applying interpolation. A simplified polynomial selection scheme is proposed to pass only one pair of locator and evaluator to successive decoding steps and a low-complexity parallel Chien search architecture is developed to implement this selection scheme. With the proposed polynomial selection architecture, the interpolation can run at the full speed to greatly increase the throughput. After efficient architectures and effective optimizations are employed, a generalized hardware complexity analysis is provided for the proposed interpolation-based decoder. For a (255, 239) RS code, the high-speed interpolation-based one-pass GMD decoder can achieve 53% higher throughput than the Kötter’s decoder with slightly more hardware requirement. In terms of speed-over-area ratio, our design is 51% more efficient. In addition, compared to other soft-decision decoders, the high-speed interpolation-based GMD decoder can achieve better performance-complexity tradeoff. 相似文献

18.

A Flexible UMTS-WiMax Turbo Decoder Architecture

Martina M. Nicola M. Masera G. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2008,55(4):369-373

This work proposes a VLSI decoding architecture for concatenated convolutional codes. The novelty of this architecture is twofold: 1) the possibility to switch on-the-fly from the universal mobile telecommunication system turbo decoder to the WiMax duo-binary turbo decoder with a limited resources overhead compared to a single-mode WiMax architecture; and 2) the design of a parallel, collision free WiMax decoder architecture. Compared to two single-mode solutions, the proposed architecture achieves a complexity reduction of 17.1% and 27.3% in terms of logic and memory, respectively. The proposed, flexible architecture has been characterized in terms of performance and complexity on a 0.13-mum standard cell technology, and sustains a maximum throughput of more than 70 Mb/s. 相似文献

19.

Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advance turbo decoder

Yang Sun Joseph R. CavallaroAuthor vitae 《Integration, the VLSI Journal》2011,44(4):305-315

We present an efficient VLSI architecture for 3GPP LTE/LTE-Advance Turbo decoder by utilizing the algebraic-geometric properties of the quadratic permutation polynomial (QPP) interleaver. The high-throughput 3GPP LTE/LTE-Advance Turbo codes require a highly-parallel decoder architecture. Turbo interleaver is known to be the main obstacle to the decoder parallelism due to the collisions it introduces in accesses to memory. The QPP interleaver solves the memory contention issues when several MAP decoders are used in parallel to improve Turbo decoding throughput. In this paper, we propose a low-complexity QPP interleaving address generator and a multi-bank memory architecture to enable parallel Turbo decoding. Design trade-offs in terms of area and throughput efficiency are explored to find the optimal architecture. The proposed parallel Turbo decoder has been synthesized, placed and routed in a 65-nm CMOS technology with a core area of 8.3 mm² and a maximum clock frequency of 400 MHz. This parallel decoder, comprising 64 MAP decoder cores, can achieve a maximum decoding throughput of 1.28 Gbps at 6 iterations 相似文献

20.

Turbo codes for image transmission-a joint channel and sourcedecoding approach

Peng Z. Huang Y.-F. Costello D.J. Jr. 《Selected Areas in Communications, IEEE Journal on》2000,18(6):868-879

This paper studies an application of turbo codes to compressed image/video transmission and presents an approach to improving error control performance through joint channel and source decoding (JCSD). The proposed approach to JCSD includes error-free source information feedback, error-detected source information feedback, and the use of channel soft values (CSV) for source signal postprocessing. These feedback schemes are based on a modification of the extrinsic information passed between the constituent maximum a posteriori probability (MAP) decoders in a turbo decoder. The modification is made according to the source information obtained from the source signal processor. The CSVs are considered as reliability information on the hard decisions and are further used for error recovery in the reconstructed signals. Applications of this joint decoding technique to different visual source coding schemes, such as spatial vector quantization, JPEG coding, and MPEG coding, are examined. Experimental results show that up to 0.6 dB of channel SNR reduction can be achieved by the joint decoder without increasing computational cost for various channel coding rates 相似文献