首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present an efficient VLSI architecture for 3GPP LTE/LTE-Advance Turbo decoder by utilizing the algebraic-geometric properties of the quadratic permutation polynomial (QPP) interleaver. The high-throughput 3GPP LTE/LTE-Advance Turbo codes require a highly-parallel decoder architecture. Turbo interleaver is known to be the main obstacle to the decoder parallelism due to the collisions it introduces in accesses to memory. The QPP interleaver solves the memory contention issues when several MAP decoders are used in parallel to improve Turbo decoding throughput. In this paper, we propose a low-complexity QPP interleaving address generator and a multi-bank memory architecture to enable parallel Turbo decoding. Design trade-offs in terms of area and throughput efficiency are explored to find the optimal architecture. The proposed parallel Turbo decoder has been synthesized, placed and routed in a 65-nm CMOS technology with a core area of 8.3 mm2 and a maximum clock frequency of 400 MHz. This parallel decoder, comprising 64 MAP decoder cores, can achieve a maximum decoding throughput of 1.28 Gbps at 6 iterations  相似文献   

2.
提出了基于高次多项式无冲突交织器的Turbo码并行解码的优化实现方法,解码器采用MAX-Log-MAP算法,完成了从Matlab算法设计验证到RTL设计、FPGA验证,并在LTE无线通信链路中验证.设计的Turbo并行高速解码器半次迭代的效率为6.9 bit/cycle,在最高迭代为5.5次、时钟频率为309MHz下,达到207Mb/s的吞吐率,满足高速无线通信系统的要求,交织和解交织采用存储器映射方法.该设计节约了计算电路和存储量.  相似文献   

3.
Iterative decoders such as turbo decoders have become integral components of modern broadband communication systems because of their ability to provide substantial coding gains. A key computational kernel in iterative decoders is the maximum a posteriori probability (MAP) decoder. The MAP decoder is recursive and complex, which makes high-speed implementations extremely difficult to realize. In this paper, we present block-interleaved pipelining (BIP) as a new high-throughput technique for MAP decoders. An area-efficient symbol-based BIP MAP decoder architecture is proposed by combining BIP with the well-known look-ahead computation. These architectures are compared with conventional parallel architectures in terms of speed-up, memory and logic complexity, and area. Compared to the parallel architecture, the BIP architecture provides the same speed-up with a reduction in logic complexity by a factor of M, where M is the level of parallelism. The symbol-based architecture provides a speed-up in the range from 1 to 2 with a logic complexity that grows exponentially with M and a state metric storage requirement that is reduced by a factor of M as compared to a parallel architecture. The symbol-based BIP architecture provides speed-up in the range M to 2M with an exponentially higher logic complexity and a reduced memory complexity compared to a parallel architecture. These high-throughput architectures are synthesized in a 2.5-V 0.25-/spl mu/m CMOS standard cell library and post-layout simulations are conducted. For turbo decoder applications, we find that the BIP architecture provides a throughput gain of 1.96 at the cost of 63% area overhead. For turbo equalizer applications, the symbol-based BIP architecture enables us to achieve a throughput gain of 1.79 with an area savings of 25%.  相似文献   

4.
Standard VLSI implementations of turbo decoding require substantial memory and incur a long latency, which cannot be tolerated in some applications. A parallel VLSI architecture for low-latency turbo decoding, comprising multiple single-input single-output (SISO) elements, operating jointly on one turbo-coded block, is presented and compared to sequential architectures. A parallel interleaver is essential to process multiple concurrent SISO outputs. A novel parallel interleaver and an algorithm for its design are presented, achieving the same error correction performance as the standard architecture. Latency is reduced up to 20 times and throughput for large blocks is increased up to six-fold relative to sequential decoders, using the same silicon area, and achieving a very high coding gain. The parallel architecture scales favorably: latency and throughput are improved with increased block size and chip area.  相似文献   

5.
This paper presents a unified, radix-4 implementation of turbo decoder, covering multiple standards such as DVB, WiMAX, 3GPP-LTE and HSPA Evolution. The radix-4, parallel interleaver is the bottleneck while using the same turbo-decoding architecture for multiple standards. This paper covers the issues associated with design of radix-4 parallel interleaver to reach to flexible turbo-decoder architecture. Radix-4, parallel interleaver algorithms and their mapping on to hardware architecture is presented for multi-mode operations. The overheads associated with hardware multiplexing are found to be least significant. Other than flexibility for the turbo decoder implementation, the low silicon cost and low power aspects are also addressed by optimizing the storage scheme for branch metrics and extrinsic information. The proposed unified architecture for radix-4 turbo decoding consumes 0.65 mm2 area in total in 65 nm CMOS process. With 4 SISO blocks used in parallel and 6 iterations, it can achieve a throughput up to 173.3 Mbps while consuming 570 mW power in total. It provides a good trade-off between silicon cost, power consumption and throughput with silicon efficiency of 0.005 mm2/Mbps and energy efficiency of 0.55 nJ/b/iter.  相似文献   

6.
7.
This paper presents a novel hardware interleaver architecture for unified parallel turbo decoding. The architecture is fully re-configurable among multiple standards like HSPA Evolution, DVB-SH, 3GPP-LTE and WiMAX. Turbo codes being widely used for error correction in today’s consumer electronics are prone to introduce higher latency due to bigger block sizes and multiple iterations. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithms used in different standards do not freely allow using them due to higher percentage of memory conflicts. The architecture presented in this paper provides a re-configurable platform for implementing the parallel interleavers for different standards by managing the conflicts involved in each. The memory conflicts are managed by applying different approaches like stream misalignment, memory division and use of small FIFO buffer. The proposed flexible architecture is low cost and consumes 0.085 mm2 area in 65 nm CMOS process. It can implement up to 8 parallel interleavers and can operate at a frequency of 200 MHz, thus providing significant support to higher throughput systems based on parallel SISO processors.  相似文献   

8.
Turbo Decoder Using Contention-Free Interleaver and Parallel Architecture   总被引:1,自引:0,他引:1  
This paper introduces a turbo decoder that utilizes multiple soft-in/soft-out (SISO) decoders to decode one codeword. In addition, each SISO decoder is modified to allow simultaneous execution over multiple successive trellis stages. The design issues related to the architecture with parallel high-radix SISO decoders are discussed. First, a contention-free interleaver for the hybrid parallelism is presented to overcome the complicated collision problem as well as reduce interconnection network complexity. Second, two techniques for the high-speed add-compare-select (ACS) circuits are given to lessen area overhead of the SISO decoder. Third, a modification of the processing schedule is made for higher operating efficiency. Two designs with parallel architecture have been implemented. The first design with 32 SISO decoders, each of which processes 2 symbols per cycle, has 160 Mb/s and 0.22 nJ/b/iter after measurement. The second design uses 16 SISO decoders to deal with 4 symbols per cycle and achieves 100% efficiency, leading to 1000 Mb/s and 0.15 nJ/b/iter in post-layout simulation.   相似文献   

9.
A new maximum a posteriori (MAP)-equivalent soft-input soft-output (SISO) algorithm is derived together with its simplified versions. The proposed SISO algorithms provide a good compromise between complexity and performance. Our simplest SISO algorithm has lower complexity than the log-MAP, the max-log-MAP, and the soft-output Viterbi (1998) algorithm SISO algorithms, and it is an equivalent max-log-MAP algorithm. When this algorithm is used, turbo codes with block length as short as 150 bits will outperform convolutional codes when compared on the basis of equal decoder complexity.  相似文献   

10.
This brief presents an energy-efficient soft-input soft-output (SISO) decoder based on border metric encoding, which is especially suitable for nonbinary circular turbo codes. In the proposed method, the size of the branch memory is reduced to half and the dummy calculation is removed at the cost of a small-sized memory that holds encoded border metrics. Due to the infrequent accesses to the border memory and its small size, the energy consumed for SISO decoding is reduced by 26.2%. Based on the proposed SISO decoder and the dedicated hardware interleaver, a double-binary tail-biting turbo decoder is designed for the WiMAX standard using a 0.18-mum CMOS process, which can support 24.26 Mbps at 200 MHz.  相似文献   

11.
The standard algorithm for computing the soft-inverse of a finite-state machine [i.e., the soft-in/soft-out (SISO) module] is the forward-backward algorithm. These forward and backward recursions can be computed in parallel, yielding an architecture with latency 𝒪(N), where N is the block size. We demonstrate that the standard SISO computation may be formulated using a combination of prefix and suffix operations. Based on well-known tree-structures for fast parallel prefix computations in the very large scale integration (VLSI) literature (e.g., tree adders), we propose a tree-structured SISO that has latency 𝒪(log2N). The decrease in latency comes primarily at a cost of area with, in some cases, only a marginal increase in computation. We discuss how this structure could be used to design a very high throughput turbo decoder or, more generally, an iterative detector. Various subwindowing and tiling schemes are also considered to further improve latency  相似文献   

12.
设计一种低开销双二元turbo译码器,提出了一种能够适应滑动窗算法的交织器结构,通过与传统方案中的交织器联合使用,大大降低了交织与解交织过程所需要的存储单元.同时将取模归一化(modulo normalization)技术运用到双二元turbo译码器加比选(ACS)模块的设计上,缩短了关键路径的延时,提高了时钟频率和吞吐量.采用FPGA对译码器进行了验证,提出的译码器和传统的译码器相比,存储资源节省12%,和使用存储器存储交织/解交织地址的译码器相比,存储资源节省97%.  相似文献   

13.
Minimum mean squared error equalization using a priori information   总被引:11,自引:0,他引:11  
A number of important advances have been made in the area of joint equalization and decoding of data transmitted over intersymbol interference (ISI) channels. Turbo equalization is an iterative approach to this problem, in which a maximum a posteriori probability (MAP) equalizer and a MAP decoder exchange soft information in the form of prior probabilities over the transmitted symbols. A number of reduced-complexity methods for turbo equalization have been introduced in which MAP equalization is replaced with suboptimal, low-complexity approaches. We explore a number of low-complexity soft-input/soft-output (SISO) equalization algorithms based on the minimum mean square error (MMSE) criterion. This includes the extension of existing approaches to general signal constellations and the derivation of a novel approach requiring less complexity than the MMSE-optimal solution. All approaches are qualitatively analyzed by observing the mean-square error averaged over a sequence of equalized data. We show that for the turbo equalization application, the MMSE-based SISO equalizers perform well compared with a MAP equalizer while providing a tremendous complexity reduction  相似文献   

14.
In this paper, both performance and complexity aspects of two-dimensional single parity check turbo product codes (I-SPC-TPC) are investigated. Based on the proposed I-SPC-TPC coding scheme, a parallel decoding structure is developed to increase the decoding throughput with minor performance degradation compared with the serial structure. For both decoding architectures, a new helical interleaver is constructed to further improve the coding gain. In terms of decoding algorithm, the extremely simple Sign-Min decoding is alternatively derived with only three additions needed to compute each bit's extrinsic information. For performance evaluation, (16, 14, 2)2 single parity check turbo product code with code rate 0.766 over AWGN channel using QPSK modulation is considered. The simulation results using Sign-Min decoding show that it can achieve bit-error-rate of 10?5 at signal-to-noise ratio of 3.8 dB with 8 iterations. Compared to the same rate and codeword length turbo product code composed of extended Hamming codes, the considered scheme can achieve similar performance with much less complexity. Important implementation issues such as the finite precision analysis, efficient sorting circuit design and interleaver memory management are also presented.  相似文献   

15.
Low-density parity-check (LDPC) codes and convolutional Turbo codes are two of the most powerful error correcting codes that are widely used in modern communication systems. In a multi-mode baseband receiver, both LDPC and Turbo decoders may be required. However, the different decoding approaches for LDPC and Turbo codes usually lead to different hardware architectures. In this paper we propose a unified message passing algorithm for LDPC and Turbo codes and introduce a flexible soft-input soft-output (SISO) module to handle LDPC/Turbo decoding. We employ the trellis-based maximum a posteriori (MAP) algorithm as a bridge between LDPC and Turbo codes decoding. We view the LDPC code as a concatenation of n super-codes where each super-code has a simpler trellis structure so that the MAP algorithm can be easily applied to it. We propose a flexible functional unit (FFU) for MAP processing of LDPC and Turbo codes with a low hardware overhead (about 15% area and timing overhead). Based on the FFU, we propose an area-efficient flexible SISO decoder architecture to support LDPC/Turbo codes decoding. Multiple such SISO modules can be embedded into a parallel decoder for higher decoding throughput. As a case study, a flexible LDPC/Turbo decoder has been synthesized on a TSMC 90 nm CMOS technology with a core area of 3.2 mm2. The decoder can support IEEE 802.16e LDPC codes, IEEE 802.11n LDPC codes, and 3GPP LTE Turbo codes. Running at 500 MHz clock frequency, the decoder can sustain up to 600 Mbps LDPC decoding or 450 Mbps Turbo decoding.  相似文献   

16.
Random linear network coding is an efficient technique for disseminating information in networks, but it is highly susceptible to errors. Kötter-Kschischang (KK) codes and Mahdavifar-Vardy (MV) codes are two important families of subspace codes that provide error control in noncoherent random linear network coding. List decoding has been used to decode MV codes beyond half distance. Existing hardware implementations of the rank metric decoder for KK codes suffer from limited throughput, long latency and high area complexity. The interpolation-based list decoding algorithm for MV codes still has high computational complexity, and its feasibility for hardware implementations has not been investigated. In this paper we propose efficient decoder architectures for both KK and MV codes and present their hardware implementations. Two serial architectures are proposed for KK and MV codes, respectively. An unfolded decoder architecture, which offers high throughput, is also proposed for KK codes. The synthesis results show that the proposed architectures for KK codes are much more efficient than rank metric decoder architectures, and demonstrate that the proposed decoder architecture for MV codes is affordable.  相似文献   

17.
The implementation and performance of a turbo/MAP decoder are described. A serial block MAP decoder operating in the logarithm domain is used to obtain a very-high-performance turbo decoder. Programmable gate arrays and EPROMs allow the decoder to be programmed for almost any code from four to 512 states, rate 1/3 to rate 1/7 (higher rates are achieved with puncturing) and interleaver block sizes to 65,536 bits. Seven decoding stages were implemented in parallel. For rate 1/3 and 1/7 16-state codes with an interleaver size of 65,536 bits and operating at up to 356 kbit/s the codec achieved an Eb/N0 of 0⋅32 and −0⋅30 dB respectively for a BER of 10−5. BERs down to 10−7 were also achieved for a small increase in Eb/N0. An efficient implementation of a continuous MAP decoder is also presented, along with a synchronization technique for turbo decoders. © 1998 John Wiley & Sons, Ltd.  相似文献   

18.
Ultra high-speed block turbo decoder architectures meet the demand for even higher data rates and open up new opportunities for the next generations of communication systems such as fiber optic transmissions. This paper presents the implementation, onto an FPGA device of an ultra high throughput block turbo code decoder. An innovative architecture of a block turbo decoder which enables the memory blocks between all half-iterations to be removed is presented. A complexity analysis of the elementary decoder leads to a low complexity decoder architecture for a negligible performance degradation. The resulting turbo decoder is implemented on a Xilinx Virtex II-Pro FPGA in a communication experimental setup which also includes an innovative parallel product encoder. The implemented block turbo decoder processes input data at 600 Mb/s. The component code is an extended Bose, Ray-Chaudhuri, Hocquenghem (eBCH(16,11)) code. Some solutions to reach even higher data rates are finally presented.  相似文献   

19.
Low-density parity-check (LDPC) codes, proposed by Gallager, emerged as a class of codes which can yield very good performance on the additive white Gaussian noise channel as well as on the binary symmetric channel. LDPC codes have gained lots of importance due to their capacity achieving property and excellent performance in the noisy channel. Belief propagation (BP) algorithm and its approximations, most notably min-sum, are popular iterative decoding algorithms used for LDPC and turbo codes. The trade-off between the hardware complexity and the decoding throughput is a critical factor in the implementation of the practical decoder. This article presents introduction to LDPC codes and its various decoding algorithms followed by realisation of LDPC decoder by using simplified message passing algorithm and partially parallel decoder architecture. Simplified message passing algorithm has been proposed for trade-off between low decoding complexity and decoder performance. It greatly reduces the routing and check node complexity of the decoder. Partially parallel decoder architecture possesses high speed and reduced complexity. The improved design of the decoder possesses a maximum symbol throughput of 92.95 Mbps and a maximum of 18 decoding iterations. The article presents implementation of 9216 bits, rate-1/2, (3, 6) LDPC decoder on Xilinx XC3D3400A device from Spartan-3A DSP family.  相似文献   

20.
Two efficient approaches are proposed to improve the performance of soft-output Viterbi (1998) algorithm (SOVA)-based turbo decoders. In the first approach, an easily obtainable variable and a simple mapping function are used to compute a target scaling factor to normalize the extrinsic information output from turbo decoders. An extra coding gain of 0.5 dB can be obtained with additive white Gaussian noise channels. This approach does not introduce extra latency and the hardware overhead is negligible. In the second approach, an adaptive upper bound based on the channel reliability is set for computing the metric difference between competing paths. By combining the two approaches, we show that the new SOVA-based turbo decoders can approach maximum a posteriori probability (MAP)-based turbo decoders within 0.1 dB when the target bit-error rate (BER) is moderately low (e.g., BER<10/sup -4/ for 1/2 rate codes). Following this, practical implementation issues are discussed and finite precision simulation results are provided. An area-efficient parallel decoding architecture is presented in this paper as an effective approach to design high-throughput turbo/SOVA decoders. With the efficient parallel architecture, multiple times throughput of a conventional serial decoder can be obtained by increasing the overall hardware by a small percentage. To resolve the problem of multiple memory accesses per cycle for the efficient parallel architecture, a novel two-level hierarchical interleaver architecture is proposed. Simulation results show that the proposed interleaver architecture performs as well as random interleavers, while requiring much less storage of random patterns.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号