期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Highly-Parallel Decoding Architectures for Convolutional Turbo Codes

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(10):1147-1151

Highly parallel decoders for convolutional turbo codes have been studied by proposing two parallel decoding architectures and a design approach of parallel interleavers. To solve the memory conflict problem of extrinsic information in a parallel decoder, a block-like approach in which data is written row-by-row and read diagonal-wise is proposed for designing collision-free parallel interleavers. Furthermore, a warm-up-free parallel sliding window architecture is proposed for long turbo codes to maximize the decoding speeds of parallel decoders. The proposed architecture increases decoding speed by 6%-34% at a cost of a storage increase of 1% for an eight-parallel decoder. For short turbo codes (e.g., length of 512 bits), a warm-up-free parallel window architecture is proposed to double the speed at the cost of a hardware increase of 12% 相似文献

2.

Memory-Reduced Maximum A Posteriori Probability Decoding for High-Throughput Parallel Turbo Decoders

Rahul Shrestha Roy Paily 《Circuits, Systems, and Signal Processing》2016,35(8):2832-2854

Wireless communication standards make use of parallel turbo decoder for higher data rate at the cost of large hardware resources. This paper presents a memory-reduced back-trace technique, which is based on a new method of estimating backward-recursion factors, for the maximum a posteriori probability (MAP) decoding. Mathematical reformulations of branch-metric equations are performed to reduce the memory requirement of branch metrics for each trellis stage. Subsequently, an architecture of MAP decoder and its scheduling based on the proposed back trace as well as branch-metric reformulation are presented in this work. Comparative analysis of bit-error-rate (BER) performances in additive white Gaussian noise channel environment for MAP as well as parallel turbo decoders are carried out. It has shown that a MAP decoder with a code rate of 1/2 and a parallel turbo decoder with a code rate of 1/3 have achieved coding gains of 1.28 dB at a BER of 10\(^{-5}\) and of 0.4 dB at a BER of 10\(^{-4}\), respectively. In order to meet high-data-rate benchmarks of recently deployed wireless communication standards, very large scale integration implementations of parallel turbo decoder with 8–64 MAP decoders have been reported. Thereby, savings of hardware resources by such parallel turbo decoders based on the suggested memory-reduced techniques are accounted in terms of complementary metal oxide semiconductor transistor count. It has shown that the parallel turbo decoder with 32 and 64 MAP decoders has shown hardware savings of 34 and 44 % respectively. 相似文献

3.

Parallel interleaver design and VLSI architecture for low-latency MAP turbo decoders

Dobkin R. Peleg M. Ginosar R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(4):427-438

Standard VLSI implementations of turbo decoding require substantial memory and incur a long latency, which cannot be tolerated in some applications. A parallel VLSI architecture for low-latency turbo decoding, comprising multiple single-input single-output (SISO) elements, operating jointly on one turbo-coded block, is presented and compared to sequential architectures. A parallel interleaver is essential to process multiple concurrent SISO outputs. A novel parallel interleaver and an algorithm for its design are presented, achieving the same error correction performance as the standard architecture. Latency is reduced up to 20 times and throughput for large blocks is increased up to six-fold relative to sequential decoders, using the same silicon area, and achieving a very high coding gain. The parallel architecture scales favorably: latency and throughput are improved with increased block size and chip area. 相似文献

4.

High-performance VLSI architectures for turbo decoders with QPP interleaver

Shivani Verma 《International Journal of Electronics》2013,100(4):599-618

This paper analyses different VLSI architectures for 3GPP LTE/LTE-advanced turbo decoders for trade-offs in terms of throughput and area requirement. Data flow graphs for standard SISO MAP (maximum a posteriori) turbo decoder, SW – SISO MAP turbo decoder, PW SISO MAP turbo decoder have been presented, thus analysing their performance. Two variants of quadratic permutation polynomial (QPP) interleaver have been proposed which tend to simplify the complexity of ‘mod’ operator implementation and provide best compromise between area, delay and power dissipation. Implementation of decoder using one variant of QPP interleaver has also been discussed. A novel approach for area optimisation has been proposed to reduce required number of interleavers for parallel window turbo decoder. Multi-port memory has also been used for parallel turbo decoder. To increase the throughput without any effective increase in area complexity, circuit-level pipelining and retiming have been used. Proposed architectures have been synthesised using Synopsys Design Compiler using 45-nm CMOS technology. 相似文献

5.

Implementation of a Radix-4, Parallel Turbo Decoder and Enabling the Multi-Standard Support

Rizwan Asghar Di Wu Ali Saeed Yulin Huang Dake Liu 《Journal of Signal Processing Systems》2012,66(1):25-41

This paper presents a unified, radix-4 implementation of turbo decoder, covering multiple standards such as DVB, WiMAX, 3GPP-LTE and HSPA Evolution. The radix-4, parallel interleaver is the bottleneck while using the same turbo-decoding architecture for multiple standards. This paper covers the issues associated with design of radix-4 parallel interleaver to reach to flexible turbo-decoder architecture. Radix-4, parallel interleaver algorithms and their mapping on to hardware architecture is presented for multi-mode operations. The overheads associated with hardware multiplexing are found to be least significant. Other than flexibility for the turbo decoder implementation, the low silicon cost and low power aspects are also addressed by optimizing the storage scheme for branch metrics and extrinsic information. The proposed unified architecture for radix-4 turbo decoding consumes 0.65 mm² area in total in 65 nm CMOS process. With 4 SISO blocks used in parallel and 6 iterations, it can achieve a throughput up to 173.3 Mbps while consuming 570 mW power in total. It provides a good trade-off between silicon cost, power consumption and throughput with silicon efficiency of 0.005 mm²/Mbps and energy efficiency of 0.55 nJ/b/iter. 相似文献

6.

Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advance turbo decoder

Yang Sun Joseph R. CavallaroAuthor vitae 《Integration, the VLSI Journal》2011,44(4):305-315

We present an efficient VLSI architecture for 3GPP LTE/LTE-Advance Turbo decoder by utilizing the algebraic-geometric properties of the quadratic permutation polynomial (QPP) interleaver. The high-throughput 3GPP LTE/LTE-Advance Turbo codes require a highly-parallel decoder architecture. Turbo interleaver is known to be the main obstacle to the decoder parallelism due to the collisions it introduces in accesses to memory. The QPP interleaver solves the memory contention issues when several MAP decoders are used in parallel to improve Turbo decoding throughput. In this paper, we propose a low-complexity QPP interleaving address generator and a multi-bank memory architecture to enable parallel Turbo decoding. Design trade-offs in terms of area and throughput efficiency are explored to find the optimal architecture. The proposed parallel Turbo decoder has been synthesized, placed and routed in a 65-nm CMOS technology with a core area of 8.3 mm² and a maximum clock frequency of 400 MHz. This parallel decoder, comprising 64 MAP decoder cores, can achieve a maximum decoding throughput of 1.28 Gbps at 6 iterations 相似文献

7.

Turbo Decoder Using Contention-Free Interleaver and Parallel Architecture 总被引：1，自引：0，他引：1

Wong C.-C. Lai M.-W. Lin C.-C. Chang H.-C. Lee C.-Y. 《Solid-State Circuits, IEEE Journal of》2010,45(2):422-432

This paper introduces a turbo decoder that utilizes multiple soft-in/soft-out (SISO) decoders to decode one codeword. In addition, each SISO decoder is modified to allow simultaneous execution over multiple successive trellis stages. The design issues related to the architecture with parallel high-radix SISO decoders are discussed. First, a contention-free interleaver for the hybrid parallelism is presented to overcome the complicated collision problem as well as reduce interconnection network complexity. Second, two techniques for the high-speed add-compare-select (ACS) circuits are given to lessen area overhead of the SISO decoder. Third, a modification of the processing schedule is made for higher operating efficiency. Two designs with parallel architecture have been implemented. The first design with 32 SISO decoders, each of which processes 2 symbols per cycle, has 160 Mb/s and 0.22 nJ/b/iter after measurement. The second design uses 16 SISO decoders to deal with 4 symbols per cycle and achieves 100% efficiency, leading to 1000 Mb/s and 0.15 nJ/b/iter in post-layout simulation. 相似文献

8.

New Soft Output Viterbi Algorithm for Mobile Communication System

YI Qing-ming SHI Min 《半导体光子学与技术》2006,12(4):228-232

Soft output Viterbi algorithm(SOVA) is a turbo decoding algorithm that is suitable for hardware implementation. But its performance is not so good as maximum a posterior probability(MAP) algorithm. So it is very important to improve its performance. The non-correlation between minimum and maximum likelihood paths in SOVA is analyzed. The metric difference of both likelihood paths is used as iterative soft information, which is not the same as the traditional SOVA. The performance of the proposed SOVA is demonstrated by the simulations. For 1024-bit frame size and 9 iterations with signal to noise ratio from 1dB to 4dB, the experimental results show that the new SOVA algorithm obtains about more 0.4dB and 0.2dB coding gains more than the traditional SOVA and Bi-SOVA algorithms at bit error rate(BER) of 1×10~ -4 , while the latency is only half of the Bi-direction SOVA decoding. 相似文献

9.

On multiple slice turbo codes

David Gnaedig Emmanuel Boutillon Michel JÉZéquel Vincent C. Gaudet P. Glenn Gulak 《电信纪事》2005,60(1-2):79-102

The main problem with the hardware implementation of turbo codes is the lack of parallelism in the MAP-based decoding algorithm. This paper proposes to overcome this problem by using a new family of turbo codes called Multiple Slice Turbo Codes. This family is based on two ideas: the encoding of each dimension with P independent tail-biting codes and a constrained interleaver structure that allows the parallel decoding of the P independent codewords in each dimension. The optimization of the interleaver is described. A high degree of parallelism is obtained with equivalent or better performance than thedvb-rcs turbo code. For very high throughput applications, the parallel architecture decreases both decoding latency and hardware complexity compared to the classical serial architecture, which requires memory duplication. 相似文献

10.

Area-efficient high-throughput MAP decoder architectures

Seok-Jun Lee Shanbhag N.R. Singer A.C. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(8):921-933

Iterative decoders such as turbo decoders have become integral components of modern broadband communication systems because of their ability to provide substantial coding gains. A key computational kernel in iterative decoders is the maximum a posteriori probability (MAP) decoder. The MAP decoder is recursive and complex, which makes high-speed implementations extremely difficult to realize. In this paper, we present block-interleaved pipelining (BIP) as a new high-throughput technique for MAP decoders. An area-efficient symbol-based BIP MAP decoder architecture is proposed by combining BIP with the well-known look-ahead computation. These architectures are compared with conventional parallel architectures in terms of speed-up, memory and logic complexity, and area. Compared to the parallel architecture, the BIP architecture provides the same speed-up with a reduction in logic complexity by a factor of M, where M is the level of parallelism. The symbol-based architecture provides a speed-up in the range from 1 to 2 with a logic complexity that grows exponentially with M and a state metric storage requirement that is reduced by a factor of M as compared to a parallel architecture. The symbol-based BIP architecture provides speed-up in the range M to 2M with an exponentially higher logic complexity and a reduced memory complexity compared to a parallel architecture. These high-throughput architectures are synthesized in a 2.5-V 0.25-/spl mu/m CMOS standard cell library and post-layout simulations are conducted. For turbo decoder applications, we find that the BIP architecture provides a throughput gain of 1.96 at the cost of 63% area overhead. For turbo equalizer applications, the symbol-based BIP architecture enables us to achieve a throughput gain of 1.79 with an area savings of 25%. 相似文献

11.

Collision free row column S-random interleaver 总被引：1，自引：0，他引：1

Gazi O. Yilmaz A. 《Communications Letters, IEEE》2009,13(4):257-259

Parallel decodable turbo codes (PDTCs) are suitable for concurrent decoding and hence have low latency. Memory collision issue is an important problem met during parallel processing. In this article, we propose a collision free interleaver for parallel processing operations. The performance of PDTCs is analyzed with the proposed random interleaver preventing the memory collision problem. Distance spectra of PDTCs with the proposed interleaver are computed and compared to those with S-random interleaver. 相似文献

12.

Memory Conflict Analysis and Implementation of a Re-configurable Interleaver Architecture Supporting Unified Parallel Turbo Decoding

Rizwan Asghar Di Wu Johan Eilert Dake Liu 《Journal of Signal Processing Systems》2010,60(1):15-29

This paper presents a novel hardware interleaver architecture for unified parallel turbo decoding. The architecture is fully re-configurable among multiple standards like HSPA Evolution, DVB-SH, 3GPP-LTE and WiMAX. Turbo codes being widely used for error correction in today’s consumer electronics are prone to introduce higher latency due to bigger block sizes and multiple iterations. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithms used in different standards do not freely allow using them due to higher percentage of memory conflicts. The architecture presented in this paper provides a re-configurable platform for implementing the parallel interleavers for different standards by managing the conflicts involved in each. The memory conflicts are managed by applying different approaches like stream misalignment, memory division and use of small FIFO buffer. The proposed flexible architecture is low cost and consumes 0.085 mm² area in 65 nm CMOS process. It can implement up to 8 parallel interleavers and can operate at a frequency of 200 MHz, thus providing significant support to higher throughput systems based on parallel SISO processors. 相似文献

13.

Implementation and performance of a turbo/MAP decoder

Steven S. Pietrobon 《International Journal of Satellite Communications and Networking》1998,16(1):23-46

The implementation and performance of a turbo/MAP decoder are described. A serial block MAP decoder operating in the logarithm domain is used to obtain a very-high-performance turbo decoder. Programmable gate arrays and EPROMs allow the decoder to be programmed for almost any code from four to 512 states, rate 1/3 to rate 1/7 (higher rates are achieved with puncturing) and interleaver block sizes to 65,536 bits. Seven decoding stages were implemented in parallel. For rate 1/3 and 1/7 16-state codes with an interleaver size of 65,536 bits and operating at up to 356 kbit/s the codec achieved an E_b/N₀ of 0⋅32 and −0⋅30 dB respectively for a BER of 10⁻⁵. BERs down to 10⁻⁷ were also achieved for a small increase in E_b/N₀. An efficient implementation of a continuous MAP decoder is also presented, along with a synchronization technique for turbo decoders. © 1998 John Wiley & Sons, Ltd. 相似文献

14.

Turbo码的一种并行译码方案及相应的并行结构交织器研究 总被引：1，自引：0，他引：1

张曦林袁东风《电子与信息学报》2006,28(6):1059-1063

Turbo码基于MAP算法译码的递推计算所引入高的译码延迟限制了Turbo码在高速率数据传输中的应用。为了解决这个问题,该文提供了一种降低译码延迟的并行译码方法。并行处理方案的实现必须通过适当的交织以避免两个译码器对外信息读写的数据冲突。该文在分析了任意无冲突交织方式可能性的存在之后,给出了设计任意地适用于并行处理方案的S随机交织器的方法。仿真验证了并行译码方案的误比特性能。相似文献

15.

Iterative multiuser detection/decoding for turbo coded CDMA systems

Damnjanovic A.D. Vojcic B.R. 《Communications Letters, IEEE》2001,5(3):104-106

We propose a novel scheme for iterative multiuser detection and turbo decoding. The multiuser detector and single-user turbo decoders are coupled such that after each turbo decoding iteration the extrinsic information of the interfering users is passed to the multiuser detector, and after each multiuser iteration, updated a posteriori probabilities are passed to the single-user turbo decoders as the soft input metrics. In synchronous systems, the proposed detector approaches the multiuser capacity limit within 1 dB in the low signal-to-noise ratio region 相似文献

16.

Design of Encoder and Decoder for LDPC Codes Using Hybrid H‐Matrix

Chanho Lee 《ETRI Journal》2005,27(5):557-562

Low‐density parity‐check (LDPC) codes have recently emerged due to their excellent performance. However, the parity check (H) matrices of the previous works are not adequate for hardware implementation of encoders or decoders. This paper proposes a hybrid parity check matrix which is efficient in hardware implementation of both decoders and encoders. The hybrid H‐matrices are constructed so that both the semi‐random technique and the partly parallel structure can be applied to design encoders and decoders. Using the proposed methods, the implementation of encoders can become practical while keeping the hardware complexity of the partly parallel decoder structures. An encoder and a decoder are designed using Verilog‐HDL and are synthesized using a 0.35 µm CMOS standard cell library. 相似文献

17.

Optimizations of a Hardware Decoder for Deep-Space Optical Communications

Cheng M.K. Nakashima M.A. Moision B.E. Hamkins J. 《IEEE transactions on circuits and systems. I, Regular papers》2008,55(2):644-658

The National Aeronautics and Space Administration has developed a capacity approaching modulation and coding scheme that comprises a serial concatenation of an inner accumulate pulse-position modulation (PPM) and an outer convolutional code [or serially concatenated PPM (SCPPM)] for deep-space optical communications. Decoding of this code uses the turbo principle. However, due to the nonbinary property of SCPPM, a straightforward application of classical turbo decoding is very inefficient. Here, we present various optimizations applicable in hardware implementation of the SCPPM decoder. More specifically, we feature a Super Gamma computation to efficiently handle parallel trellis edges, a pipeline-friendly ";maxstar top-2"; circuit that reduces the max-only approximation penalty, a low-latency cyclic redundancy check circuit for window-based decoders, and a high-speed algorithmic polynomial interleaver that leads to memory savings. Using the featured optimizations, we implement a 6.72 megabits-per-second (Mbps) SCPPM decoder on a single field-programmable gate array (FPGA). Compared to the current data rate of 256 kilobits per second from Mars, the SCPPM coded scheme represents a throughput increase of more than twenty-six fold. Extension to a 50-Mbps decoder on a board with multiple FPGAs follows naturally. We show through hardware simulations that the SCPPM coded system can operate within 1 dB of the Shannon capacity at nominal operating conditions. 相似文献

18.

Multistage recursive interleaver for turbo codes in DS-CDMA mobileradio

Shibutani A. Suda H. Adachi F. 《Vehicular Technology, IEEE Transactions on》2002,51(1):88-100

A multistage recursive block interleaver (MIL) is proposed for the turbo code internal interleaver. Unlike conventional block interleavers, the MIL repeats permutations of rows and columns in a recursive manner until reaching the final interleaving length. The bit error rate (BER) and frame error rate (FER) performance with turbo coding and MIL under frequency-selective Rayleigh fading are evaluated by computer simulation for direct-sequence code-division multiple-access mobile radio. The performance of rate-1/3 turbo codes with MIL is compared with pseudorandom and S-random interleavers assuming a spreading chip rate of 4.096 Mcps and an information bit rate of 32 kbps. When the interleaving length is 3068 bits, turbo coding with MIL outperforms the pseudorandom interleaver by 0.4 dB at an average BER of 10^-6 on a fading channel using the ITU-R defined Vehicular-B power-delay profile with the maximum Doppler frequency of f_D = 80 Hz. The results also show that turbo coding with MIL provides superior performance to convolutional and Reed-Solomon concatenated coding; the gain over concatenated coding is as much as 0.6 dB 相似文献

19.

随机交织器的设计与实现 总被引：19，自引：0，他引：19

白宝明马啸王新梅《通信学报》2000,21(6):6-11

本文首先分析了随机交织器在Ｔｕｒｂｏ码中的重要作用。然后讨论了交织器长度的选择原则,并且对ＡＷＧＮ信道和Ｒａｙｌｅｉｇｎ衰落信道条件下交织深度对Ｔｕｒｂｏ码性能的影响进行了计算机模拟。在此基础上,给出了随机交织器的两种硬件电路实现方案,其中方案二便于ＡＳＩＣ实现。相似文献

20.

Dependence of d/sub free/ in MPCCC systems

Ambroze M.A. Wade G. Tomlinson M. 《Communications, IEEE Transactions on》2003,51(3):318-325

This letter first investigates the distribution of the free distance, parameter d/sub free/ for multiple parallel concatenated schemes based on random interleavers. The distribution is obtained by computer search for information weight IW=2 error events, which are the most likely events to produce d/sub free/, at least for turbo codes. The dependence upon interleaver length and code memory is also studied. The design of the S-interleaver for turbo codes is shown to depend upon a combination of IW=2 error events (which are dependent on S) and IW=2+2 "crossed" error events (which are independent of S). The limiting value of S (for which the two effects are equal) is calculated for turbo codes and a novel algorithm to increase this limit (and hence, d/sub free/) is presented. The S-random interleaver design is extended to schemes with two interleavers, for which the use of paired S-random interleavers is proposed. 相似文献