期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Flexible LDPC/Turbo Decoder Architecture

Yang Sun Joseph R. Cavallaro 《Journal of Signal Processing Systems》2011,64(1):1-16

Low-density parity-check (LDPC) codes and convolutional Turbo codes are two of the most powerful error correcting codes that are widely used in modern communication systems. In a multi-mode baseband receiver, both LDPC and Turbo decoders may be required. However, the different decoding approaches for LDPC and Turbo codes usually lead to different hardware architectures. In this paper we propose a unified message passing algorithm for LDPC and Turbo codes and introduce a flexible soft-input soft-output (SISO) module to handle LDPC/Turbo decoding. We employ the trellis-based maximum a posteriori (MAP) algorithm as a bridge between LDPC and Turbo codes decoding. We view the LDPC code as a concatenation of n super-codes where each super-code has a simpler trellis structure so that the MAP algorithm can be easily applied to it. We propose a flexible functional unit (FFU) for MAP processing of LDPC and Turbo codes with a low hardware overhead (about 15% area and timing overhead). Based on the FFU, we propose an area-efficient flexible SISO decoder architecture to support LDPC/Turbo codes decoding. Multiple such SISO modules can be embedded into a parallel decoder for higher decoding throughput. As a case study, a flexible LDPC/Turbo decoder has been synthesized on a TSMC 90 nm CMOS technology with a core area of 3.2 mm². The decoder can support IEEE 802.16e LDPC codes, IEEE 802.11n LDPC codes, and 3GPP LTE Turbo codes. Running at 500 MHz clock frequency, the decoder can sustain up to 600 Mbps LDPC decoding or 450 Mbps Turbo decoding. 相似文献

2.

A Memory Efficient Partially Parallel Decoder Architecture for Quasi-Cyclic LDPC Codes

Wang Z. Cui Z. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(4):483-488

This paper presents a memory efficient partially parallel decoder architecture suited for high rate quasi-cyclic low-density parity-check (QC-LDPC) codes using (modified) min-sum algorithm for decoding. In general, over 30% of memory can be saved over conventional partially parallel decoder architectures. Efficient techniques have been developed to reduce the computation delay of the node processing units and to minimize hardware overhead for parallel processing. The proposed decoder architecture can linearly increase the decoding throughput with a small percentage of extra hardware. Consequently, it facilitates the applications of LDPC codes in area/power sensitive high-speed communication systems 相似文献

3.

An FPGA Implementation of High‐Speed Flexible 27‐Mbps 8‐StateTurbo Decoder

Duk Gun Choi Min‐Hyuk Kim Jin Hee Jeong Ji Won Jung Jong‐Tae Bae Seok‐Soon Choi Young Yun 《ETRI Journal》2007,29(3):363-370

In this paper, we propose a flexible turbo decoding algorithm for a high order modulation scheme that uses a standard half‐rate turbo decoder designed for binary quadrature phase‐shift keying (B/QPSK) modulation. A transformation applied to the incoming I‐channel and Q‐channel symbols allows the use of an off‐the‐shelf B/QPSK turbo decoder without any modifications. Iterative codes such as turbo codes process the received symbols recursively to improve performance. As the number of iterations increases, the execution time and power consumption also increase. The proposed algorithm reduces the latency and power consumption by combination of the radix‐4, dual‐path processing, parallel decoding, and early‐stop algorithms. We implement the proposed scheme on a field‐programmable gate array and compare its decoding speed with that of a conventional decoder. The results show that the proposed flexible decoding algorithm is 6.4 times faster than the conventional scheme. 相似文献

4.

A Subthreshold PMOS Analog Cortex Decoder for the (8, 4, 4) Hamming Code

Jorge Pérez‐Chamorro Cyril Lahuec Fabrice Seguin Gérald Le Mestre Michel Jézéquel 《ETRI Journal》2009,31(5):585-592

This paper presents a method for decoding high minimal distance (d_min) short codes, termed Cortex codes. These codes are systematic block codes of rate 1/2 and can have higher d_min than turbo codes. Despite this characteristic, these codes have been impossible to decode with good performance because, to reach high d_min, several encoding stages are connected through interleavers. This generates a large number of hidden variables and increases the complexity of the scheduling and initialization. However, the structure of the encoder is well suited for analog decoding. A proof‐of‐concept Cortex decoder for the (8, 4, 4) Hamming code is implemented in subthreshold 0.25‐μm CMOS. It outperforms an equivalent LDPC‐like decoder by 1 dB at BER=10^?5 and is 44 percent smaller and consumes 28 percent less energy per decoded bit. 相似文献

5.

Nonlinear Programming Approaches to Decoding Low-Density Parity-Check Codes

《Selected Areas in Communications, IEEE Journal on》2006,24(8):1603-1613

We consider the decoding problem for low-density parity-check codes, and apply nonlinear programming methods. This extends previous work using linear programming (LP) to decode linear block codes. First, a multistage LP decoder based on the branch-and-bound method is proposed. This decoder makes use of the maximum-likelihood-certificate property of the LP decoder to refine the results when an error is reported. Second, we transform the original LP decoding formulation into a box-constrained quadratic programming form. Efficient linear-time parallel and serial decoding algorithms are proposed and their convergence properties are investigated. Extensive simulation studies are performed to assess the performance of the proposed decoders. It is seen that the proposed multistage LP decoder outperforms the conventional sum-product (SP) decoder considerably for low-density parity-check (LDPC) codes with short to medium block length. The proposed box-constrained quadratic programming decoder has less complexity than the SP decoder and yields much better performance for LDPC codes with regular structure. 相似文献

6.

Improved Reliability‐Based Iterative Decoding of LDPC Codes Based on Dynamic Threshold

下载免费PDF全文

Ma Zhuo Du Shuanyi 《ETRI Journal》2015,37(4):736-742

A serial concatenated decoding algorithm with dynamic threshold is proposed for low‐density parity‐check codes with short and medium code lengths. The proposed approach uses a dynamic threshold to select a decoding result from belief propagation decoding and order statistic decoding, which improves the performance of the decoder at a negligible cost. Simulation results show that, under a high SNR region, the proposed concatenated decoder performs better than a serial concatenated decoder without threshold with an E_b/N₀ gain of above 0.1 dB. 相似文献

7.

Low-complex processing element architecture for successive cancellation decoder

《Integration, the VLSI Journal》2019

A low-complexity design architecture for implementing the Successive Cancellation (SC) decoding algorithm for polar codes is presented. Hardware design of polar decoders is accomplished using SC decoding due to the reduced intricacy of the algorithm. Merged processing element (MPE) block is the primary area occupying factor of the SC decoder as it incorporates numerous sign and magnitude conversions. Two’s complement method is typically used in the MPE block of SC decoder. In this paper, a low-complex MPE architecture with minimal two’s complement conversion is proposed. A reformulation is also applied to the merged processing elements at the final stage of SC decoder to generate two output bits at a time. The proposed merged processing element thereby reduces the hardware complexity of the SC decoder and also reduces latency by an average of 64%. An SC decoder with code length 1024 and code rate 1/2 was designed and synthesized using 45-nm CMOS technology. The implementation results of the proposed decoder display significant improvement in the Technology Scaled Normalized Throughput (TSNT) value and an average 48% reduction in hardware complexity compared to the prevalent SC decoder architectures. Compared to the conventional SC decoder, the presented method displayed a 23% reduction in area. 相似文献

8.

Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advance turbo decoder

Yang Sun Joseph R. CavallaroAuthor vitae 《Integration, the VLSI Journal》2011,44(4):305-315

We present an efficient VLSI architecture for 3GPP LTE/LTE-Advance Turbo decoder by utilizing the algebraic-geometric properties of the quadratic permutation polynomial (QPP) interleaver. The high-throughput 3GPP LTE/LTE-Advance Turbo codes require a highly-parallel decoder architecture. Turbo interleaver is known to be the main obstacle to the decoder parallelism due to the collisions it introduces in accesses to memory. The QPP interleaver solves the memory contention issues when several MAP decoders are used in parallel to improve Turbo decoding throughput. In this paper, we propose a low-complexity QPP interleaving address generator and a multi-bank memory architecture to enable parallel Turbo decoding. Design trade-offs in terms of area and throughput efficiency are explored to find the optimal architecture. The proposed parallel Turbo decoder has been synthesized, placed and routed in a 65-nm CMOS technology with a core area of 8.3 mm² and a maximum clock frequency of 400 MHz. This parallel decoder, comprising 64 MAP decoder cores, can achieve a maximum decoding throughput of 1.28 Gbps at 6 iterations 相似文献

9.

Design and Architecture of Low‐Latency High‐Speed Turbo Decoders

Ji Won Jung In Ki Lee Duk Gun Choi Jin Hee Jeong Ki Man Kim Eun‐A Choi Deock Gil Oh 《ETRI Journal》2005,27(5):525-532

In this paper, we propose and present implementation results of a high‐speed turbo decoding algorithm. The latency caused by (de)interleaving and iterative decoding in a conventional maximum a posteriori turbo decoder can be dramatically reduced with the proposed design. The source of the latency reduction is from the combination of the radix‐4, center to top, parallel decoding, and early‐stop algorithms. This reduced latency enables the use of the turbo decoder as a forward error correction scheme in real‐time wireless communication services. The proposed scheme results in a slight degradation in bit error rate performance for large block sizes because the effective interleaver size in a radix‐4 implementation is reduced to half, relative to the conventional method. To prove the latency reduction, we implemented the proposed scheme on a field‐programmable gate array and compared its decoding speed with that of a conventional decoder. The results show an improvement of at least five fold for a single iteration of turbo decoding. 相似文献

10.

Low Complexity Decoder Architecture for Low-Density Parity-Check Codes

Daesun Oh Keshab K. Parhi 《Journal of Signal Processing Systems》2009,56(2-3):217-228

In this paper, we propose a low complexity decoder architecture for low-density parity-check (LDPC) codes using a variable quantization scheme as well as an efficient highly-parallel decoding scheme. In the sum-product algorithm for decoding LDPC codes, the finite precision implementations have an important tradeoff between decoding performance and hardware complexity caused by two dominant area-consuming factors: one is the memory for updated messages storage and the other is the look-up table (LUT) for implementation of the nonlinear function Ψ(x). The proposed variable quantization schemes offer a large reduction in the hardware complexities for LUT and memory. Also, an efficient highly-parallel decoder architecture for quasi-cyclic (QC) LDPC codes can be implemented with the reduced hardware complexity by using the partially block overlapped decoding scheme and the minimized power consumption by reducing the total number of memory accesses for updated messages. For (3, 6) QC LDPC codes, our proposed schemes in implementing the highly-parallel decoder architecture offer a great reduction of implementation area by 33% for memory area and approximately by 28% for the check node unit and variable node unit computation units without significant performance degradation. Also, the memory accesses are reduced by 20%. 相似文献

11.

New Syndrome Check Error Estimation and Its Concatenated Coding

Chang Jin Su Choi Seung Bae Lee Moon Ho 《Wireless Personal Communications》2001,19(3):193-204

A low-complexity and high performance SCEE (Syndrome Check Error Estimation) decoding method for convolutional codes and its concatenated SCEE/RS (Reed–Solomon) coding scheme are proposed. First, we describe the operation of the decoding steps in the proposed algorithm. Then deterministic values on the decoding operation are derived when some combination of predecoder-reencoder is used. Computer simulation results show that the computational complexity of the proposed SCEE decoder is significantly reduced compared to that of conventional Viterbi decoder without degradation of the P_e performance. Also, simulation results of BER performance of the concatenated SCEE/Hard Decision Viterbi (HD-Viterbi) and SCEE/RS (Reed–Solomon) codes are presented. 相似文献

12.

High-throughput layered decoder implementation for quasi-cyclic LDPC codes 总被引：2，自引：0，他引：2

Zhang K. Huang X. Wang Z. 《Selected Areas in Communications, IEEE Journal on》2009,27(6):985-994

This paper presents a high-throughput decoder design for the Quasi-Cyclic (QC) Low-Density Parity-Check (LDPC) codes. Two new techniques are proposed, including parallel layered decoding architecture (PLDA) and critical path splitting. PLDA enables parallel processing for all layers by establishing dedicated message passing paths among them. The decoder avoids crossbar-based large interconnect network. Critical path splitting technique is based on articulate adjustment of the starting point of each layer to maximize the time intervals between adjacent layers, such that the critical path delay can be split into pipeline stages. Furthermore, min-sum and loosely coupled algorithms are employed for area efficiency. As a case study, a rate-1/2 2304-bit irregular LDPC decoder is implemented using ASIC design in 90nm CMOS process. The decoder can achieve the maximum decoding throughput of 2.2Gbps at 10 iterations. The operating frequency is 950MHz after synthesis and the chip area is 2.9mm². 相似文献

13.

Architectural Decomposition of Video Decoders by Meansof an Intermediate Data Stream Format

Henryk Richter Benno Stabernack Volker Kühn 《Journal of Signal Processing Systems》2014,75(1):65-84

The microprocessor industry trend towards many-core architectures introduced the necessity of devising appropriately scalable applications. While implementing software based video decoding, the main challenges are the optimized partitioning of decoder operations, efficient tracking of dependencies and resource synchronization for multiple parallel units. The same applies for hardware implementations of video decoders where monolithic approaches anticipate scalability of the design and reusability of already implemented core components.In this paper, we propose an intermediate data stream format (Meta Format Stream) which is suited for architectural decomposition of video decoding by replacing the conventional monolithic decoder architecture design with a pipelined structure. The Meta Format is forward-oriented and self contained and multistandard capable, so that processing of Meta Streams is independent of the originating bit stream. Our approach does not require special coding settings and is applicable to accelerated decoding of any standards-compliant bit stream. A H.264/AVC multiprocessing proposal is presented as a case study for the potential our our concept. The case study combines coarse grained frame-level parallel decoding of the bit stream with fine-grained macroblock level parallelism in the image processing stage.The proposed H.264 decoder achieved speedup factors of up to 7.6 on an 8 core machine with 2-way SMT. We are reporting actual decoding speeds of up to 150 frames per second in 2160p-resolution. 相似文献

14.

Turbo Product Codes Based on Convolutional Codes

Orhan Gazi Ali zgür Y&#x;lmaz 《ETRI Journal》2006,28(4):453-460

In this article, we introduce a new class of product codes based on convolutional codes, called convolutional product codes. The structure of product codes enables parallel decoding, which can significantly increase decoder speed in practice. The use of convolutional codes in a product code setting makes it possible to use the vast knowledge base for convolutional codes as well as their flexibility in fast parallel decoders. Just as in turbo codes, interleaving turns out to be critical for the performance of convolutional product codes. The practical decoding advantages over serially‐concatenated convolutional codes are emphasized. 相似文献

15.

Optimal Overlapped Message Passing Decoding of Quasi-Cyclic LDPC Codes

Yongmei Dai Zhiyuan Yan Ning Chen 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(5):565-578

Efficient hardware implementation of low-density parity-check (LDPC) codes is of great interest since LDPC codes are being considered for a wide range of applications. Recently, overlapped message passing (OMP) decoding has been proposed to improve the throughput and hardware utilization efficiency (HUE) of decoder architectures for LDPC codes. In this paper, we first study the scheduling for the OMP decoding of LDPC codes, and show that maximizing the throughput gain amounts to minimizing the intra- and inter-iteration waiting times. We then focus on the OMP decoding of quasi-cyclic (QC) LDPC codes. We propose a partly parallel OMP decoder architecture and implement it using FPGA. For any QC LDPC code, our OMP decoder achieves the maximum throughput gain and HUE due to overlapping, hence has higher throughput and HUE than previously proposed OMP decoders while maintaining the same hardware requirements. We also show that the maximum throughput gain and HUE achieved by our OMP decoder are ultimately determined by the given code. Thus, we propose a coset-based construction method, which results in QC LDPC codes that allow our optimal OMP decoder to achieve higher throughput and HUE. 相似文献

16.

A Reconfigurable TDMP Decoder for Raptor Codes

Hady Zeineddine Mohammad M. Mansour 《Journal of Signal Processing Systems》2012,69(3):293-304

A Raptor code is a concatenation of a fixed rate precode and a Luby-Transform (LT) code that can be used as a rateless error-correcting code over communication channels. By definition, Raptor codes are characterized by irregularity features such as dynamic rate, check-degree variability, and joint coding, which make the design of hardware-efficient decoders a challenging task. In this paper, serial turbo decoding of architecture-aware Raptor codes is mapped into sequential row processing of a regular matrix by using a combination of code enhancements and architectural optimizations. The proposed mapping approach is based on three basic steps: (1) applying systematic permutations on the source matrix of the Raptor code, (2) confining LT random encoding to pseudo-random permutation of messages and periodic selection of row-splitting scenarios, and (3) developing a reconfigurable parallel check-node processor that attains a constant throughput while processing LT- and LDPC-nodes of varying degrees and count. The decoder scheduling is, thus, made simple and uniform across both LDPC and LT decoding. A serial decoder implementing the proposed approach was synthesized in 65 nm, 1.2 V CMOS technology. Hardware simulations show that the decoder, decoding a rate-0.4 code instance, achieves a throughput of 36 Mb/s at SNR of 1.5 dB, dissipates an average power of 27 mW and occupies an area of 0.55 mm². 相似文献

17.

Fast Multibit Decision Polar Decoder for Successive-Cancellation List Decoding

Jeong Seo Lin Bae Jung Hyun Sunwoo Myung Hoon 《Journal of Signal Processing Systems》2021,93(1):127-136

Successive-cancellation list (SCL) decoding for polar codes has the disadvantage of high latency owing to serial operations. To improve the latency, several algorithms with additional circuits have been proposed, but the area becomes larger. This paper proposes a fast multibit decision method having-high area efficiency based on the SCL decoding algorithm. First, multiple bits can be determined to reduce clock cycles using new nodes represented by the information bits and frozen bits. We propose the new nodes called the combined nodes and the other node in this paper. The combined nodes that combine redundant operations of the fast-simplified SC (fast-SSC) algorithm can increase area efficiency. The other node with bit patterns other than the node types of the fast-SSC algorithm performs an 8-bit multibit decision to reduce the number of decoding cycles. Latency is further reduced by applying a sphere decoding method to the other node. In addition, a sorter is proposed to reduce the critical path delay. As a large number of path metrics causes sorter delays, the proposed sorter can achieve high throughput with the small area. The proposed (1024, 512) SCL decoder showed negligible performance degradation in the simulation using Matlab and was synthesized using 65 nm CMOS technology. The proposed decoder achieves about 1.3Gbps with the small area. As a result, the area-throughput efficiency is at least 1.4 times higher than the state-of-the-art works over 1 Gbps.

相似文献

18.

基于FPGA的数字高清晰度电视视频解码器的设计和实现

周萍俞斯乐《电子与信息学报》1998,20(6):799-805

本文介绍了一个能实时解码基于MPEG-2的高清晰度电视(HDTV)编码流的视频解码器的设计方案及其实现。在设计中采用大量FPGA以及能实现高速处理的并行处理技术和流水线工作方式,并研究了由并行处理而导致的运动补偿越界等特殊问题的解决途径。论文阐明了解码器的总体结构和各主要电路的组成以及整个解码过程的具体实现。相似文献

19.

High-Speed RS(255, 239) Decoder Based on LCC Decoding

F. García-Herrero J. Valls P. K. Meher 《Circuits, Systems, and Signal Processing》2011,30(6):1643-1669

Algebraic Soft-Decision Decoding (ASD) of Reed–Solomon (RS) codes provides higher coding gain over the conventional hard-decision decoding (HDD), but involves high computational complexity. Among the existing ASD methods, the Low Complexity Chase (LCC) decoding is the one with the lowest implementation cost. LCC decoding is based on generating 2^η test vectors, where η symbols are selected as the least reliable symbols for which hard-decision or the second more reliable decision are employed. Previous decoding algorithms for LCC decoders are based on interpolation and re-encoding techniques. On the other hand, HDD algorithms such as the Berlekamp–Massey (BM) algorithm or the Euclidean algorithm, despite of their low computational complexity, are not considered suitable for LCC decoding. In this paper, we present a new approach to LCC decoding based on one of these HDD algorithms, the inversion-less Berlekamp–Massey (iBM) algorithm, where the test vectors are selected for correction during decoding on occurrence of hard-decision decoding failure. The proposed architecture when applied to a RS(255, 239) code with η=3, saves a 20.5% and 2% of area compared to the LCC with factorization and a factorization-free decoder, respectively. In both cases, the latency is reduced by 34.5%, which is an increase of throughput rate in the same percentage since the critical path is the same in all the competing architectures. So an efficiency of at least 56% in terms of area-delay product can be obtained, compared with previous works. A complete RS(255, 239) LCC decoder with η=3 has been coded in VHDL and synthesized for implementation in Vitex-5 FPGA device, and by using SAED 90 nm standard cell library as well, and find a decoding rate of 710 Mbps and 4.2 Gbps and area of 2527 slices and 0.36 mm², respectively. 相似文献

20.

Hardware Implementation of Successive-Cancellation Decoders for Polar Codes

Camille Leroux Alexandre J. Raymond Gabi Sarkis Ido Tal Alexander Vardy Warren J. Gross 《Journal of Signal Processing Systems》2012,69(3):305-315

The recently-discovered polar codes are seen as a major breakthrough in coding theory; they provably achieve the theoretical capacity of discrete memoryless channels using the low-complexity successive cancellation decoding algorithm. Motivated by recent developments in polar coding theory, we propose a family of efficient hardware implementations for successive cancellation (SC) polar decoders. We show that such decoders can be implemented with O(N) processing elements and O(N) memory elements. Furthermore, we show that SC decoding can be implemented in the logarithmic domain, thereby eliminating costly multiplication and division operations, and reducing the complexity of each processing element greatly. We also present a detailed architecture for an SC decoder and provide logic synthesis results confirming the linear complexity growth of the decoder as the code length increases. 相似文献