首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Turbo decoders inherently have large decoding latency and low throughput due to iterative decoding. To increase the throughput and reduce the latency, high-speed decoding schemes have to be employed. In this paper, following a discussion on basic parallel decoding architectures, the segmented sliding window approach and two other types of area-efficient parallel decoding schemes are proposed. Detailed comparison on storage requirement, number of computation units, and the overall decoding latency is provided for various decoding schemes with different levels of parallelism. Hybrid parallel decoding schemes are proposed as an attractive solution for very high level parallelism implementations. To reduce the storage bottleneck for each subdecoder, a modified version of the partial storage of state metrics approach is presented. The new approach achieves a better tradeoff between storage part and recomputation part in general. The application of the pipeline-interleaving technique to parallel turbo decoding architectures is also presented. Simulation results demonstrate that the proposed area-efficient parallel decoding schemes do not cause performance degradation.  相似文献   

2.
《Microelectronics Reliability》2014,54(11):2645-2648
The interest in using advanced Error Correction Codes (ECCs) to protect memories and caches is growing. This is because as process technology downscales, errors are more frequent and also tend to affect multiple bits. For SRAM memories and caches, latency is a limiting factor and ECCs have to provide low decoding times that can in most cases be only achieved with the use of a parallel decoder. One important issue with parallel decoders is that they typically require large circuit area to be implemented. One type of ECCs that has been explored for memory protection is Difference Set (DS) codes. In this research note, an optimized parallel decoding scheme for DS codes is presented and evaluated. The results show that the circuit area and the decoding delay are reduced compared to a traditional implementation. In addition, the new scheme enables a reduction in the number of parity check bits thus reducing the memory size.  相似文献   

3.
Parameter design methods, in general, do not take into account the common occurrence that some of the uncontrollable factors are observable for products and processes, during operation and production, respectively. This paper introduces a methodology that facilitates on-line parameter design for products and processes utilizing the extra information available about observable uncontrollable factors. Implementation of the proposed methodology leads to a quality controller that operates in two distinct modes: identification mode and on-line parameter design mode, identification mode involves establishing a model that relates quality response characteristics with significant controllable and uncontrollable variables. On-line parameter design mode involves optimization of the controllable variables with respect to desired levels of output quality parameters, with consideration to levels of the observable uncontrollable variables. A plasma etching semiconductor manufacturing process is used as a testbed for the proposed intelligent quality controllers. Results reveal that the proposed quality controllers can be used for on-line parameter design of manufacturing processes. Results also reveal that significant improvements in quality (measured in terms of average deviation of process outputs from target) over off-line parameter design approaches are to be expected in production processes with some level of control on uncontrollable variables. Even in the absence of any control on uncontrollable variables, the proposed controllers always perform better than traditional off-line robust parameter design techniques; however, the improvements may not be significant  相似文献   

4.
Soft-decision-feedback MAP decoders are developed for joint source/channel decoding (JSCD) which uses the residual redundancy in two-dimensional sources. The source redundancy is described by a second order Markov model which is made available to the receiver for row-by-row decoding, wherein the output for one row is used to aid the decoding of the next row. Performance can be improved by generalizing so as to increase the vertical depth of the decoder. This is called sheet decoding, and entails generalizing trellis decoding of one-dimensional data to trellis decoding of two-dimensional data (2-D). The proposed soft-decision-feedback sheet decoder is based on the Bahl algorithm, and it is compared to a hard-decision-feedback sheet decoder which is based on the Viterbi algorithm. The method is applied to 3-bit DPCM picture transmission over a binary symmetric channel, and it is found that the soft-decision-feedback decoder with vertical depth V performs approximately as well as the hard-decision-feedback decoder with vertical depth V+1. Because the computational requirement of the decoders depends exponentially on the vertical depth, the soft-decision-feedbark decoder offers significant reduction in complexity. For standard monochrome Lena, at a channel bit error rate of 0.05, the V=1 and V=2 soft-decision-feedback decoder JSCD gains in RSNR are 5.0 and 6.3 dB, respectively.  相似文献   

5.
Reed-Solomon (RS) codes are among the most widely utilized error-correcting codes in digital communication and storage systems. Among the decoding algorithms of RS codes, the recently developed Koetter-Vardy (KV) soft-decision decoding algorithm can achieve substantial coding gain, while has a polynomial complexity. One of the major steps of the KV algorithm is the factorization. Each iteration of the factorization mainly consists of root computations over finite fields and polynomial updating. To speed up the factorization step, a fast factorization architecture has been proposed to circumvent the exhaustive-search-based root computation from the second iteration level by using a root-order prediction scheme. Based on this scheme, a partial parallel factorization architecture was proposed to combine the polynomial updating in adjacent iteration levels. However, in both of these architectures, the root computation in the first iteration level is still carried out by exhaustive search, which accounts for a significant part of the overall factorization latency. In this paper, a novel iterative prediction scheme is proposed for the root computation in the first iteration level. The proposed scheme can substantially reduce the latency of the factorization, while only incurs negligible area overhead. Applying this scheme to a (255, 239) RS code, speedups of 36% and 46% can be achieved over the fast factorization and partial parallel factorization architectures, respectively.  相似文献   

6.
Jian Wang  Yubai Li  Huan Li 《ETRI Journal》2013,35(5):767-774
In this paper, a novel parallel Viterbi decoding scheme is proposed to decrease the decoding latency and power consumption for the software‐defined radio (SDR) system. It implements a divide‐and‐conquer approach by first dividing a block into a series of subblocks, then performing independent Viterbi decoding for each subsequence, and finally merging the surviving subpaths into the final path. Moreover, a network‐on‐chip‐based SDR platform is used to evaluate the performance of the proposed parallel Viterbi decoding scheme. The experiment results show that our scheme can speed up the Viterbi decoding process without increasing the BER, and it performs better than the current state‐of‐the‐art methods.  相似文献   

7.
We present a framework for the analysis of the decoding delay in multiview video coding (MVC). We show that in real-time applications, an accurate estimation of the decoding delay is essential to achieve a minimum communication latency. As opposed to single-view codecs, the complexity of the multiview prediction structure and the parallel decoding of several views requires a systematic analysis of this decoding delay, which we solve using graph theory and a model of the decoder hardware architecture. Our framework assumes a decoder implementation in general purpose multi-core processors with multi-threading capabilities. For this hardware model, we show that frame processing times depend on the computational load of the decoder and we provide an iterative algorithm to compute jointly frame processing times and decoding delay. Finally, we show that decoding delay analysis can be applied to design decoders with the objective of minimizing the communication latency of the MVC system.  相似文献   

8.
We describe parallel concatenated codes for communication over finite-state binary Markov channels. We present encoder design techniques and decoder processing modifications that utilize the a priori statistics of the channel and show that the resulting codes allow reliable communication at rates which are above the capacity of a memoryless channel with the same stationary bit error probability as the Markov channel. These codes outperform systems based on the traditional approach of using a channel interleaver to create a channel which is assumed to be memoryless. In addition, we introduce a joint estimation/decoding method that allows the estimation of the parameters of the hidden Markov model when they are not known a priori  相似文献   

9.
Inter-window shuffle (IWS) interleavers are a class of collision-free (CF) interleavers that have been applied to parallel turbo decoding. In this paper, we present modified IWS (M-IWS) interleavers that can further increase turbo decoding throughput only at the expense of slight performance degradation. By deriving the number of M-IWS interleavers, we demonstrate that the number is much smaller than that of IWS interleavers, whereas they both have a very simple algebraic representation. Further, it is shown by analysis that under given conditions, storage requirements of M-IWS interleavers can be reduced to only 368 storage bits for variable interleaving lengths. In order to realize parallel outputs of the on-line interleaving addresses, a low-complexity architecture design of M-IWS interleavers for parallel turbo decoding is proposed, which also supports variable interleaving lengths. Therefore, the M-IWS interleavers are very suitable for the turbo decoder in next generation communication systems with the high data rate and low latency requirements.  相似文献   

10.
This paper considers a class of iterative message-passing decoders for low-density parity-check codes in which the decoder can choose its decoding rule from a set of decoding algorithms at each iteration. Each available decoding algorithm may have different per-iteration computation time and performance. With an appropriate choice of algorithm at each iteration, overall decoding latency can be reduced significantly, compared with standard decoding methods. Such a decoder is called a gear-shift decoder because it changes its decoding rule (shifts gears) in order to guarantee both convergence and maximum decoding speed (minimum decoding latency). Using extrinsic information transfer charts, the problem of finding the optimum (minimum decoding latency) gear-shift decoder is formulated as a computationally tractable dynamic program. The optimum gear-shift decoder is proved to have a decoding threshold equal to or better than the best decoding threshold among those of the available algorithms. In addition to speeding up software decoder implementations, gear-shift decoding can be applied to optimize a pipelined hardware decoder, minimizing hardware cost for a given decoder throughput.  相似文献   

11.
A common joint source-channel (JSC) decoder structure for predictively encoded sources involves first forming a JSC decoding estimate of the prediction residual and then feeding this estimate to a standard predictive decoding (synthesis) filter. In this paper, we demonstrate that in a JSC decoding context, use of this standard filter is suboptimal. In place of the standard filter, we choose the synthesis filter coefficients to give a least-squares (LS) estimate of the original source, based on given training data. For first-order differential pulse-code modulation, this yields as much as 0.65-dB gain in reconstructing first-order Gauss-Markov sources. More gains are achieved with modest additional complexity by increasing the filter order. While performance can also be enhanced by increasing the source's Markov model order and/or the decoder's lookup table memory, complexity grows exponentially in these parameters. For both predictive and nonpredictive coding, our LS approach offers a strategy for increasing the estimation accuracy of JSC decoders while retaining manageable complexity.  相似文献   

12.
Standard VLSI implementations of turbo decoding require substantial memory and incur a long latency, which cannot be tolerated in some applications. A parallel VLSI architecture for low-latency turbo decoding, comprising multiple single-input single-output (SISO) elements, operating jointly on one turbo-coded block, is presented and compared to sequential architectures. A parallel interleaver is essential to process multiple concurrent SISO outputs. A novel parallel interleaver and an algorithm for its design are presented, achieving the same error correction performance as the standard architecture. Latency is reduced up to 20 times and throughput for large blocks is increased up to six-fold relative to sequential decoders, using the same silicon area, and achieving a very high coding gain. The parallel architecture scales favorably: latency and throughput are improved with increased block size and chip area.  相似文献   

13.
Binary Polar Codes (BPCs) have advantages of high-efficiency and capacity-achieving but suffer from large latency due to the Successive-Cancellation List (SCL) decoding. Non-Binary Polar Codes (NBPCs) have been investigated to obtain the performance gains and reduce latency under the implementation of parallel architectures for multi-bit decoding. However, most of the existing works only focus on the Reed-Solomon matrix-based NBPCs and the probability domain-based non-binary polar decoding, which lack flexible structure and have a large computation amount in the decoding process, while little attention has been paid to general non-binary kernel-based NBPCs and Log-Likelihood Ratio (LLR) based decoding methods. In this paper, we consider a scheme of NBPCs with a general structure over GF(2m). Specifically, we pursue a detailed Monte-Carlo simulation implementation to determine the construction for proposed NBPCs. For non-binary polar decoding, an SCL decoding based on LLRs is proposed for NBPCs, which can be implemented with non-binary kernels of arbitrary size. Moreover, we propose a Perfect Polarization-Based SCL (PPB-SCL) algorithm based on LLRs to reduce decoding complexity by deriving a new update function of path metric for NBPCs and eliminating the path splitting process at perfect polarized (i.e., highly reliable) positions. Simulation results show that the bit error rate of the proposed NBPCs significantly outperforms that of BPCs. In addition, the proposed PPB-SCL decoding obtains about a 40% complexity reduction of SCL decoding for NBPCs.  相似文献   

14.
提出了一种平行子状态隐马尔可夫模型用作噪声鲁棒语音识别的声学模型。该模型融合了纯净语音和背景噪声信息,模型的每个状态包含平行关系的子状态。在此基础上,提出了两种用于平行子状态隐马尔可夫模型的识别解码策略——子状态最大似然解码和联合转移子状态最大似然解码。实验结果表明,声学模型及其解码策略在各种噪声下取得了良好鲁棒识别效果。  相似文献   

15.
为了设计高效的LDPC译码器,结合准循环结构LDPC的校验矩阵H的规律性、乘性修正最小和译码算法不需要估计信道质量的特点和部分并行译码实现复杂度低的特点,介绍了一种新的译码算法——交迭的部分并行译码算法,这种译码算法相对于采用部分并行结构的BP译码算法,不但降低了硬件实现的复杂度,减少了存储资源的开销,而且提高了译码器的吞吐率。  相似文献   

16.
该文提出将图像编码后残留冗余的马尔可夫场模型分解为4个方向的马尔可夫链,并结合简化的模型及低密度奇偶校验码(LDPC)译码的软输出进行信源-信道联合译码。将分解后信源中多个方向上同时存在的相关性看作一种特殊的天然信道编码方式,利用前向-后向算法、和积算法以及信道译码软输出分别对信源符号进行串行和并行的译码。仿真实验表明,与传统利用马尔可夫场模型的联合译码算法相比,该联合译码算法降低了复杂度,同时提高了重建图像的峰值信噪比。  相似文献   

17.
The main problem with the hardware implementation of turbo codes is the lack of parallelism in the MAP-based decoding algorithm. This paper proposes to overcome this problem by using a new family of turbo codes called Multiple Slice Turbo Codes. This family is based on two ideas: the encoding of each dimension with P independent tail-biting codes and a constrained interleaver structure that allows the parallel decoding of the P independent codewords in each dimension. The optimization of the interleaver is described. A high degree of parallelism is obtained with equivalent or better performance than thedvb-rcs turbo code. For very high throughput applications, the parallel architecture decreases both decoding latency and hardware complexity compared to the classical serial architecture, which requires memory duplication.  相似文献   

18.
In this letter, we present an improved index-based a-posteriori probability (APP) decoding approach for the error-resilient transmission of packetized variable-length encoded Markov sources. The proposed algorithm is based on a novel two-dimensional (2D) state representation which leads to a three-dimensional trellis with unique state transitions. APP decoding on this trellis is realized by employing a 2D version of the BCJR algorithm where all available source statistics can be fully exploited in the source decoder. For an additional use of channel codes the proposed approach leads to an increased error-correction performance compared to a one-dimensional state representation.  相似文献   

19.
Turbo码的一种并行译码方案及相应的并行结构交织器研究   总被引:1,自引:0,他引:1  
Turbo码基于MAP算法译码的递推计算所引入高的译码延迟限制了Turbo码在高速率数据传输中的应用。为了解决这个问题,该文提供了一种降低译码延迟的并行译码方法。并行处理方案的实现必须通过适当的交织以避免两个译码器对外信息读写的数据冲突。该文在分析了任意无冲突交织方式可能性的存在之后,给出了设计任意地适用于并行处理方案的S随机交织器的方法。仿真验证了并行译码方案的误比特性能。  相似文献   

20.
In this paper, a novel K-nested layered look-ahead method and its corresponding architecture, which combine K-trellis steps into one trellis step (where K is the encoder constraint length), are proposed for implementing low-latency high-throughput rate Viterbi decoders. The proposed method guarantees parallel paths between any two-trellis states in the look-ahead trellises and distributes the add-compare-select (ACS) computations to all trellis layers. It leads to regular and simple architecture for the Viterbi decoding algorithm. The look-ahead ACS computation latency of the proposed method increases logarithmically with respect to the look-ahead step (M) divided by the encoder constraint length (K) as opposed to linearly as in prior work. For a 4-state (i.e., K=3) convolutional code, the decoding latency of the Viterbi decoder using proposed method is reduced by 84%, at the expense of about 22% increase in hardware complexity, compared with conventional M-step look-ahead method with M=48 (where M is also the level of parallelism). The main advantage of our proposed design is that it has the least latency among all known look-ahead Viterbi decoders for a given level of parallelism.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号