期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Three-Parallel Reed-Solomon Decoder Using S-DCME for High-Speed Communications

Jae Do Lee Myung Hoon Sunwoo 《Journal of Signal Processing Systems》2012,66(1):15-24

This paper proposes a high-speed and area-efficient three-parallel Reed-Solomon (RS) decoder using the simplified degree computationless modified Euclid (S-DCME) algorithm for the key equation solver (KES) block. To achieve a high throughput rate, the inner signals, such as the syndrome, error locator and error value polynomials, are computed in parallel. In addition, the key equations are solved by using the S-DCME algorithm to reduce the hardware complexity. To handle the many problems caused by applying the S-DCME algorithm to the KES block, we modify the architectures of some of the blocks in the three-parallel RS decoder. The proposed RS architecture can reduce the hardware complexity by about 80% with respect to the KES block. In addition, the proposed RS architecture has an approximately 25% shorter latency than the conventional parallel RS architectures. 相似文献

2.

面积优化RS编解码器的VLSI设计

尧勇仕顾晓峰于宗光《微电子学》2008,38(6)

介绍了一种适用于数字电视广播视频(DVB)系统的面积优化RS(204, 188)编解码器的VLSI设计.设计中,充分考虑DVB系统的特性,采用软硬件协调和优化的三级流水线结构,运用改进的Berlekamp-Massey迭代算法来实现,有效地缩小了RS编解码器的面积,适合应用于高清晰数字电视芯片. 相似文献

3.

New degree computationless modified euclid algorithm and architecture for Reed-Solomon decoder 总被引：3，自引：0，他引：3

Baek J.H. Sunwoo M.H. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(8):915-920

This paper proposes a new degree computationless modified Euclid (DCME) algorithm and its dedicated architecture for Reed-Solomon (RS) decoder. This architecture has low hardware complexity compared with conventional modified Euclid (ME) architectures, since it can completely remove the degree computation and comparison circuits. The architecture employing a systolic array requires only the latency of 2t clock cycles to solve the key equation without initial latency. In addition, the DCME architecture using 3t+2 basic cells has regularity and scalability since it uses only one processing element. Hence, the proposed DCME architecture provides the short latency and low-cost RS decoding. The DCME architecture has been synthesized using the 0.25-/spl mu/m Faraday CMOS standard cell library and operates at 200 MHz. The gate count of the DCME architecture is 21 760. Hence, the RS decoder using the proposed DCME architecture can reduce the total gate count by at least 23% and the total latency to at least 10% compared with conventional ME decoders. 相似文献

4.

Fast factorization architecture in soft-decision Reed-Solomon decoding

Xinmiao Zhang Parhi K.K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(4):413-426

Reed-Solomon (RS) codes are among the most widely utilized block error-correcting codes in modern communication and computer systems. Compared to its hard-decision counterpart, soft-decision decoding offers considerably higher error-correcting capability. The recent development of soft-decision RS decoding algorithms makes their hardware implementations feasible. Among these algorithms, the Koetter-Vardy (KV) algorithm can achieve substantial coding gain for high-rate RS codes, while maintaining a polynomial complexity with respect to the code length. In the KV algorithm, the factorization step can consume a major part of the decoding latency. A novel architecture based on root-order prediction is proposed in this paper to speed up the factorization step. As a result, the time-consuming exhaustive-search-based root computation in each iteration level, except the first one, of the factorization step is circumvented with more than 99% probability. Using the proposed architecture, a speedup of 141% can be achieved over prior efforts for a (255, 239) RS code, while the area consumption is reduced to 31.4%. 相似文献

5.

一种高效地实现运动估计算法的VLSI结构

舒清明徐葭生《电子学报》1995,23(5):12-16

本文提出了一种全新的低延滞、高吞吐率、可编程的ＶＬＳＩ树型结构，它能十分有效地实现ＦＳＡ和ＴＳＳＡ运动估计算法。该结构比其它树型结构少１／３的处理单元（ＰＥ），而且ＰＥ单元的延时减少一半。独特的ＭＥ窗缓冲结构使Ｉ／Ｏ带宽和Ｉ／Ｏ管脚大大减小，交叉流水线技术使硬件利用率可达到１００％。这些特点使得该结构适合ＶＬＳＩ实现。相似文献

6.

VLSI Structures for Viterbi Receivers: Part I--General Theory and Applications

Gulak P. Shwedyk E. 《Selected Areas in Communications, IEEE Journal on》1986,4(1):142-154

A taxonomy of VLSI grid model layouts is presented for the implementation of certain types of digital communication receivers based on the Viterbi algorithm. We deal principally with networks of many simple processors connected to perform the Viterbi algorithm in a highly parallel way. Two interconnection patterns of interest are the "shuffleexchange" and the "cube-connected cycles." The results are generally applicable to the development of area-efficient VLSI circuits for decoding: convolutional codes, coded modulation with multilevel/phase signals, punctured convolutional codes, correlatively encoded MSK signals and for maximum likelihood sequence estimation ofM-ary signals on intersymbol interference channels. In a companion paper, we elaborate on how the concepts presented here can be applied to the problem of building encoded MSK Viterbi receivers. Lower bounds are established on the product (chip area) * (baud rate)^-2and on the energy consumption that any VLSI implementation of the Viterbi algorithm must obey, regardless of the architecture employed or the intended application. 相似文献

7.

Efficient VLSI Architecture for Soft-Decision Decoding of Reed–Solomon Codes

《IEEE transactions on circuits and systems. I, Regular papers》2008,55(10):3050-3062

Reed–Solomon (RS) codes have very broad applications in digital communication and storage systems. The recently developed algebraic soft-decision decoding (ASD) algorithms of RS codes can achieve substantial coding gain with polynomial complexity. Among the ASD algorithms with practical multiplicity assignment schemes, the bit-level generalized minimum distance (BGMD) decoding algorithm can achieve similar or higher coding gain with lower complexity. ASD algorithms consist of two major steps: the interpolation and the factorization. In this paper, novel architectures for both steps are proposed for the BGMD decoder. The interpolation architecture is based on the newly proposed Lee-O'Sullivan (LO) algorithm. By exploiting the characteristics of the LO algorithm and the multiplicity assignment scheme in the BGMD decoder, the proposed interpolation architecture for a (255, 239) RS code can achieve 25% higher efficiency in terms of speed/area ratio than prior efforts. Root computation over finite fields and polynomial updating are the two main steps of the factorization. A low-latency and prediction-free scheme is introduced in this paper for the root computation in the BGMD decoder. In addition, novel coefficient storage schemes and parallel processing architectures are developed to reduce the latency of the polynomial updating. The proposed factorization architecture is 126% more efficient than the previous direct root computation factorization architecture. 相似文献

8.

基于DVD应用的流水线RS-PC解码的VLSI设计 总被引：2，自引：0，他引：2

周云明刘政林于宝东邹雪城《电视技术》2003,(9):59-61

基于DVD数据纠错的应用，设计实现了全程流水线处理的RS-PC解码，采用分解的无逆BM(Berlekamp—Massey)算法和脉动时序控制实现RS解码器的三级流水线处理，采用行列独立的缓冲器和纠错解码器实现行列纠错的两级流水线处理。该RS-PC解码能达到非常快的处理速度，在行列纠错处理无迭代的情况下，数据率可达到每时钟一个字节。相似文献

9.

An Efficient VLSI Architecture for Nonbinary LDPC Decoders

Lin J. Sha J. Wang Z. Li L. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2010,57(1):51-55

Low-density parity-check (LDPC) codes constructed over the Galois field $ hbox{GF}(q)$, which are also called nonbinary LDPC codes, are an extension of binary LDPC codes with significantly better performance. Although various kinds of low-complexity quasi-optimal iterative decoding algorithms have been proposed, the VLSI implementation of nonbinary LDPC decoders has rarely been discussed due to their hardware unfriendly properties. In this brief, an efficient selective computation algorithm, which totally avoids the sorting process, is proposed for Min–Max decoding. In addition, an efficient VLSI architecture for a nonbinary Min–Max decoder is presented. The synthesis results are given to demonstrate the efficiency of the proposed techniques. 相似文献

10.

A Novel Low-Cost Multi-Mode Reed Solomon Decoder Design Based on Peterson-Gorenstein-Zierler Algorithm

Huai-Yi Hsu Sheng-Feng Wang An-Yeu Wu 《The Journal of VLSI Signal Processing》2003,34(3):251-259

Reed-Solomon (RS) codes play an important role in providing error protection and data integrity. Among various Reed-Solomon decoding algorithms, the Peterson-Gorenstein-Zierler (PGZ) algorithm in general has the least computational complexity for small t values. However, unlike the iterative approaches (e.g., Berlekamp-Massey and Euclidean algorithms), it will encounter divided-by-zero problems in solving multiple t values. In this paper, we propose a multi-mode hardware architecture for error numbers ranging from zero to three. We first propose a cost-down technique to reduce the hardware complexity of a t = 3 decoder. A Finite-field Inversion (FFI) elimination scheme is also proposed in our PGZ kernel. Next, we perform an algorithmic-level derivation to identify the configurable feature of our design. With those manipulations, we are able to perform multi-mode RS decoding in one unified VLSI architecture with very simple control scheme. The very low cost and simple data-path make our design a good choice in small-footprint embedded VLSI systems such as Error Control Coding (ECC) in memory/storage systems. 相似文献

11.

Concurrent Error Detection in Reed–Solomon Encoders and Decoders

Cardarilli G.C. Pontarelli S. Re M. Salsano A. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(7):842-846

Reed-Solomon (RS) codes are widely used to identify and correct errors in transmission and storage systems. When RS codes are used for high reliable systems, the designer should also take into account the occurrence of faults in the encoder and decoder subsystems. In this paper, self-checking RS encoder and decoder architectures are presented. The RS encoder architecture exploits some properties of the arithmetic operations in GF(2^m). These properties are related to the parity of the binary representation of the elements of the Galois field. In the RS decoder, the implicit redundancy of the received codeword, under suitable assumptions explained in this paper, allows implementing concurrent error detection schemes useful for a wide range of different decoding algorithms with no intervention on the decoder architecture. Moreover, performances in terms of area and delay overhead for the proposed circuits are presented. 相似文献

12.

Architecture Design of Fine Grain Quality Scalable Encoder with CABAC for H.264/AVC Scalable Extension

Tzu-Der Chuang Yu-Jen Chen Yi-Hau Chen Shao-Yi Chien Liang-Gee Chen 《Journal of Signal Processing Systems》2010,60(3):363-375

In addition to coding efficiency, the scalable extension of H.264/AVC provides good functionality for video adaptation in heterogeneous environments. Fine grain scalability (FGS) is a technique to extract video bitstream at the finest quality level under the given bandwidth. In this paper, an architecture of FGS encoder with low external memory bandwidth and low hardware cost is proposed. Up to 99% of bandwidth reduction can be attained by the proposed scan bucket algorithm, early context modeling with context reduction, and first scan pre-encoding. The area-efficient hardware architecture is implemented by layer-wise hardware reuse. Besides, three design strategies for enhancement layer coder are explored so that the trade-off between external memory bandwidth and silicon area is allowed. The proposed hardware architecture can real-time encode HDTV 1920×1080 video with two FGS enhancement layers at 200 MHz working frequency, or HDTV 1280×720 video with three FGS enhancement layers at 130 MHz working frequency. 相似文献

13.

Decoding algorithm and architecture for BCH codes under the Lee Metric

Wu Y. Hadjicostis C.N. 《Communications, IEEE Transactions on》2008,56(12):2050-2059

The Lee metric measures the circular distance between two elements in a cyclic group and is particularly appropriate as a measure of distance for data transmission under phase-shift-keying modulation over a white noise channel. In this paper, using newly derived properties on Newton?s identities, we initially investigate the Lee distance properties of a class of BCH codes and show that (for an appropriate range of parameters) their minimum Lee distance is at least twice their designed Hamming distance. We then make use of properties of these codes to devise an efficient algebraic decoding algorithm that successfully decodes within the above lower bound of the Lee error-correction capability. Finally, we propose an attractive design for the corresponding VLSI architecture that is only mildly more complex than popular decoder architectures under the Hamming metric; since the proposed architecture can also be used for decoding under the Hamming metric without extra hardware, one can use the proposed architecture to decode under both distance metrics (Lee and Hamming). 相似文献

14.

Further Exploring the Strength of Prediction in the Factorization of Soft-Decision Reed–Solomon Decoding

Xinmiao Zhang 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(7):811-820

Reed-Solomon (RS) codes are among the most widely utilized error-correcting codes in digital communication and storage systems. Among the decoding algorithms of RS codes, the recently developed Koetter-Vardy (KV) soft-decision decoding algorithm can achieve substantial coding gain, while has a polynomial complexity. One of the major steps of the KV algorithm is the factorization. Each iteration of the factorization mainly consists of root computations over finite fields and polynomial updating. To speed up the factorization step, a fast factorization architecture has been proposed to circumvent the exhaustive-search-based root computation from the second iteration level by using a root-order prediction scheme. Based on this scheme, a partial parallel factorization architecture was proposed to combine the polynomial updating in adjacent iteration levels. However, in both of these architectures, the root computation in the first iteration level is still carried out by exhaustive search, which accounts for a significant part of the overall factorization latency. In this paper, a novel iterative prediction scheme is proposed for the root computation in the first iteration level. The proposed scheme can substantially reduce the latency of the factorization, while only incurs negligible area overhead. Applying this scheme to a (255, 239) RS code, speedups of 36% and 46% can be achieved over the fast factorization and partial parallel factorization architectures, respectively. 相似文献

15.

Low Complexity Decoder Architecture for Low-Density Parity-Check Codes

Daesun Oh Keshab K. Parhi 《Journal of Signal Processing Systems》2009,56(2-3):217-228

In this paper, we propose a low complexity decoder architecture for low-density parity-check (LDPC) codes using a variable quantization scheme as well as an efficient highly-parallel decoding scheme. In the sum-product algorithm for decoding LDPC codes, the finite precision implementations have an important tradeoff between decoding performance and hardware complexity caused by two dominant area-consuming factors: one is the memory for updated messages storage and the other is the look-up table (LUT) for implementation of the nonlinear function Ψ(x). The proposed variable quantization schemes offer a large reduction in the hardware complexities for LUT and memory. Also, an efficient highly-parallel decoder architecture for quasi-cyclic (QC) LDPC codes can be implemented with the reduced hardware complexity by using the partially block overlapped decoding scheme and the minimized power consumption by reducing the total number of memory accesses for updated messages. For (3, 6) QC LDPC codes, our proposed schemes in implementing the highly-parallel decoder architecture offer a great reduction of implementation area by 33% for memory area and approximately by 28% for the check node unit and variable node unit computation units without significant performance degradation. Also, the memory accesses are reduced by 20%. 相似文献

16.

Area-efficient reed-solomon decoder design for optical communications

Bo Yuan Zhongfeng Wang Li Li Minglun Gao Jin Sha Chuan Zhang 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2009,56(6):469-473

A high-speed low-complexity Reed-Solomon (RS) decoder architecture based on the recursive degree computationless modified Euclidean (rDCME) algorithm is presented in this brief. The proposed architecture has very low hardware complexity compared with the conventional modified Euclidean and degree computationless modified Euclidean (DCME) architectures, since it can reduce the degree computation circuitry and replace the conventional systolic architecture that uses many processing elements (PEs) with a recursive architecture using a single PE. A high-throughput data rate is also facilitated by employing a pipelining technique. The proposed rDCME architecture has been designed and implemented using SMIC 0.18-mum CMOS technology. Synthesized results show that the proposed RS (255, 239) decoder requires only about 18 K gates and can operate at 640 MHz to achieve a throughput of 5.1 Gb/s, which meets the requirement of modern high-speed optical communications. 相似文献

17.

Efficient Error Control Decoder Architectures for Noncoherent Random Linear Network Coding

Jun Lin Hongmei Xie Zhiyuan Yan 《Journal of Signal Processing Systems》2014,76(2):195-209

Random linear network coding is an efficient technique for disseminating information in networks, but it is highly susceptible to errors. Kötter-Kschischang (KK) codes and Mahdavifar-Vardy (MV) codes are two important families of subspace codes that provide error control in noncoherent random linear network coding. List decoding has been used to decode MV codes beyond half distance. Existing hardware implementations of the rank metric decoder for KK codes suffer from limited throughput, long latency and high area complexity. The interpolation-based list decoding algorithm for MV codes still has high computational complexity, and its feasibility for hardware implementations has not been investigated. In this paper we propose efficient decoder architectures for both KK and MV codes and present their hardware implementations. Two serial architectures are proposed for KK and MV codes, respectively. An unfolded decoder architecture, which offers high throughput, is also proposed for KK codes. The synthesis results show that the proposed architectures for KK codes are much more efficient than rank metric decoder architectures, and demonstrate that the proposed decoder architecture for MV codes is affordable. 相似文献

18.

High performance, high throughput turbo/SOVA decoder design

Zhongfeng Wang Parhi K.K. 《Communications, IEEE Transactions on》2003,51(4):570-579

Two efficient approaches are proposed to improve the performance of soft-output Viterbi (1998) algorithm (SOVA)-based turbo decoders. In the first approach, an easily obtainable variable and a simple mapping function are used to compute a target scaling factor to normalize the extrinsic information output from turbo decoders. An extra coding gain of 0.5 dB can be obtained with additive white Gaussian noise channels. This approach does not introduce extra latency and the hardware overhead is negligible. In the second approach, an adaptive upper bound based on the channel reliability is set for computing the metric difference between competing paths. By combining the two approaches, we show that the new SOVA-based turbo decoders can approach maximum a posteriori probability (MAP)-based turbo decoders within 0.1 dB when the target bit-error rate (BER) is moderately low (e.g., BER<10/sup -4/ for 1/2 rate codes). Following this, practical implementation issues are discussed and finite precision simulation results are provided. An area-efficient parallel decoding architecture is presented in this paper as an effective approach to design high-throughput turbo/SOVA decoders. With the efficient parallel architecture, multiple times throughput of a conventional serial decoder can be obtained by increasing the overall hardware by a small percentage. To resolve the problem of multiple memory accesses per cycle for the efficient parallel architecture, a novel two-level hierarchical interleaver architecture is proposed. Simulation results show that the proposed interleaver architecture performs as well as random interleavers, while requiring much less storage of random patterns. 相似文献

19.

A VLSI Architecture and Algorithm for Lucas–Kanade-Based Optical Flow Computation

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2010,18(1):29-38

Optical flow computation in vision-based systems demands substantial computational power and storage area. Hence, to enable real-time processing at high resolution, the design of application-specific system for optic flow becomes essential. In this paper, we propose an efficient VLSI architecture for the accurate computation of the Lucas–Kanade (L-K)-based optical flow. The L-K algorithm is first converted to a scaled fixed-point version, with optimal bit widths, for improving the feasibility of high-speed hardware implementation without much loss in accuracy. The algorithm is mapped onto an efficient VLSI architecture and the data flow exploits the principles of pipelining and parallelism. The optical flow estimation involves several tasks such as Gaussian smoothing, gradient computation, least square matrix calculation, and velocity estimation, which are processed in a pipelined fashion. The proposed architecture was simulated and verified by synthesizing onto a Xilinx Field Programmable Gate Array, which utilize less than 40% of system resources while operating at a frequency of 55 MHz. Experimental results on benchmark sequences indicate 42% improvement in accuracy and a speed up of five times, compared to a recent hardware implementation of the L-K algorithm. 相似文献

20.

A Flexible LDPC/Turbo Decoder Architecture

Yang Sun Joseph R. Cavallaro 《Journal of Signal Processing Systems》2011,64(1):1-16

Low-density parity-check (LDPC) codes and convolutional Turbo codes are two of the most powerful error correcting codes that are widely used in modern communication systems. In a multi-mode baseband receiver, both LDPC and Turbo decoders may be required. However, the different decoding approaches for LDPC and Turbo codes usually lead to different hardware architectures. In this paper we propose a unified message passing algorithm for LDPC and Turbo codes and introduce a flexible soft-input soft-output (SISO) module to handle LDPC/Turbo decoding. We employ the trellis-based maximum a posteriori (MAP) algorithm as a bridge between LDPC and Turbo codes decoding. We view the LDPC code as a concatenation of n super-codes where each super-code has a simpler trellis structure so that the MAP algorithm can be easily applied to it. We propose a flexible functional unit (FFU) for MAP processing of LDPC and Turbo codes with a low hardware overhead (about 15% area and timing overhead). Based on the FFU, we propose an area-efficient flexible SISO decoder architecture to support LDPC/Turbo codes decoding. Multiple such SISO modules can be embedded into a parallel decoder for higher decoding throughput. As a case study, a flexible LDPC/Turbo decoder has been synthesized on a TSMC 90 nm CMOS technology with a core area of 3.2 mm². The decoder can support IEEE 802.16e LDPC codes, IEEE 802.11n LDPC codes, and 3GPP LTE Turbo codes. Running at 500 MHz clock frequency, the decoder can sustain up to 600 Mbps LDPC decoding or 450 Mbps Turbo decoding. 相似文献