期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel high-throughput limited search trellis decoder VLSI design

Fei Sun Tong Zhang 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(9):1013-1022

Limited search trellis decoding algorithms have great potentials of realizing low power due to their largely reduced computational complexity compared with the widely used Viterbi algorithm. However, because of the lack of operational parallelism and regularity in their original formulations, the limited search decoding algorithms have been traditionally ruled out for applications demanding very high throughput. We believe that, through appropriate algorithm and hardware architecture co-design, certain limited search trellis decoding algorithms can become serious competitors to the Viterbi algorithm for high-throughout applications. Focusing on the well-known T-algorithm, this paper presents techniques at the algorithm and VLSI architecture levels to design fully parallel T-algorithm limited search trellis decoders. We first develop a modified T-algorithm, called SPEC-T, to improve the algorithmic parallelism. Then, based on the conventional state-parallel register exchange Viterbi decoder, we develop a parallel SPEC-T decoder architecture that can effectively transform the reduced computational complexity at the algorithm level to the reduced switching activities in the hardware. We demonstrate the effectiveness of the SPEC-T design solution in the context of convolutional code decoding. Compared with state-parallel register exchange Viterbi decoders, the SPEC-T convolutional code decoders can achieve almost the same throughput and decoding performance, while realizing up to 56% power savings. For the first time, this work provides an approach to exploit the low power potential of the T-algorithm in very high throughput applications. 相似文献

2.

A 100-Mb/s 2.8-V CMOS current-mode analog Viterbi decoder

Demosthenous A. Taylor J. 《Solid-State Circuits, IEEE Journal of》2002,37(7):904-910

This paper describes a 4-state rate-1/2 analog convolutional decoder fabricated in 0.8-μm CMOS technology. Although analog implementations have been described in the literature, this decoder is the first to be reported realizing the add-compare-select section entirely with current-mode analog circuits. It operates at data rates up to 115 Mb/s (channel rate 230 Mb/s) and consumes 39 mW at that rate from a single 2.8-V power supply. At a rate of 100 Mb/s, the power consumption per trellis state is about 1/3 that of a comparable digital system. In addition, at 50 Mb/s (the only rate at which comparative data were available), the power consumption per trellis state is similarly about 1/3 that of the best competing analog realization (i.e., excluding, for example, PR4 detectors which use a simplified form of the Viterbi algorithm). The chip contains 3.7 K transistors of which less than 1 K are used in the analog part of the decoder. The die has a core area of 1 mm², of which about 1/3 contains the analog section. The measured performance is close to that of an ideal Viterbi decoder with infinite quantization. In addition, a technique is described which extends the application of the circuits to decoders with a larger number of states. A typical example is a 64-state decoder for use in high-speed satellite communications 相似文献

3.

数字流水Viterbi译码器的VLSI设计

冯昭志黄载禄《通信学报》1995,16(3):50-55

本文介绍了高速数字流水Ｖｉｔｅｒｂｉ译码器的ＶＬＳＩ设计。在符号４值系统的基础上，给出Ｖｉｔｅｒｂｉ算法的新的功能分解公式，并介绍了用于译码器实现的两个重要的快速运算部件ＡＤＤ和ＭＡＸ的原理及其现场可编程（序）门阵列（ＦＰＧＡ）实现。文中详细讨论了译码器的ＶＬＳＩ结构、设计和性能分析。本文给出的Ｖｉｔｅｒｂｉ译码器可塑性强，并具有高度的并行性和很高的数据吞吐率。相似文献

4.

Implementation of scalable power and area efficient high-throughputViterbi decoders

Gemmeke T. Gansen M. Noll T.G. 《Solid-State Circuits, IEEE Journal of》2002,37(7):941-948

Today's data reconstruction in digital communication systems requires designs of highest throughput rate at low power. The Viterbi algorithm is a key element in such digital signal processing applications. The nonlinear and recursive nature of the Viterbi decoder makes its high-speed implementation challenging. Several promising approaches to achieve either high throughput or low power have been proposed in the past. A combination of these is developed in this paper. Additional new concepts allow building a signal-flow graph suitable for the design of high-speed Viterbi decoders with low power. Using a flexible datapath generator facilitates the essential quantitative optimization from architectural down to physical level to fully exploit the low-power and high-speed potential of a given technology. With parameterizable design entry, this datapath generator establishes the basis of a scalable platform-based design library. Altogether, this allows coverage of the range of today's industrial interest in high throughput rates, from 150 Msymbols/s up to 1.2 Gsymbols/s using conventional CMOS logic. The features of two exemplary Viterbi decoder implementations prove the benefit of this physically oriented design methodology in terms of speed and low power, when compared to other leading edge implementations 相似文献

5.

High-performance Viterbi decoder with circularly connected 2-D CNN unilateral cell array

Hyongsuk Kim Son H. Roska T. Chua L.O. 《IEEE transactions on circuits and systems. I, Regular papers》2005,52(10):2208-2218

A very-high-performance Viterbi decoder with a circularly connected two-dimensional analog cellular neural network (CNN) cell array is disclosed. In the proposed Viterbi decoder, the CNN cells with nonlinear unilateral connections are implemented with electronic circuits at nodes on a trellis diagram. The circuits are circularly connected, forming a cylindrical shape so that the cells of the last stage are connected to those of the first stage. Unilateral connections guide the information to flow circularly around the cylindrical surface. Such configuration enables the conceptually infinite length of the trellis diagram to be reduced to a circuit of limited size. The analog circuits does not require any analog-digital converters, which is the major cause of high power consumption and the quantization error. With the parallel analog processing structure, its decoding speed becomes very high. Also, the decoding mechanism using triggering wave of the CNN circuit does not require the path memory. Circuits for the proposed structure have been designed with HSPICE. Features of the proposed Viterbi decoder are compared with those of the conventional digital Viterbi decoder. 相似文献

6.

Low-latency architectures for high-throughput rate Viterbi decoders

Jun Jin Kong Parhi K.K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(6):642-651

In this paper, a novel K-nested layered look-ahead method and its corresponding architecture, which combine K-trellis steps into one trellis step (where K is the encoder constraint length), are proposed for implementing low-latency high-throughput rate Viterbi decoders. The proposed method guarantees parallel paths between any two-trellis states in the look-ahead trellises and distributes the add-compare-select (ACS) computations to all trellis layers. It leads to regular and simple architecture for the Viterbi decoding algorithm. The look-ahead ACS computation latency of the proposed method increases logarithmically with respect to the look-ahead step (M) divided by the encoder constraint length (K) as opposed to linearly as in prior work. For a 4-state (i.e., K=3) convolutional code, the decoding latency of the Viterbi decoder using proposed method is reduced by 84%, at the expense of about 22% increase in hardware complexity, compared with conventional M-step look-ahead method with M=48 (where M is also the level of parallelism). The main advantage of our proposed design is that it has the least latency among all known look-ahead Viterbi decoders for a given level of parallelism. 相似文献

7.

一种高速Viterbi译码器的设计与实现 总被引：3，自引：0，他引：3

下载免费PDF全文

李刚黑勇乔树山仇玉林《电子器件》2007,30(5):1886-1889

Viterbi算法是卷积码的最优译码算法.设计并实现了一种高速(3,1,7)Viterbi译码器,该译码器由分支度量单元(BMU)、加比选单元(ACSU)、幸存路径存储单元(SMU)、控制单元(CU)组成.在StratixⅡ FPGA上实现、验证了该Viterbi译码器.验证结果表明,该译码器数据吞吐率达到231Mbit/s,在加性高斯白噪声(AWGN)信道下的误码率十分接近理论仿真值.与同类型Viterbi译码器比较,该译码器具有高速、硬件实现代价低的特点. 相似文献

8.

Design Methodology for a DVB Satellite Receiver ASIC

Martin Vaupel Uwe Lambrette Herbert Dawid Olaf Joeressen Stefan Bitterlich Heinrich Meyr Focko Frieling Karsten Müller Götz Kluge 《Design Automation for Embedded Systems》1998,3(4):255-290

This contribution describes design methodology and implementation of a single-chip timing and carrier synchronizer and channel decoder for digital video broadcasting over satellite (DVB-S). The device consists of an A /D converter with AGC, timing and carrier synchronizer with matched filter, Viterbi decoder including node synchronization, byte and frame synchronizer, convolutional de-interleaver, Reed Solomon decoder, and a descrambler.The system was designed in accordance with the DVB specifications. It is able to perform Viterbi decoding at data rates up to 56 Mbit /s and to sample the analog input values with up to 88 MHz. The chip allows automatic acquisition of the convolutional code rate and the position of the puncturing mask. The symbol synchronization is performed fully digitally by means of interpolation and controlled decimation. Hence, no external analog clock recovery circuit is needed.For algorithm design, system performance evaluation, co-verification of the building blocks, and functional hardware verification an advanced design methodology and the corresponding tool framework are presented which guarantee both short design time and highly reliable results. The chip has been fabricated in a 0.5 µm CMOS technology with three metal layers. A die photograph is included. 相似文献

9.

Turbo Decoder Using Contention-Free Interleaver and Parallel Architecture 总被引：1，自引：0，他引：1

Wong C.-C. Lai M.-W. Lin C.-C. Chang H.-C. Lee C.-Y. 《Solid-State Circuits, IEEE Journal of》2010,45(2):422-432

This paper introduces a turbo decoder that utilizes multiple soft-in/soft-out (SISO) decoders to decode one codeword. In addition, each SISO decoder is modified to allow simultaneous execution over multiple successive trellis stages. The design issues related to the architecture with parallel high-radix SISO decoders are discussed. First, a contention-free interleaver for the hybrid parallelism is presented to overcome the complicated collision problem as well as reduce interconnection network complexity. Second, two techniques for the high-speed add-compare-select (ACS) circuits are given to lessen area overhead of the SISO decoder. Third, a modification of the processing schedule is made for higher operating efficiency. Two designs with parallel architecture have been implemented. The first design with 32 SISO decoders, each of which processes 2 symbols per cycle, has 160 Mb/s and 0.22 nJ/b/iter after measurement. The second design uses 16 SISO decoders to deal with 4 symbols per cycle and achieves 100% efficiency, leading to 1000 Mb/s and 0.15 nJ/b/iter in post-layout simulation. 相似文献

10.

CMOS analog MAP decoder for (8,4) Hamming code

Winstead C. Jie Dai Shuhuan Yu Myers C. Harrison R.R. Schlegel C. 《Solid-State Circuits, IEEE Journal of》2004,39(1):122-131

Design and test results for a fully integrated translinear tail-biting MAP error-control decoder are presented. Decoder designs have been reported for various applications which make use of analog computation, mostly for Viterbi-style decoders. MAP decoders are more complex, and are necessary components of powerful iterative decoding systems such as turbo codes. Analog circuits may require less area and power than digital implementations in high-speed iterative applications. Our (8, 4) Hamming decoder, implemented in an AMI 0.5-/spl mu/m process, is the first functioning CMOS analog MAP decoder. While designed to operate in subthreshold, the decoder also functions above threshold with a small performance penalty. The chip has been tested at bit rates up to 2 Mb/s, and simulations indicate a top speed of about 10 Mb/s in strong inversion. The decoder circuit size is 0.82 mm/sup 2/, and typical power consumption is 1 mW at 1 Mb/s. 相似文献

11.

General approach to implementing analogue Viterbi decoders

Shakiba M.H. Johns D.A. Martin K.W. 《Electronics letters》1994,30(22):1823-1824

Commercial realisations of analogue Viterbi decoder are available for magnetic read channels. However, these implementations are restricted to a particular coding scheme. The authors present a general approach to the implementation of analogue Viterbi decoders having arbitrary coding schemes. The proposed method exploits the ability of simple analogue circuits to perform the required mathematical functions. Simulation results are presented for a 0.8 μm BiCMOS process 相似文献

12.

A 0.18-$muhbox m$CMOS Analog Min-Sum Iterative Decoder for a (32,8) Low-Density Parity-Check (LDPC) Code

Hemati S. Banihashemi A.H. Plett C. 《Solid-State Circuits, IEEE Journal of》2006,41(11):2531-2540

Current-mode circuits are presented for implementing analog min-sum (MS) iterative decoders. These decoders are used to efficiently decode the best known error correcting codes such as low-density parity-check (LDPC) codes and turbo codes. The proposed circuits are devised based on current mirrors, and thus, in any fabrication technology that accurate current mirrors can be designed, analog MS decoders can be implemented. The functionality of the proposed circuits is verified by implementing an analog MS decoder for a (32,8) LDPC code in a 0.18-mum CMOS technology. This decoder is the first reported analog MS decoder. For low signal to noise ratios where the circuit imperfections are dominated by the noise of the channel, the measured error correcting performance of this chip in steady-state condition surpasses that of the conventional floating-point discrete-time synchronous MS decoder. When data throughput is 6 Mb/s, loss in the coding gain compared to the conventional MS decoder at BER of 10^-3 is about 0.3 dB and power consumption is about 5 mW. This is the first time that an analog decoder has been successfully tested for an LDPC code, though a short one 相似文献

13.

一种维特比译码器的矩阵实现方案

彭万权伍小兵张承畅张丽《电路与系统学报》2012,17(3):115-120

本文针对(2,1,l)卷积码提出一种维特比矩阵译码算法,通过引入整形、合并和动态选择等辅助模块,实现了所有环节的矩阵处理,构建出具有单一结构的并行译码器。由于只需要更改一部分模块的内部参数便可获得不同卷积码译码器,因此非常有利于分析和设计。仿真实验表明,在运算量更少的情况下,矩阵译码器可以取得接近最优的译码性能。相似文献

14.

Design of a 20-mb/s 256-state Viterbi decoder 总被引：1，自引：0，他引：1

Xun Liu Papaefthymiou M.C. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(6):965-975

The design of high-throughput large-state Viterbi decoders relies on the use of multiple arithmetic units. The global communication channels among these parallel processors often consist of long interconnect wires, resulting in large area and high power consumption. In this paper, we propose a data transfer oriented design methodology to implement a low-power 256-state rate-1/3 Viterbi decoder. Our architectural level scheme uses operation partitioning, packing, and scheduling to analyze and optimize interconnect effects in early design stages. In comparison with other published Viterbi decoders, our approach reduces the global data transfers by up to 75% and decreases the amount of global buses by up to 48%, while enabling the use of deeply pipelined datapaths with no data forwarding. In the register-transfer level (RTL) implementation, we apply precomputation in conjunction with saturation arithmetic to further reduce power dissipation with provably no coding performance degradation. Designed using a 0.25 /spl mu/m standard cell library, our decoder achieves a throughput of 20 Mb/s in simulation and dissipates only 0.45 W. 相似文献

15.

Semi-Iterative Analog Turbo Decoding

Arzel M. Lahuec C. Seguin F. Gnaedig D. Jezequel M. 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(6):1305-1316

Based on multiple-slice turbo codes, a novel semi-iterative analog turbo decoding algorithm and its corresponding decoder architecture are presented. This work paves the way for integrating flexible analog decoders dealing with frame lengths over thousands of bits. The algorithm benefits from a partially continuous exchange of extrinsic information to improve decoding speed and correction performance. The proposed algorithm and architecture are applied to design an analog decoder for double-binary codes. Taking full advantage of multiple slice codes, the on-chip area is shown to be reduced by ten when compared to a conventional fully parallelized analog slice turbo decoder. The reconfigurable analog core area for frames of 40 bits up to 2432 bits is 37 nm² in a 0.25-mum BiCMOS process. 相似文献

16.

Convergence Speed and Throughput of Analog Decoders

Hemati S. Banihashemi A. H. 《Communications, IEEE Transactions on》2007,55(5):833-836

This letter is concerned with the implementation of iterative decoding algorithms in analog integrated circuits. We study the convergence speed and the throughput of analog decoders for low-density parity-check codes, and show that they depend on the code, the decoding algorithm, the signal-to-noise ratio, and the average time constant of the analog circuit interconnections. However, they are not a function of the variance of the time constants. The analysis presented here can be used for selecting suitable codes and decoding algorithms for analog decoding. Furthermore, it can be used to estimate the throughput of an analog decoder, if the average time constant of the analog circuit is known 相似文献

17.

An improved pipelined MSB-first add-compare select unit structure for Viterbi decoders

Parhi K.K. 《IEEE transactions on circuits and systems. I, Regular papers》2004,51(3):504-511

Convolutional codes are widely used in many communication systems due to their excellent error-control performance. High-speed Viterbi decoders for convolutional codes are of great interest for high-data-rate applications. In this paper, an improved most-significant-bit (MSB) -first bit-level pipelined add-compare select (ACS) unit structure is proposed. The ACS unit is the main bottleneck on the decoding speed of a Viterbi decoder. By balancing the settling time of different paths in the ACS unit, the length of the critical path is reduced as close as possible to the iteration bound in the ACS unit. With the proposed retimed structure, it is possible to decrease the critical path of the ACS unit by 12% to 15% compared with the conventional MSB-first structures. This reduction in critical path can reduce the level of parallelism (and area) required for a very high-speed Viterbi decoder. 相似文献

18.

高速Viterbi译码器的FPGA实现

张健刘小林匡镜明王华《电讯技术》2006,46(3):37-41

提出了一种高速Viterbi译码器的FPGA实现方案。该译码器采用全并行结构的加比选模块和寄存器交换法以提高速度,并且利用大数判决准则和对译码器各个部分的优化设计,减少了硬件消耗。译码器的最高输出数据速率可以达到90Mbps。译码器的性能仿真和FDGA实现验证了该方案的可行性。相似文献

19.

An Efficient In-Place VLSI Architecture for Viterbi Algorithm

Yun-Nan Chang 《The Journal of VLSI Signal Processing》2003,33(3):317-324

This paper presents a novel design of Viterbi decoder based on in-place state metric update and hybrid survivor path management. By exploiting the in-place computation feature of the Viterbi algorithm, the proposed design methodology can result in high-speed and modular architectures suitable for those Viterbi applications with large constraint length. This feature is not only applied to the design of highly regular ACS units, but also exploited in the design of trace-back units for the first time. The proposed hybrid survivor path management based on the combination of register-exchange and trace-back schemes cannot only reduce the number of memory operations, but also the size of memory required. Compared with the general hybrid trace-back structure, the overhead of register-exchange circuit in our architecture is significantly less. Therefore, the proposed architecture can find promising applications in digital communication systems where high-speed large state Viterbi decoders are desirable. 相似文献

20.

Switching Activity Minimization in Iterative LDPC Decoders

Brendan Crowley Vincent Gaudet 《Journal of Signal Processing Systems》2012,68(1):63-73

LDPC codes can be designed to perform extremely close to the Shannon limit. Achieving such performance with high energy efficiency is now a main goal in the research community. This work combines knowledge of LDPC decoder message statistics, provided by density evolution, with knowledge of the physical implementation of decoders to predict switching activity in the decoder interconnect. In this work we provide results for the switching activity on the interconnect for fully parallel decoders. However, our model can be applied to partially parallel and serial implementations, and is not limited to interconnect. It is shown that switching activity can vary by as much as 300%, depending on several hardware design choices. Results of this work validate the usefulness of the presented model for providing designers with an understanding of how their decoder implementation choices affect power consumption for any size of LDPC code. This knowledge can be used for making design choices that minimize decoder power consumption very early in the hardware design process. 相似文献