期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Implementation of scalable power and area efficient high-throughputViterbi decoders

Gemmeke T. Gansen M. Noll T.G. 《Solid-State Circuits, IEEE Journal of》2002,37(7):941-948

Today's data reconstruction in digital communication systems requires designs of highest throughput rate at low power. The Viterbi algorithm is a key element in such digital signal processing applications. The nonlinear and recursive nature of the Viterbi decoder makes its high-speed implementation challenging. Several promising approaches to achieve either high throughput or low power have been proposed in the past. A combination of these is developed in this paper. Additional new concepts allow building a signal-flow graph suitable for the design of high-speed Viterbi decoders with low power. Using a flexible datapath generator facilitates the essential quantitative optimization from architectural down to physical level to fully exploit the low-power and high-speed potential of a given technology. With parameterizable design entry, this datapath generator establishes the basis of a scalable platform-based design library. Altogether, this allows coverage of the range of today's industrial interest in high throughput rates, from 150 Msymbols/s up to 1.2 Gsymbols/s using conventional CMOS logic. The features of two exemplary Viterbi decoder implementations prove the benefit of this physically oriented design methodology in terms of speed and low power, when compared to other leading edge implementations 相似文献

2.

Parallel high-throughput limited search trellis decoder VLSI design

Fei Sun Tong Zhang 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(9):1013-1022

Limited search trellis decoding algorithms have great potentials of realizing low power due to their largely reduced computational complexity compared with the widely used Viterbi algorithm. However, because of the lack of operational parallelism and regularity in their original formulations, the limited search decoding algorithms have been traditionally ruled out for applications demanding very high throughput. We believe that, through appropriate algorithm and hardware architecture co-design, certain limited search trellis decoding algorithms can become serious competitors to the Viterbi algorithm for high-throughout applications. Focusing on the well-known T-algorithm, this paper presents techniques at the algorithm and VLSI architecture levels to design fully parallel T-algorithm limited search trellis decoders. We first develop a modified T-algorithm, called SPEC-T, to improve the algorithmic parallelism. Then, based on the conventional state-parallel register exchange Viterbi decoder, we develop a parallel SPEC-T decoder architecture that can effectively transform the reduced computational complexity at the algorithm level to the reduced switching activities in the hardware. We demonstrate the effectiveness of the SPEC-T design solution in the context of convolutional code decoding. Compared with state-parallel register exchange Viterbi decoders, the SPEC-T convolutional code decoders can achieve almost the same throughput and decoding performance, while realizing up to 56% power savings. For the first time, this work provides an approach to exploit the low power potential of the T-algorithm in very high throughput applications. 相似文献

3.

Low-Power State-Parallel Relaxed Adaptive Viterbi Decoder

Sun F. Zhang T. 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(5):1060-1068

Although it possesses reduced computational complexity and great power saving potential, conventional adaptive Viterbi algorithm implementations contain a global best survivor path metric search operation that prevents it from being directly implemented in a high-throughput state-parallel decoder. This limitation also incurs power and silicon area overhead. This paper presents a modified adaptive Viterbi algorithm, referred to as the relaxed adaptive Viterbi algorithm, that completely eliminates the global best survivor path metric search operation. A state-parallel decoder VLSI architecture has been developed to implement the relaxed adaptive Viterbi algorithm. Using convolutional code decoding as a test vehicle, we demonstrate that state-parallel relaxed adaptive Viterbi decoders, versus Viterbi counterparts, can achieve significant power savings and modest silicon area reduction, while maintaining almost the same decoding performance and very high throughput 相似文献

4.

基于Radix-4实现高速Viterbi译码器设计

马力陈泳恩《通信技术》2002,(3):13-14

使用一种新的Viterbi译码器设计方法来达到高速率、低功耗设计。在传统Viterbi译码器中,ACS(add-compare-select)单元是基于radix-2网格设计的,而这里将介绍一种新的ACS设计方法,即基于radix-4网格的ACS单元设计。每个这样的ACS单元将有4路输入,即在每个时钟周期能够处理两级传统的基于radix-2设计的两级网格。同时在这里的Viterbi译码器设计中采用了Top-To-Down设计思想,用Verilog语言来描述RTL电路层。并用QuartusII软件进行电路仿真和综合。用本算法在33.333MHz时钟下实观在Altera公司的APEX20KFPGA的64状态Viterbi译码器译码速率可达8Mbps以上,且仅占用很小的硬件资源。采用此方法设计的高速Viterbi解码器SoftIPCore可应用于需要高速,低功耗译码的多媒体移动通讯上。相似文献

5.

An artificial neural net Viterbi decoder

Xiao-An Wang Wicker S.B. 《Communications, IEEE Transactions on》1996,44(2):165-171

The Viterbi algorithm is a maximum likelihood means for decoding convolutional codes and has thus played an important role in applications ranging from satellite communications to cellular telephony. In the past, Viterbi decoders have usually been implemented using digital circuits. The speed of these digital decoders is directly related to the amount of parallelism in the design. As the constraint length of the code increases, parallelism becomes problematic due to the complexity of the decoder. In this paper an artificial neural network (ANN) Viterbi decoder is presented. The ANN decoder is significantly faster than comparable digital-only designs due to its fully parallel architecture. The fully parallel structure is obtained by implementing most of the Viterbi algorithm using analog neurons as opposed to digital circuits. Several modifications to the ANN decoder are considered, including an analog/digital hybrid design that results in an extremely fast and efficient decoder. The ANN decoder requires one-sixth the number of transistors required by the digital decoder. The connection weights of the ANN decoder are either +1 or -1, so weight considerations in the implementation are eliminated. This, together with the design's modularity and local connectivity, makes the ANN Viterbi decoder a natural fit for VLSI implementation. Simulation results are provided to show that the performance of the ANN decoder matches that of an ideal Viterbi decoder 相似文献

6.

A 100-Mb/s 2.8-V CMOS current-mode analog Viterbi decoder

Demosthenous A. Taylor J. 《Solid-State Circuits, IEEE Journal of》2002,37(7):904-910

This paper describes a 4-state rate-1/2 analog convolutional decoder fabricated in 0.8-μm CMOS technology. Although analog implementations have been described in the literature, this decoder is the first to be reported realizing the add-compare-select section entirely with current-mode analog circuits. It operates at data rates up to 115 Mb/s (channel rate 230 Mb/s) and consumes 39 mW at that rate from a single 2.8-V power supply. At a rate of 100 Mb/s, the power consumption per trellis state is about 1/3 that of a comparable digital system. In addition, at 50 Mb/s (the only rate at which comparative data were available), the power consumption per trellis state is similarly about 1/3 that of the best competing analog realization (i.e., excluding, for example, PR4 detectors which use a simplified form of the Viterbi algorithm). The chip contains 3.7 K transistors of which less than 1 K are used in the analog part of the decoder. The die has a core area of 1 mm², of which about 1/3 contains the analog section. The measured performance is close to that of an ideal Viterbi decoder with infinite quantization. In addition, a technique is described which extends the application of the circuits to decoders with a larger number of states. A typical example is a 64-state decoder for use in high-speed satellite communications 相似文献

7.

Optimal Datapath Widths Within Turbo and Viterbi Decoders for High Area- and Energy-Efficiency

Martin Broich Tobias G. Noll 《Journal of Signal Processing Systems》2017,87(3):299-325

Datapath widths in state-of-the-art Turbo and Viterbi decoder implementations depend on estimated upper bounds of the differences of processed metrics. Aiming at highest area and energy efficiency, this paper presents guidelines for designing Turbo and Viterbi decoder datapaths with minimal widths. This is based on maximum absolute values of branch, state and path metric differences within theMax-Log-MAP respectively Viterbi decoding algorithm applying modulo normalization. The proposed methodology for determining the maximum absolute values covers punctured as well as n-input binary convolutional and Turbo codes so it accommodates higherradix add-compare-select operations. Maximum absolute values of metric differences and minimum datapath widths are presented for the 3GPP-LTE, DVB-RCS2 and IEEE-802.16 (WiMAX) compliant Turbo decoders and for the IEEE-802.11 (Wi-Fi), IEEE-802.16 (WiMAX) and 3GPP-LTE compliant Viterbi decoders. Besides, a new dynamic branch-metric saturation scheme is presented, which enables a further datapath width reduction within Turbo decoders. In total, a datapath width reduction of two bits (?20 %) is achieved applying radix-4 Max-Log-MAP arithmetic. An overall area-time-energy complexity reduction of 31% is achieved for a soft-input soft-output decoder and of 24% for the LTE Turbo decoder. 相似文献

8.

FPGA design and implementation of a low-power systolic array-based adaptive Viterbi decoder 总被引：1，自引：0，他引：1

Man Guo Ahmad M.O. Swamy M.N.S. Chunyan Wang 《IEEE transactions on circuits and systems. I, Regular papers》2005,52(2):350-365

In this paper, by modifying the well-known Viterbi algorithm, an adaptive Viterbi algorithm that is based on strongly connected trellis decoding is proposed. Using this algorithm, the design and a field-programmable gate array implementation of a low-power adaptive Viterbi decoder with a constraint length of 9 and a code rate of 1/2 is presented. In this design, a novel systolic array-based architecture with time multiplexing and arithmetic pipelining for implementing the proposed algorithm is used. It is shown that the proposed algorithm can reduce by up to 70% the average number of ACS computations over that by using the nonadaptive Viterbi algorithm, without degradation in the error performance. This results in lowering the switching activities of the logic cells, with a consequent reduction in the dynamic power. Further, it is shown that the total power consumption in the implementation of the proposed algorithm can be reduced by up to 43% compared to that in the implementation of the nonadaptive Viterbi algorithm, with a negligible increase in the hardware. 相似文献

9.

一种高速Viterbi译码器的设计与实现 总被引：3，自引：0，他引：3

下载免费PDF全文

李刚黑勇乔树山仇玉林《电子器件》2007,30(5):1886-1889

Viterbi算法是卷积码的最优译码算法.设计并实现了一种高速(3,1,7)Viterbi译码器,该译码器由分支度量单元(BMU)、加比选单元(ACSU)、幸存路径存储单元(SMU)、控制单元(CU)组成.在StratixⅡ FPGA上实现、验证了该Viterbi译码器.验证结果表明,该译码器数据吞吐率达到231Mbit/s,在加性高斯白噪声(AWGN)信道下的误码率十分接近理论仿真值.与同类型Viterbi译码器比较,该译码器具有高速、硬件实现代价低的特点. 相似文献

10.

Design and Optimization of an HSDPA Turbo Decoder ASIC

Benkeser C. Burg A. Cupaiuolo T. Qiuting Huang 《Solid-State Circuits, IEEE Journal of》2009,44(1):98-106

The turbo decoder is the most challenging component in a digital HSDPA receiver in terms of computation requirement and power consumption, where large block size and recursive algorithm prevent pipelining or parallelism to be effectively deployed. This paper addresses the complexity and power consumption issues at algorithmic, arithmetic and gate levels of ASIC design, in order to bring power consumption and die area of turbo decoders to a level commensurate with wireless application. Realized in 0.13 mum CMOS technology, the turbo decoder ASIC measures 1.2 mm² excluding pads, and can achieve 10.8 Mb/s throughput while consuming only 32 mW. 相似文献

11.

Low-latency architectures for high-throughput rate Viterbi decoders

Jun Jin Kong Parhi K.K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(6):642-651

In this paper, a novel K-nested layered look-ahead method and its corresponding architecture, which combine K-trellis steps into one trellis step (where K is the encoder constraint length), are proposed for implementing low-latency high-throughput rate Viterbi decoders. The proposed method guarantees parallel paths between any two-trellis states in the look-ahead trellises and distributes the add-compare-select (ACS) computations to all trellis layers. It leads to regular and simple architecture for the Viterbi decoding algorithm. The look-ahead ACS computation latency of the proposed method increases logarithmically with respect to the look-ahead step (M) divided by the encoder constraint length (K) as opposed to linearly as in prior work. For a 4-state (i.e., K=3) convolutional code, the decoding latency of the Viterbi decoder using proposed method is reduced by 84%, at the expense of about 22% increase in hardware complexity, compared with conventional M-step look-ahead method with M=48 (where M is also the level of parallelism). The main advantage of our proposed design is that it has the least latency among all known look-ahead Viterbi decoders for a given level of parallelism. 相似文献

12.

基于FPGA的高性能Viterbi译码器的设计与实现

沈南王华《中国有线电视》2006,(2):163-166

对Viterbi译码器3个重要组成部分之一——幸存路径管理和存储模块进行优化设计。采用一种新的方法（改进的寄存器交换法）作为幸存路径管理方案，取消了译码时的回溯读操作。与采用传统回溯法的译码器相比，该译码器具有较低的译码时延、有效的存储空间管理和较低的硬件复杂度。在总体设计中对译码器的其他部分也进行了相应的优化设计，进行了综合布线后仿真，译码器输出的最大数据速率达到了90Mbps。相似文献

13.

An Efficient In-Place VLSI Architecture for Viterbi Algorithm

Yun-Nan Chang 《The Journal of VLSI Signal Processing》2003,33(3):317-324

This paper presents a novel design of Viterbi decoder based on in-place state metric update and hybrid survivor path management. By exploiting the in-place computation feature of the Viterbi algorithm, the proposed design methodology can result in high-speed and modular architectures suitable for those Viterbi applications with large constraint length. This feature is not only applied to the design of highly regular ACS units, but also exploited in the design of trace-back units for the first time. The proposed hybrid survivor path management based on the combination of register-exchange and trace-back schemes cannot only reduce the number of memory operations, but also the size of memory required. Compared with the general hybrid trace-back structure, the overhead of register-exchange circuit in our architecture is significantly less. Therefore, the proposed architecture can find promising applications in digital communication systems where high-speed large state Viterbi decoders are desirable. 相似文献

14.

A 500-Mb/s soft-output Viterbi decoder

Engling Yeo Augsburger S.A. Davis W.R. Nikolic B. 《Solid-State Circuits, IEEE Journal of》2003,38(7):1234-1241

Two eight-state 7-bit soft-output Viterbi decoders matched to an EPR4 channel and a rate-8/9 convolutional code are implemented in a 0.18-/spl mu/m CMOS technology. The throughput of the decoders is increased through architectural transformation of the add-compare-select recursion, with a small area overhead. The survivor-path decoding logic of a conventional Viterbi decoder register exchange is adapted to detect the two most likely paths. The 4-mm/sup 2/ chip has been verified to decode at 500 Mb/s with 1.8-V supply. These decoders can be used as constituent decoders for Turbo codes in high-performance applications requiring information rates that are very close to the Shannon limit. 相似文献

15.

A Universal VLSI Architecture for Reed–Solomon Error-and-Erasure Decoders

Hsie-Chia Chang Chien-Ching Lin Fu-Ke Chang Chen-Yi Lee 《IEEE transactions on circuits and systems. I, Regular papers》2009,56(9):1960-1967

This paper presents a universal architecture for Reed-Solomon (RS) error-and-erasure decoder. In comparison with other reconfigurable RS decoders, our universal approach based on Montgomery multiplication algorithm can support not only arbitrary block length but various finite-field degree within different irreducible polynomials. Moreover, the decoder design also features the constant multipliers in the universal syndrome calculator and Chien search block, as well as an on-the-fly inversion table for calculating error or errata values. After implemented with 0.18-mum 1P6M technology, the proposed universal RS decoder correcting up to 16 errors can be measured to reach a maximum 1.28 Gb/s data rate at 160 MHz. The total gates count is around 46.4 K with 1.21 mm² silicon area, and the average core power consumption is 68.1 mW. 相似文献

16.

A reconfigurable, power-efficient adaptive Viterbi decoder

Tessier R. Swaminathan S. Ramaswamy R. Goeckel D. Burleson W. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(4):484-488

Error-correcting convolutional codes provide a proven mechanism to limit the effects of noise in digital data transmission. Although hardware implementations of decoding algorithms, such as the Viterbi algorithm, have shown good noise tolerance for error-correcting codes, these implementations require an exponential increase in very large scale integration area and power consumption to achieve increased decoding accuracy. To achieve reduced decoder power consumption, we have examined and implemented decoders based on the reduced-complexity adaptive Viterbi algorithm (AVA). Run-time dynamic reconfiguration is performed in response to varying communication channel-noise conditions to match minimized power consumption to required error-correction capabilities. Experimental calculations indicate that the use of dynamic reconfiguration leads to a 69% reduction in decoder power consumption over a nonreconfigurable field-programmable gate array implementation with no loss of decode accuracy. 相似文献

17.

Switching Activity Minimization in Iterative LDPC Decoders

Brendan Crowley Vincent Gaudet 《Journal of Signal Processing Systems》2012,68(1):63-73

LDPC codes can be designed to perform extremely close to the Shannon limit. Achieving such performance with high energy efficiency is now a main goal in the research community. This work combines knowledge of LDPC decoder message statistics, provided by density evolution, with knowledge of the physical implementation of decoders to predict switching activity in the decoder interconnect. In this work we provide results for the switching activity on the interconnect for fully parallel decoders. However, our model can be applied to partially parallel and serial implementations, and is not limited to interconnect. It is shown that switching activity can vary by as much as 300%, depending on several hardware design choices. Results of this work validate the usefulness of the presented model for providing designers with an understanding of how their decoder implementation choices affect power consumption for any size of LDPC code. This knowledge can be used for making design choices that minimize decoder power consumption very early in the hardware design process. 相似文献

18.

Design of Spherical Lattice Space–Time Codes

《IEEE transactions on information theory / Professional Technical Group on Information Theory》2008,54(11):4847-4865

In this paper, we propose a systematic procedure for designing spherical lattice (space–time) codes. By employing stochastic optimization techniques we design lattice codes which are well matched to the fading statistics as well as to the decoder used at the receiver. The decoders we consider here include the optimal albeit of highest decoding complexity maximum-likelihood (ML) decoder, the suboptimal lattice decoders, as well as the suboptimal lattice-reduction-aided (LRA) decoders having the lowest decoding complexity. For each decoder, our design methodology can be tailored to obtain low error-rate lattice codes for arbitrary fading statistics and signal-to-noise ratios (SNRs) of interest. Further, we obtain fundamental lower bounds on the error probabilities yielded by lattice and LRA decoders and characterize their asymptotic behavior. 相似文献

19.

基于递归神经网络的卷积码解码器的研究

林国华殷奎喜《现代电子技术》2007,30(1):61-62

详细介绍了一种基于递归神经网络(RNN)的1/n卷积码解码器的原理与实现。仿真结果表明RNN解码器与Vterbi解码器效果很接近,对一些特殊的卷积码,性能非常好。并通过模拟退火技术对此解码器的性能进行了改善。相似文献

20.

CELL/B.E.的高性能维特比译码

下载免费PDF全文

Lai Junjie Tang Jun Peng Yingning Chen Jianwen 《中国通信》2009,6(2):150-156

Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development of the multicore technology, multicore platforms become a reasonable choice for software radio （SR） systems. The Cell Broadband Engine processor is a state-of-art multi-core processor designed by Sony, Toshiba, and IBM. In this paper, we present a 64-state soft input Viterbi decoder for WiMAX SR Baseband system based on the Cell processor. With one Synergistic Processor Element （SPE） of a Cell Processor running at 3.2GHz, our Viterbi decoder can achieve the throughput up to 30Mb/s to decode the tail-biting convolutional code. The performance demonstrates that the proposed Viterbi decoding implementation is very efficient. Moreover, the Viterbi decoder can be easily integrated to the SR system and can provide a highly integrated SR solution. The optimization methodology in this module design can be extended to other modules on Cell platform. 相似文献