共查询到20条相似文献,搜索用时 15 毫秒
1.
K-best Schnorr-Euchner (KSE) decoding algorithm is proposed in this paper to approach near-maximum-likelihood (ML) performance for multiple-input-multiple-output (MIMO) detection. As a low complexity MIMO decoding algorithm, the KSE is shown to be suitable for very large scale integration (VLSI) implementations and be capable of supporting soft outputs. Modified KSE (MKSE) decoding algorithm is further proposed to improve the performance of the soft-output KSE with minor modifications. Moreover, a VLSI architecture is proposed for both algorithms. There are several low complexity and low-power features incorporated in the proposed algorithms and the VLSI architecture. The proposed hard-output KSE decoder and the soft-output MKSE decoder is implemented for 4/spl times/4 16-quadrature amplitude modulation (QAM) MIMO detection in a 0.35-/spl mu/m and a 0.13-/spl mu/m CMOS technology, respectively. The implemented hard-output KSE chip core is 5.76 mm/sup 2/ with 91 K gates. The KSE decoding throughput is up to 53.3 Mb/s with a core power consumption of 626 mW at 100 MHz clock frequency and 2.8 V supply. The implemented soft-output MKSE chip can achieve a decoding throughput of more than 100 Mb/s with a 0.56 mm/sup 2/ core area and 97 K gates. The implementation results show that it is feasible to achieve near-ML performance and high detection throughput for a 4/spl times/4 16-QAM MIMO system using the proposed algorithms and the VLSI architecture with reasonable complexity. 相似文献
2.
Chi-Fang Li Wern-Ho Sheen Chong-Ren Wang Yuan-Sun Chu 《Solid-State Circuits, IEEE Journal of》2003,38(4):677-682
This brief proposes a fast multispeed comma-free Reed-Solomon (CFRS) decoder for the frame synchronization and code-group identification in the cell search of the Third Generation Partnership Project wide-band code-division multiple access/frequency division duplexing (W-CDMA/FDD) system. A foldable systolic array is proposed to achieve fast decoding and provide flexible tradeoffs between power consumption, chip size, and decoding latency. Multispeed decoding, an idea that is useful for cell search in different application scenarios, can also be achieved with the same array architecture. The proposed CFRS decoder is implemented in a 3.3-V 0.35-/spl mu/m CMOS technology with 2.2 /spl times/ 2.2 mm/sup 2/ core area and power dissipation of 13.3 and 1.23 mW in high- and low-speed decoding modes, respectively. 相似文献
3.
Decoding the Golden Code: A VLSI Design 总被引:1,自引:0,他引:1
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(1):156-160
4.
Wojciech Sułek 《Circuits, Systems, and Signal Processing》2016,35(11):4060-4080
Most of the recently proposed hardware realizations for non-binary low-density parity-check decoders are ASIC oriented as they employ multiplierless computation units. In this article, we present a different decoder design approach that is specifically intended for an FPGA implementation. We reformulate the mixed-domain FFT-BP decoding algorithm and develop a decoder architecture that does not exclude the multiplication units. This allows mapping a part of the algorithm to the multiplier cores embedded in an FPGA, thus making use of all the types of FPGA resources. Then, the throughput limit achievable in a single FPGA by the proposed decoder is significantly increased. We also consider another important optimization of the decoder implementation, mainly an efficient realization of the permutation units and an approximated evaluation of the nonlinear functions of messages. Another motivation is to make the decoder easily scalable for FPGA devices of different sizes. To achieve this goal, the configurable semi-parallel decoder architecture is applied operating for the structured subclass of codes. 相似文献
5.
Heunchul Lee Jungho Cho Jong-kyu Kim Inkyu Lee 《Communications, IEEE Transactions on》2009,57(1):17-21
In this letter, we present a new maximum likelihood (ML) decoding algorithm for space time block codes (STBCs) that employ multidimensional constellations. We start with a lattice representation for STBCs which transforms complex channel models into real matrix equations. Based on the lattice representation, we propose a new decoding algorithm for quasiorthogonal STBCs (QO-STBC) which allows simpleML decoding with performance identical to the conventional ML decoder. Multidimensional rotated constellations are constructed for the QO-STBCs to achieve full diversity. As a consequence, for quasi-orthogonal designs with an arbitrary number of transmit antennas N (N ? 4), the proposed decoding scheme achieves full rate and full diversity while reducing the decoding complexity from ∂(McN/2) to ∂(McN/4) in a Mc-QAM constellation. 相似文献
6.
Bouguezel S. Ahmad M.O. Swamy M.N.S. 《IEEE transactions on circuits and systems. I, Regular papers》2006,53(2):306-315
In this paper, new three-dimensional (3-D) radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) and radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) decimation-in-frequency (DIF) fast Fourier transform (FFT) algorithms are developed and their implementation schemes discussed. The algorithms are developed by introducing the radix-2/4 and radix-2/8 approaches in the computation of the 3-D DFT using the Kronecker product and appropriate index mappings. The butterflies of the proposed algorithms are characterized by simple closed-form expressions facilitating easy software or hardware implementations of the algorithms. Comparisons between the proposed algorithms and the existing 3-D radix-(2/spl times/2/spl times/2) FFT algorithm are carried out showing that significant savings in terms of the number of arithmetic operations, data transfers, and twiddle factor evaluations or accesses to the lookup table can be achieved using the radix-(2/spl times/2/spl times/2)/(4/spl times/4/spl times/4) DIF FFT algorithm over the radix-(2/spl times/2/spl times/2) FFT algorithm. It is also established that further savings can be achieved by using the radix-(2/spl times/2/spl times/2)/(8/spl times/8/spl times/8) DIF FFT algorithm. 相似文献
7.
Presented in this paper is a pipelined 285-MHz maximum a posteriori probability (MAP) decoder IC. The 8.7-mm/sup 2/ IC is implemented in a 1.8-V 0.18-/spl mu/m CMOS technology and consumes 330 mW at maximum frequency. The MAP decoder chip features a block-interleaved pipelined architecture, which enables the pipelining of the add-compare-select kernels. Measured results indicate that a turbo decoder based on the presented MAP decoder core can achieve: 1) a decoding throughput of 27.6 Mb/s with an energy-efficiency of 2.36 nJ/b/iter; 2) the highest clock frequency compared to existing 0.18-/spl mu/m designs with the smallest area; and 3) comparable throughput with an area reduction of 3-4.3/spl times/ with reference to a look-ahead based high-speed design (Radix-4 design), and a parallel architecture. 相似文献
8.
Benkeser C. Burg A. Cupaiuolo T. Qiuting Huang 《Solid-State Circuits, IEEE Journal of》2009,44(1):98-106
The turbo decoder is the most challenging component in a digital HSDPA receiver in terms of computation requirement and power consumption, where large block size and recursive algorithm prevent pipelining or parallelism to be effectively deployed. This paper addresses the complexity and power consumption issues at algorithmic, arithmetic and gate levels of ASIC design, in order to bring power consumption and die area of turbo decoders to a level commensurate with wireless application. Realized in 0.13 mum CMOS technology, the turbo decoder ASIC measures 1.2 mm2 excluding pads, and can achieve 10.8 Mb/s throughput while consuming only 32 mW. 相似文献
9.
Chia-Hsiang Yang Markovic D. 《IEEE transactions on circuits and systems. I, Regular papers》2009,56(10):2301-2314
This paper presents the architecture and circuit design of a sphere decoder for agile multi-input multi-output (MIMO) communication systems. Algorithm and architecture co-design is used to reduce hardware complexity, which enables the proposed sphere decoder to support larger antenna-array sizes and higher order modulations. The proposed architecture is also capable of processing multiple frequency subcarriers for orthogonal frequency-division multiplexing (OFDM) based systems. A 20 times area reduction is achieved, even without interleaving of subcarriers compared to the direct-mapped architecture. The sphere decoder supports multiple configurations: antenna arrays from 2 times 2 to 16 times 16, constellation sizes from binary phase-shift keying (BPSK) to 64-QAM (quadrature-amplitude modulation), and 16-128 subcarriers. The peak estimated data rate exceeds 1.5 Gbits/s of ideal throughput in a 16-MHz bandwidth. The core area is estimated at 0.31 mm2 in a standard 90-nm CMOS technology. The estimated power consumption is 33 mW in the 16 times 16 64-QAM mode at 256 MHz from a 1-V supply voltage. 相似文献
10.
The Block Decoder (BD) which is an indispensable component of the JPEG 2000 image compression standard has the highest computational complexity and determines the speed of the overall decoder system. This paper proposes a high throughput pass parallel BD architecture, which can decode more than one bit per clock cycle. In BD, the dependency between context generation and arithmetic decoding unit incorporates stalling and reduces the throughput of the decoding process. The proposed selective byte input and synchronous sample skipping techniques are used to prevent stalling in the decoding process. The proposed architecture achieves 86% more throughput with 50% increment in the hardware cost than that of the best available serial BD architecture. In comparison with the best available pass parallel architecture, throughput improves almost 8.2 times with 61% increment in the hardware cost. Incorporation of the speed up techniques in the design is the main reason for more hardware consumption. The Figure of Merit of the proposed design, which is the ratio of throughput and hardware cost, is more than that of the available BD architectures for typical code block (CB) size of 32 × 32. The ASIC implementation of the proposed design consumes 66 mW power at maximum operating frequency. 相似文献
11.
为克服多元LDPC码的扩展最小和(Extended Min-Sum, EMS)译码算法中对数似然比(Log Likelihood Ratio, LLR)生成及排序复杂度过高的问题,该文针对以BPSK为调制方式的编码调制系统,提出一种快速而简单的LLR生成算法。该算法采用一种低复杂度的迭代计算方式,可快速生成并排序LLR,适用于硬件实现的流水线结构,能够加速译码器的译码速度并提高译码器吞吐量。仿真结果表明:所提出算法对译码性能基本没有影响且极大降低LLR计算的复杂度,是一种适用于高速多元LDPC译码器前端实现的候选算法。 相似文献
12.
This paper presents the design of space–time block codes (STBCs) over maximum rank distance (MRD) codes, energy‐efficient STBCs, STBCs using interleaved‐MRD codes, the use of Gaussian integers for STBCs modulation, and Gabidulin's decoding algorithm for decoding STBCs. The design fundamentals of STBCs using MRD codes are firstly put forward for different number of transmit antennas. Extension finite fields (Galois fields) are used to design these linear block codes. Afterward, a comparative study of MRD‐based STBCs with corresponding orthogonal and quasi‐orthogonal codes is also included in the paper. The simulation results show that rank codes, for any number of transmit antennas, exhibit diversity gain at full rate contrary to orthogonal codes, which give diversity gain at full rate only for two transmit antennas case. Secondly, an energy‐efficient MRD‐STBC is proposed, which outperforms orthogonal STBC at least for 2 × 1 antenna system. Thirdly, interleaved‐MRD codes are used to construct higher‐order transmit antenna systems. Using interleaved‐MRD codes further reduces the complexity (compared with normal MRD codes) of the decoding algorithm. Fourthly, the use of Gaussian integers is utilized in mapping MRD‐based STBCs to complex constellations. Furthermore, it is described how an efficient and computationally less complex Gabidulin's decoding algorithm can be exploited for decoding complex MRD‐STBCs. The decoding results have been compared against hard‐decision maximum likelihood decoding. Under this decoding scheme, MRD‐STBCs have been shown to be potential candidate for higher transmit antenna systems as the decoding complexity of Gabidulin's algorithm is far less, and its performance for decoding MRD‐STBCs is somewhat reasonable. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
13.
G. Caire J. Ventura-Traveset M. Hollreiser E. Biglieri 《The Journal of VLSI Signal Processing》1995,10(2):153-168
We describe the systolic-array implementation of a block-oriented algorithm known as staged decoding, applicable to a class
of signal-space codes and lattices obtained through “generalized concatenation”. By exploiting the trellis representation
of block codes and the algebraic formulation of Viterbi Algorithm, we derive a very efficient symbol-level pipelined architecture
of the staged processor. In order to show the range of applicability of our architecture, we consider the implementation of
a staged decoder for the 8-PSK block-coded modulation (BCM) scheme with block length 8 and rate 1 bit/dimension. We obtain
a decoding rate of more than 700 Mbit/s with an associated hardware complexity of less than 30 Kgates with 1μ CMOS technology.
A preliminary, shorter version of this paper was presented at ICC'93, Geneva, May 1993. 相似文献
14.
Dinh A. Xiao Hu 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(6):745-750
This brief presents a new technique in implementing a very large-scale integration trellis code modulation (TCM) decoder. The technique aims to reduce hardware complexity and increase decoding throughput. The technique is introduced in the design of a Viterbi decoder. To simplify the decoding algorithm and calculation, branch cost distances are pre-calculated and stored in a distance look-up table (DLUT). The concept of DLUT significantly reduces hardware requirements as this table eliminates the need for calculation circuitry. In addition, an output LUT (OLUT) is constructed based on the trellis diagram of the code. This table generates the decoding output using information provided by the algorithm. The use of this OLUT reduces the amount of storage requirement. The technique was used to design a 16-state, radix-4 codec for two-dimensional and four-dimensional TCM. The decoder was implemented in hardware after functional simulation. The tested ASIC has a core area of 1.1 mm/sup 2/ in 0.18-/spl mu/m CMOS. A decoding speed of 1 Gbps was achieved. Implementation results have shown that LUTs can be used to decrease hardware requirement and increase decoding speed. 相似文献
15.
Chien-Ching Lin Shih Y.-H. Hsie-Chia Chang Chen-Yi Lee 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(4):426-430
This paper presents a channel decoder that completes both turbo and Viterbi decodings, which are pervasive in many wireless communication systems, especially those that require very low signal-to-noise ratios. The trellis decoding algorithm merges them with less redundancy. However, the implementation is still challenging due to the power consumption in wearable devices. This research investigates an optimized memory scheme and rescheduled data flow to reduce power consumption and chip area. The memory access is reduced by buffering the input symbols, and the area is reduced by reducing the embedded interleaver memory. A test chip is fabricated in a 1.8 V 0.18-/spl mu/m standard CMOS technology and verified to provide 4.25-Mb/s turbo decoding and 5.26-Mb/s Viterbi decoding. The measured power dissipation is 83 mW, while decoding a 3.1 Mb/s turbo encoded data stream with six iterations for each block. The power consumption in Viterbi decoding is 25.1 mW in the 1-Mb/s data rate. The measurement shows the power dissipation is 83 mW for the turbo decoding with six iterations at 3.1 Mb/s, and 25.1 mW for the Viterbi decoding at 1 Mb/s. 相似文献
16.
17.
Bickerstaff M.A. Garrett D. Prokop T. Thomas C. Widdup B. Gongyu Zhou Davis L.M. Woodward G. Nicol C. Ran-Hong Yan 《Solid-State Circuits, IEEE Journal of》2002,37(11):1555-1564
A channel decoder chip compliant with the 3GPP mobile wireless standard is described. It supports both data and voice calls simultaneously in a unified turbo/Viterbi decoder architecture. For voice services, the decoder can process over 128 voice channels encoded with rate 1/2 or 1/3, constraint length 9 convolutional codes. For data services, the turbo decoder is capable of processing any mix of rate 1/3, constraint length 4 turbo encoded data streams with an aggregate data rate of up to 2.5 Mb/s with 10 iterations per block (or 4.1 Mb/s with six iterations). The turbo decoder uses the logMAP algorithm with a programmable logsum correction table. It features an interleaver address processor that computes the 3GPP interleaver addresses for all block sizes enabling it to quickly switch context to support different data services for several users. The decoder also contains the 3GPP first channel de-interleaving function and a post-decoder bit error rate estimation unit. The chip is fabricated in a 0.18-/spl mu/m six-layer metal CMOS technology, has an active area of 9 mm/sup 2/, and has a peak clock frequency of 110.8 MHz at 1.8 V (nominal). The power consumption is 306 mW when turbo decoding a 2-Mb/s data stream with ten iterations per block and eight voice calls simultaneously. 相似文献
18.
New degree computationless modified euclid algorithm and architecture for Reed-Solomon decoder 总被引:3,自引:0,他引:3
Baek J.H. Sunwoo M.H. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(8):915-920
This paper proposes a new degree computationless modified Euclid (DCME) algorithm and its dedicated architecture for Reed-Solomon (RS) decoder. This architecture has low hardware complexity compared with conventional modified Euclid (ME) architectures, since it can completely remove the degree computation and comparison circuits. The architecture employing a systolic array requires only the latency of 2t clock cycles to solve the key equation without initial latency. In addition, the DCME architecture using 3t+2 basic cells has regularity and scalability since it uses only one processing element. Hence, the proposed DCME architecture provides the short latency and low-cost RS decoding. The DCME architecture has been synthesized using the 0.25-/spl mu/m Faraday CMOS standard cell library and operates at 200 MHz. The gate count of the DCME architecture is 21 760. Hence, the RS decoder using the proposed DCME architecture can reduce the total gate count by at least 23% and the total latency to at least 10% compared with conventional ME decoders. 相似文献
19.
Chi-Fang Li Yuan-Sun Chu Wern-Ho Sheen Fu-Chin Tian Ho J.-S. 《Solid-State Circuits, IEEE Journal of》2004,39(5):852-857
This paper presents a low-power ASIC design for cell search in the wideband code-division multiple-access (W-CDMA) system. A low-complexity algorithm that is able to work satisfactorily under the effect of large frequency and clock errors is designed first. Then, a set of low-power measures are employed in the design of hardware architecture and circuits. Finally, through power analysis, critical blocks are identified and redesigned so as to further reduce the power consumption. The final design shows that the power is reduced by 51% from the original design of 133.6 mW to 65.49 mW, and its core area is also reduced by 31.9% from 3.4/spl times/3.4 mm/sup 2/ to 2.8/spl times/2.8 mm/sup 2/. The design is implemented and verified in a 3.3-V 0.35-/spl mu/m CMOS technology with clock rate 15.36 MHz. 相似文献
20.
In this paper, an architecture for real-time digital HDTV video decoding is presented. Our architecture is based on a dual decoding datapath controlled in a fixed schedule with an efficient write-back scheme for anchor pictures. The decoding datapath is synchronized at the block (8 × 8 pixels) level. Unlike other decoding approaches such as the slice bar decoding method and the cross-divide method, our scheme reduces memory access contention problem to achieve real-time HDTV decoding without a high cost in overall decoder buffers, architecture, and bus. In comparison to data-flow approaches, our method eliminates the complexity associated with tagged data operations. Our anchor picture storage is organized to minimize page-breaks during memory accesses. Simulation shows that with a relatively low rate 81 MHz clock, our decoder can decode MPEG-2 MP@HL HDTV in real-time, based on an ATSC video format of 1,920 × 1,080 pixels/frame at 30 frames/s, at a bit rate of 18 to 20 Mbps. 相似文献