首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A very-large-scale integration architecture for Reed-Solomon (RS) decoding is presented that is scalable with respect to the throughput rate. This architecture enables given system specifications to be matched efficiently independent of a particular technology. The scalability is achieved by applying a systematic time-sharing technique. Based on this technique, new regular, multiplexed architectures have been derived for solving the key equation and performing finite field divisions. In addition to the flexibility, this approach leads to a small silicon area in comparison with several decoder implementations published in the past. The efficiency of the proposed architecture results from a fine granular pipeline scheme throughout each of the RS decoder components and a small overhead for the control circuitry. Implementation examples show that due to the pipeline strategy, data rates up to 1.28 Gbit/s are reached in a 0.5 μm CMOS technology  相似文献   

2.
A technique for implementing the realization algorithm from input-output data of the system directly in hardware is proposed. The proposed technique results in low cost, special purpose devices which can quickly solve realization problems. The systolic arrays for VLSI implementation of the different steps of the realization algorithm are developed.  相似文献   

3.
Standard VLSI implementations of turbo decoding require substantial memory and incur a long latency, which cannot be tolerated in some applications. A parallel VLSI architecture for low-latency turbo decoding, comprising multiple single-input single-output (SISO) elements, operating jointly on one turbo-coded block, is presented and compared to sequential architectures. A parallel interleaver is essential to process multiple concurrent SISO outputs. A novel parallel interleaver and an algorithm for its design are presented, achieving the same error correction performance as the standard architecture. Latency is reduced up to 20 times and throughput for large blocks is increased up to six-fold relative to sequential decoders, using the same silicon area, and achieving a very high coding gain. The parallel architecture scales favorably: latency and throughput are improved with increased block size and chip area.  相似文献   

4.
This paper introduces two arithmetic decoders that decode the residue number system into its binary equivalent. The first one deals with the moduli set: (2/sup n/,2/sup n/-1,2/sup n/+1,2/sup n/-2/sup (n+1/2)/+1,2/sup n/+2/sup (n+1/2)/+1), while the other deals with the moduli set: (2/sup n+1/,2/sup n/-1,2/sup n/+1,2/sup n/-2/sup (n+1/2)/+1,2/sup n/+2/sup (n+1/2)/+1), where n is odd. Compact forms for the multiplicative inverse of each modulus is introduced, which facilitates the implementation of these arithmetic decoders. The proposed hardware realizations for these decoders are based on using six carry save adders and one carry propagate adder. The hardware and time requirements of these decoders are much better than other similar decoders found in literature. A sub-micron silicon implementation for the decoder has been performed and reported.  相似文献   

5.
Very large scale integration (VLSI) design methodology and implementation complexities of high-speed, low-power soft-input soft-output (SISO) a posteriori probability (APP) decoders are considered. These decoders are used in iterative algorithms based on turbo codes and related concatenated codes and have shown significant advantage in error correction capability compared to conventional maximum likelihood decoders. This advantage, however, comes at the expense of increased computational complexity, decoding delay, and substantial memory overhead, all of which hinge primarily on the well-known recursion bottleneck of the SISO-APP algorithm. This paper provides a rigorous analysis of the requirements for computational hardware and memory at the architectural level based on a tile-graph approach that models the resource-time scheduling of the recursions of the algorithm. The problem of constructing the decoder architecture and optimizing it for high speed and low power is formulated in terms of the individual recursion patterns which together form a tile graph according to a tiling scheme. Using the tile-graph approach, optimized architectures are derived for the various forms of the sliding-window and parallel-window algorithms known in the literature. A proposed tiling scheme of the recursion patterns, called hybrid tiling, is shown to be particularly effective in reducing memory overhead of high-speed SISO-APP architectures. Simulations demonstrate that the proposed approach achieves savings in area and power in the range of 4.2%-53.1% over state of the art.  相似文献   

6.
This paper is devoted to VLSI implementation of a staged decoder for Block-Coded Modulation (BCM). We first review a general parallel and pipelined implementation of the decoder and we identify the parameters to be considered for optimization. A particular BCM scheme, based on the 8-PSK signal set, is chosen for a case study. Several ideas are described leading to a code-optimized design, and hardware implementation is shown. Next, we evaluate the performance of our design. In particular it is shown that, by exploiting regularity, a simple structure which achieves a throughput rate of 10 Mbps can be implemented by using 23 K transistors and 2 standard cells CMOS technology. Further optimization and simple stacking of ten processors on a single chip in a block-processing structure allows us to achieve a throughput rate of 100 Mbps with about 150 K transistors (38 K gates).  相似文献   

7.
The use of "turbo codes" has been proposed for several applications, including the development of wireless systems, where highly reliable transmission is required at very low signal-to-noise ratios (SNR). The problem of extracting the best coding gains from these kind of codes has been deeply investigated in the last years. Also the hardware implementation of turbo codes is a very challenging topic, mainly due to the iterative nature of the decoding process, which demands an operating frequency much higher than the data rate; in the case of wireless applications, the design constraints became even more strict due to the low-cost and low-power requirements. This paper first presents a new architecture for the decoder core with improved area and power dissipation properties; then partitioning techniques are proposed to reduce the power consumption of the decoder memories. It is proven that most of the power is dissipated by the large RAM units required by the decoder, so the described technique is very efficient: an average power saving of 70% with an area overhead of 23% has been obtained on a set of analyzed architectures.  相似文献   

8.
Turbo codes have received tremendous attention and have commenced their practical applications due to their excellent error-correcting capability. Investigation of efficient iterative decoder realizations is of particular interest because the underlying soft-input soft-output decoding algorithms usually lead to highly complicated implementation. This paper describes the architectural design and analysis of sliding-window (SW) Log-MAP decoders in terms of a set of predetermined parameters. The derived mathematical representations can be applied to construct a variety of VLSI architectures for different applications. Based on our development, a SW-Log-MAP decoder complying with the specification of third-generation mobile radio systems is realized to demonstrate the performance tradeoffs among latency, average decoding rate, area/computation complexity, and memory power consumption. This paper thus provides useful and general information on practical implementation of SW-Log-MAP decoders.  相似文献   

9.
参数可选的高速椭圆曲线密码专用芯片的VLSI实现   总被引:10,自引:0,他引:10  
研究了椭圆曲线密码体制的VLSI实现问题。从点乘运算层与群运算层调度到有限域上的高速运算方法等方面给出了一些提高椭圆曲线上点乘运算的新方案;提出了一种域与曲线参数可选择的高速椭圆曲线密码专用芯片VLSI新结构。基于0.6mm单元库,芯片面积约为36mm2。综合后仿真结果表明:设计芯片能够有效地完成数字签名与身份验证完整流程,在20MHz下平均每次签名时间为62.67ms,高于目前报道的其它同类芯片。  相似文献   

10.
Memory requirements and critical path are essential for 2-D Discrete Wavelet Transform (DWT). In this paper, we address this problem and develop a memory-efficient high-speed architecture for multi-level two-dimensional DWT. First, dual data scanning technique is first adopted in 2-D 9/7 DWT processing unit to perform lifting operations, which doubles the throughputs per cycle. Second, for 2-D DWT architecture, the proposed Row Transform Unit and Column Transform Unit take advantage of input sample availabilities and provision computing resources accordingly to optimize the processing speed, in which the number of processors is further optimized to significantly reduce the hardware cost. Third, to address the problem of high cost of memory for the immediate computing results from each level and the computation time as resolution level increases, multiple proposed 2-D DWT units were combined to build a parallel multi-level architecture, which can perform up to six levels of 2-D DWT in a resolution level parallel way on any arbitrary image size at competitive hardware cost. Experimental results demonstrated that the proposed scheme achieves improved hardware performance with significantly reduced on-chip memory resource and computational time, which outperforms the-state-of-the-art schemes and makes it desirable in memory-constrained real-time application systems.  相似文献   

11.
New VLSI architectures for fast convolutional threshold decoders that process soft-quantized channel symbols are presented. The new architectures feature pipelining and parallelism and make it possible to fabricate decoders for data rates up to hundreds of Mbits per second. With these architectures, the data rate is shown to be independent of the memory of the code, implying that fast AAPP (approximate a posteriori probability) decoders can be built for long powerful codes. Furthermore, the architectures are convenient to use with low and high coding rates. Using a typical example it is shown that a soft-decision threshold decoder can provide a substantial coding gain while being less costly to implement than the hard-decision threshold decoder  相似文献   

12.
通过对BOOTH型乘法器、高速加法器结构和CSD编码滤波器结构的深入研究,开发出一种新型高速CSD编码滤波器结构.采用此结构实现了正交幅度调制器中的一个高速反SINC滤波器,并在ALCATEL 0.35μm CMOS工艺实现.芯片规模7500门,面积1.00mm×0.42mm.  相似文献   

13.
Turbo decoders inherently have large decoding latency and low throughput due to iterative decoding. To increase the throughput and reduce the latency, high-speed decoding schemes have to be employed. In this paper, following a discussion on basic parallel decoding architectures, the segmented sliding window approach and two other types of area-efficient parallel decoding schemes are proposed. Detailed comparison on storage requirement, number of computation units, and the overall decoding latency is provided for various decoding schemes with different levels of parallelism. Hybrid parallel decoding schemes are proposed as an attractive solution for very high level parallelism implementations. To reduce the storage bottleneck for each subdecoder, a modified version of the partial storage of state metrics approach is presented. The new approach achieves a better tradeoff between storage part and recomputation part in general. The application of the pipeline-interleaving technique to parallel turbo decoding architectures is also presented. Simulation results demonstrate that the proposed area-efficient parallel decoding schemes do not cause performance degradation.  相似文献   

14.
This paper analyses different VLSI architectures for 3GPP LTE/LTE-advanced turbo decoders for trade-offs in terms of throughput and area requirement. Data flow graphs for standard SISO MAP (maximum a posteriori) turbo decoder, SW – SISO MAP turbo decoder, PW SISO MAP turbo decoder have been presented, thus analysing their performance. Two variants of quadratic permutation polynomial (QPP) interleaver have been proposed which tend to simplify the complexity of ‘mod’ operator implementation and provide best compromise between area, delay and power dissipation. Implementation of decoder using one variant of QPP interleaver has also been discussed. A novel approach for area optimisation has been proposed to reduce required number of interleavers for parallel window turbo decoder. Multi-port memory has also been used for parallel turbo decoder. To increase the throughput without any effective increase in area complexity, circuit-level pipelining and retiming have been used. Proposed architectures have been synthesised using Synopsys Design Compiler using 45-nm CMOS technology.  相似文献   

15.
An all-digital architecture is presented for implementing the front-end signal-processing functions in a quadrature modulator and demodulator for high bit-rate digital radio applications. A pair of CMOS chips has been designed and submitted for fabrication in a 1.25-μm process and is expected to accommodate symbol rates up to 35 MBd. The modulator chip accepts a pair of 8-b in-phase and quadrature data streams and generates a bandlimited IF output with an excess bandwidth factor of 35%. The demodulator chip accepts a digitized IF input signal and generates a pair of filtered in-phase and quadrature baseband signals. The modulator and demodulator chips each incorporate 40-tap multiplierless FIR (finite-impulse response) square-root Nyquist matched filters, and the cascade of the two chips achieves a peak intersymbol interference distortion of -54 dB. The modulator chip can generate any arbitrary signal constellation within a rectangular grid of 256×256 points. Thus, the all-digital implementation results in a generic chip set suitable for a wide variety of high bit-rate digital modem designs using formats such as M-ary PSK and QAM  相似文献   

16.
New electrostatic discharge (ESD) protection circuits for MOPS/VLSI provide typical 2.7-ns delays and protection against voltage spikes up to 2200 V (limit of test circuit) in some cases. These circuits contain some traditional elements plus new features including a gate-drain connected thin-oxide device to achieve the very low protected node voltage (~2 V) required for advanced thin gate oxide technologies. In addition, for CMOS application, these all-NMOS (or PMOS) circuits would offer a high degree of latchup immunity. For both positive and negative spikes, single-pulse and repeated-pulse test data were obtained for six different test conditions. Electrical and physical analyses show dominant failure modes. Because techniques used to improve protection tend to degrade speed, a figure of merit is proposed to assist a fair comparison between different ESD protection circuit designs.  相似文献   

17.
The architecture and implementation of a high-speed host interface   总被引:1,自引:0,他引:1  
In the design of a high-speed network, the host network interface is a critical component in achieving high end-to-end throughput. Some of the architectural issues involved in host interfacing are discussed. These include the appropriate partitioning of functionality between host and interface and the choice of mechanism for data movement into, out of, and within the host. The general issues are considered in a specific example: the realization of a highly flexible host interface for a 622-Mb/s asynchronous transfer mode network. The architecture of such an interface is described, and the experimental results obtained from its prototype implementation are presented. The prototype will allow experimentation with a variety of scheduling and segmentation/reassembly algorithms, and with new transport protocols, while also delivering high bandwidths to the host  相似文献   

18.
A family of multiprocessor architectures implementing the Viterbi algorithm is presented. The family of architectures is shown to be capable of achieving an increase in throughput that is directly proportional to the number of processors when the number of processors is smaller than the constraint length v of the code. The hardware utilization and the depth of the pipelining available inside each processor are also shown. An architecture with v-1 processors is found to be particularly advantageous, since it results in the maximum speedup and simplest interconnection structure  相似文献   

19.
This paper describes a scalable architecture for real-time speech recognizers based on word hidden Markov models (HMMs) that provide high recognition accuracy for word recognition tasks. However, the size of their recognition vocabulary is small because its extremely high computational costs cause long processing times. To achieve high-speed operations, we developed a VLSI system that has a scalable architecture. The architecture effectively uses parallel computations on the word HMM structure. It can reduce processing time and/or extend the word vocabulary. To explore the practicality of our architecture, we designed and evaluated a complete system recognizer, including speech analysis and noise robustness parts, on a 0.18-/spl mu/m CMOS standard cell library and field-programmable gate array. In the CMOS standard-cell implementation, the total processing time is 56.9 /spl mu/s/word at an operating frequency of 80 MHz in a single system. The recognizer gives a real-time response using an 800-word vocabulary.  相似文献   

20.
A general method is proposed for decoding any cyclic binary code at extremely high speed using only modulo2adders and threshold elements, and the decoders may be designed for maximum-likelihood decoding. The number of decoding cycles is a fraction of the number of digits in the code word.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号