首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper presents novel very large scale integration (VLSI) architectures in support of an efficient implementation of Leighton's well-known Columnsort. The designs take advantage of reconfigurable bus architectures enhanced with simple shift switches. Our first main contribution is to show that Columnsort can be partitioned into two components: a hardware scheme involving the task of sorting arrays of small size and a hardware or software scheme that involves simple data movement tasks. Our second main contribution is to demonstrate that the dynamically reconfigurable mesh architecture can be exploited to obtain a small and efficient hardware sorter. The resulting architectures feature high regularity of circuitry, simplicity of control structure, and adaptability. Both theoretical analyses and simulation tests have shown that the proposed VLSI architectures for sorting are superior to existing designs in the context of sorting small and moderate size arrays  相似文献   

2.
Ray Liu  K.J. 《Electronics letters》1990,26(23):1962-1963
The Haar transform is very useful in many signal and image processing applications where real-time implementation is essential. Three VLSI computing architectures are proposed for fast implementation of the Haar transform. Comparisons on the advantages and disadvantages of the proposed architectures are also presented.<>  相似文献   

3.
New VLSI architectures for fast convolutional threshold decoders that process soft-quantized channel symbols are presented. The new architectures feature pipelining and parallelism and make it possible to fabricate decoders for data rates up to hundreds of Mbits per second. With these architectures, the data rate is shown to be independent of the memory of the code, implying that fast AAPP (approximate a posteriori probability) decoders can be built for long powerful codes. Furthermore, the architectures are convenient to use with low and high coding rates. Using a typical example it is shown that a soft-decision threshold decoder can provide a substantial coding gain while being less costly to implement than the hard-decision threshold decoder  相似文献   

4.
A typical digital image communication system based on a number of ASIC architectural designs is currently under development. After partitioning the complete system into several stages, each selected algorithm can be implemented on a single ASIC based on an efficient architectural style. The required throughput for high-performance data compression and channel coding has been obtained due to the optimization of the critical path in the architectural design. Input/output (I/O) operations, which create the bottleneck for many image-processing algorithms, are handled by a dedicated I/O interface unit. Construction of each dedicated data path in the architecture is based on a limited parameterizable functional building block (FBB) library. The dedicated data paths have been constructed by partitioning the initial signal flow graph (SFG) into compatible graphs and by matching the graphs in each partition onto a collection of time-multiplexed FBBs. Hierarchically partitioned controllers were used to meet the high-throughput requirements. The ASIC architectures proposed are oriented to broadband integrated services digital networks (B-ISDN) as well as high-performance digital image compression systems  相似文献   

5.
The modern real time applications like orthogonal frequency division multiplexing and etc., demand high performance fast Fourier transform (FFT) design with less area and clock cycles. This paper proposes efficient FFT VLSI architectures using folded/parallel implementation. In the proposed folded FFT architecture, the number of cycles required to complete the operation is less than single path delay feedback (SDF)/multi-path delay commutator (MDC) architectures. In the proposed parallel FFT architecture, N-point FFT is implemented by using one N/2-point FFT without much extra hardware. Both the proposed architectures are implemented for radix-2, 22, and 4 using 45 nm technology library. The proposed parallel architecture achieves 56.7% and 40.6% of area reduction as compared with the existing parallel architecture based 16-point radix-2 and radix-22 DIF FFTs respectively. The proposed folded architecture achieves 65.5%, 51.1%, and 35.8% of worst path delay reduction as compared with the existing SDF based 16-point radix-2, radix-22, and radix-4 DIF FFTs respectively.  相似文献   

6.
This paper presents a generalized mixed-radix decimation-in-time (DIT) fast algorithm for computing the modified discrete cosine transform (MDCT) of the composite lengths N=2×qm, m≥2, where q is an odd positive integer. The proposed algorithm not only has the merits of parallelism and numerical stability, but also needs less multiplications than that of type-IV discrete cosine transform (DCT-IV) and type-II discrete cosine transform (DCT-II) based MDCT algorithms due to the optimized efficient length-(N/q) modules. The computation of MDCT for composite lengths N=qm×2n, m≥2, n≥2, can then be realized by combining the proposed algorithm with fast radix-2 MDCT algorithm developed for N=2n. The combined algorithm can be used for the computation of length-12/36 MDCT used in MPEG-1/-2 layer III audio coding as well as the recently established wideband speech and audio coding standards such as G.729.1, where length-640 MDCT is used. The realization of the inverse MDCT (IMDCT) can be obtained by transposing the signal flow graph of the MDCT.  相似文献   

7.
The discrete wavelet transform (DWT) provides a new method for signal/image analysis where high frequency components are studied with finer time resolution and low frequency components with coarser time resolution. It decomposes a signal or an image into localized contributions for multiscale analysis. In this paper, we present a parallel pipelined VLSI array architecture for 2D dyadic separable DWT. The 2D data array is partitioned into non-overlapping groups of rows. All rows in a partition are processed in parallel, and consecutive partitions are pipelined. Moreover, multiple wavelet levels are computed in the same pipeline, and multiple DWT problems can be pipelined also. The whole computation requires a single scan of the image data array. Thus, it is suitable for on-line real-time applications. For anN×N image, anm-level DWT can be computed in time units on a processor costing no more than , whereq is the partition size,p is the length of corresponding 1D DWT filters,C m andC a are the costs of a parallel multiplier and a parallel adder respectively, and a time unit is the time for a multiplication and an addition. Forq=N m, the computing time reduces to . When a large number of DWT problems are pipelined, the computing time is about per problem.  相似文献   

8.
Malvar  H. 《Electronics letters》1986,22(7):352-353
A relationship between the discrete cosine transform (DCT) and the discrete Hartley transform (DHT) is derived. It leads to a new fast and numerically stable algorithm for the DCT.  相似文献   

9.
This paper presents several techniques for the very large-scale integration (VLSI) implementation of the maximum a posteriori (MAP) algorithm. In general, knowledge about the implementation of the Viterbi (1967) algorithm can be applied to the MAP algorithm. Bounds are derived for the dynamic range of the state metrics which enable the designer to optimize the word length. The computational kernel of the algorithm is the add-MAX* operation, which is the add-compare-select operation of the Viterbi algorithm with an added offset. We show that the critical path of the algorithm can be reduced if the add-MAX* operation is reordered into an offset-add-compare-select operation by adjusting the location of registers. A general scheduling for the MAP algorithm is presented which gives the tradeoffs between computational complexity, latency, and memory size. Some of these architectures eliminate the need for RAM blocks with unusual form factors or can replace the RAM with registers. These architectures are suited to VLSI implementation of turbo decoders.  相似文献   

10.
Typographical errors were printed in the print and online versions of the original article. The correct versions of Equation 25 (page 168), Figure 6(c) (page 169), and Table 2 (page 170) are printed below, respectively. The online version of the original article can be found at  相似文献   

11.
12.
The real time implementation of an efficient signal compression technique, Vector Quantization (VQ), is of great importance to many digital signal coding applications. In this paper, we describe a new family of bit level systolic VLSI architectures which offer an attractive solution to this problem. These architectures are based on a bit serial, word parallel approach and high performance and efficiency can be achieved for VQ applications of a wide range of bandwidths. Compared with their bit parallel counterparts, these bit serial circuits provide better alternatives for VQ implementations in terms of performance and cost.  相似文献   

13.
Very large scale integration (VLSI) design methodology and implementation complexities of high-speed, low-power soft-input soft-output (SISO) a posteriori probability (APP) decoders are considered. These decoders are used in iterative algorithms based on turbo codes and related concatenated codes and have shown significant advantage in error correction capability compared to conventional maximum likelihood decoders. This advantage, however, comes at the expense of increased computational complexity, decoding delay, and substantial memory overhead, all of which hinge primarily on the well-known recursion bottleneck of the SISO-APP algorithm. This paper provides a rigorous analysis of the requirements for computational hardware and memory at the architectural level based on a tile-graph approach that models the resource-time scheduling of the recursions of the algorithm. The problem of constructing the decoder architecture and optimizing it for high speed and low power is formulated in terms of the individual recursion patterns which together form a tile graph according to a tiling scheme. Using the tile-graph approach, optimized architectures are derived for the various forms of the sliding-window and parallel-window algorithms known in the literature. A proposed tiling scheme of the recursion patterns, called hybrid tiling, is shown to be particularly effective in reducing memory overhead of high-speed SISO-APP architectures. Simulations demonstrate that the proposed approach achieves savings in area and power in the range of 4.2%-53.1% over state of the art.  相似文献   

14.
A great interest has been gained in recent years by a new error-correcting code technique, known as “turbo coding”, which has been proven to offer performance closer to the Shannon's limit than traditional concatenated codes. In this paper, several very large scale integration (VLSI) architectures suitable for turbo decoder implementation are proposed and compared in terms of complexity and performance; the impact on the VLSI complexity of system parameters like the state number, number of iterations, and code rate are evaluated for the different solutions. The results of this architectural study have then been exploited for the design of a specific decoder, implementing a serial concatenation scheme with 2/3 and 3/4 codes; the designed circuit occupies 35 mm2, supports a 2 Mb/s data rate, and for a bit error probability of 10-6, yields a coding gain larger than 7 dB, with ten iterations  相似文献   

15.
本文通过将全搜索矢量量化算法(Full Search Vector Quantization)的计算转换成内积(inner product)运算,并利用Baugh-Wooley算法,阐述了FSVQ算法的一种新的有效的基于二进制补码的VLSI实现结构。由于该结构的规则性(regularity)和模块性(modularity),它可以被高效地应用在语音、图像、和视频编码的VLSI实现中。  相似文献   

16.
The paper describes the design and parallel computation of a regularised fast Hartley transform (FHT), to be used for computation of the discrete Fourier transform (DFT) of real-valued data. For the processing of such data, the FHT has attractions over the fast Fourier transform (FFT) in terms of reduced arithmetic operation counts and reduced memory requirement, whilst its bilateral property means it may be straightforwardly applied to both forward and inverse DFTs. A drawback, however, of conventional FHT algorithms lies in the loss of regularity arising from the need for two sizes of 'butterfly' for efficient fixed-radix implementations. A generic double butterfly is therefore developed for the radix-4 FHT which overcomes the problem in an elegant fashion. The result is a recursive single-butterfly solution, referred to as the regularised FHT, which lends itself naturally to parallelisation and to mapping onto a regular computational structure for implementation with algorithmically specialised hardware.  相似文献   

17.
We present an iterative nonparametric approach to spectral estimation that is particularly suitable for estimation of line spectra. This approach minimizes a cost function derived from Bayes' theorem. The method is suitable for line spectra since a “long tailed” distribution is used to model the prior distribution of spectral amplitudes. Since the data themselves are used as constraints, phase information can also be recovered and used to extend the data outside the original window. The objective function is formulated in terms of hyperparameters that control the degree of fit and spectral resolution. Noise rejection can also be achieved by truncating the number of iterations. Spectral resolution and extrapolation length are controlled by a single parameter. When this parameter is large compared with the spectral powers, the algorithm leads to zero extrapolation of the data, and the estimated Fourier transform yields the periodogram. When the data are sampled at a constant rate, the algorithm uses one Levinson recursion per iteration. For irregular sampling, the algorithm uses one Cholesky decomposition per iteration. The performance of the algorithm is illustrated with three different problems that arise in geophysical data: (1) harmonic retrieval from a time series contaminated with noise; (2) linear event detection from a finite aperture array of receivers, (3) interpolation/extrapolation of gapped data. The performance of the algorithm as a spectral estimator is tested with the Kay and Marple (1981) data set  相似文献   

18.
The Fourier transform over finite fields is mainly required in the encoding and decoding of Reed-Solomon and BCH codes. An algorithm for computing the Fourier transform over any finite field GF(pm) is introduced. It requires only O(n(log n)2/4) additions and the same number of multiplications for an n-point transform and allows in some fields a further reduction of the number of multiplications to O(n log n). Because of its highly regular structure, this algorithm can be easily implementation by VLSI technology  相似文献   

19.
Novel architectures for 1-D and 2-D discrete wavelet transform (DWT) by using lifting schemes are presented in this paper. An embedded decimation technique is exploited to optimize the architecture for 1-D DWT, which is designed to receive an input and generate an output with the low- and high-frequency components of original data being available alternately. Based on this 1-D DWT architecture, an efficient line-based architecture for 2-D DWT is further proposed by employing parallel and pipeline techniques, which is mainly composed of two horizontal filter modules and one vertical filter module, working in parallel and pipeline fashion with 100% hardware utilization. This 2-D architecture is called fast architecture (FA) that can perform J levels of decomposition for N * N image in approximately 2N2(1 - 4(-J))/3 internal clock cycles. Moreover, another efficient generic line-based 2-D architecture is proposed by exploiting the parallelism among four subband transforms in lifting-based 2-D DWT, which can perform J levels of decomposition for N * N image in approximately N2(1 - 4(-J))/3 internal clock cycles; hence, it is called high-speed architecture. The throughput rate of the latter is increased by two times when comparing with the former 2-D architecture, but only less additional hardware cost is added. Compared with the works reported in previous literature, the proposed architectures for 2-D DWT are efficient alternatives in tradeoff among hardware cost, throughput rate, output latency and control complexity, etc.  相似文献   

20.
High-speed VLSI architectures for the AES algorithm   总被引:1,自引:0,他引:1  
This paper presents novel high-speed architectures for the hardware implementation of the Advanced Encryption Standard (AES) algorithm. Unlike previous works which rely on look-up tables to implement the SubBytes and InvSubBytes transformations of the AES algorithm, the proposed design employs combinational logic only. As a direct consequence, the unbreakable delay incurred by look-up tables in the conventional approaches is eliminated, and the advantage of subpipelining can be further explored. Furthermore, composite field arithmetic is employed to reduce the area requirements, and different implementations for the inversion in subfield GF(2/sup 4/) are compared. In addition, an efficient key expansion architecture suitable for the subpipelined round units is also presented. Using the proposed architecture, a fully subpipelined encryptor with 7 substages in each round unit can achieve a throughput of 21.56 Gbps on a Xilinx XCV1000 e-8 bg560 device in non-feedback modes, which is faster and is 79% more efficient in terms of equivalent throughput/slice than the fastest previous FPGA implementation known to date.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号