期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An Efficient In-Place VLSI Architecture for Viterbi Algorithm

Yun-Nan Chang 《The Journal of VLSI Signal Processing》2003,33(3):317-324

This paper presents a novel design of Viterbi decoder based on in-place state metric update and hybrid survivor path management. By exploiting the in-place computation feature of the Viterbi algorithm, the proposed design methodology can result in high-speed and modular architectures suitable for those Viterbi applications with large constraint length. This feature is not only applied to the design of highly regular ACS units, but also exploited in the design of trace-back units for the first time. The proposed hybrid survivor path management based on the combination of register-exchange and trace-back schemes cannot only reduce the number of memory operations, but also the size of memory required. Compared with the general hybrid trace-back structure, the overhead of register-exchange circuit in our architecture is significantly less. Therefore, the proposed architecture can find promising applications in digital communication systems where high-speed large state Viterbi decoders are desirable. 相似文献

2.

A Fast Computational Algorithm for the Discrete Cosine Transform 总被引：2，自引：0，他引：2

Wen-Hsiung Chen Smith C. Fralick S. 《Communications, IEEE Transactions on》1977,25(9):1004-1009

A Fast Discrete Cosine Transform algorithm has been developed which provides a factor of six improvement in computational complexity when compared to conventional Discrete Cosine Transform algorithms using the Fast Fourier Transform. The algorithm is derived in the form of matrices and illustrated by a signal-flow graph, which may be readily translated to hardware or software implementations. 相似文献

3.

An Efficient Folded Architecture for Lifting-Based Discrete Wavelet Transform

Guangming Shi Weifeng Liu Li Zhang Fu Li 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2009,56(4):290-294

In this brief an efficient folded architecture (EFA) for lifting-based discrete wavelet transform (DWT) is presented. The proposed EFA is based on a novel form of the lifting scheme that is given in this brief. Due to this form, the conventional serial operations of the lifting data flow can be optimized into parallel ones by employing parallel and pipeline techniques. The corresponding optimized architecture (OA) has short critical path latency and is repeatable. Further, utilizing this repeatability, the EFA is derived from the OA by employing the fold technique. For the proposed EFA, hardware utilization achieves 100%, and the number of required registers is reduced. Additionally, the shift-add operation is adopted to optimize the multiplication; thus, the proposed architecture is more suitable for hardware implementation. Performance comparisons and field-programmable gate array (FPGA) implementation results indicate that the proposed EFA possesses better performances in critical path latency, hardware cost, and control complexity. 相似文献

4.

Fast Discrete Cosine Transform via Computation of Moments 总被引：2，自引：0，他引：2

J.G. Liu H.F. Li F.H.Y. Chan F.K. Lam 《The Journal of VLSI Signal Processing》1998,19(3):257-268

Discrete cosine transform (DCT) is widely used in signal processing. This paper presents a novel approach to perform DCT. DCT is expressed in terms of discrete moments via triangle function transforms and later Taylor series expansion. From this, a fast systolic array for computing moments is converted to compute DCT with only a few multiplications and without any cosine evaluations. The systolic array has advantages of pipelinability, regularity, modularity, local connectivity and scalability, thus making it to be very suitable for VLSI implementation. We provide an estimate of the realizability of our array in a 0.5 m CMOS technology and comparisons with other methods. The execution time of the systolic array is only O(N log₂ N/log₂ log₂ N) in computing 1D N-point DCT if N is sufficiently large. The approach is also applicable to multiple dimensional DCT and DCT inverses. 相似文献

5.

An Efficient VLSI Architecture for Computation of Discrete Fractional Fourier Transform

Kailash Chandra Ray M. V. N. V. Prasad Anindya Sundar Dhar 《Journal of Signal Processing Systems》2018,90(11):1569-1580

Since decades, the fractional Fourier transform (FrFT) has attracted researchers from various domains such as signal and image processing applications. These applications have been essentially demanding the requirement of low computational complexity of FrFT. In this paper, FrFT is simplified to reduce the complexity, and further an efficient CORDIC-based architecture for computing discrete fractional Fourier transform (DFrFT) is proposed which brings down the computational complexity and hardware requirements and provides the flexibility to change the user defined fractional angles to compute DFrFT on-the-fly. Architectural design and working method of proposed architecture along with its constituent blocks are discussed. The hardware complexity and throughput of the proposed architecture are illustrated as well. Finally, the architecture of DFrFT of the order sixteen is implemented using Verilog HDL and synthesized targeting an FPGA device ”XLV5LX110T”. The hardware simulation is performed for functional verification, which is compared with the MATLAB simulation results. Further, the physical implementation result of the proposed design shows that the design can be operated at a maximum frequency of 154 MHz with the latency of 63-clock cycles. 相似文献

6.

二维离散余弦变换的一种新的快速算法

王新成李叔梁卢颉朱维乐《电子学报》1995,23(9):118-121

介绍了二维离散余弦变换的一种新的快速算法，对于Ｎ×ＮＤＣＴ（Ｎ＝２ｍ），只需用Ｎ个一维ＤＣＴ和若干加法运算，与常规的行一列法相比，所需的乘法运算量减少了一半，也比其它快速算法的乘法运算量要少，而加法运算量基本上是相同的。相似文献

7.

An Efficient VLSI Architecture for the Computation of 1-D Discrete Wavelet Transform

A.B. Premkumar A.S. Madhukumar 《The Journal of VLSI Signal Processing》2002,31(3):231-241

This paper presents a new architecture for VLSI implementation of the one dimensional Discrete Wavelet Transform (DWT). The architecture uses single filter for generation of both the DWT coefficients and scaling function for orthogonal wavelets as opposed to the conventional two filter approach. For multilevel decomposition, the fold back architecture principle, which interleaves the decimated scaling function back into the filter for subsequent levels, is applied. Limited use of memory in the design enables efficient implementation of the DWT computation in VLSI. 相似文献

8.

Reconsideration of "A Fast Computational Algorithm for the Discrete Cosine Transform"

Zhongde Wang 《Communications, IEEE Transactions on》1983,31(1):121-123

Some corrections are made for the original paper "A fast computational algorithm for the discrete cosine transform," 1 which contains some errors of indexes and of multiplication factors. 相似文献

9.

Efficient VLSI Architecture for Lifting-Based Discrete Wavelet Packet Transform

Wang C. Gan W. S. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2007,54(5):422-426

This brief presents a novel very large-scale integration (VLSI) architecture for discrete wavelet packet transform (DWPT). By exploiting the in-place nature of the DWPT algorithm, this architecture has an efficient pipeline structure to implement high-throughput processing without any on-chip memory/first-in first out access. A folded architecture for lifting-based wavelet filters is proposed to compute the wavelet butterflies in different groups simultaneously at each decomposition level. According to the comparison results, the proposed VLSI architecture is more efficient than the previous proposed architectures in terms of memory access, hardware regularity and simplicity, and throughput. The folded architecture not only achieves a significant reduction in hardware cost but also maintains both the hardware utilization and high-throughput processing with comparison to the direct mapped tree-structured architecture 相似文献

10.

Hardware Efficient Fast Computation of the Discrete Fourier Transform

Chao Cheng Keshab K. Parhi 《The Journal of VLSI Signal Processing》2006,42(2):159-171

In this paper, a new systolic array for prime N-length DFT is first proposed, and then combined with Winograd Fourier Transform algorithm (WFTA) to control the increase of the hardware cost when the transform length is large. The proposed new DFT design is both fast and hardware efficient. Compared with the recently reported DFT design with computational complexity of O(log N), the proposed design saves the average number of required multiplications by 30 to 60% and reduces the average computation time by more than 2 times, when the transform length changes from 16 to 2048. Chao Cheng received his MSEE degree from Huazhong University of Science and Technology, Wuhan, China, in 2001. With three years industrial experience as a digital communication engineer from VIA Technologies, he is now pursuing his Ph.D. degree at the University of Minnesota, Twin Cities, MN. His present research interest is in VLSI digital signal processing algorithms and their implementation. Keshab K. Parhi received his B.Tech., MSEE, and Ph.D. degrees from the Indian Institute of Technology, Kharagpur, the University of Pennsylvania, Philadelphia, and the University of California at Berkeley, in 1982, 1984, and 1988, respectively. He has been with the University of Minnesota, Minneapolis, since 1988, where he is currently Distinguished McKnight University Professor in the Department of Electrical and Computer Engineering. His research addresses VLSI architecture design and implementation of physical layer aspects of broadband communications systems. He is currently working on error control coders and cryptography architectures, high-speed transceivers, and ultra wideband systems. He has published over 400 papers, has authored the text book VLSI Digital Signal Processing Systems (Wiley, 1999) and coedited the reference book Digital Signal Processing for Multimedia Systems (Marcel Dekker, 1999). Dr. Parhi is the recipient of numerous awards including the 2004 F.E. Terman award by the American Society of Engineering Education, the 2003 IEEE Kiyo Tomiyasu Technical Field Award, the 2001 IEEE W.R.G. Baker prize paper award, and a Golden Jubilee award from the IEEE Circuits and Systems Society in 1999. He has served on the editorial boards of the IEEE TRANSACTIONS ON CAS, CAS-II, VLSI Systems, Signal Processing, Signal Processing Letters, and Signal Processing Magazine, and currently serves as the Editor-in-Chief of the IEEE Trans. on Circuits and Systems---I (2004--2005 term), and serves on the Editorial Board of the Journal of VLSI Signal Processing. He has served as technical program cochair of the 1995 IEEE VLSI Signal Processing workshop and the 1996 ASAP conference, and as the general chair of the 2002 IEEE Workshop on Signal Processing Systems. He was a distinguished lecturer for the IEEE Circuits and Systems society during 1996--1998. He is a Fellow of IEEE (1996). An erratum to this article is available at . 相似文献

11.

An Approach to the Implementation of a Discrete Cosine Transform

Bertocci G. Schoenherr B. Messerschmitt D. 《Communications, IEEE Transactions on》1982,30(4):635-641

An approach to the implementation of a discrete cosine transform (DCT) for application to coding speech is described. The approach is oriented toward single speech channel encoding. In addition, a detailed computer simulation of an adaptive transform coder is described. The purpose of the computer simulation is to determine the internal precision at various points in the implementation required to avoid subjective degradation. Specific recommmendations are made on the required internal precision in the implementation of the discrete cosine transform. A breadboard implementation of the DCT using SSI and MSI TTL logic based on the results of the computer simulation is reported. 相似文献

12.

二维DCT快速算法及FPGA实现

陈普跃赵新璧陈斌《电子质量》2008,(2):5-7,22

本文提出了一种二维OCT快速算法的FPGA实现结构,采用行列快速算法将二维DCT分解成两个一维DCT实现,其中一维DCT借鉴Loeffler DCT算法,采用并行的流水线结构,提高电路的数据吞吐率和运算速度,通过系数矩阵的简化和蝶形运算结构的等价减少乘法器的消耗,一维DCT核消耗16个乘法器.转置RAM采用8片双口RAM,一个时钟可以完成 8个数据读写.实验结果验证了二维DCT核设计的正确性,该电路结构消耗资源少,布线简单,功耗小,适合图像的实时处理. 相似文献

13.

一种快速迭代的坐标旋转机结构

梁政沈绪榜《微电子学与计算机》2002,19(8):11-12,60

最简单超越函数硬件实现方法是基于移位加的坐标旋转机算法CORDIC，这种方法的结构简单规则，能以固定结构实现多种超越函数的计算。文章介绍了这种算法的工作方式和具体应用，引入冗余数计算以减少单次迭代的延迟。同时讨论了冗余计算结构所需的尺度因子补偿，并提出了一种减小迭代次数的混合基结构。相似文献

14.

Fast Algorithm and Efficient Architecture for Integer and Fractional Motion Estimation

Obianuju Ndili Tokunbo Ogunfunmi 《Journal of Signal Processing Systems》2014,75(1):55-64

Motion estimation in H.264/AVC, is done in two parts – integer motion estimation, and fractional motion estimation. Hardware reuse for both parts is inefficient due to the differences between them. In this paper we address the hardware reuse problem by proposing a, fast motion estimation algorithm as well as a pipelined FPGA-based, field programmable system-on-chip (FPSoC), for integer and fractional motion estimation. Our results show that the rate-distortion loss of our algorithm is insignificant when compared to full search in H.264/AVC. Its average Y-PSNR loss is 0.065 dB, its average percentage bit rate increase is 5 %, and its power consumption is 76 mW. Our FPSoC is hardware-efficient, even out-performing some state-of-the-art ASIC implementations. It can support up to high definition 1280?×?720p video at 24Hz. Thus, our proposed algorithm and architecture is suitable for delivery of high quality video on low power devices and low bit rate applications which typically use H.264/AVC baseline profile@levels 1–3.1. 相似文献

15.

Interframe Cosine Transform Image Coding

Roese J. Pratt W. Robinson G. 《Communications, IEEE Transactions on》1977,25(11):1329-1339

Two-dimensional transform coding and hybrid transform/DPCM coding techniques have been investigated extensively for image coding. This paper presents a theoretical and experimental extension of these techniques to the coding of sequences of correlated image frames. Two coding methods are analyzed: three-dimensional cosine transform coding, and two-dimensional cosine transform coding within an image frame combined with DPCM coding between frames. Theoretical performance estimates are developed for the coding of Markovian image sources. Simulation results are presented for transmission over error-free and binary symmetric channels. 相似文献

16.

Discrete Cosine Transform for Driving Liquid Crystal Displays

《Display Technology, Journal of》2009,5(7):243-249

Despite rapid advances in science and technology of liquid crystal display (LCD); elimination of motion-related artifacts and preservation of color purity in moving images have remained elusive because gray-scale to gray-scale response time, i.e., time taken to switch pixels from one gray-scale to another depends on the initial and final gray shades. A technique wherein gray scale to gray scale response times are less dependent on the initial and final gray shades as compared to other addressing techniques for driving matrix LCD is reported. We also found that the response times are about the same as that of a pixel driven with simple square waveforms and, therefore, the effect of duty cycle due to matrix addressing is minimal with distributed waveforms of this technique. 相似文献

17.

E-TCAM: An Efficient SRAM-Based Architecture for TCAM

Zahid Ullah Manish Kumar Jaiswal Ray C. C. Cheung 《Circuits, Systems, and Signal Processing》2014,33(10):3123-3144

Ternary content addressable memories (TCAMs) perform high-speed search operation in a deterministic time. However, when compared with static random access memories (SRAMs), TCAMs suffer from certain limitations such as low-storage density, relatively slow access time, low scalability, complex circuitry, and higher cost. One fundamental question is that can we utilize SRAM to combine it with additional logic to achieve the TCAM functionality? This paper proposes an efficient memory architecture, called E-TCAM, which emulates the TCAM functionality with SRAM. E-TCAM logically divides the classical TCAM table along columns and rows into hybrid TCAM subtables and then maps them to their corresponding memory blocks. During search operation, the memory blocks are accessed by their corresponding subwords of the input word and a match address is produced. An example design of $512\times 36$ of E-TCAM has been successfully implemented on Xilinx Virtex- $5$ , Virtex- $6$ , and Virtex- $7$ field-programmable gate arrays (FPGAs). FPGA implementation results show that E-TCAM obtains $33.33$ % reduction in block-RAMs, $71.07$ % in slice registers, $77.16$ % in lookup tables, $53.54$ % in energy/bit/search, and offers $63.03$ % improvement in speed, compared with the best available SRAM-based TCAM designs. 相似文献

18.

An Efficient VLSI Architecture for Nonbinary LDPC Decoders

Lin J. Sha J. Wang Z. Li L. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2010,57(1):51-55

Low-density parity-check (LDPC) codes constructed over the Galois field $ hbox{GF}(q)$, which are also called nonbinary LDPC codes, are an extension of binary LDPC codes with significantly better performance. Although various kinds of low-complexity quasi-optimal iterative decoding algorithms have been proposed, the VLSI implementation of nonbinary LDPC decoders has rarely been discussed due to their hardware unfriendly properties. In this brief, an efficient selective computation algorithm, which totally avoids the sorting process, is proposed for Min–Max decoding. In addition, an efficient VLSI architecture for a nonbinary Min–Max decoder is presented. The synthesis results are given to demonstrate the efficiency of the proposed techniques. 相似文献

19.

快速傅立叶变换算法概述 总被引：9，自引：0，他引：9

季虎夏胜平郁文贤《现代电子技术》2001,(8):11-14

快速傅立叶变换（FFT）属于数字信号处理中最基础的运算,已广泛应用于通讯、医学电子学、雷达或无线电天文学等领域。对FFT的主要算法进行了概述 ,并对其特性和运算工作量进行了分析和对比,期望对快速傅立叶变换算法有一个清晰的认识。相似文献

20.

On the Computation of the Discrete Cosine Transform 总被引：1，自引：0，他引：1

Narasimha M. Peterson A. 《Communications, IEEE Transactions on》1978,26(6):934-936

AnN-point discrete Fourier transform (DFT) algorithm can be used to evaluate a discrete cosine transform by a simple rearrangement of the input data. This method is about two times faster compared to the conventional method which uses a2N-point DFT. 相似文献