共查询到20条相似文献,搜索用时 15 毫秒
1.
Yun-Nan Chang 《The Journal of VLSI Signal Processing》2003,33(3):317-324
This paper presents a novel design of Viterbi decoder based on in-place state metric update and hybrid survivor path management. By exploiting the in-place computation feature of the Viterbi algorithm, the proposed design methodology can result in high-speed and modular architectures suitable for those Viterbi applications with large constraint length. This feature is not only applied to the design of highly regular ACS units, but also exploited in the design of trace-back units for the first time. The proposed hybrid survivor path management based on the combination of register-exchange and trace-back schemes cannot only reduce the number of memory operations, but also the size of memory required. Compared with the general hybrid trace-back structure, the overhead of register-exchange circuit in our architecture is significantly less. Therefore, the proposed architecture can find promising applications in digital communication systems where high-speed large state Viterbi decoders are desirable. 相似文献
2.
A Fast Computational Algorithm for the Discrete Cosine Transform 总被引:2,自引:0,他引:2
A Fast Discrete Cosine Transform algorithm has been developed which provides a factor of six improvement in computational complexity when compared to conventional Discrete Cosine Transform algorithms using the Fast Fourier Transform. The algorithm is derived in the form of matrices and illustrated by a signal-flow graph, which may be readily translated to hardware or software implementations. 相似文献
3.
Guangming Shi Weifeng Liu Li Zhang Fu Li 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2009,56(4):290-294
In this brief an efficient folded architecture (EFA) for lifting-based discrete wavelet transform (DWT) is presented. The proposed EFA is based on a novel form of the lifting scheme that is given in this brief. Due to this form, the conventional serial operations of the lifting data flow can be optimized into parallel ones by employing parallel and pipeline techniques. The corresponding optimized architecture (OA) has short critical path latency and is repeatable. Further, utilizing this repeatability, the EFA is derived from the OA by employing the fold technique. For the proposed EFA, hardware utilization achieves 100%, and the number of required registers is reduced. Additionally, the shift-add operation is adopted to optimize the multiplication; thus, the proposed architecture is more suitable for hardware implementation. Performance comparisons and field-programmable gate array (FPGA) implementation results indicate that the proposed EFA possesses better performances in critical path latency, hardware cost, and control complexity. 相似文献
4.
Fast Discrete Cosine Transform via Computation of Moments 总被引:2,自引:0,他引:2
Discrete cosine transform (DCT) is widely used in signal processing. This paper presents a novel approach to perform DCT. DCT is expressed in terms of discrete moments via triangle function transforms and later Taylor series expansion. From this, a fast systolic array for computing moments is converted to compute DCT with only a few multiplications and without any cosine evaluations. The systolic array has advantages of pipelinability, regularity, modularity, local connectivity and scalability, thus making it to be very suitable for VLSI implementation. We provide an estimate of the realizability of our array in a 0.5 m CMOS technology and comparisons with other methods. The execution time of the systolic array is only O(N log2
N/log2 log2
N) in computing 1D N-point DCT if N is sufficiently large. The approach is also applicable to multiple dimensional DCT and DCT inverses. 相似文献
5.
Kailash Chandra Ray M. V. N. V. Prasad Anindya Sundar Dhar 《Journal of Signal Processing Systems》2018,90(11):1569-1580
Since decades, the fractional Fourier transform (FrFT) has attracted researchers from various domains such as signal and image processing applications. These applications have been essentially demanding the requirement of low computational complexity of FrFT. In this paper, FrFT is simplified to reduce the complexity, and further an efficient CORDIC-based architecture for computing discrete fractional Fourier transform (DFrFT) is proposed which brings down the computational complexity and hardware requirements and provides the flexibility to change the user defined fractional angles to compute DFrFT on-the-fly. Architectural design and working method of proposed architecture along with its constituent blocks are discussed. The hardware complexity and throughput of the proposed architecture are illustrated as well. Finally, the architecture of DFrFT of the order sixteen is implemented using Verilog HDL and synthesized targeting an FPGA device ”XLV5LX110T”. The hardware simulation is performed for functional verification, which is compared with the MATLAB simulation results. Further, the physical implementation result of the proposed design shows that the design can be operated at a maximum frequency of 154 MHz with the latency of 63-clock cycles. 相似文献
6.
7.
This paper presents a new architecture for VLSI implementation of the one dimensional Discrete Wavelet Transform (DWT). The architecture uses single filter for generation of both the DWT coefficients and scaling function for orthogonal wavelets as opposed to the conventional two filter approach. For multilevel decomposition, the fold back architecture principle, which interleaves the decimated scaling function back into the filter for subsequent levels, is applied. Limited use of memory in the design enables efficient implementation of the DWT computation in VLSI. 相似文献
8.
Zhongde Wang 《Communications, IEEE Transactions on》1983,31(1):121-123
Some corrections are made for the original paper "A fast computational algorithm for the discrete cosine transform," 1 which contains some errors of indexes and of multiplication factors. 相似文献
9.
This brief presents a novel very large-scale integration (VLSI) architecture for discrete wavelet packet transform (DWPT). By exploiting the in-place nature of the DWPT algorithm, this architecture has an efficient pipeline structure to implement high-throughput processing without any on-chip memory/first-in first out access. A folded architecture for lifting-based wavelet filters is proposed to compute the wavelet butterflies in different groups simultaneously at each decomposition level. According to the comparison results, the proposed VLSI architecture is more efficient than the previous proposed architectures in terms of memory access, hardware regularity and simplicity, and throughput. The folded architecture not only achieves a significant reduction in hardware cost but also maintains both the hardware utilization and high-throughput processing with comparison to the direct mapped tree-structured architecture 相似文献
10.
In this paper, a new systolic array for prime N-length DFT is first proposed, and then combined with Winograd Fourier Transform algorithm (WFTA) to control the increase
of the hardware cost when the transform length is large. The proposed new DFT design is both fast and hardware efficient.
Compared with the recently reported DFT design with computational complexity of O(log N), the proposed design saves the average number of required multiplications by 30 to 60% and reduces the average computation
time by more than 2 times, when the transform length changes from 16 to 2048.
Chao Cheng received his MSEE degree from Huazhong University of Science and Technology, Wuhan, China, in 2001. With three years industrial
experience as a digital communication engineer from VIA Technologies, he is now pursuing his Ph.D. degree at the University
of Minnesota, Twin Cities, MN.
His present research interest is in VLSI digital signal processing algorithms and their implementation.
Keshab K. Parhi received his B.Tech., MSEE, and Ph.D. degrees from the Indian Institute of Technology, Kharagpur, the University of Pennsylvania,
Philadelphia, and the University of California at Berkeley, in 1982, 1984, and 1988, respectively. He has been with the University
of Minnesota, Minneapolis, since 1988, where he is currently Distinguished McKnight University Professor in the Department
of Electrical and Computer Engineering.
His research addresses VLSI architecture design and implementation of physical layer aspects of broadband communications systems.
He is currently working on error control coders and cryptography architectures, high-speed transceivers, and ultra wideband
systems.
He has published over 400 papers, has authored the text book VLSI Digital Signal Processing Systems (Wiley, 1999) and coedited
the reference book Digital Signal Processing for Multimedia Systems (Marcel Dekker, 1999).
Dr. Parhi is the recipient of numerous awards including the 2004 F.E. Terman award by the American Society of Engineering
Education, the 2003 IEEE Kiyo Tomiyasu Technical Field Award, the 2001 IEEE W.R.G. Baker prize paper award, and a Golden Jubilee
award from the IEEE Circuits and Systems Society in 1999.
He has served on the editorial boards of the IEEE TRANSACTIONS ON CAS, CAS-II, VLSI Systems, Signal Processing, Signal Processing
Letters, and Signal Processing Magazine, and currently serves as the Editor-in-Chief of the IEEE Trans. on Circuits and Systems---I
(2004--2005 term), and serves on the Editorial Board of the Journal of VLSI Signal Processing.
He has served as technical program cochair of the 1995 IEEE VLSI Signal Processing workshop and the 1996 ASAP conference,
and as the general chair of the 2002 IEEE Workshop on Signal Processing Systems. He was a distinguished lecturer for the IEEE
Circuits and Systems society during 1996--1998. He is a Fellow of IEEE (1996).
An erratum to this article is available at . 相似文献
11.
An approach to the implementation of a discrete cosine transform (DCT) for application to coding speech is described. The approach is oriented toward single speech channel encoding. In addition, a detailed computer simulation of an adaptive transform coder is described. The purpose of the computer simulation is to determine the internal precision at various points in the implementation required to avoid subjective degradation. Specific recommmendations are made on the required internal precision in the implementation of the discrete cosine transform. A breadboard implementation of the DCT using SSI and MSI TTL logic based on the results of the computer simulation is reported. 相似文献
12.
13.
最简单超越函数硬件实现方法是基于移位加的坐标旋转机算法CORDIC,这种方法的结构简单规则,能以固定结构实现多种超越函数的计算。文章介绍了这种算法的工作方式和具体应用,引入冗余数计算以减少单次迭代的延迟。同时讨论了冗余计算结构所需的尺度因子补偿,并提出了一种减小迭代次数的混合基结构。 相似文献
14.
Motion estimation in H.264/AVC, is done in two parts – integer motion estimation, and fractional motion estimation. Hardware reuse for both parts is inefficient due to the differences between them. In this paper we address the hardware reuse problem by proposing a, fast motion estimation algorithm as well as a pipelined FPGA-based, field programmable system-on-chip (FPSoC), for integer and fractional motion estimation. Our results show that the rate-distortion loss of our algorithm is insignificant when compared to full search in H.264/AVC. Its average Y-PSNR loss is 0.065 dB, its average percentage bit rate increase is 5 %, and its power consumption is 76 mW. Our FPSoC is hardware-efficient, even out-performing some state-of-the-art ASIC implementations. It can support up to high definition 1280?×?720p video at 24Hz. Thus, our proposed algorithm and architecture is suitable for delivery of high quality video on low power devices and low bit rate applications which typically use H.264/AVC baseline profile@levels 1–3.1. 相似文献
15.
Two-dimensional transform coding and hybrid transform/DPCM coding techniques have been investigated extensively for image coding. This paper presents a theoretical and experimental extension of these techniques to the coding of sequences of correlated image frames. Two coding methods are analyzed: three-dimensional cosine transform coding, and two-dimensional cosine transform coding within an image frame combined with DPCM coding between frames. Theoretical performance estimates are developed for the coding of Markovian image sources. Simulation results are presented for transmission over error-free and binary symmetric channels. 相似文献
16.
《Display Technology, Journal of》2009,5(7):243-249
17.
Zahid Ullah Manish Kumar Jaiswal Ray C. C. Cheung 《Circuits, Systems, and Signal Processing》2014,33(10):3123-3144
Ternary content addressable memories (TCAMs) perform high-speed search operation in a deterministic time. However, when compared with static random access memories (SRAMs), TCAMs suffer from certain limitations such as low-storage density, relatively slow access time, low scalability, complex circuitry, and higher cost. One fundamental question is that can we utilize SRAM to combine it with additional logic to achieve the TCAM functionality? This paper proposes an efficient memory architecture, called E-TCAM, which emulates the TCAM functionality with SRAM. E-TCAM logically divides the classical TCAM table along columns and rows into hybrid TCAM subtables and then maps them to their corresponding memory blocks. During search operation, the memory blocks are accessed by their corresponding subwords of the input word and a match address is produced. An example design of \(512\times 36\) of E-TCAM has been successfully implemented on Xilinx Virtex- \(5\) , Virtex- \(6\) , and Virtex- \(7\) field-programmable gate arrays (FPGAs). FPGA implementation results show that E-TCAM obtains \(33.33\) % reduction in block-RAMs, \(71.07\) % in slice registers, \(77.16\) % in lookup tables, \(53.54\) % in energy/bit/search, and offers \(63.03\) % improvement in speed, compared with the best available SRAM-based TCAM designs. 相似文献
18.
Lin J. Sha J. Wang Z. Li L. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2010,57(1):51-55
19.
20.
On the Computation of the Discrete Cosine Transform 总被引:1,自引:0,他引:1
AnN -point discrete Fourier transform (DFT) algorithm can be used to evaluate a discrete cosine transform by a simple rearrangement of the input data. This method is about two times faster compared to the conventional method which uses a2N -point DFT. 相似文献