首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents an efficient approach for computing the N-point (N=2n) scaled discrete cosine transform (DCT) with the coordinate rotation digital computer (CORDIC) algorithm. The proposed algorithm is based on an indirect approach for computing the DCT so that the vector rotations are completely separated from the other operations and placed at the end of the DCT unit. As a result, unlike the other CORDIC-based DCT architectures, the proposed scaled DCT architecture does not require scale factor compensation. The number of CORDIC iterations is minimized through the optimal angle recoding method based on the three-value CORDIC algorithm. Although this three-value CORDIC algorithm results in different scale factors for different angles, this does not incur any extra hardware in the proposed scaled DCT architecture  相似文献   

2.
In this work, we proposed a novel Coordinate Rotation DIgital Computer (CORDIC) rotator algorithm that converges faster by performing radix-2,4 and 16 CORDIC iterations while maintaining the scale factor implicitly constant. A mixed-radix is used to achieve convergence faster to reduce the computational latency of the CORDIC algorithm. The main concern of the higher radix CORDIC algorithm is the compensation of a variable scale factor. To solve this problem, the Taylor series approximation of sine and cosine is proposed for a higher radix CORDIC algorithm to achieve the scaling-free rotation of the two-dimensional vector. The scaling-free rotation of the proposed CORDIC algorithm removes the read-only memory (ROM) needed to store scale factor of higher radix CORDIC algorithm. Further, the proposed CORDIC algorithm is designed in rotation mode and optimized by removing the Z datapath for the digital signal processing (DSP) applications for which the angle of rotation is known in advance. Finally, the multipath delay commutator (MDC) fast Fourier transform (FFT) algorithm is implemented with the proposed CORDIC algorithm based rotator on FPGA. The proposed design is compared with existing designs. In a comparison between the radix-16 CORDIC rotator based FFT implementation and our proposed implementation, it has been found out that implementation proposed in this article has used 17% fewer resources.  相似文献   

3.
An efficient implementation of discrete cosine transform (DCT) computations are presented based on the so-called shifted discrete Fourier transform (SDFT), a generalization of the conventional DFT (DFT). Due to the simple form of the factorized matrices, the derived architecture can be easily constructed from the cascade of only two types of parameterized hardware modules: butterfly operators and rotators. The butterfly operator performs the conventional butterfly shuffling and addition/subtraction. The rotator that performs plane rotations of two-dimensional (2-D) vectors is designed using carry-save-adder (CSA)-based unfolded pipelined CORDIC architecture where the rotation angles can be approximated with different accuracies using a sequence of bipolar signs. The proposed one-dimensional and 2-D DCT implementations composed of the above two types of parameterized modules can be used as flexible and reusable Silicon Intellectual Property (SIP) for the DCT computation unit to be embedded in system-on-a-chip (SoC) design. The proposed implementations have many features and advantages, including SIP reusability, low complexity, high-throughput, regularity, scalability (easy extension of transform length), and flexibility (approximated DCT with various accuracies).  相似文献   

4.
This article presents a low hardware complexity for exponent calculations based on CORDIC. The proposed CORDIC algorithm is designed to overcome major drawbacks (scale-factor compensation, low range of convergence and optimal selection of micro-rotations) of the conventional CORDIC in hyperbolic mode of operation. The micro-rotations are identified using leading-one bit detection with uni-direction rotations to eliminate redundant iterations and improve throughput. The efficiency and performance of the processor are independent of the probability of rotation angles being known prior to implementation. The eight-staged pipelined architecture implementation requires an 8?×?N ROM in the pre-processing unit for storing the initial coordinate values; it no longer requires the ROM for storing the elementary angles. It provides an area-time efficient design for VLSI implementation for calculating exponents in activation functions and Gaussain Potential Functions (GPF) in neural networks. The proposed CORDIC processor requires 32.68% less adders and 72.23% less registers compared to that of the conventional design. The proposed design when implemented on Virtex 2P (2vp50ff1148-6) device, dissipates 55.58% less power and has 45.09% less total gate count and 16.91% less delay as compared to Xilinx CORDIC Core. The detailed algorithm design along with FPGA implementation and area and time complexities is presented.  相似文献   

5.
This paper presents a novel modified Coordinate Rotation Digital Computer (CORDIC) architecture that computes values of sine and cosine in a single cycle. The proposed method utilises angle-recoding technique to design a modified CORDIC algorithm. Multiple iterations are merged in the modified algorithm using memory storage for initial iterations and employing inverse recoding to generate constant multiplication factors for the remaining iterations. Scale factor of the algorithm remains constant, as these factors are independent of intermediate directions of rotation. In addition, the architecture is mapped onto a single CORDIC computation element that requires only a single cycle to compute the result. These multiplications are implemented using dedicated hardware multipliers in Field Programmable Gate Arrays and customised fixed-point multiplication techniques for Application Specific Integrated Circuits. Implementation results show that the proposed IS-CORDIC architecture is 7.9 times more efficient than basic CORDIC and has reduced area-delay product than current state of the art implementations.  相似文献   

6.
《电子学报:英文版》2016,(6):1063-1070
Fast Fourier transform (FFT) accelerator and Coordinate rotation digital computer (CORDIC) algorithm play important roles in signal processing.We propose a conflgurable floating-point FFT accelerator based on CORDIC rotation,in which twiddle direction prediction is presented to reduce hardware cost and twiddle angles are generated in real time to save memory.To finish CORDIC rotation efficiently,a novel approach in which segmentedparallel iteration and compress iteration based on CSA are presented and redundant CORDIC is used to reduce the latency of each iteration.To prove the efficiency of our FFT accelerator,four FFT accelerators are prototyped into a FPGA chip to perform a batch-FFT.Experimental results show that our structure,which is composed of four butterfly units and finishes FFT with the size ranging from 64 to 8192 points,occupies 33230(3%) REGs and 143006(30%)LUTs.The clock frequency can reach 122MHz.The resources of double-precision FFT is only about 2.5 times of single-precision while the theoretical value is 4.What's more,only 13331 cycles are required to implement 8192-points double-precision FFT with four butterfly units in parallel.  相似文献   

7.
给出了一种用于 OFDM 系统中可同时完成帧起始位置估计和载波频偏估计/补偿的同步算法及其 FPGA 实现方法。采用量化的方法对该不需要额外同步序列的估计算法进行了改进,提高了载波频偏的估计精度,并在数字域进行了频偏校正。在硬件实现上,利用 CORDIC 算法能够有效计算三角函数的特点,把频偏估计和频偏校正过程中的反正切运算和角度旋转操作转化成加法和移位操作,实现了计算精度、运算速度和硬件资源利用三方面性能的兼顾。  相似文献   

8.
固定角度旋转的CORDIC(Coordinate Rotation Digital Computer)算法已经广泛的应用于高速数字信号处理、图像处理、机器人学等领域.针对固定角度旋转CORDIC算法在相位旋转过程中,存在数据吞吐率较高、占用硬件资源较多且资源消耗量大等缺点,提出了利用混合CORDIC算法,将角度旋转分为单向角度旋转和一次角度估计旋转两部分.本文根据欠阻尼理论,将固定角度旋转采用单向旋转CORDIC算法实现,减少了流水线的级数和迭代符号位的判决,然后通过对角度估计旋转的二进制表示,修正常数因子,再根据角度映射关系进行相关处理,完成高速高精度坐标旋转.最后在硬件平台上进行了仿真实验.实验结果表明,在误差范围一定的前提下,混合算法进一步的减少了迭代次数,并且资源消耗较低,提高了数据吞吐率.  相似文献   

9.
The quantization effects of the CORDIC algorithm   总被引:7,自引:0,他引:7  
A detailed analysis of the quantization error encountered in the CORDIC (coordinate rotation digital computer) algorithm is presented. Two types of quantization error are examined: an approximation error due to the quantized representation of rotation angles, and a rounding error due to the finite precision representation in both fixed-point and floating-point arithmetic. Tight error bounds for these two types of error are derived. The rounding error due to a scaling (normalization) operation in the CORDIC algorithm is also discussed. An expression for overall quantization error is derived, and several simulation examples are presented  相似文献   

10.
The fast Fourier transform (FFT) is an algorithm widely used to compute the discrete Fourier transform (DFT) in real-time digital signal processing. High-performance with fewer resources is highly desirable for any real-time application. Our proposed work presents the implementation of the radix-2 decimation-in-frequency (R2DIF) FFT algorithm based on the modified feed-forward double-path delay commutator (DDC) architecture on FPGA device. Need for a complex multiplier to carry out the multiplication of complex twiddle factors and large memory to store the twiddle factors are the main concerns for FFT implementation. Propose work aims to address these issues. In this work, a high-performance radix-16 COordinate Rotational DIgital Computer (CORDIC) algorithm based rotator is proposed to carry out the complex twiddle factor multiplication. Further, CORDIC needs only rotational angles to carry out complex multiplication, which reduces the need for large memory to store the twiddle factors. To compute the total rotation for n-bit precision, our proposed radix-16 CORDIC algorithm takes n/4 iteration as compared to n iteration of the radix-2 CORDIC algorithm. Our proposed architecture of the radix-2 decimation-in-frequency (R2DIF) algorithm is implemented on a Virtex−7 series FPGA. Further, the detailed comparison is presented between our proposed FFT implementation and other recently proposed FFT implementations. Experimental results suggest that proposed implementation has less latency and hardware utilization as compared to recently proposed implementations.  相似文献   

11.
This investigation proposes a novel radix-42 algorithm with the low computational complexity of a radix-16 algorithm but the lower hardware requirement of a radix-4 algorithm. The proposed pipeline radix-42 single delay feedback path (R42SDF) architecture adopts a multiplierless radix-4 butterfly structure, based on the specific linear mapping of common factor algorithm (CFA), to support both 256-point fast Fourier transform/inverse fast Fourier transform (FFT/IFFT) and 8times8 2D discrete cosine transform (DCT) modes following with the high efficient feedback shift registers architecture. The segment shift register (SSR) and overturn shift register (OSR) structure are adopted to minimize the register cost for the input re-ordering and post computation operations in the 8times8 2D DCT mode, respectively. Moreover, the retrenched constant multiplier and eight-folded complex multiplier structures are adopted to decrease the multiplier cost and the coefficient ROM size with the complex conjugate symmetry rule and subexpression elimination technology. To further decrease the chip cost, a finite wordlength analysis is provided to indicate that the proposed architecture only requires a 13-bit internal wordlength to achieve 40-dB signal-to-noise ratio (SNR) performance in 256-point FFT/IFFT modes and high digital video (DV) compression quality in 8 times 8 2D DCT mode. The comprehensive comparison results indicate that the proposed cost effective reconfigurable design has the smallest hardware requirement and largest hardware utilization among the tested architectures for the FFT/IFFT computation, and thus has the highest cost efficiency. The derivation and chip implementation results show that the proposed pipeline 256-point FFT/IFFT/2D DCT triple-mode chip consumes 22.37 mW at 100 MHz at 1.2-V supply voltage in TSMC 0.13-mum CMOS process, which is very appropriate for the RSoCs IP of next-generation handheld devices.  相似文献   

12.
介绍了一种支持MPEG2压缩协议,应用于ARM9内核、高速低功耗的二维DCT协处理设计研究.该协处理器利用行列分解法,并行优化实现二维DCT数据结构,明显提高了8×8数据块的处理速度.与此同时,应用改进的CORDIC算法——移位代替乘法并优化移位算法实现一维DCT.仿真结果表明,对于此种一维DCT算法硬件实现,在符合MPEG2精度和ARM9数据传输频率的前提下比文献[2]速度提高了30%,面积却减少了50%.这种协处理器可以在移动多媒体设备的编解码模块中得到广泛应用.  相似文献   

13.
在视频信号的编解码流程中,离散余弦变换(DCT)是一个至关重要的环节,其决定了视频压缩的质量和效率。针对88尺寸的2维离散余弦变换,该文提出一种基于粗粒度可重构阵列结构(Coarse-Grained Reconfigurable Array, CGRA)的硬件电路结构。利用粗粒度可重构阵列的可重配置的特性,实现在单一平台支持多个视频压缩编码标准的88 2维离散余弦变换。实验结果显示,这种结构每个时钟周期可以并行处理8个像素,吞吐率最高可达1.157109像素/s。与已有结构相比,设计效率和功耗效率最高可分别提升4.33倍和12.3倍,并能够以最高30帧/s的帧率解码尺寸为40962048,格式为4:2:0的视频序列。  相似文献   

14.
陈雷  王涛  田晓燕  张锁良 《电讯技术》2021,61(11):1404-1410
针对传统反馈环路补偿速度无法跟上目前混合跳扩频通信系统误差变化速度导致同步失败的问题,提出一种基于坐标旋转数字算法(Coordinate Rotation Digital Computer,CORDIC)的跟踪同步方法.该方法将系统载波同步误差和定时同步误差映射到相位上,利用二维旋转的方式对误差进行补偿,并根据二进制相移键控(Binary Phase Shift Keying,BPSK)调制的性质,确定最佳旋转位和补偿误差,实现系统同步.实验证明,此方法成功让具有跳频速率20000 hop/s、跳频带宽327.52 MHz的混合跳扩频通信系统实现了稳定同步,补偿性能相比于传统补偿技术具有明显优势.  相似文献   

15.
In this work we extend the radix-4 CORDIC algorithm to the vectoring mode (the radix-4 CORDIC algorithm was proposed recently by the authors for the rotation mode). The extension to the vectoring mode is not straightforward, since the digit selection function is more complex in the vectoring case than in the rotation case; as in the rotation mode, the scale factor is not constant. Although the radix-4 CORDIC algorithm in vectoring mode has a similar recurrence as the radix-4 division algorithm, there are specific issues concerning the vectoring algorithm that demand dedicated study. We present the digit selection for nonredundant and redundant arithmetic (following two different approaches: arithmetic comparisons and table look-up), the computation and compensation of the scale factor, and the implementation of the algorithm (with both types of digit selection) in a word-serial architecture. When compared with conventional radix-2 (redundant and non-redundant) architectures, the radix-4 algorithms present a significant speed up for angle calculation. For the computation of the magnitude the speed up is very slight, due to the nonconstant scale factor in the radix-4 algorithm.  相似文献   

16.
This paper focuses on developing an area efficient hyperbolic Coordinate Rotation Digital Computer (CORDIC) algorithm with performance improvement. The algorithm eliminates the need of scale factor calculation in the Range of Convergence (ROC). At the same time the range of convergence offered is higher than the conventional CORDIC ROC in the hyperbolic rotation mode. Being the only kind of algorithm in hyperbolic rotation with sign sequence μ?=?1 always, one complete operation requires just 5 iterations. Thus the pipelined implementation has 5 stages which provides a 50% increase in throughput in comparison to conventional CORDIC. As far as the area improvement is considered, 16-bit processor can be realized using 56% less number of full adders required by Flat-CORDIC. The x and y datapath are based on series expansion of hyperbolic functions. The complete algorithm design along with pipelined architecture implementation is detailed.  相似文献   

17.
本文对计算反正余弦函数的CORDIC算法的迭代结构进行了改进,并在此基础上完成多模式CORDIC算法的实现.通过重新设定初始旋转向量避免了前两级迭代,通过修改向量旋转方向的判决条件对原算法的误差进行了校正,在增加了很少资源的情况下将正余弦运算和反正余弦运算统一到同样的迭代结构中并予以实现.实现结果表明改进后的算法反正余弦运算结果有更高的运算精度,在两种运算函数都需要的应用中能够有效减少的硬件资源占用.  相似文献   

18.
The authors present an efficient algorithm for the computation of the 4×4 discrete cosine transform (DCT). The algorithm is based on the decomposition of the 4×4 DCT into four 4-point 1-D DCTs. Thus, only 1-D transformations and some additions are required. It is shown that the proposed algorithm requires only 16 multiplications, which is half the number needed for the conventional row-column method. Since the 2m×2m DCT can be computed using the 4×4 DCT recursively for any m, the proposed algorithm leads to a fast algorithm for the computation of the 2-D DCT  相似文献   

19.
针对多载波系统中信道矩阵QR(正交三角矩阵)分解的延时问题,该文提出适用于MIMO-OFDM系统QR分解的分布式脉动阵列处理(Distributed Systolic Array Processing, DSAP)算法。该算法包含两种处理机制,一是交织预处理,对不同子载波信道矩阵行矢量进行分组交织处理,按照延时递增规律将每列信道矩阵元素读出并输入到脉动阵列;二是分布式脉动阵列计算,通过脉动阵列边界单元和内部单元中流水线CORDIC计算和子载波同步处理实现信道矩阵QR分解分布式处理,实现不同子载波QR分解分布于脉动阵列边界单元和内部单元中CORDIC不同级。与串行脉动阵列处理(Serial Systolic Array Processing, SSAP)算法比, DSAP算法充分利用时钟周期,分解延时约为SSAP算法的8%,有效减少数据处理延时,而复杂度几乎没有增加。  相似文献   

20.
免缩放因子双步旋转CORDIC算法   总被引:7,自引:0,他引:7       下载免费PDF全文
徐成  秦云川  李肯立  戚芳芳 《电子学报》2014,42(7):1441-1445
集成电路设计中经常使用CORDIC算法实现高效的向量旋转操作.当前对该算法的研究热点集中在减少该算法的迭代次数、扩展其收敛范围以及降低缩放因子补偿操作的代价等问题上.本文提出免缩放因子的双步旋转CORDIC算法使用双步旋转策略,减少了免缩放因子CORDIC算法的迭代次数,将收敛区间扩展到了整个圆周区间.实验结果表明,该算法保持高计算精度的同时减少了迭代次数和面积消耗.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号