首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 109 毫秒
1.
JPEG2000并行阵列式小波滤波器的VLSI结构设计   总被引:2,自引:0,他引:2       下载免费PDF全文
兰旭光  郑南宁  梅魁志  刘跃虎 《电子学报》2004,32(11):1806-1809
提出一种基于提升算法实现JPEG2000编码系统中的二维离散小波变换(Discrete Wavelet Transform)的并行阵列式的VLSI结构设计方法.利用该方法所得结构由两个行处理器,一个列处理器以及少量行缓存组成;行列处理器内部是由并行阵列式的处理单元组成;能使行和列滤波器同时进行滤波,用优化的移位加操作替代乘法操作.整个结构采用流水线的设计方法处理,在保证同样的精度下,大大减少了运算量和提高了硬件资源利用率,几乎达到100%,加快了变换速度,也减少了电路的规模.该结构对于N×N大小的图像,处理速度达到O(N2/2)个时钟周期.二维离散小波滤波器结构已经过FPGA验证,并可作为单独的IP核应用于正在开发的JPEG2000图像编解码芯片中.  相似文献   

2.
流水线结构FFT/IFFT处理器的设计与实现   总被引:1,自引:0,他引:1  
针对实时高速信号处理的要求,设计并实现了一种高效的FFT处理器。在分析了FFT算法的复杂度和硬件实现结构的基础上,处理器采用了按频率抽取的基—4算法,分级流水线以及定点运算结构。可以根据要求设置成4P点的FFT或IFFT。处理器可以对多个输入序列进行连续的FFT运算,消除了数据的输入输出对延时的影响。平均每完成一次N点FFT运算仅需要Ⅳ个时钟周期。整个设计基于Verilog HDL语言进行模块化设计。并在Altera公司的Cyclone Ⅱ器件上实现。  相似文献   

3.
谢靖  陈侃松  王德志  蒋碧波 《微电子学》2015,45(6):743-746, 750
提出了一种新型快速自动频率校准技术,应用于宽带频率综合器的频带搜索和频率锁定过程。该自动频率校准模块通过直接控制频率综合器中压控振荡器(VCO)的开关电容阵列的闭合状态来调节VCO的振荡频率,实现快速锁定输出频率的目的。这种自校准技术由纯数字电路实现,校准过程只需5个时钟周期即可完成,时钟信号直接使用外部输入的参考时钟,具有算法简单、所需时钟周期少的优点。电路采用SMIC 0.18 μm CMOS工艺进行设计和验证,相比以往的校准技术,其校准时间明显减少。  相似文献   

4.
32位浮点阵列乘法器的设计及算法比较   总被引:8,自引:0,他引:8  
讨论了乘法器用于补码运算的几种算法。通过比较,发现改进型Booth算法是较为理想的算法。该算法在不考虑乘数和被乘数符号的情况下,都可以用统一的步骤来完成乘法运算,而立无需对乘积作任何修王,这极大地提高了乘法器的运算速度。结合改进型Booth算法,设计了一个高性能32位浮点阵列乘法器,它能在单个时钟周期内完成一次24位整数乘或32住浮点乘。该乘法器适于VLSI实现,巳被应用于DSP芯片设计之中。  相似文献   

5.
针对ARM并行阵列机结构,提出了与之相适应的通信结构,采用4个路由器完成16个处理器内核之间的通信,有效地节约了面积.该路由器采用基于数据包交换的片上网络通信方式,内部运用缓存机制、经典的XY路由算法和专用的仲裁策略再加入数据多播,且处理器选用低功耗、高性能的ARM内核,通过采用以上机制能够有效降低数据传播延迟和功耗.实验结果表明采用该方案设计的路由器时钟频率最高可达406.009 MHz,能够满足该ARM阵列机对于通信速率的要求.  相似文献   

6.
针对现场可编程门阵列(FPGA)实现图像中值滤波处理时,面临着提高FPGA运行时钟频率和优化硬件资源相冲突的问题,提出一种时序优化中值滤波算法。该算法先通过二输入比较器级联模块代替三输入比较器模块,实现数据多拍处理,减少算法的硬件时序迟滞,提高算法在FPGA上的运行时钟频率。接着使用极值比较器模块对算法的并行运算流程进行优化,节省硬件资源,缩短算法耗时。仿真结果表明:对3?3滤波器,算法8个时钟周期后输出首个中值,后续每个时钟周期输出1 个中值,理论稳定运行的最高时钟频率为231.2MHz。  相似文献   

7.
高速除法器设计及ASIC实现   总被引:3,自引:0,他引:3  
为提高除法计算的速度,提出了新的基-16算法的高速除法器算法,并以专用集成电路设计方法实现。与MIPS处理器中使用的除法器相比,电路最大延迟减少了27%,计算所需时钟周期数减少了68%,速度性能改善了77%左右。给出了电路的其他性能指标。该电路适用于对运算速度要求非常高的场合。  相似文献   

8.
提出了一种基于Simple Sealar和SystemC的异构异步多核仿真器,不同运行频率的内核之间采用共享存储区实现通信及数据共享。实验结果表明该仿真器能够在时钟周期级正确模拟异构多核处理器的运行情况,并准确评估异构多核处理器的性能。该仿真器在异构多核系统的软硬件协同设计方面将有较好的应用前景。  相似文献   

9.
首先证明了DTMB标准中采用的BCH码是纠错能力为1的循环汉明码,并基于此提出了适用于该BCH码的译码算法,及其串行和并行两种FPGA实现电路。考虑到该BCH码缩短码的特性,通过修改差错检测电路,使其译码时延缩短34%。实现结果表明,译码器译码正确无误,FPGA资源占用极少。串行译码器总时延为762个时钟周期,最大工作时钟频率可达357MHz。并行译码器总时延仅为77个时钟周期,最大工作时钟频率可达276MHz。  相似文献   

10.
针对自旋弹提出了一种新的姿态估计方法。首先,根据自旋弹的性质设计了一种特殊的太阳矢量测量装置,用来测量太阳矢量并替代常用的MEMS中的加速度计和地磁计来修正陀螺仪偏差。该太阳矢量测量装置包括透明外壳、光电池阵列、实时时钟和处理器,太阳透过透明外壳照射光电池阵列,该阵列将太阳能转换成电动势信号,并结合实时时钟、通过处理器处理得到该时刻的太阳矢量。然后,通过四元数微分方程建立卡尔曼滤波状态方程和观测方程对自旋弹进行姿态估计。最后,仿真结果表明该方法能够得到较高精度的自旋弹姿态角。  相似文献   

11.
A speech recognition processor CMOS LSI was developed as the processing element (PE) of a ring array processor previously proposed by the authors as architecture to carry out highly parallel recognition processing with array size flexibility. There are three key features for the LSI: (1) a highly parallel I/O structure of triple buffer with cyclical-mode transition control methods to solve the serious problem of inter-PE data transfer overhead versus the array processing; (2) a control structure with two direct memory access (DMA) controllers to realize inter-PE data I/O processing and intra-PE processing in parallel; and (3) a pipelined recognition processing at a high execution rate realized by a pipelined structure and a balanced clock distribution design technique. These effective designs for the PE LSI allow high-speed recognition processing without any inter-PE data transfer overhead in the ring array processor. Combining the PE-LSI architecture with the proposed array architecture for highly parallel dynamic time warping (DTW) processing, a real-time continuous speech recognition system based on continuous dynamic programming matching using the SPLIT method for a 1000-word vocabulary, can be constructed using a ring array processor consisting of 30 PEs  相似文献   

12.
基于FPGA硬件技术,以空间换时间的思路,提出了一种并行全比较的排序算法。该算法通过对数据的并行全比较,计算出每个数据在排序中的位置实现数据排序。该算法可在4个时钟周期内实现数字序列的排序,通过实验证明,实时性好,通用性强。  相似文献   

13.
We present a parallel algorithm, architecture, and implementation for efficient Lempel-Ziv (LZ)-based data compression. The parallel algorithm exhibits a scalable, parameterized, and regular structure and is well suited for VLSI array implementation. Based on our parallel algorithm and systematic design methodologies, two semisystolic array architectures have been developed which are low power and area efficient. The first architecture trades off the compression speed for the area and has a low run-time overhead for multichannel compression. The second architecture achieves a high compression rate (one data symbol per clock) at the expense of the area due to a large clock load and global wiring. Compared to a recent state-of-the-art parallel architecture, our first array structure requires significantly less chip area (≃330 k versus ≃36 k transistors) and more than an order of magnitude less power (≈1.0 W versus ≈70 mW) while still providing the compression speed required for most data communication applications. Hence, data compression can be adopted in portable data communication as well as wireless local area networks. The second architecture has at least three times less area and power while providing the same constant compression rate. To demonstrate the correctness of our design, a prototype module for the first architecture has been implemented using 1.2 μ complementary metal-oxide-semiconductor (CMOS) technology. The compression module contains 32 simple and identical processors, has an average compression rate of 12.5 million bytes/s, and consumes 18.34 mW without the dictionary (≈70 mW with a 4.1k SRAM for the dictionary) while operating at a 100 MHz clock rate (simulated)  相似文献   

14.
We revise Montgomery's algorithm such that modular multiplication can be executed two times faster. Each iteration in our algorithm requires only one addition, while that in Montgomery's requires two additions. We then propose a cellular array to implement modular exponentiation for the Rivest-Shamir-Adleman cryptosystem. It has approximately 2n cells, where n is the word length. The cell contains one full-adder and some controlling logic. The time to calculate a modular exponentiation is about 2n2 clock cycles. The proposed architecture has a data rate of 100 kb/s for 512-b words and a 100 MHz clock  相似文献   

15.
This paper presents an ultra-high-speed sorter based upon a simplified parallel sorting algorithm using a binary neural network which consists both of binary neurons and of AND-OR synaptic connections to solve sorting problems at two and only two clock cycles. Our simplified algorithm is based on the super parallel sorting algorithm proposed by Takefuji and Lee. Nevertheless, our algorithm does not need any adders, while Takefuji's algorithm needs n×(n–1) analog adders of which each has multiple input ports. For an example of the simplified parallel sorter, a hardware design and its implementation will be introduced in this paper, which performs a sorting operation at two clock cycles. Both results of a logic circuit simulation and of an algorithm simulation show the justice of our hardware implementation even if in the practical size of the problem.  相似文献   

16.
基于二进制多字Montgomery模乘算法,提出了一种参数可灵活配置的规则的脉动阵列硬件结构,并使用此结构在FPGA上实现了不同位宽的Montgomery模乘算法.该结构成功地在不增加额外电路或运行周期的情况下,将脉动阵列的关键路径限制在运算单元内部的加法器中.硬件实现结果表明,该结构具有更高的电路频率、更少的电路面积消耗及算法运算时间.  相似文献   

17.
Reed-Solomon codes are powerful error-correcting codes that can be found in many digital communications standards. Recently, there has been an interest in soft-decision decoding of Reed-Solomon codes, incorporating reliability information from the channel into the decoding process. The Koetter-Vardy algorithm is a soft-decision decoding algorithm for Reed-Solomon codes which can provide several dB of gain over traditional hard-decision decoders. The algorithm consists of a soft-decision front end to the interpolation-based Guruswami-Sudan list decoder. The main computational task in the algorithm is a weighted interpolation of a bivariate polynomial. We propose a parallel architecture for the hardware implementation of bivariate interpolation for soft-decision decoding. The key feature is the embedding of both a binary tree and a linear array into a 2-D array processor, enabling fast polynomial evaluation operations. An field-programmable gate array interpolation processor was implemented and demonstrated at a clock frequency of 23 MHz, corresponding to decoding rates of 10-15 Mb/s  相似文献   

18.
This paper presents the performance of a decision feedback adaptive array based on the interarray correlation-neglecting (ICN) algorithm that is suitable for high-speed mobile-communication systems. Although the ICN algorithm is regarded as a means of complexity reduction from the recursive least-square (RLS) algorithm, we present that the ICN algorithm has a superior initial acquisition performance close to the RLS algorithm, theoretically. The requirements for stable convergence are analytically revealed. Moreover, the performance is confirmed by Monte Carlo simulation. Besides, since the decision feedback adaptive array based on the ICN algorithm can be easily implemented with parallel processing, this array can reduce run-time cycles needed for one symbol. Therefore, this array can be applied to high-speed radio communication systems. In addition to this, the low operational complexity of this array makes it easy to apply this for high-speed mobile-communication systems where low complexity is basically required.  相似文献   

19.
in this paper, simple 1-D and 2-D systolic array for realizing the discrete cosine transform (DCT) based on the discrete Fourier transform (DFT) fo an input sequence are presented. The proposed arrays are obtained by a simple modified DFT (MDFT) and an inverse DFT (IDFT) version of the Goertzel algorithm combined with Kung's approach. The 1-D array requiresN cells, one multiplier and takesN clock cycles to produce a completeN-point DCT. The 2-D array takes N clock cycles, faster than the 1-D array, but the area complexity is larger. A continuous flow of input data is allowed and no idle time is required between the input sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号