首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
针对数字图像本身存在的特点,提出了一种基于FPGA的浮点运算方法.该方法根据数字图像中像素点的坐标值和灰度值均为正整数的特点,利用FPGA中较易实现的定点乘法、加减运算和移位操作来实现浮点运算,这种浮点运算方法能够克服传统的浮点运算结构复杂,延时长,难以保证结果的实时性等严重不足。该算法已成功应用于以XC2S200-5PQ208为核心处理器的实时图像消旋系统中,并用ModelSim SE仿真软件进行仿真。实验结果表明,该算法原理简单,速度快,精度可调,适于实时图像处理。  相似文献   

2.
提出一种浮点型数字信号处理器(DSP)硬核结构,在兼容定点数运算的同时,也为浮点数运算提供较好支持。目前各大现场可编程门阵列(FPGA)主流厂商在实现浮点数运算功能时均采用软核实现方式,即将浮点数运算算法映射到芯片上,通过逻辑资源和DSP模块实现。相比于传统方法,提出的硬核结构在不占用FPGA中其他逻辑资源情况下,仅利用DSP模块便能完成浮点数运算。设计中,充分考虑负载和时延影响,插入多级流水线,显著提高浮点数的计算效率。采用中芯国际(MCI)28 nm工艺设计并完成所提出的浮点型DSP硬核结构。仿真结果表明,所提出的硬核结构的单个浮点数加法和乘法效率为0.4 Gflops。  相似文献   

3.
This paper provides a personal perspective on developments in the implementation of two systolic fast Fourier transform processors over the last 25 years and identifies some of the lessons learned. This has been a period of tremendous advancements in integrated circuit technology that is demonstrated by the resulting processors. The first processor is the Modular Transform Processor that was developed at TRW in the 1982–1984 time frame using VLSI technology. It is a set of six large circuit boards that computes 4,096-point fast Fourier transforms using 22-bit floating-point arithmetic at sustained data rates of 40 MSPS. The second processor is a single ASIC chip systolic FFT processor developed by the Mayo Foundation in the 2001–2002 time frame that computes 4,096-point FFTs using 16-bit fixed-point arithmetic at sustained data rates of 200 MSPS. Some thoughts on the future directions of systolic FFT processor development are offered. Future systems will compute large transforms (e.g., 16 K-point to 1 M-point) at high data rates (e.g., 500 MSPS to 1 GSPS), will employ more precise arithmetic (e.g., 32-bit single precision IEEE Standard floating-point arithmetic), will consume very low power (e.g., on the order of one watt) and will be realized on a single chip.  相似文献   

4.
The authors describe the design of a custom integrated circuit for the arithmetic operation of division. The chip uses self-timing to avoid the need for high-speed clocks and directly concatenates precharged function blocks without latches. Internal stages form a ring that cycles without any external signaling. The self-timed control introduces no serial overhead, making the total chip latency equal just the combinational logic delays of the data elements. The ring's data path uses embedded completion encoding and generates the mantissa of a 54-b (floating-point IEEE double-precision) result. Fabricated in 1.2-μm CMOS, the ring occupies 7 mm2 and generates a quotient and done indication in 45 to 160 ns, depending on the particular data operands  相似文献   

5.
A high-performance data execution unit suitable for computation-intensive digital signal processing systems is described. This unit uses the hybrid number system approach to speed up the basic arithmetic operations while remaining compatible with a standard IEEE 32-b floating-point format. However, all the arithmetic operations are performed in the 32 b logarithmic number system (LNS) domain. This chip is designed using a 3.4 V 0.8 μm CMOS technology with double-layer metallization. Conversion algorithms, chip architecture, design methodology, and major circuit components are discussed. A macrocell design methodology is adopted in order to achieve high-performance custom design circuits with the convenience of an automatic layout system. Computer simulations indicate that all the 32 b floating-point arithmetic operations (multiplication, division, squaring, and square root) can be executed in 10 ns. Extension of this unit into a 64 b double-precision floating-point system and multiply-accumulation applications are also presented  相似文献   

6.
A 440000-transistor second-generation RISC (reduced instruction set computer) floating-point chip is described. The pipeline latency is only two cycles, and a double-precision result is produced every cycle. System throughput and accuracy are increased by using a floating-point multiply-add-fused unit, which carries out a double-precision accumulate as a two-cycle pipelined execution with only one rounding error. While the cycle time (40 ns) is competitive with other CMOS RISC systems, the floating-point performance stretches to the range of bipolar RISC systems (7.4-13 MFLOPS LINPACK). Leading zero anticipation makes the two-cycle pipeline possible by nearly eliminating the additional postnormalization time, and it allows for reduced overall system latency. Partial decode shifters allow complete time sharing for the multiply and data alignment. Improved design techniques for logarithmic addition and higher order counters for multiplication complete this second-generation RISC floating-point unit design  相似文献   

7.
The design of the WE32106 Math Accelerator Unit, which provides the WE32100 microprocessor with IEEE standard (Draft 10) floating-point capabilities, is described. The chip implements a host of floating-point operations in single, double, and double-extended precision, as well as the complete set of IEEE standard requirements for fault and exception handling. The chip provides a high-speed co-processor interface to the WE32100 microprocessor, as well as a general-purpose memory-mapped peripheral-mode interface to other microprocessors. The chip is implemented in 1.5 /spl mu/m twin-tub CMOS III technology.  相似文献   

8.
This paper describes a digital neural network chip for high-speed neural network servers. The chip employs single-instruction multiple-data stream (SIMD) architecture consisting of 12 floating-point processing units, a control unit, and a nonlinear function unit. At a 50 MHz clock frequency, the chip achieves a peak speed performance of 1.2 GFLOPS using 24-bit floating-point representation. Two schemes of expanding the network size enable neural tasks requiring over 1 million synapses to be executed. The average speed performances of typical neural network models are also discussed  相似文献   

9.
A 200-MHz double-data-rate synchronous-DRAM (DDR-SDRAM) was developed. The chip contains a delay-locked loop (DLL) which performs over a wide range of operating conditions. Post-mold-tuning allows precise replica programming. A 200-MHz intra-chip data bus is suitable for DDR operation  相似文献   

10.
刘少龙  李仑升  曹琳 《电子测试》2020,(8):26-27,51
本文利用TI公司TMS320F28335芯片高效的浮点运算能力,结合片上丰富的外设,设计并实现了一种具有高可靠性的智能电源控制单元。该控制单元周期性地对各片上外设进行自检维护,完成多路负载通道控制、电压、电流的实时监控,并对故障进行指示、处理和上报,同时提供人机交互界面更新状态信息。经过验证,该控制单元工作稳定,具备良好的工程应用价值。  相似文献   

11.
针对某型导弹模拟训练装置的需求,本文设计了一种基于TMS320C6713视频采集综合处理系统,通过摄像头采集视频图像,以浮点DSP芯片TMS320C6713为核心处理器,采用高速FPGA芯片XC5VS95T实现逻辑控制技术,通过千兆网口送给某型导弹模拟训练装置并显示。该视频综合处理系统易于操作,性能稳定,功耗低。  相似文献   

12.
雷元武  窦勇  倪时策  周杰 《电子学报》2012,40(9):1715-1722
本文针对科学应用中基本函数种类多、实现复杂、使用频率低的特点,提出一种定制VLIW结构四精度浮点基本函数协处理器(QPC-Processor).该结构通过显示并行技术挖掘基本函数实现算法的并行性,在同一硬件平台上通过元操作的不同组合来计算多种基本函数.同时,本文还提出基本函数元操作序列到定制VLIW指令的映射算法,指导基本函数的设计.最后,在FPGA平台上进行验证.实验结果表明,相对软件实现,单个QPC-Processor能够取得6倍以上的加速比,而且,QFC-Processor在同一硬件平台上实现多种类型的算法,弥补单一算法的不足,获得较高的硬件资源利用率.  相似文献   

13.
With a huge increase in demand for various kinds of compute-intensive applications in electronic systems, researchers have focused on coarse-grained reconfigurable architectures because of their advantages: high performance and flexibility. This paper presents FloRA, a coarse-grained reconfigurable architecture with floating-point support. A two-dimensional array of integer processing elements in FloRA is configured at run-time to perform floating-point operations as well as integer operations. Fabricated using 130 nm process, the total area overhead due to additional hardware for floating-point operations is about 7.4% compared to the previous architecture which does not support floating-point operations. The fabricated chip runs at 125 MHz clock frequency and 1.2 V power supply. Experiments show 11.6× speedup on average compared to ARM9 with a vector-floating-point unit for integer-only benchmark programs as well as programs containing floating-point operations. Compared with other similar approaches including XPP and Butter, the proposed architecture shows much higher performance for integer applications, while maintaining about half the performance of Butter for floating-point applications.  相似文献   

14.
A processor chip set with IBM/370 architecture is implemented on five CMOS VLSI chips containing 2.8 million transistors with an effective channel length of 0.5 μm. The chip set consists of the instruction and the fixed-point processor, two cache chips with 16 KB of data and instructions, and the floating-point processor. The chips are implemented in a 1.0-μm technology with three layers of metal. An automatic design system based on the sea-of-gates technique and the standard cell approach was used. The worst-case operating frequency of the chip set is 35 MHz (typically 50 MHz). Four chips of the processor are packaged on a ceramic multichip module. Level-sensitive scan design, built-in self-test, and parity check guarantee high test coverage and reliability  相似文献   

15.
丁小波 《电子科技》2015,28(4):142-145
介绍了一种基于高性能浮点DSP芯片TMS320C32、CPLD芯片XC95288和A/D采样芯片AD976组成的多路采集系统的工作原理以及设计方法。通过对第一路施加特殊的电压量,在CCS开发环境下读取采样缓冲区的值,并利用Matlab对采样数据进行了全波傅氏变换。此外,该系统已在继电保护中得到广泛应用,实践表明,该系统能较好地解决多路模拟量的采集,并确保了采样数据的安全可靠性。  相似文献   

16.
A chip architecture designed to compute a 16-point discrete Fourier transform (DFT) using S. Winograd's algorithm (1978) every 457 ns is presented. The 99500-transistor 1.2-μm chip incorporates arithmetic, control, and input/output circuitry with testability and fault detection into a 144-pin package. A throughput of 2.3×1012 gate-Hz/cm2 and 79-million multiplications/s is attained with 70-MHz pipelined bit-serial logic. Combined with similar chips computing 15- and 17-point DFTs, 4080-point DFTs can be computed every 118 μs. Using the 16- and 17-point chips, 272×272-point complex data imagery can be transformed in 4.25 ms. A 24-bit block floating-point data representation combined with an adaptive scaling algorithm delivers a numerical precision of 106 dB (17.6 bits) after computing 4080-point DFTs  相似文献   

17.
18.
A double/single-precision floating-point processor using a titanium disilicide 3.5-/spl mu/m NMOS process achieves double-precision add/subtract, multiply, and divide in 2, 8, and 16 /spl mu/s respectively. The chip has about 35K devices and is about 400 mil on the side. The chip uses a single 5-V supply with TTL-compatible levels on all signals except for the clocks, which require 4.5 V for a logic high. Four input clocks are used to generate eight 50-ns intervals. A -2.5 V substrate bias generator is designed on the chip but uses a pin for an external capacitor. The processor, which is to be used in a desktop implementation of a minicomputer, executes the floating-point instruction set for the micro-Eclipse computer.  相似文献   

19.
本文提出了一种新型混合基可重构FFT处理器,由支持基-2/3FFT的新型可重构蝶形单元和多路并行无冲突的存储器组成,实现了FFT过程中多路数据并行性和操作的连续性.本设计在TSMC28nm工艺下的最高频率为1.06GHz,同时在Xilinx的XC7V2000T FPGA芯片上搭建了混合基FFT处理器硬件测试系统.对混合基FFT处理器的FPGA硬件测试结果表明,本设计支持基-2、基-3和基-2/3混合模式FFT变换,且执行速度达到给定蝶乘器数量下的理论周期值,对单精度浮点数,混合基FFT处理器可提供10-5的结果精度.  相似文献   

20.
A 200 MHz quadrature direct digital frequency synthesizer/complex mixer (QDDFSM) chip is presented. The chip synthesizes 12 b sine and cosine waveforms with a spectral purity of -84.3 dBc. The frequency resolution is 0.047 Hz with a corresponding switching speed of 5 ns and a tuning latency of 14 clock cycles. The chip is also capable of frequency, phase, and quadrature amplitude modulation. These modulation capabilities operate up to the maximum clocking frequency. The chip provides the capability of parallel operation of multiple chips with throughputs up to 800 MHz. The 0.8 μm triple level metal N-well CMOS chip has a complexity of 52000 transistors with a core area of 2.6×6.1 mm2. Power dissipation is 2 W at 200 MHz and 5 V  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号