期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The TMS390C602A floating-point coprocessor for Sparc systems

Darley M. Kronlage B. Bural D. Churchill B. Pulling D. Wang P. Iwamoto R. Yang L. 《Micro, IEEE》1990,10(3):36-47

A recent Sparc (scalable processor architecture) processor consists of a two-chip configuration, containing the TMS390C601 integer unit (IU) and the TMS390C602A floating-point unit (FPU). The second device, an innovative coprocessor that lets the processor execute single- or double-precision floating-point instructions concurrently with IU operations is described. Dedicated floating-point hardware in the FPU increases the performance of the system. Running at clock periods as small as 20 ns, the chip should deliver 5.5 million double-precision floating-point operations per second under the Linpack benchmark (50-MHz clock rate). The FPU provides single- and double-precision arithmetic functions: addition, subtraction, multiplication, division, square root, compare, and convert. To minimize its math unit's latency, the FPU uses a highly parallel architecture requiring separate math units to optimize additions and multiplications. Traps stop the execution of a program to jump to software routine for handling data-dependent errors or to execute instructions not implemented in the hardware. Benchmark results are presented 相似文献

2.

The TMS320C30 floating-point digital signal processor

Papamichalis P. Simar R. Jr. 《Micro, IEEE》1988,8(6):13-29

The 320C30 is a fast processor with a large memory space and floating-point-arithmetic capabilities. The authors describe the 320C30 architecture in detail, discussing both the internal organization of the device and the external interfaces. They also explain the pipeline structure, addressing software-related issues and constructs, and examine the development tools and support. Finally, they present examples of applications. Some of the major features of the 320C30 are: a 60-ns cycle time that results in execution of over 16 million instructions per second (MIPS) and over 33 million floating-point operations per second (Mflops); 32-bit data buses and 24-bit address buses for a 16M-word overall memory space; dual-access, 4 K×32-bit on-chip ROM and 2 K×32-bit on-chip RAM; a 64×32-bit program cache; a 32-bit integer/40-bit floating-point multiplier and ALU; eight extended-precision registers, eight auxiliary registers, and 23 control and status registers; generally single-cycle instructions; integer, floating-point, and logical operation; two- and three-operand instructions; an on-chip DMA controller; and fabrication in 1-μm CMOS technology and packaging in a 180-pin package. These facilitate FIR (finite impulse response) and IIR (infinite impulse response) filtering, telecommunications and speech applications, and graphics and image processing applications 相似文献

3.

Design and analysis of high-speed 8-bit ALU using 18 nm FinFET technology

Shylashree N. Venkatesh B. Saurab T. M. Srinivasan Tarun Nath Vijay 《Microsystem Technologies》2019,25(6):2349-2359

All modern computational devices consist of ALU. With increase in complexity of software and the consistent shift of software towards parallelism, high speed processors with hardware support for time consuming operations such as multiplication would benefit. Smaller, compact devices such as IoT devices need to run software such as security software and be able to offload computation cost from the cloud. In this paper, a high speed 8-bit ALU using 18 nm FinFET technology is proposed. The arithmetic and logical unit consists of fast compute units such as Kogge Stone fast adder and Dadda multiplier along with basic logic gates. In this paper, an ALU with each compute unit optimized for speed is proposed, while responsibly consuming area. Dadda multiplier is of 8 × 8 architecture as opposed to conventional approach of 4 × 4 making it a true 8-bit ALU. Simulation and analysis is done using Cadence Virtuoso in Analog Design Environment. The transistor count of proposed design is 5298, the power consumption is 219 µW and maximum delay is 166.8 ps. The design is also expected to consume a maximum of one clock cycle for any computation.

相似文献

4.

Design verification of the WE 32106 math accelerator unit

Maurer P.M. 《Design & Test of Computers, IEEE》1988,5(3):11-21

相似文献

5.

X-DSP ALU与移位部件的设计与实现

彭元喜邹佳骏《计算机应用》2010,30(7):1978-1982

X型DSP是我们自主研发的一款低功耗高性能DSP。对X型DSP的CPU体系结构进行了深入研究,在详细分析X型DSP的ALU部件和移位器部件相关指令基础上,对ALU与移位器部件进行了设计与实现。采用Design Compiler综合工具,基于SMIC公司0.13um CMOS工艺库对ALU移位部件进行了逻辑综合,电路功耗共为4.2821mW,电路面积为71042.9804m2,工作频率达到250MHz。相似文献

6.

基于改进4—2压缩结构的32位浮点乘法器设计

邵磊李昆张树丹于宗光徐睿《微计算机信息》2007,23(3X):224-225,199

本文介绍一种用于高性能DSP的32位浮点乘法器设计,通过采用改进Booth编码的树状4-2压缩器结构,提高了速度,降低了功耗,该乘法器结构规则且适合于VLSI实现,单个周期内完成一次24位整数乘或者32位浮点乘。整个设计采用Verilog HDL语言结构级描述,用0.25um单元库进行逻辑综合.完成一次乘法运算时间为24.30ns. 相似文献

7.

基于改进4-2压缩结构的32位浮点乘法器设计

邵磊李昆张树丹于宗光徐睿《微计算机信息》2007,23(9)

本文介绍一种用于高性能DSP的32位浮点乘法器设计,通过采用改进Booth编码的树状4-2压缩器结构,提高了速度,降低了功耗,该乘法器结构规则且适合于VLSI实现,单个周期内完成一次24位整数乘或者32位浮点乘。整个设计采用Verilog HDL语言结构级描述,用0.25um单元库进行逻辑综合.完成一次乘法运算时间为24.30ns. 相似文献

8.

在LonWorks网络中使用浮点数据

童静吴柯王怀兴《微机发展》2005,15(2):18-20,24

Neuron C是一种专门为Neuron芯片设计的程序设计语言。它在ANSIC的基础上进行了扩展，是开发LonWorks应用的有力工具。Neuron C不直接支持ANSIC中浮点数的算术和比较运算，但是它提供了一个浮点函数库，从而允许使用符合IEEE754标准的浮点数。文中详细介绍了Neuron C中浮点数据类型的定义、浮点常量的生成方法和浮点函数库的使用。通过一个实例LonWorks网络，演示了浮点数据的使用。相似文献

9.

基于FPGA的可配置浮点向量乘法单元设计实现

黄兆伟王连明《计算机应用研究》2020,37(9):2762-2765,2771

针对目前采用IEEE 754浮点标准设计的FPGA浮点运算器中吞吐率与资源利用率低等问题,提出一种运算精度与运算器数量可配置的并行浮点向量乘法运算单元。通过浮点运算器的指数、尾数位数可配置化设计,提高系统资源利用率,并将流水线技术与并行结构结合,提高数据吞吐率。以EP4CE115型FPGA为测试平台,当配置10组FP14运算器时,系统的逻辑资源占用约为4.2%,峰值吞吐率可达4.5 GFLOPS。结果表明,提出的浮点向量乘法单元有效提高了FPGA资源利用率与运算吞吐率,同时具有高度的可移植性与通用性,适用于FPGA向量乘法运算的加速。相似文献

10.

A class of systolic multiplier units for VLSI technology

A. R. Hurson B. Shirazi 《International journal of parallel programming》1985,14(5):261-275

A high speed multiplier unit is an indispensable part of many applications such as real-time speech processing, image processing and enhancing, pattern classification, fast fourier transformation, etc. Because of its importance, the design of a fast multiplier unit has been under investigation since the 1950's. In addition, recent development in technology has motivated the design and implementation of fast multiplier units for VLSI technology. Some of these algorithms are still hard to implement due to their irregular structure with a high ratio of PINs per chip.This paper addresses a fast systolic multiplier unit suitable for VLSI technology. The system is a collection of two basic components replicated in a 2-dimensional space. Such a recursive structure offers simplicity in the design and implementation. Moreover, in comparison with the alternative models, the proposed pipeline architecture reduces the number of required ports (PINs) per chip by a factor of two. 相似文献

11.

VLIW处理器可重组乘法器单元设计

杨焱张凯《微处理机》2007,28(3):21-23

在VLIW多媒体芯片的设计过程中,针对传统乘法器与加法器的不足,提出了一种新的分叉华莱氏树结构的乘法器模型,采用可重用的模块化设计思想,通过重用一位全加器阵列对乘法器进行扩展,处理器可以在一个乘法器单元内部同时支持多个32/16/8位的乘法运算,同时使乘法单元的速度和面积均得以优化。仿真测试表明,新的乘法器结构可有效减少FFT、滤波等信号处理以及多媒体处理中常用算法的执行周期,提高了实际运行速度,进一步增强了VLIW处理器在多媒体与信号处理运算上的能力。相似文献

12.

The microprocessor today

Slater M. 《Micro, IEEE》1996,16(6):32-44

Outlines technology and business issues in today's microprocessor industry. From their humble beginnings 25 years ago, microprocessors have proliferated into an astounding range of chips, powering devices ranging from telephones to supercomputers. Today, microprocessors for personal computers get widespread attention-and have enabled Intel to become the world's largest semiconductor maker. In addition, embedded microprocessors are at the heart of a diverse range of devices that have become staples of consumers worldwide. Microprocessors have become specialized in many ways. Those for desktop computers fall into classes based on their instruction set architectures: either x86, the primary surviving complex instruction set computing (CISC) architecture, or one of the five major reduced instruction set computing (RISC) architectures-PA-RISC, Mips, Spare, Alpha, and PowerPC. Such chips typically integrate few functions other than cache memory and bus interfaces with the processor but usually include a floating-point unit and memory management unit. Embedded microprocessors, on the other hand, typically do not have floating-point or memory management units but often integrate various peripheral functions with the processor to reduce system cost. 相似文献

13.

一种64位浮点乘加器的设计与实现 总被引：2，自引：0，他引：2

靳战鹏白永强沈绪榜《计算机工程与应用》2006,42(18):95-98

乘加操作是许多科学与工程应用中的基本操作,特别是在图形加速器和DSP等应用领域,浮点乘加器有着广泛的应用。论文针对PowerPC603e微处理器系统,基于SMIC0.25μm1P5MCMOS工艺,采用正向全定制的电路及版图设计方法,设计实现了一个综合使用改进Booth算法、平衡的4-2压缩器构成的Wallace树形结构、先行进位加法器的支持IEEE-754标准的64bit浮点乘加器。相似文献

14.

Customizing floating-point units for FPGAs: Area-performance-standard trade-offs

Pedro Echeverría Marisa López-VallejoAuthor vitae 《Microprocessors and Microsystems》2011,35(6):535-546

The high integration density of current nanometer technologies allows the implementation of complex floating-point applications in a single FPGA. In this work the intrinsic complexity of floating-point operators is addressed targeting configurable devices and making design decisions providing the most suitable performance-standard compliance trade-offs. A set of floating-point libraries composed of adder/subtracter, multiplier, divisor, square root, exponential, logarithm and power function are presented. Each library has been designed taking into account special characteristics of current FPGAs, and with this purpose we have adapted the IEEE floating-point standard (software-oriented) to a custom FPGA-oriented format. Extended experimental results validate the design decisions made and prove the usefulness of reducing the format complexity. 相似文献

15.

Designing the TFP microprocessor

Hsu P.Y.-T. 《Micro, IEEE》1994,14(2):23-33

Designed to efficiently support large, real-world, floating-point-intensive applications, the TFP (short for Tremendous Floating-Point) microprocessor is a superscalar implementation of the Mips Technologies architecture. This floating-point, computation-oriented processor uses a superscalar machine organization that dispatches up to four instructions each clock cycle to two floating-point execution units, two memory load/store units, and two integer execution units. Its split-level cache structure reduces cache misses by directing integer data references to a 16-Kbyte on-chip cache, while channeling floating-point data references off chip to a 4 Mbyte cache 相似文献

16.

基于互补电阻开关的忆阻乘法器设计

李志刚陈辉刘鹏武继刚《计算机工程》2023,49(1):201-209

现有的忆阻算术逻辑多采用单个忆阻器作为存储单元,在忆阻交叉阵列中易受到漏电流以及设计逻辑电路时逻辑综合复杂度高的影响,导致当前乘法器设计中串行化加法操作的延时和面积开销增加。互补电阻开关具有可重构逻辑电路的运算速度和抑制忆阻交叉阵列中漏电流的性能,是实现忆阻算术逻辑的关键器件。提出一种弱进位依赖的忆阻乘法器。为提升忆阻器的逻辑性能,基于互补电阻开关电路结构,设计两种加法器的优化方案,简化操作步骤。在此基础上,通过改进传统的乘法实现方式,并对进位数据进行拆解,降低运算过程中进位数据之间的依赖性,实现并行化的加法运算。将设计的乘法器映射到混合CMOS/crossbar结构中,乘法计算性能得到大幅提高。在Spice仿真环境下验证所提乘法器的可行性。仿真实验结果表明,与现有的乘法器相比,所提乘法器的延时开销从O(n²)降低为线性级别,同时面积开销降低约70%。相似文献

17.

A pipelined interface for high floating-point performance with precise exceptions

Iacobovici S. 《Micro, IEEE》1988,8(3):77-87

Two options are presented that were considered for a pipelined interface between a central processing unit (CPU) and a floating-point coprocessor (FPU), along with the CPU recovery mechanisms that provide precise floating-point exceptions for each option. The first option supports parallel execution of both floating-point and integer instructions, while the second option pipelines only the execution of floating-point instructions. The use of the second option in National Semiconductor's 32532/32580 processor cluster because it offers high performance with significantly lower complexity. The 32532 microprocessor features a pipelined slave protocol that hides the CPU-FPU communication overhead for most floating-point instructions by pipelining their execution. A simple recovery mechanism implemented within the CPU maintains the precision of floating-point exceptions. As a result, the 32532 microprocessor supports very high floating point performance without sacrificing software compatibility with previous Series 32000 CPU-FPU clusters.<> 相似文献

18.

一种高性能四倍精度浮点乘加器的设计与实现

何军黄永勤朱英《计算机工程》2014,(2):294-299

高精度、高性能浮点运算部件是高性能微处理器设计的重要部分。通过对传统双精度浮点乘加运算算法的研究,结合四倍精度浮点数据格式特点,设计并实现一种高性能的四倍精度浮点乘加器(QPFMA),该乘加器支持多种浮点运算,运算延迟为7拍,全流水结构。采用双路加法器改进算法结构,优化头零预测和规格化移位逻辑,减小运算延迟和硬件开销。通过参数化设计验证方法,实现高效的正确性验证。逻辑综合结果表明,基于65 nm工艺,该QPFMA频率可达1.2 GHz,比现有的QPFMA设计运算延迟减少3拍,频率提高约11.63%。相似文献

19.

基于FPGA的阵列乘法器的设计与实现

朱世宇夏汝华甘科刘春雷陈小川《自动化与仪器仪表》2011,(4):60-61,67

先对乘法器进行了分析,然后用现场可编程门阵列(F P G A)实现了阵列乘法器,并分析了设计原理。相似文献

20.

"一卡多用"中应用管理平台的设计

吴俊军罗标《计算机工程》2005,31(23):193-195,202

“一卡多用”操作系统是智能卡发展的必然趋势，如何实现应用的安全动态下载、更新和删除便成为了其中的关键问题。该文在分析了“一卡多用”的系统体系结构的基础上，设计了新的应用下载、更新和删除的实现机制，并详细描述了应用下载单元、应用下载证书和应用删除证书所应具备的内容，最后对应用下载、删除和更新进行了安全性分析。相似文献