首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper describes a single-cycle 64-bit integer execution ALU fabricated in 90-nm dual-Vt CMOS technology, operating at 4 GHz in the 64-bit mode with a 32-bit mode frequency of 7 GHz (measured at 1.3 V, 25/spl deg/ C). The lower- and upper-order 32-bit domains operate on separate off-chip supply voltages, enabling conditional turn-on/off of the 64-bit ALU mode operation and efficient power-performance optimization. High-speed single-rail dynamic circuit techniques and a sparse-tree semi-dynamic adder architecture enable a dense layout occupying 280 /spl times/ 260 /spl mu/m/sup 2/ while simultaneously achieving: (i) low carry-merge fan-outs and inter-stage wiring complexity; (ii) low active leakage and dynamic power consumption; (iii) high DC noise robustness with maximum low-Vt usage; (iv) single-rail dynamic-compatible ALU write-back bus; (v) simple 2/spl Phi/ 50% duty-cycle timing plan with seamless time-borrowing across phases; (vi) scalable 64-bit ALU performance up to 7 GHz measured at 2.1 V, 25/spl deg/ C; and (vii) scalable 32-bit ALU performance up to 9 GHz measured at 1.68 V, 25/spl deg/ C.  相似文献   

2.
A 135K transistor, uniformly pipelined 50-MHz CMOS 64-bit floating-point arithmetic processor chip is described. The execution unit is capable of sustaining pipelined performance of one 32-bit or 64-bit result every 20 ns for all operations except double-precision multiply (40 ns) and divide. The chip employs an exponent difference prediction scheme and a unified leading-one and sticky-bit computation logic for the addition and subtraction operations. A hardware multiplier using a radix-8 modified Booth algorithm and a divider using a radix-2 SRT algorithm are employed.<>  相似文献   

3.
This third-generation 1.1-GHz 64-bit UltraSPARC microprocessor provides 1-MB on-chip level-2 cache, 4-Gb/s off chip memory bandwidth, and a new 200 MHz JBus interface that supports one to four processors. The 87.5-million transistor chip is implemented in a seven-layer-metal copper 0.13-/spl mu/m CMOS process and dissipates 53 W at 1.3 V and 1.1 GHz.  相似文献   

4.
P.A.Semi(Santa Clara,美国加州)公司准备发布一款高性能的64位处理器,据称可以在同样性能的水平下,将功耗降低到目前产品的1/10。PA6T-1682M通过采用新的架构设计、工艺的改进和先进的时钟管理技术来实现低功耗,芯片上的门控时钟多达15000个。  相似文献   

5.
This paper describes the main features and functions of the Pentium(R) 4 processor microarchitecture. We present the front-end of the machine, including its new form of instruction cache called the trace cache, and describe the out-of-order execution engine, including a low latency double-pumped arithmetic logic unit (ALU) that runs at 4 GHz. We also discuss the memory subsystem, including the low-latency Level 1 data cache that is accessed in two clock cycles. We then describe some of the key features that contribute to the Pentium(R) 4 processor's floating-point and multimedia performance. We provide some key performance numbers for this processor, comparing it to the Pentium(R) III processor  相似文献   

6.
In this paper the ciruit and the design of an experimental 16 bits processor are described. The circuit is used in controller applications between mass storage devices and CPU of mainframes. The chip is fabricated in 2.5μ NMOS technology. This component (45 000 transistors, 35 mm2, 40 pins) handles data generated by a CAD tool for real-time control system (PIASTRE).  相似文献   

7.
A two-rank GaAs sample-and-hold (S/H) chip and four 250-MHz silicon digitizers form a 1-GHz 6-b analog-to-digital converter (ADC) system. The two rank S/H architecture avoids dynamic errors inherent to interleaved ADCs; accuracy exceeds 5.2 effective bits, up to 1-GHz input frequency. Special attention is paid to avoiding GaAs slow transient errors.  相似文献   

8.
A 4-MB L2 data cache was implemented for a 64-bit 1.6-GHz SPARC(r) RISC microprocessor. Static sense amplifiers were used in the SRAM arrays and for global data repeaters, resulting in robust and flexible timing operation. Elimination of the global clock grid over the SRAM array saves power, enabled by combining the clock information with array select signals. Redundancy was implemented flexibly, with shift circuits outside the main data array for area efficiency. The chip integrates 315 million transistors and uses an 8-metal-layer 90-nm CMOS process.  相似文献   

9.
A single-chip 80-bit floating point VLSI processor capable of performing 5.6 million floating point operations per second has been realized using 1.2-/spl mu/m n-well CMOS technology. The processor handles 80-bit double-extended floating point data conforming to IEEE standard 754. The chip has 128 microinstructions which are stored in an on-chip ROM. By programming microinstruction sequences in an external control storage, not only basic arithmetic operation but also special arithmetic functions can be performed. A composite design method supported by a hierarchical design automation system was used to quickly lay out 50K gates including a 64-/spl times/64-bit multiplier and 15 kb of memory on a chip with a die size of 10/spl times/10 mm/SUP 2/. Only 11 man-months were required for the effort.  相似文献   

10.
A general-purpose programmable digital signal processor (DSP) has been implemented in 1.5-/spl mu/m (L/SUB eff/) NMOS technology using full-custom circuit design for high performance. The DSP has a 32-bit instruction set, 32-bit data path, and full-hardware 32-bit floating-point arithmetic. The architecture is described section by section, and an overview of the instruction set is presented. The extensive design verification process applied to the DSP is also described.  相似文献   

11.
To design a 32-bit logarithmic number system (LNS) processor, this paper presents two novel techniques: Digit-Partition (DP) to design log2(1.x) function and Iterative Difference by Linear Approximation (IDLA) to design 20.x function. The basic concept behind DP is that variablex can be divided into two parts in bit representation to be implemented. Thus, ROM or PLA table can be reduced to a reasonable size and this will make a high precision design allowable. The basic idea of IDLA is that the function 20.x can be obtained approximately through iterative linear approximations. By this method, only adder, shifter and a small PLA are required, unlike the previous designs which require ROM and multiplier. The experiment results reveal that the proposed design is more attractive than the previous researches in the LNS processor.This work was supported by the National Science Council under Grant NSC 84-2215-E002-020.  相似文献   

12.
A Josephson 4-b processor with a 4-bit slice microprocessor, a 4-b multiplier, a 12-b accumulator, an 8-kb ROM, and a sequencer is described. The chip was fabricated with 1.5-μm all-niobium technology, and contains 24000 Nb/AlOx/Nb Josephson junctions. The processor was designed using a bit slice structure and a simple ripple-carry method, and it has a data sequence based on a three-stage pipeline. Experiments confirmed that the processor functions operated correctly. The critical path measurements for each stage show that the ROM has a 100-ps access time, the microprocessor can be clocked at 1.1 GHz, and the multiplier has a 200-ps multiplication time. The power dissipation of the chip was 6.1 mW  相似文献   

13.
The authors describe design and experimental results of a Josephson data processor, designed to demonstrate the possibility of a Josephson computer system with a gigahertz clock. It is a stored-program-type full processor including both a data path and a control path, and is constructed from 2066 three-junction interferometer devices on a 5*5-mm/sup 2/ die. An eight-instruction set to enable the basic operations of digital signal processing is implemented. The design rule is 2.5 mu m. The junctions were fabricated using an Nb-AlO/sub x/-Nb process. A new latchup-free DC flip-flop is used in the registers. A DC output buffer eliminates crosstalk from the AC power to the output signals. A stacked AC supply reduces the required AC current amplitude by one quarter. The power dissipation is 25 mW and minimum gate delay is 9 ps. Operation could be confirmed up to a 1.02-GHz clock frequency.<>  相似文献   

14.
A 32-bit integer execution core containing a Han-Carlson arithmetic-logic unit (ALU), an 8-entry /spl times/ 2 ALU instruction scheduler loop and a 32-entry /spl times/ 32-bit register file is described. In a 130 nm six-metal, dual-V/sub T/ CMOS technology, the 2.3 mm/sup 2/ prototype contains 160 K transistors. Measurements demonstrate capability for 5-GHz single-cycle integer execution at 25/spl deg/C. The single-ended, leakage-tolerant dynamic scheme used in the ALU and scheduler enables up to 9-wide ORs with 23% critical path speed improvement and 40% active leakage power reduction when compared to a conventional Kogge-Stone implementation. On-chip body-bias circuits provide additional performance improvement or leakage tolerance. Stack node preconditioning improves ALU performance by 10%. At 5 GHz, ALU power is 95 mW at 0.95 V and the register file consumes 172 mW at 1.37 V. The ALU performance is scalable to 6.5 GHz at 1.1 V and to 10 GHz at 1.7 V, 25/spl deg/C.  相似文献   

15.
A shared n-well layout technique is developed for the design of dual-supply-voltage logic blocks. It is demonstrated on a design of a 64-bit arithmetic logic unit (ALU) module in domino logic. The second supply voltage is used to lower the power of noncritical paths in the sparse, radix-4 64-bit carry-lookahead adder and in the loopback bus. A 3 mm/sup 2/ test chip in 0.18-/spl mu/m 1.8-V five-metal with local interconnect CMOS technology that contains six ALUs and test circuitry operates at 1.16 GHz at the nominal supply. For target delay increase of 2.8% energy savings are 25.3% using dual supplies, while for 8.3% increase in delay, 33.3% can be saved.  相似文献   

16.
The architecture, circuit design, and test results for a GaAs 8-b slice processor IC are presented. The device is a high-speed cascadable element intended for use in MIL-STD-1750A computers, reduced-instruction-set computer (RISC) systems, signal processors, and numerous other applications where high speed and radiation hardness are required. The bus-oriented architecture features a 31-word×8-b two-port register file, a fast eight-function ALU, an 8-bit address port an 8-b bidirectional data port, and associated shifting, decoding, and multiplexing functions. Ancillary logic commonly mechanized in external hardware has been included on-chip. The 9400-transistor LSI device demonstrated peak performance above 150 million operations per second (MOPS) at 9.2 W; a lower power version executes 100 MOPS at 4.2 W  相似文献   

17.
The authors present a 3-V dual-modulus (÷64/65, ÷128/129) prescaler that operates up to 1.0 GHz with a 3-mW (VCC at 2.58 V) power consumption. Under the normal supply voltage of 3 V, the maximum operating frequency and power dissipation are 1.18 GHz and 5 mW, respectively. This has been achieved by accurate circuit simulation and by the use of a 0.2-μm bipolar technology  相似文献   

18.
A 1.3-GHz fifth-generation SPARC64 microprocessor   总被引:1,自引:0,他引:1  
A fifth-generation SPARC64 processor is fabricated in 130-nm partially depleted silicon-on-insulator CMOS with eight layers of Cu metallization. At V/sub dd/ = 1.2 V and T/sub a/ = 25/spl deg/C, it runs at 1.3 GHz and dissipates 34.7 W. The chip contains 191 M transistors with 19 M logic circuits in an area of 18.14 mm /spl times/ 15.99 mm and is covered with 5858 bumps, of which 269 are for I/O signals. It is mounted in a 1360-pin land-grid-array package. The 16-byte-wide system bus operates with a 260-MHz clock in single-data-rate or double-data-rate modes. This processor implements an error-detection mechanism for execution units and data path logic circuits in addition to on-chip arrays to detect data corruption. Intermittent errors detected in execution units and data paths are recovered via instruction retry. A soft barrier clocking scheme allows amortization of the clock skew and jitter over multiple cycles and helps to achieve high clock frequency. Tunability of the clock timing makes timing closure easier. A relatively small amount of custom circuit design and the use of mostly static circuits contributes to achieve short development time.  相似文献   

19.
A high-yield, FET gate fabrication technology is described. The main advantage of this processing approach is that it permits fabrication of devices with gate lengths of less than 0.5 μm using standard optical photolithography without recourse to deep UV or electron-beam lithography. The process is simple and easy to implement in a manufacturing environment. Exceptionally good gate-length control, typically 10% for a 0.4-μm-long gate, is demonstrated. Yield of a 300-μm-wide FET, designed for use in a gain block and in a switch, is found to be 89% on average. Data on wafer-to-wafer and on-wafer variations in device DC and RF parameters and equivalent circuit values are presented. Typical standard deviations are in the 5-10% range. This process technology has been used to fabricate a 17.5-GHz, 3-b phase-shift receive monolithic microwave integrated circuit (MMIC) of moderately high complexity. Statistics of RF data on 704 such devices, fabricated over a period of two years, are presented. It is shown that such MMICs can be fabricated with yields sufficient for prototype active phased-array antenna applications  相似文献   

20.
Datapaths for media signal processing are typically built using programmable computational elements such as adders and multipliers, which can be run-time reconfigured to operate on simple integers with 8, 16, or 32 bits of precision. In this brief, a new high-speed energy-efficient reconfigurable adder for media signal processing is presented. The proposed circuit is based on carry-propagation schemes and can be partitioned to perform one 64-, two 32-, four 16-, and eight 8-bit additions. When the Austria Mikro System (AMS) 0.35 /spl mu/m 2-poly 3-metal 3.3 V CMOS (CSD) process is used to produce layout, a worst propagation delay of about 4.9 ns and an average energy dissipation of about 181 /spl mu/W/MHz are obtained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号