期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A 4.5-GHz 130-nm 32-KB L0 cache with a leakage-tolerant self reverse-bias bitline scheme

Hsu S. Alvandpour A. Mathew S. Shih-Lien Lu Krishnamurthy R.K. Borkar S. 《Solid-State Circuits, IEEE Journal of》2003,38(5):755-761

This paper describes a 32-KB two-read, one-write ported L0 cache for 4.5-GHz operation in 1.2-V 130-nm dual-V/sub TH/ CMOS technology. The local bitline uses a leakage-tolerant self reverse-bias (SRB) scheme with nMOS source-follower pullup access transistors, while preserving robust full-swing operation. Gate-source underdrive of -220 mV on the bitline read-select transistors is established without external bias voltages or gate-oxide overstress. Device-level measurements in the 130-nm technology show 72/spl times/ bitline active leakage reduction, enabling low-V/sub TH/ usage, 40% bitline keeper downsizing, and 16 bitcells/bitline. 11% faster read delay and 2/spl times/ higher dc noise robustness are achieved compared with high-performance dual-V/sub TH/ bitline scheme. Sustained performance and robustness benefits of the SRB technique against conventional dynamic bitline with scaling to 100- and 70-nm technology is also presented. 相似文献

2.

A D&T roundtable: challenges for low-power and high-performancechips

Ching-Te Chuang De V. Shih-Lien Lu Soumyanath K. Partovi H. Sakurai T. 《Design & Test of Computers, IEEE》1998,15(3):119-124

Microprocessor and other lC performance continues to improve at historic rates, with no visible end in sight for the next 10 years. However, we are starting to encounter a power wall. This is true for high-performance components as well as for low-power chips with a very limited energy budget offered by batteries. We need to find ways to manage power and energy consumption on all fronts-technology, design, and architecture-without compromising performance. Otherwise, we may face discontinuation of Moore's law for the semiconductor industry in the near future. This would be triggered not by any difficulty in the scaling of process technology but by formidable barriers posed by packaging and cooling, inefficacy of power delivery, and energy constraints dictated by battery technology, which is advancing at a very lukewarm pace 相似文献

3.

Trading Off Cache Capacity for Low-Voltage Operation

Wilkerson Chris Gao Hongliang Alameldeen Alaa R. Chishti Zeshan Khellah Muhammad Lu Shih-Lien 《Micro, IEEE》2009,29(1):96-103

Two proposed techniques let microprocessors operate at low voltages despite high memory-cell failure rates. They identify and disable defective portions of the cache at two granularities: individual words or pairs of bits. Both techniques use the entire cache during high-voltage operation while sacrificing cache capacity during low-voltage operation to reduce the minimum voltage below 500 mV. 相似文献

4.

Low voltage Manchester adder design

Shih-Lien Lu 《Electronics letters》1997,33(16):1358-1359

A low voltage dynamic Manchester adder design is presented, with a critical delay path operating at a higher voltage level. This voltage level is generated on-chip using a bootstrapping circuit. The goal of this design is to maintain the delay of its worst-case path, comparable to the design having a higher supply voltage, while operating the rest of the circuit at a lower supply voltage, thus consuming less overall power. A SPICE simulation is performed to verify the design 相似文献

5.

Coming challenges in microarchitecture and architecture 总被引：5，自引：0，他引：5

Ronen R. Mendelson A. Lai K. Shih-Lien Lu Pollack F. Shen J.P. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2001,89(3):325-340

In the past several decades, the world of computers and especially that of microprocessors has witnessed phenomenal advances. Computers have exhibited ever-increasing performance and decreasing costs, making them more affordable and in turn, accelerating additional software and hardware development that fueled this process even more. The technology that enabled this exponential growth is a combination of advancements in process technology, microarchitecture, architecture, and design and development tools. While the pace of this progress has been quite impressive over the last two decades, it has become harder and harder to keep up this pace. New process technology requires more expensive megafabs and new performance levels require larger die, higher power consumption, and enormous design and validation effort. Furthermore, as CMOS technology continues to advance, microprocessor design is exposed to a new set of challenges. In the near future, microarchitecture has to consider and explicitly manage the limits of semiconductor technology, such as wire delays, power dissipation, and soft errors. In this paper we describe the role of microarchitecture in the computer world present the challenges ahead of us, and highlight areas where microarchitecture can help address these challenges 相似文献

6.

Design of most-significant-bit-first serial multiplier

Shih-Lien Lu Kenney J. 《Electronics letters》1995,31(14):1133-1135

The design of a two's complement most-significant-bit-first add and shift serial multiplier is presented. In this multiplier, one of the multiplicands is represented in full length, whereas the second multiplicand is presented in a bit-serial fashion with the most significant bit (MSB) first 相似文献

7.

Design of a static MIMD data flow processor using micropipelines

Chih-Ming Chang Shih-Lien Lu 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1995,3(3):370-378

Control-flow machines are sequential in nature, executing instructions in sequence through control of program counters, whereas data-flow machines execute instructions only as input operands are made available, a process directed at the parallelism inherent within programs. At the architecture level, data-flow machines execute instructions asynchronously. In contrast, at the implementation level, the synchronous design framework of computer systems which employs globally clocked timing discipline has reached its design limits owing to problems of clock distribution. Therefore, renewed interest has been expressed in the design of computer systems based upon an asynchronous (or self-timed) approach free of the discipline imposed by the global clock. Thus, the design of a static MIMD data-flow processor using micropipelines is presented. The implemented processor, or the micro data-flow processor, differs from processors previously reported insofar as the micro data-flow processor is wholly asynchronous at both the architectural and the implementation levels 相似文献

8.

Implementation of micropipelines in enable/disable CMOSdifferential logic

Shih-Lien Lu 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1995,3(2):338-341

This paper examines an alternative implementation of micropipeline logic/data processing structures. To satisfy the timing requirements of the micropipeline, currently a delay element needs to be introduced in each of its stages. The alternative approach presented here eliminates this by using a differential CMOS logic family-enable/disable CMOS differential logic (ECDL) instead of the conventional static CMOS. This will ease the process of synthesizing micropipeline stages. The effectiveness of this technique in eliminating the delay requirement has been exemplified by presenting an adder implemented using ECDL 相似文献

9.

Active Cache Emulator 总被引：1，自引：0，他引：1

Nurvitadhi E. Jumnit Hong Shih-Lien Lu 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(3):229-240

This paper presents the active cache emulator (ACE), a novel field-programmable gate-array (FPGA)-based emulator that models an L3 cache actively and in real-time. ACE leverages interactions with its host system to model the target system. Unlike most existing FPGA-based cache emulators that collect only memory traces from their host system, ACE provides feedback to its host by injecting delays to time dilate the host system such that it experiences hit/miss latencies of the emulated cache. Such active emulation expands the context of performance evaluations by allowing measurements of system performance metrics (e.g., CPI, operations per second, frame rate) in addition to the typical cache-specific performance metrics (e.g., miss ratio) provided by existing emulators. ACE is designed to interface with a front-side bus (FSB) of a typical Pentium-based PC system. ACE utilizes the FSB snoop stall mechanism to inject delays into the system. At present, ACE is implemented using a Xilinx XC2V6000 FPGA running at 66 MHz, the same speed as its host's FSB. Verification of ACE includes using the cache calibrator and RightMark memory analyzer software to confirm proper detection of the emulated cache by the host system, and comparing ACE results with SimpleScalar software simulations. Finally, ACE is used to study L3 caches for compute-intensive, throughput-oriented, and real-time gaming benchmarks (SPEC-CPU2000, SPEC-JBB2000, Quake3). The study shows that analyzing only cache-specific metrics, as done by existing L3 cache studies with FPGA emulators, is insufficient. Active emulation mitigates this issue by providing a broader performance view, allowing researchers make better research conclusion. 相似文献

10.

Efficient arithmetic using self-timing

Ramachandran R. Shih-Lien Lu 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1996,4(4):445-454

Recent advances in VLSI technology have facilitated high levels of integration and the implementation of faster circuits on a chip. Most of the improvements in the performance of digital systems have been brought about by such faster technologies. However, these improvements in technology have brought along with them a host of other constraints. In the faster deep submicron technologies, the wire delays constitute a significant portion of the overall delay of the system and hence some of the advantages of faster technologies are lost. The high level of integration necessitates clock distribution schemes which minimize skew across the die. These result in area penalties and adversely affect the level of integration possible at the chip level. Hence, changes in the basic architecture of computing elements of a system, which when implemented in silicon introduces reduced interconnect delays and simpler clock distribution networks, will result in more effective performance improvements. The work presented here examines the implementation of the most basic element in any datapath-an adder. The adder, a carry elimination adder (CEA), uses self-timing at both the algorithmic and implementation levels and presents a minimal hardware high speed addition mechanism. The adder exploits the nature of the input operands dynamically, which results in its average case convergence time approaching that of the ubiquitous carry lookahead adder (CLA) and the hardware complexity of a carry ripple adder (CRA). Use of self-timing results in the elimination of a global clock and hence clock-skew 相似文献