期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Quantitative analysis and optimization techniques for on-chip cache leakage power

Nam Sung Kim Blaauw D. Mudge T. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(10):1147-1156

On-chip L1 and L2 caches represent a sizeable fraction of the total power consumption of microprocessors. In nanometer-scale technology, the subthreshold leakage power is becoming one of the dominant total power consumption components of those caches. In this study, we present optimization techniques to reduce the subthreshold leakage power of on-chip caches assuming that there are multiple threshold voltages, V/sub T/'s, available. First, we show a cache leakage optimization technique that examines the tradeoff between access time and subthreshold leakage power by assigning distinct V/sub T/'s to each of the four main cache components-address bus drivers, data bus drivers, decoders, and static random access memory (SRAM) cell arrays with sense amplifiers. Second, we show optimization techniques to reduce the leakage power of L1 and L2 on-chip caches without affecting the average memory access time. The key results are: 1) two additional high V/sub T/'s are enough to minimize leakage in a single cache-3 V/sub T/'s if we include a nominal low V/sub T/ for microprocessor core logic; 2) if L1 size is fixed, increasing L2 size can result in much lower leakage without reducing average memory access time; 3) if L2 size is fixed, reducing L1 size may result in lower leakage without loss of the average memory access time for the SPEC2K benchmarks; and 4) smaller L1 and larger L2 caches than are typical in today's processors result in significant leakage and dynamic power reduction without affecting the average memory access time. 相似文献

2.

Zero-aware asymmetric SRAM cell for reducing cache power in writing zero

Yen-Jen Chang Feipei Lai Chia-Lin Yang 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(8):827-836

Most microprocessors employ the on-chip caches to bridge the performance gap between the processor and the main memory. However, the cache accesses usually contribute significantly to the total power consumption of the chip. Based on the observation that an overwhelming majority of the values written to the cache are "0", in this paper we propose a zero-aware SRAM cell with an asymmetric inverter pair, called ZA cell, to minimize the cache power consumption in writing "0". The ZA cell uses a circuit-level technique, which is software independent and orthogonal to other low-power techniques at architecture-level. Compared to the conventional SRAM cell, the experimental results based on the SPEC2000 and MediaBench traces show that without compromise of both performance and stability, the ZA cell can reduce the average cache write power consumption over 60% for both the baseline instruction and data caches. In particular, the ZA cell is attractive in the data caches, which reveal the high write-zero rate. 相似文献

3.

Characterization and modeling of run-time techniques for leakage power reduction

Yuh-Fang Tsai Duarte D.E. Vijaykrishnan N. Irwin M.J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(11):1221-1233

While some leakage power reduction techniques require modification of the process technology, others are based on circuit-level optimizations and are applied at run-time. We focus our study on the latter and compare three techniques: input vector control, body bias control, and power supply gating. We determine their limits and benefits in terms of the potential leakage reduction, performance penalty, and area and power overhead. The leakage power savings trends considering technology scaling are also presented. Due to the differences in the properties of datapath logic and memory structures, different implementations are recommended. Finally, the use of the "minimum idle time" parameter, as a metric for evaluating different leakage control mechanisms, is shown. 相似文献

4.

Circuit techniques for 1.5-V power supply flash memory

Otsuka N. Horowitz M.A. 《Solid-State Circuits, IEEE Journal of》1997,32(8):1217-1230

We describe circuit techniques for a Flash memory which operates with a V_DD of 1.5 V. For the interface between the peripheral circuits and the memory core circuits, two types of level shifter circuits are proposed which convert a V_DD level signal into the high voltage signals needed for high performance. In order to improve the read performance at a low V_DD, a new self-bias bitline voltage sensing scheme is described. This circuit greatly reduces the delay's dependence on bitline capacitance and achieves 19 ns reduction of the sense delay at low voltages. Multilevel storage sensing with this circuit is also discussed 相似文献

5.

Study of novel techniques for reducing self-heating effects in SOI power LDMOS 总被引：1，自引：0，他引：1

J. Roig D. Flores S. Hidalgo M. Vellvehi J. Rebollo J. Milln 《Solid-state electronics》2002,46(12):2123-2133

Self-heating effects in silicon-on-insulator (SOI) power devices have become a serious problem when the active silicon layer thickness is reduced and buried oxide thickness is increased. Hence, if the temperature of the active region rises, the device electrical characteristics can be seriously modified in steady state and transient modes. In order to alleviate the self heating, two novel techniques which lead to a better heat flow from active silicon layer to silicon substrate through the buried oxide layer in SOI power devices are proposed. No significant changes on device electrical characteristics are expected with the inclusion of the novel techniques. The electro-thermal performance of lateral power devices including the proposed techniques is also presented. 相似文献

6.

Low leakage techniques for FPGAs 总被引：1，自引：0，他引：1

Lodi A. Ciccarelli L. Guerrieri R. 《Solid-State Circuits, IEEE Journal of》2006,41(7):1662-1672

Reconfigurable architectures are well suited for wireless applications since they provide high performance computation together with the capability to adapt to changing communication protocols. Moving to 90-nm technology and below, FPGAs could suffer from leakage energy consumption due to the large number of inactive transistors. This paper presents an extensive study on the application of different low-leakage techniques to the design of FPGAs. The approaches are compared and mixed to find an implementation of switch blocks and look-up tables which reduces leakage without affecting delay and area. The circuits we propose achieve an 86% stand-by energy saving and 46% active leakage reduction with respect to standard implementations. The FPGA delay is not affected, while area is increased by only 3%. 相似文献

7.

Circuit techniques for reducing the effects of op-ampimperfections: autozeroing, correlated double sampling, and chopperstabilization

Enz C.C. Temes G.C. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1996,84(11):1584-1614

In linear IC's fabricated in a low-voltage CMOS technology, the reduction of the dynamic range due to the dc offset and low frequency noise of the amplifiers becomes increasingly significant. Also, the achievable amplifier gain is often quite low in such a technology, since cascoding may not be a practical circuit option due to the resulting reduction of the output signal swing. In this paper, some old and some new circuit techniques are described for the compensation of the amplifier's most important nonideal effects including the noise (mainly thermal and 1/f noise), the input-referred dc offset voltage as well as the finite gain resulting in a nonideal virtual ground at the input 相似文献

8.

一种嵌入式处理器Cache的可在线配置和低功耗设计

刘坤杰孟建熠严晓浪葛海通《电路与系统学报》2009,14(5)

本文提出了一种基于"组拼合"技术的嵌入式片上高速缓存(Cache)在线可配置结构.在线可配置Cache可以针对不同的应用,配置Cache的组关联等参数,从而在保持应用性能基本不变的前提下,有效降低Cache的动态功耗.其中水平组拼合方式与Gated-Vdd技术配合使用,不仅可以有效降低动态功耗,而且可以降低超深亚微米工艺中不断凸现的静态漏电功耗.将该结构应用于32-bit嵌入式处理器CK510中,PowerStone测试基准中的一组应用测试表明,组拼合可在线配置Cache结构可以显著降低处理器功耗. 相似文献

9.

Full-chip subthreshold leakage power prediction and reduction techniques for sub-0.18-/spl mu/m CMOS

Narendra S. De V. Borkar S. Antoniadis D.A. Chandrakasan A.P. 《Solid-State Circuits, IEEE Journal of》2004,39(3):501-510

The driving force for the semiconductor industry growth has been the elegant scaling nature of CMOS technology. In future CMOS technology generations, supply and threshold voltages will have to continually scale to sustain performance increase, control switching power dissipation, and maintain reliability. These continual scaling requirements on supply and threshold voltages pose several technology and circuit design challenges. With threshold voltage scaling, subthreshold leakage power is expected to become a significant portion of the total power in future CMOS systems. Therefore, it becomes crucial to predict and reduce subthreshold leakage power of such systems. In the first part of this paper, we present a subthreshold leakage power prediction model that takes into account within-die threshold voltage variation. Statistical measurements of 32-bit microprocessors in 0.18-/spl mu/m CMOS confirm that the mean error of the model is 4%. In the second part of this paper, we present the use of stacked devices to reduce system subthreshold leakage power without reducing system performance. A model to predict the scaling nature of this stack effect and verification of the model through statistical device measurements in 0.18-/spl mu/m and 0.13-/spl mu/m are presented. Measurements also demonstrate reduction in threshold voltage variation for stacked devices compared to nonstack devices. Comparison of the stack effect to the use of high threshold voltage or longer channel length devices for subthreshold leakage reduction is also discussed. 相似文献

10.

Circuit techniques for large CSEA SRAMs

Wingard D.E. Stark D.C. Horowitz M.A. 《Solid-State Circuits, IEEE Journal of》1992,27(6):908-919

The CMOS-storage emitter-access (CSEA) memory cell offers faster access than the MOS cells used in conventional BiCMOS SRAMs but using it in large memory arrays poses several problems. Novel BiCMOS circuit approaches to address the problems of decoding power, electronic noise, level translation, and write disturbance are described. Results on a 64-kb CSEA SRAM using these techniques are reported. The device, fabricated in an 0.8-μm BiCMOS technology, achieves read access and write pulse time of less than 4 ns while dissipating 1.7 W at a case temperature of 70°C 相似文献

11.

Circuit techniques for a VLSI memory

《Solid-State Circuits, IEEE Journal of》1983,18(5):463-470

This paper describes circuit techniques necessary for dynamic RAMs with high-packing density to implement submicron device technology. An on-chip error checking and correcting technique using bidirectional parity checking is proposed to reduce the soft error rate. In a sense-refresh amplifier, capacitor-coupled presenting is introduced to compensate for threshold imbalance. An on-chip supply voltage conversion is described as a solution for a hot carrier-injection problem. A 256K CMOS dynamic RAM has been designed and fabricated as a test vehicle for these techniques. 相似文献

12.

Micro-operation cache: a power aware frontend for variable instruction length ISA

Solomon B. Mendelson A. Ronen R. Orenstien D. Almog Y. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(5):801-811

Modern computer architectures that support variable length instruction set architectures (ISA), such as the Intel's IA-32, distinguish between the architectural level of presentation and the micro-architectural representations of the instructions. At the micro-architectural level, instructions are represented by fixed-length micro-operations termed uops, and complex instructions are broken into sequence of uops. The fetch and decode operations in such architectures are extremely complicated and power hungry, especially if they aim to handle several variable length instructions per cycle. This paper suggests caching uop sequences from decoded instructions in a special structure, termed uop cache (UC), and use this fix-length decoded format when possible. Doing so enables reduction in the processor's power and energy consumption while not compromising performance. We will show that a moderately-sized UC can eliminate about 75% instruction decodes across a broad range of benchmarks and over 90% in multimedia applications and high-power tests. For existing Intel P6 family processors, the eliminated work may save about 10% of the full-chip power consumption. While the new proposed technique can be used to save power without degrading performance, we can also use it to improve processor performance when power is constrained. 相似文献

13.

Circuit techniques for CMOS low-power high-performance multipliers

Abu-Khater I.S. Bellaouar A. Elmasry M.I. 《Solid-State Circuits, IEEE Journal of》1996,31(10):1535-1546

In this paper we present circuit techniques for CMOS low-power high-performance multiplier design. Novel full adder circuits were simulated and fabricated using 0.8-μm CMOS (in BiCMOS) technology. The complementary pass-transistor logic-transmission gate (CPL-TG) full adder implementation provided an energy savings of 50% compared to the conventional CMOS full adder. CPL implementation of the Booth encoder provided 30% power savings at 15% speed improvement compared to the static CMOS implementation. Although the circuits were optimized for (16×16)-b multiplier using the Booth algorithm, a (6×6)-b implementation was used as a test vehicle in order to reduce simulation time. For the (6×6)-b case, implementation based on CPL-TG resulted in 18% power savings and 30% speed improvement over conventional CMOS 相似文献

14.

降低OFDM峰值平均功率比的方法 总被引：3，自引：0，他引：3

雷俊吴乐南《电声技术》2003,(8):59-61,65

正交频分复用(OFDM)作为一种抗多径的高速信息传输技术,最大缺点就是固有的高峰值平均功率比。综述了现有的用于降低OFDM系统中峰值平均功率比的方法及其存在的问题。相似文献

15.

Evaluating power consumption of parameterized cache and busarchitectures in system-on-a-chip designs

Givargis T.D. Vahid F. Henkel J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(4):500-508

Architectures with parameterizable cache and bus can support large tradeoffs between performance and power. We provide simulation data showing the large tradeoffs by such an architecture for several applications and demonstrating that the cache and bus should be configured simultaneously to find the optimal solutions. Furthermore, we describe analytical techniques for speeding up the cache/bus power and performance evaluation by several orders of magnitude over simulation, while maintaining sufficient accuracy with respect to simulation-based approaches 相似文献

16.

Optimization of large band-gap barriers for reducing leakage in bipolar cascade lasers

Dross F. van Dijk F. Vinter B. 《Quantum Electronics, IEEE Journal of》2004,40(8):1003-1007

In order to study the characteristics of bipolar cascade lasers, we have developed a fully consistent transport model compatible with Esaki tunnel junctions (TJs). First, we compare the calculated electrical characteristics of TJs made of different InGaAsP lattice-matched to InP materials with different doping concentrations. Then, a complex (p-n)-(n/sup ++/p/sup ++/)-(p-n)-(n/sup ++/p/sup ++/)-p-n) structure is implemented. The Esaki junctions are cladded by doped InP current confining layers, the width of which is optimized to prevent electron leakage. We find that a 25-nm-wide InP barrier confines more than 98% of the electron current for a total injection current of 10 kA.cm/sup -2/ at room temperature. The predicted differential quantum efficiency is then 230%. 相似文献

17.

Scheduling techniques for reducing processor energy use in MacOS 总被引：1，自引：0，他引：1

Lorch Jacob R. Smith Alan Jay 《Wireless Networks》1997,3(5):311-324

The CPU is one of the major power consumers in a portable computer, and considerable power can be saved by turning off the CPU when it is not doing useful work. In Apple's MacOS, however, idle time is often converted to busy waiting, and generally it is very hard to tell when no useful computation is occurring. In this paper, we suggest several heuristic techniques for identifying this condition, and for temporarily putting the CPU in a low‐power state. These techniques include turning off the processor when all processes are blocked, turning off the processor when processes appear to be busy waiting, and extending real time process sleep periods. We use trace‐driven simulation, using processor run interval traces, to evaluate the potential energy savings and performance impact. We find that these techniques save considerable amounts of processor energy (as much as 66%), while having very little performance impact (less than 2% increase in run time). Implementing the proposed strategies should increase battery lifetime by approximately 20% relative to Apple's current CPU power management strategy, since the CPU and associated logic are responsible for about 32% of power use; similar techniques should be applicable to operating systems with similar behavior. This revised version was published online in June 2006 with corrections to the Cover Date. 相似文献

18.

集成循环代码cache降低微控制器功耗研究

费振东毛志刚《信息技术》2008,32(12)

功耗对于面向低成本低功耗应用的微控制器(单片机)十分重要.研究表明,CPU由于取指对程序存储器的访问功耗,构成了微控制器整体功耗的重要组成部分,而微控制器应用程序的大部分执行时间被用于执行固定的循环代码.研究了集成循环代码cache,从中执行循环代码来降低存储器访问功耗的技术. 相似文献

19.

Using dynamic cache management techniques to reduce energy ingeneral purpose processors

Bellas N.E. Hajj I.N. Polychronopoulos C.D. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(6):693-708

The memory hierarchy of high-performance and embedded processors has been shown to be one of the major energy consumers. For example, the Level-1 (L1) instruction cache (I-Cache) of the StrongARM processor accounts for 27% of the power dissipation of the whole chip, whereas the instruction fetch unit (IFU) and the I-Cache of Intel's Pentium Pro processor are the single most important power consuming modules with 14% of the total power dissipation [2]. Extrapolating current trends, this portion is likely to increase in the near future, since the devices devoted to the caches occupy an increasingly larger percentage of the total area of the chip. In this paper, we propose a technique that uses an additional mini cache, the LO-Cache, located between the I-Cache and the CPU core. This mechanism can provide the instruction stream to the data path and, when managed properly, it can effectively eliminate the need for high utilization of the more expensive I-Cache. We propose, implement, and evaluate five techniques for dynamic analysis of the program instruction access behavior, which is then used to proactively guide the access of the LO-Cache. The basic idea is that only the most frequently executed portions of the code should be stored in the LO-Cache since this is where the program spends most of its time. We present experimental results to evaluate the effectiveness of our scheme in terms of performance and energy dissipation for a series of SPEC95 benchmarks. We also discuss the performance and energy tradeoffs that are involved in these dynamic schemes. Results for these benchmarks indicate that more than 60% of the dissipated energy in the I-Cache subsystem can be saved 相似文献

20.

Bus-switch coding for reducing power dissipation in off-chip buses

Olivieri M. Pappalardo F. Visalli G. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(12):1374-1377

We present a novel coding scheme for reducing bus power dissipation. The presented approach is well suited to driving off-chip buses, where the line capacitance is a dominant factor. A distinctive feature of the technique is the dynamic reordering of bus line positions, in order to minimize the toggling activity on physical bus wires. The effectiveness of the approach is demonstrated through cycle-accurate simulation of industrial benchmarks in conjunction with post-layout evaluation of speed, power and area overhead. 相似文献