首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Thermal-Aware Methodology for Repeater Insertion in Low-Power VLSI Circuits   总被引:1,自引:0,他引:1  
In this paper, the impact of thermal effects on low-power repeater insertion methodology is studied. An analytical methodology for thermal-aware repeater insertion that includes the electrothermal coupling between power, delay, and temperature is presented, and simulation results with global interconnect repeaters are discussed for 90- and 65-nm technology. Simulation results show that the proposed thermal-aware methodology can save 17.5% more power consumed by the repeaters compared to a thermal-unaware methodology for a given allowed delay penalty. In addition, the proposed methodology also results in a lower chip temperature, and thus, extra leakage power savings from other logic blocks.  相似文献   

2.
Circuit and microarchitectural techniques for reducing cache leakage power   总被引:2,自引:0,他引:2  
On-chip caches represent a sizable fraction of the total power consumption of microprocessors. As feature sizes shrink, the dominant component of this power consumption will be leakage. However, during a fixed period of time, the activity in a data cache is only centered on a small subset of the lines. This behavior can be exploited to cut the leakage power of large data caches by putting the cold cache lines into a state preserving, low-power drowsey mode. In this paper, we investigate policies and circuit techniques for implementing drowsy data caches. We show that with simple microarchitectural techniques, about 80%-90% of the data cache lines can be maintained in a drowsy state without affecting performance by more than 0.6%, even though moving lines into and out of a drowsy state incurs a slight performance loss. According to our projections, in a 70-nm complementary metal-oxide-semiconductor process, drowsy data caches will be able to reduce the total leakage energy consumed in the caches by 60%-75%. In addition, we extend the drowsy cache concept to reduce leakage power of instruction caches without significant impact on execution time. Our results show that data and instruction caches require different control strategies for efficient execution. In order to enable drowsy instruction caches, we propose a technique called cache subbank prediction, which is used to selectively wake up only the necessary parts of the instruction cache, while allowing most of the cache to stay in a low-leakage drowsy mode. This prediction technique reduces the negative performance impact by 78% compared with the no-prediction policy. Our technique works well even with small predictor sizes and enables a 75% reduction of leakage energy in a 32-kB instruction cache.  相似文献   

3.
LRU-Assist:一种高效的Cache漏流功耗控制算法   总被引:5,自引:4,他引:1       下载免费PDF全文
随着集成电路制造工艺进入超深亚微米阶段,漏电流功耗在微处理器总功耗中所占的比例越来越大,在开发新的低漏流工艺和电路技术之外,如何在体系结构级控制和优化漏流功耗成为业界研究的热点.Cache在微处理器中面积最大,是进行漏流控制的首要部件.LRU是组相联Cache最常用的替换算法,而研究发现,访存操作命中LRU后半区的概率很低.LRU-Assist算法以Drowsy Cache、Cache Decay等控制策略为基础,在保证处理器性能不受影响的前提下,利用既有的LRU信息把Cache的关闭率平均提高了15%,大大降低了漏电流功耗.  相似文献   

4.
In this paper, we propose a novel integrated circuit and architectural level technique to reduce leakage power consumption in high-performance cache memories using single V/sub t/ (transistor threshold voltage) process. We utilize the concept of gated-ground (nMOS transistor inserted between ground line and SRAM cell) to achieve a reduction in leakage energy without significantly affecting performance. Experimental results on gated-ground caches show that data is retained (DRG-Cache) even if the memory is put in the standby mode of operation. Data is restored when the gated-ground transistor is turned on. Turning off the gated-ground transistor in turn gives a large reduction in leakage power. This technique requires no extra circuitry; the row decoder itself can be used to control the gated-ground transistor. The technique is applicable to data and instruction caches as well as different levels of cache hierarchy, such as the L1, L2, or L3 caches. We fabricated a test chip in TSMC 0.25-/spl mu/m technology to show the data retention capability and the cell stability of the DRG-Cache. Our simulation results on 100-nm and 70-nm processes (Berkeley Predictive Technology Model) show 16.5% and 27% reduction in consumed energy in L1 cache and 50% and 47% reduction in L2 cache, respectively, with less than 5% impact on execution time and within 4% increase in area overhead.  相似文献   

5.
On-chip L1 and L2 caches represent a sizeable fraction of the total power consumption of microprocessors. In nanometer-scale technology, the subthreshold leakage power is becoming one of the dominant total power consumption components of those caches. In this study, we present optimization techniques to reduce the subthreshold leakage power of on-chip caches assuming that there are multiple threshold voltages, V/sub T/'s, available. First, we show a cache leakage optimization technique that examines the tradeoff between access time and subthreshold leakage power by assigning distinct V/sub T/'s to each of the four main cache components-address bus drivers, data bus drivers, decoders, and static random access memory (SRAM) cell arrays with sense amplifiers. Second, we show optimization techniques to reduce the leakage power of L1 and L2 on-chip caches without affecting the average memory access time. The key results are: 1) two additional high V/sub T/'s are enough to minimize leakage in a single cache-3 V/sub T/'s if we include a nominal low V/sub T/ for microprocessor core logic; 2) if L1 size is fixed, increasing L2 size can result in much lower leakage without reducing average memory access time; 3) if L2 size is fixed, reducing L1 size may result in lower leakage without loss of the average memory access time for the SPEC2K benchmarks; and 4) smaller L1 and larger L2 caches than are typical in today's processors result in significant leakage and dynamic power reduction without affecting the average memory access time.  相似文献   

6.
《Microelectronics Journal》2015,46(11):1020-1032
This paper describes a new memristor crossbar architecture that is proposed for use in a high density cache design. This design has less than 10% of the write energy consumption than a simple memristor crossbar. Also, it has up to 3 times the bit density of an STT-MRAM system and up to 11 times the bit density of an SRAM architecture. The proposed architecture is analyzed using a detailed SPICE analysis that accounts for the resistance of the wires in the memristor structure. Additionally, the memristor model used in this work has been matched to specific device characterization data to provide accurate results in terms of energy, area, and timing. The proposed memory system was analyzed by modeling two different devices that vary in resistance range and switching time. This system does not require that the memristor devices have inherent diode effects which limit alternate current paths. Therefore this system is capable of utilizing a much broader class of devices.An architectural analysis has also been completed that shows how the memory system may perform as a cache memory. A hybrid cache structure was used to alleviate the long write latencies of memristor devices. This approach consisted of the tag array being made of SRAM cells while the data array was made of the memristor circuit proposed. This hybrid scheme allows multiple reads and writes to concurrently access different sub-arrays within a cache. The performance of these novel memristor based caches was compared to SRAM and STT-MRAM based caches through detailed simulations. The results show that the memristor caches are denser and allow better performance along with lower system power when compared to the STT-MRAM and SRAM caches.  相似文献   

7.
With advances in process technology, soft errors are becoming an increasingly critical design concern. Owing to their large area, high density, and low operating voltages, caches are worst hit by soft errors. Based on the observation that in multimedia applications, not all data require the same amount of protection from soft errors, we propose a partially protected cache (PPC) architecture, in which there are two caches, one protected and the other unprotected at the same level of memory hierarchy. We demonstrate that as compared to the existing unprotected cache architectures, PPC architectures can provide 47 times reduction in failure rate, at only 1% runtime and 3% power overheads. In addition, the failure rate reduction obtained by PPCs is very sensitive to the PPC cache configuration. Therefore, this observation provides an opportunity for further improvement of the solution by correctly parameterizing the PPC configurations. Consequently, we develop design space exploration (DSE) strategies to discover the best PPC configuration. Our DSE technique can reduce the exploration time by more than six times as compared to an exhaustive approach.   相似文献   

8.
Power consumption is an increasingly pressing problem in modern processor design. Since the on-chip caches usually consume a significant amount of power, it is one of the most attractive targets for power reduction. This paper presents a two-level filter scheme, which consists of the L1 and L2 filters, to reduce the power consumption of the on-chip cache. The main idea of the proposed scheme is motivated by the substantial unnecessary activities in conventional cache architecture. We use a single block buffer as the L1 filter to eliminate the unnecessary cache accesses. In the L2 filter, we then propose a new sentry-tag architecture to further filter out the unnecessary way activities in case of the L1 filter miss. We use SimpleScalar to simulate the SPEC2000 benchmarks and perform the HSPICE simulations to evaluate the proposed architecture. Experimental results show that the two-level filter scheme can effectively reduce the cache power consumption by eliminating most unnecessary cache activities, while the compromise of system performance is negligible. Compared to a conventional instruction cache (32 kB, two-way) implemented with only the L1 filter, the use of a two-level filter can result in roughly 30% reduction in total cache power consumption. Similarly, compared to a conventional data cache (32 kB, four-way) implemented with only the L1 filter, the total cache power reduction is approximately 46%.  相似文献   

9.
Deep-submicron CMOS designs maintain high transistor switching speeds by scaling down the supply voltage and proportionately reducing the transistor threshold voltage. Lowering the threshold voltage increases leakage energy dissipation due to subthreshold leakage current even when the transistor is not switching. Estimates suggest a five-fold increase in leakage energy in every future generation. In modern microarchitectures, much of the leakage energy is dissipated in large on-chip cache memory structures with high transistor densities. While cache utilization varies both within and across applications, modern cache designs are fixed in size resulting in transistor leakage inefficiencies. This paper explores an integrated architectural and circuit-level approach to reducing leakage energy in instruction caches (i-caches). At the architecture level, we propose the Dynamically ResIzable i-cache (DRI i cache), a novel i-cache design that dynamically resizes and adapts to an application's required size. At the circuit-level, we use gated-Vdd, a novel mechanism that effectively turns off the supply voltage to, and eliminates leakage in, the SRAM cells in a DRI i-cache's unused sections. Architectural and circuit-level simulation results indicate that a DRI i-cache successfully and robustly exploits the cache size variability both within and across applications. Compared to a conventional i-cache using an aggressively-scaled threshold voltage a 64 K DRI i-cache reduces on average both the leakage energy-delay product and cache size by 62%, with less than 4% impact on execution time. Our results also indicate that a wide NMOS dual-Vt gated-Vdd transistor with a charge pump offers the best gating implementation and virtually eliminates leakage energy with minimal increase in an SRAM cell read time area as compared to an i-cache with an aggressively-scaled threshold voltage  相似文献   

10.
Spin-torque transfer RAM (STT-RAM) is a promising candidate to replace SRAM for larger Last level cache (LLC). However, it has long write latency and high write energy which diminish the benefit of adopting STT-RAM caches. A common observation for LLC is that a large number of cache blocks have never been referenced again before they are evicted. The write operations for these blocks, which we call dead writes, can be eliminated without incurring subsequent cache misses. To address this issue, a quantitative scheme called Feedback learn-ing based dead write termination (FLDWT) is proposed to improve energy efficiency and performance of STT-RAM based LLC. FLDWT dynamically learns the block access behavior by using data reuse distance and data access fre-quency, and then classifies the blocks into dead blocks and live blocks. FLDWT terminates dead write block requests and improves the estimation accuracy via feedback infor-mation. Compared with STT-RAM baseline in the last-level caches, experimental results show that our scheme achieves energy reduction by 44.6% and performance im-provement by 12% on average with negligible overhead.  相似文献   

11.
This paper describes a 2.3 Billion transistors, 8-core, 16-thread, 64-bit Xeon? EX processor with a 24 MB shared L3 cache implemented in a 45 nm nine-metal process. Multiple clock and voltage domains are used to reduce power consumption. Long channel devices and cache sleep mode are used to minimize leakage. Core and cache recovery improve manufacturing yields and enable multiple product flavors from the same silicon die. The disabled blocks are both clock and power gated to minimize their power consumption. Idle power is reduced by shutting off the unterminated I/O links and shedding phases in the voltage regulator to improve the power conversion efficiency.  相似文献   

12.
Static energy reduction techniques for microprocessor caches   总被引:1,自引:0,他引:1  
Microprocessor performance has been improved by increasing the capacity of on-chip caches. However, the performance gain comes at the price of static energy consumption due to subthreshold leakage current in cache memory arrays. This paper compares three techniques for reducing static energy consumption in on-chip level-1 and level-2 caches. One technique employs low-leakage transistors in the memory cell. Another technique, power supply switching, can be used to turn off memory cells and discard their contents. A third alternative is dynamic threshold modulation, which places memory cells in a standby state that preserves cell contents. In our experiments, we explore the energy and performance tradeoffs of these techniques. We also investigate the sensitivity of microprocessor performance and energy consumption to additional cache latency caused by leakage-reduction techniques.  相似文献   

13.
NTS 高缓为直接映象高缓扩充了一个小的全相联高缓,其中保存被预测为具有非时间局部性的数据块.与牺牲高缓的区别在于NTS 高缓在两个高缓之间没有直接的数据通路,因此结构设计简单,功耗低.本文提出了一个NTS 高缓的改进方案,称为选择性冲突预测高缓(SCP 高缓)设计方案.SCP 高缓利用冲突预测算法监测高缓中数据块局部性,并有选择性地将数据块填充到直接映象高缓或全相联高缓中.仿真结果显示,容量相当的SCP 高缓性能优于NTS 高缓.  相似文献   

14.
In this paper, we study microarchitecture-level leakage energy reduction by power gating. We consider the virtual power/ground rails clamp (VRC) and multithreshold CMOS (MTCMOS) techniques and apply VRC to memory-based units for data retention and MTCMOS to the other units. We propose a systematic methodology for leakage reduction at the microarchitecture level, in which profiling of idle period distribution and ideal power gating analysis are used to select a target component for realistic power gating. We show that the ideal leakage energy reduction can be up to 30% of the total energy for the modern high-performance very long instruction word processors we study and that the secondary level (L2) cache contributes most to the reduction. We further improve the existing adaptive cache decay method for leakage reduction by using VRC for data retention and name it VRC decay . Applied to L2 cache, the VRC decay, on average, increases performance by 5.6% and reduces system energy by 24.1%, compared to the adaptive cache decay without data retention.  相似文献   

15.
Coarse-grained reconfigurable architectures (CGRAs) require many processing elements (PEs) and a configuration memory unit (configuration cache) for reconfiguration of its PE array. Although this structure is meant for high performance and flexibility, it consumes significant power. Specially, power consumption by configuration cache is explicit overhead compared to other types of intellectual property (IP) cores. Reducing power is very crucial for CGRA to be more competitive and reliable processing core in embedded systems. In this paper, we propose a reusable context pipelining (RCP) architecture to reduce power-overhead caused by reconfiguration. It shows that the power reduction can be achieved by using the characteristics of loop pipelining, which is a multiple instruction stream, multiple data stream (MIMD)-style execution model. RCP efficiently reduces power consumption in configuration cache without performance degradation. Experimental results show that the proposed approach saves much power even with reduced configuration cache size. Power reduction ratio in the configuration cache and the entire architecture are up to 86.33% and 37.19%, respectively, compared to the base architecture.  相似文献   

16.
Multithreshold CMOS (MTCMOS) circuits reduce standby leakage power with low delay overhead. Most MTCMOS designs cut off the power to large blocks of logic using large sleep transistors. Locally distributing sleep devices has remained less popular even though it has several advantages described in this paper. However, locally placed sleep devices are only feasible if sneak leakage currents are prevented. This paper makes two contributions to leakage reduction. First, we examine the causes of sneak leakage paths and propose a design methodology that enables local insertion of sleep devices for sequential and combinational circuits. A set of design rules allows designers to prevent most sneak leakage paths. A fabricated 0.13-/spl mu/m, dual V/sub T/ test chip employs our methodology to implement a low-power FPGA architecture with gate-level sleep FETs and over 8/spl times/ measured standby current reduction. Second, we describe the implementation and benefits of local sleep regions in our design and examine the interfacing issues for this technique. Local sleep regions reduce leakage in unused circuit components at a local level while the surrounding circuits remain active. Measured results show that local sleep regions reduce leakage in active configurable logic blocks (CLBs) on our chip by up to 2.2/spl times/ (measured) based on configuration.  相似文献   

17.
Most microprocessors employ the on-chip caches to bridge the performance gap between the processor and the main memory. However, the cache accesses usually contribute significantly to the total power consumption of the chip. Based on the observation that an overwhelming majority of the values written to the cache are "0", in this paper we propose a zero-aware SRAM cell with an asymmetric inverter pair, called ZA cell, to minimize the cache power consumption in writing "0". The ZA cell uses a circuit-level technique, which is software independent and orthogonal to other low-power techniques at architecture-level. Compared to the conventional SRAM cell, the experimental results based on the SPEC2000 and MediaBench traces show that without compromise of both performance and stability, the ZA cell can reduce the average cache write power consumption over 60% for both the baseline instruction and data caches. In particular, the ZA cell is attractive in the data caches, which reveal the high write-zero rate.  相似文献   

18.
本文提出了一种直接映象式高速缓存块冲突预测方法,即借助于高缓块的最近替换行为动态预测冲突发生.基于该方法,我们设计了一种高缓结构-冲突预测高缓,主体为一个直接映象式高缓和一个较小的全相联高缓,利用冲突预测表实行高缓块的动态分配.应用于片上数据高缓的SPEC95 仿真结果表明,与16kB直接映象式高缓相比,(8+1)kB冲突预测高缓命中率平均可提高12.2%,与类似结构的高缓(如NTS高缓、PCS高缓等)相比降低了硬件开销,简化了控制机构,易于实现,并且在命中率和总线交通量等性能方面都有所提高.  相似文献   

19.
Soft errors issues in low-power caches   总被引:1,自引:0,他引:1  
As technology scales, reducing leakage power and improving reliability of data stored in memory cells is both important and challenging. While lower threshold voltages increase leakage, lower supply voltages and smaller nodal capacitances reduce energy consumption but increase soft errors rates. In this work, we present a comprehensive study of soft error rates on low-power cache design. First, we study the effect of circuit level techniques, used to reduce the leakage energy consumption, on soft error rates. Our results using custom designs show that many of these approaches may increase the soft error rates as compared to a standard 6T SRAM. We also validate the effects of voltage scaling on soft error rate by performing accelerated tests on off-the-shelf SRAM-based chips using a neutron beam source. Next, we study the impact of cache decay and drowsy cache, which are two commonly used architectural-level leakage reduction approaches, on the cache reliability. Our results indicate that the leakage optimization techniques change the reliability of cache memory. More importantly, we demonstrate that there is a tradeoff between optimizing for leakage power and improving the immunity to soft error. We also study the impact of error correcting codes on soft error rates. Based on this study, we propose an adaptive error correcting scheme to reduce the leakage energy consumption and improve reliability.  相似文献   

20.
In this paper, we make the case for building high-performance asymmetric-cell caches (ACCs) that employ recently-proposed asymmetric SRAMs to reduce leakage proportionally to the number of resident zero bits. Because ACCs target memory value content (independent of cell activity and access patterns), they complement prior proposals for reducing cache leakage that target memory access characteristics. Through detailed simulation and leakage estimation using a commercial 0.13-$mu$m CMOS process model, we show that: 1) on average 75% of resident data cache bits and 64% of resident instruction cache bits are zero; 2) while prior research carefully evaluated the fraction of accessed zero bytes, we show that a high fraction of accessed zero bytes is neither a necessary nor a sufficient condition for a high fraction of resident zero bits; 3) the zero-bit program behavior persists even when we restrict our attention to live data, thereby complementing prior leakage-saving techniques that target inactive cells; and 4) ACCs can reduce leakage on the average by 4.3$times$compared to a conventional data cache without any performance loss, and by 9$times$at the cost of a 5% increase in overall cache access latency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号