期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

周宏伟张承义张民选《计算机研究与发展》2008,45(2):367-374

随着工艺尺寸的缩小,漏流功耗逐渐成为制约微处理器设计的主要因素之一.Sleep Cache与Drowsy Cache是两种降低Cache漏流功耗的重要技术.基于统计信息的Cache漏流功耗估算方法(SB-CLPE)用于对Sleep Cache或Drowsy Cache进行Cache漏流功耗估算,根据该方法设计的Cache体系结构能够在程序执行过程中实时估算Cache漏流功耗.通过对所有Cache块的访问间隔时间进行统计,SB_CLPE可以估算出使用不同衰退间隔时Cache的漏流功耗,从而得到使Cache漏流功耗最低的最佳衰退间隔.实验表明,SB_CLPE对Sleep Cache的漏流功耗的估算结果与HotLeakage漏流功耗模拟器通过模拟获得的结果相比,平均偏差仅为3.16%,得到的最佳衰退间隔也可以较好吻合.使用SB_CLPE的Cache体系结构可以用于在程序执行过程中对最佳衰退间隔进行实时估算,通过动态调整衰退间隔以达到最优的功耗降低效果. 相似文献

2.

多核处理器可重构Cache功耗计算方法的研究

《计算机科学》2014,(Z1)

多核动态可重构Cache是解决Cache功耗困扰的一个重要方法。现有Cache功耗模拟器并不能很好地支持多核动态可重构Cache功耗研究,通过对多核动态可重构Cache的功耗模型进行研究,找到了计算可重构Cache的方法和思路,应用CACTI来分别构建各个组成结构的Cache功耗模型,以较为准确地测算可重构Cache的功耗。在Simics模拟器下构建动态可重构Cache,运行测试程序,对比传统的体系结构,可重构Cache的功耗能够得到10.4%的降低。同时,实验中发现功耗的降低不仅仅是动态可重构Cache贡献的,而是由系统综合产生的,因此在低功耗设计中,要综合考虑整体系统的功耗和性能,避免片面地考虑Cache结构而导致整体功耗的提高。相似文献

3.

基于亚阈值漏电流的数据Cache低功耗控制策略研究

赵世凡樊晓桠李玉发《计算机测量与控制》2010,18(3)

随着工艺尺寸及处理器频率的提高,Cache的功耗已经成为处理器功耗的重要来源,数据Cache的亚阈值漏电流功耗在总功耗中的比重也在上升;提出一种通过降低未被访问的Cache line的亚阈值漏电流功耗来降低整个数据Cache功耗的控制策略;该策略对所有Cache line周期性地提供低电压,从而降低了SRAM单元的亚阈值漏电流;当某一行被访问时,提供正常的电压,直到下一次被周期性地控制提供低电压;仿真结果显示,此策略以较少的硬件代价和访问延迟显著地降低了数据Cache的亚阈值漏电流功耗。相似文献

4.

片内二级Cache的静态功耗优化技术研究

下载免费PDF全文

张承义张民选《计算机工程与科学》2007,29(3):77-79

随着集成电路制造工艺进入超深亚微米阶段,静态功耗在微处理器总功耗中所占的比例越来越大,尤其是片上二级Cache。在开发新的低漏流工艺和电路技术之外,如何在体系结构级控制和优化静态功耗成为业界研究的热点。本文提出了一种ADSR算法,在保证处理器性能不受影响的前提下,可以大幅降低二级Cache的静态功耗。相似文献

5.

组相联Cache中漏流功耗优化技术研究

张承义张民选邢座程《小型微型计算机系统》2007,28(2):372-375

随着集成电路制造工艺进入超深亚微米阶段,漏电流功耗在微处理器总功耗中所占的比例越来越大,在开发新的低漏流工艺和电路技术之外,如何在体系结构级控制和优化漏流功耗成为业界研究的热点.Cache在微处理器中面积最大,是进行漏流控制和优化的首要部件.本文提出了一种LRU-assist算法,利用既有的LRU信息,在保证处理器性能不受影响的前提下,cache的平均关闭率可达53%,大大降低了漏电流功耗. 相似文献

6.

基于SRAM和STT-RAM的混合指令Cache设计

皇甫晓妍樊晓桠黄小平《计算机工程与应用》2015,51(12):43-48

随着工艺尺寸减小,传统基于SRAM的片上Cache的漏电流功耗成指数增长,阻碍了片上Cache容量的增加。基于牺牲者Cache的原理,利用SRAM写速度快,STT-RAM的非易失性、高密度、极低漏电流功耗等特性设计了一种基于SRAM和STT-RAM的混合型指令Cache。通过实验证明,该混合型指令Cache与传统基于SRAM的指令Cache相比,在不增加指令Cache面积的情况下,增加了指令Cache容量,并显著提高了指令Cache的命中率。相似文献

7.

全相联Cache的体系结构级功耗估算与分析

王永文张民选《计算机工程与应用》2003,39(26):21-23,27

Cache是现代微处理器中消耗能量最多的部件之一。论文研究了全相联cache的组织结构,给出了一种全相联cache的体系结构级功耗估算模型,验证了该模型的有效性,并定量地分析了全相联cache组织结构的功耗特性。相似文献

8.

可重配置处理器的体系结构级功耗模型与分析

下载免费PDF全文

肖玮臧斌宇朱传琪《计算机工程与应用》2007,43(26):34-37

按照可重配置处理器的体系结构建立并实现功耗模型;模型对处理器的电路级特性进行抽象,基于体系结构级属性和工艺参数进行静态峰值功耗估算,基于性能模拟器进行动态功耗统计,并实现三种条件时钟下的门控技术;可重配置处理器与超标量通用微处理器相比,在性能方面获得的平均加速比为3.59,而在功耗方面的平均增长率仅为1.48;通过实验还说明采用简单的CC1门控技术能有效地降低可重配置系统的功耗和硬件复杂度;该模型为可重配置处理器低功耗设计和编译器级低功耗优化研究奠定了基础。相似文献

9.

一种嵌入式系统的滑动Cache机制设计

何青松邓超邱志《单片机与嵌入式系统应用》2015,15(3)

为了提高嵌入式系统中Cache的使用效率,针对不同类型的应用程序对指令和数据Cache的容量实时需求不同,提出一种滑动Cache组织方案.均衡考虑指令和数据Cache需求,动态地调整一级Cache的容量和配置.采用滑动Cache结构,不但降低了一级Cache的动态和静态泄漏功耗,而且还降低了整个处理器的动态功耗.模拟仿真结果表明,该方案在有效降低Cache功耗的同时能够提高Cache的综合性能. 相似文献

10.

多核处理器的功耗估算模型

刘辛沈立苏博王志英《软件学报》2015,26(7):1840-1852

精确的功耗估算能够为操作系统调度、软/硬件能效优化提供有效的指导.以往的研究表明:通过监测处理器内部相关硬件事件(如提交的指令数、Cache访问次数等),可以对功耗进行估算.但是,已有的相关功耗模型的精度并不理想,误差通常在5%以上.通过分析处理器提供的硬件事件,并在众多事件中筛选出一组与程序运行功耗密切相关的事件,使用逐步多元线性回归分析,建立了一个与应用无关的实时功耗估算模型,该模型可以直接移植到支持SMT的平台上.通过PARSEC和SPLASH2两个基准测试程序集进行了验证,估算误差分别为3.01%和1.99%.针对建模耗时长的问题,提出了基于两阶聚类的优化改进方法.所提出的估算模型能为构建具有动态平衡功耗和平滑峰值功耗的智能功耗感知系统提供借鉴. 相似文献

11.

Performance linked dynamic cache tuning: A static energy reduction approach in tiled CMPs

《Microprocessors and Microsystems》2017

Advancement in semiconductor technology increases power density in recent Chip Multi-Processors (CMPs) which significantly increases the leakage energy consumptions of on-chip Last Level Caches (LLCs). Performance linked dynamic tuning in LLC size is a promising option for reducing the cache leakage.This paper reduces static power consumption by dynamically shutting down or turning on cache banks based upon system performance and cache bank usage statistics. Shutting down of a cache bank remaps its future requests to another active bank, called as target bank. The proposed method is evaluated on three different implementation policies, viz (1) The system can decide to shutdown or turn-on some cache banks periodically throughout the process execution. (2) The system allows to shutdown banks initially and once the bank restarting initiates, no more shutdown is permitted further. (3) This policy resizes cache like first policy with some predefined time slices, in which cache cannot be resized.For a 4MB 4 way set associative L2 cache, experimental analysis shows 66% reduction in static energy with 29% gain in Energy Delay Product (EDP) for first strategy; for the second policy, static power is reduced by 59% with 27% savings in EDP. Finally, last policy saves 65% in static power and 30% in EDP with minimal performance penalty. 相似文献

12.

A leakage-aware L2 cache management technique for producer-consumer sharing in low-power chip multiprocessors

Hyunhee Kim Author VitaeJihong KimAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(12):1545-1557

This paper proposes a novel leakage management technique for applications with producer-consumer sharing patterns. Although previous research has proposed leakage management techniques by turning off inactive cache blocks, these techniques can be further improved by exploiting the various run-time characteristics of target applications in CMPs. By exploiting particular access sequences observed in producer-consumer sharing patterns and the spatial locality of shared buffers, our technique enables a more aggressive turn-off of L2 cache blocks of these buffers. Experimental results using a CMP simulator show that our proposed technique reduces the energy consumption of on-chip L2 caches, a shared bus, and off-chip memory by up to 31.3% over the existing cache leakage power management techniques with no significant performance loss. 相似文献

13.

深亚微米CMOS电路漏电流快速模拟器 总被引：2，自引：0，他引：2

徐勇军陈治国骆祖莹李晓维《计算机研究与发展》2004,41(5):880-885

随着工艺的发展 ,功耗成为大规模集成电路设计领域中一个关键性问题降低电源电压是减少电路动态功耗的一种十分有效的方法 ,但为了保证系统性能 ,必须相应地降低电路器件的阈值电压 ,而这样又将导致静态功耗呈指数形式增长 ,进入深亚微米工艺后 ,漏电功耗已经能和动态功耗相抗衡 ,因此 ,漏电功耗快速模拟器和低功耗低漏电技术一样变得十分紧迫诸如HSPICE的精确模拟器可以准确估计漏电功耗 ,但仅仅适合于小规模电路首先证实了CMOS晶体管和基本逻辑门都存在堆栈效应 ,然后提出了快速模拟器的漏电模型 ,最后通过对ISCAS85& 89基准电路的实验 ,说明了在精度许可 (误差不超过 3% )的前提下 ,模拟器获得了成百倍的加速 ,同时也解决了精确模拟器的内存爆炸问题相似文献

14.

FILESPPA: Fast Instruction Level Embedded System Power and Performance Analyzer

Nikolaos KroupisAuthor Vitae Dimitrios SoudrisAuthor Vitae 《Microprocessors and Microsystems》2011,35(3):329-342

In the low power embedded systems design, it is important to analyze and optimize both the hardware and the software components of the system. The power consumption evaluation of the embedded systems is very slow procedure using the instruction-level power models into the simulator. Moreover, a huge number of simulations are needed to explore the power consumption in the instruction memory hierarchy to find the best cache parameters of each hierarchy’s level. In this paper we present a methodology which is aiming to estimate the system power consumption in short time, without simulation. The proposed methodology is based on the fast instruction analysis using instruction level power models, cache memory and memory power models. Based on the proposed methodology a software tool was developed named FILESPPA in order to automate the methodology’s steps for the MIPS processor architectures. The experimental results show the efficiency of the proposed methodology and tool in term of estimation accuracy, reducing the system power estimation time of the simulation technique. 相似文献

15.

Leakage current estimation of CMOS circuit with stack effect

下载免费PDF全文

Yong-JunXu Zu-YingLuo Xiao-WeiLi Li-JianLi Xian-LongHong 《计算机科学技术学报》2004,19(5):0-0

Leakage current of CMOS circuit increases dramatically with the technology scaling down and has become a critical issue of high performance system. Subthreshold, gate and reverse biased junction band-to-band tunneling (BTBT) leakages are considered three main determinants of total leakage current. Up to now, how to accurately estimate leakage current of large-scale circuits within endurable time remains unsolved, even though accurate leakage models have been widely discussed. In this paper, the authors first dip into the stack effect of CMOS technology and propose a new simple gate-level leakage current model. Then, a table-lookup based total leakage current simulator is built up according to the model. To validate the simulator, accurate leakage current is simulated at circuit level using popular simulator HSPICE for comparison. Some further studies such as maximum leakage current estimation, minimum leakage current generation and a high-level average leakage current macromodel are introduced in detail. Experiments on ISCAS85 and ISCAS89 benchmarks demonstrate that the two proposed leakage current estimation methods are very accurate and efficient. 相似文献

16.

一种SPM周期准确功耗模型分析与实现

下载免费PDF全文

胡志刚赵庆福蒋湘涛《计算机工程与应用》2010,46(2):63-65

功耗问题是限制嵌入式设备发展的瓶颈之一。嵌入式系统中,为了降低嵌入式处理器的整体功耗,使用SPM（Scratch-Pad Memory）部件来替换cache部件。提出了一个SPM周期准确功耗模型。模型通过扩展SimpleScalar模拟器模拟程序执行时对SPM的访问,获得电路输入状态,并利用集成到模拟器中周期准确的SPM功耗模型计算SPM功耗,模型克服了电路级模型可扩展性较差的缺陷,通过在SimpleScalar中配置相关参数,模拟不同大小和结构SPM的功耗。实验表明模型能够准确模拟SPM功耗（误差不超过10%）。对SPM低功耗设计和优化具有一定的指导意义。相似文献

17.

An on-chip instruction cache design with one-bit tag for low-power embedded systems

Ji Gu^{Author Vitae} Hui Guo Author VitaePatrick LiAuthor Vitae 《Microprocessors and Microsystems》2011,35(4):382-391

On-chip instruction cache is a potential power hungry component in embedded systems due to its large chip area and high access-frequency. Aiming at reducing power consumption of the on-chip cache, we propose a Reduced One-Bit Tag Instruction Cache (ROBTIC), where the cache size is judiciously reduced and the cache tag field only contains the least significant bit of the full-tag. We develop a cache operational control scheme for ROBTIC so that with the one-bit cache tag, the program locality can still be efficiently exploited. For applications where most of the memory accesses are localized, our cache can achieve similar performance as a traditional full-tag cache; however, the power consumption of the cache can be significantly reduced due to the much smaller cache size, narrower tag array (just one bit), and tinier tag comparison circuit being used. Experiments on a set of benchmarks implemented in CMOS 180 nm process technology demonstrate that our proposed design can reduce up to 27.3% dynamic power consumption and 30.9% area of the traditional cache when the cache size is fixed at 32 instructions, which outperforms the existing partial-tag based cache design. With the cache size customization, a further 47.8% power saving can be achieved. Our experimental results also show that when implemented in the deep sub-micron technologies where the leakage power is not ignorable, our design is still efficient - a coherent power saving trend (about 22%) has been observed for technologies from 130 nm down to 65 nm. 相似文献