期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

何青松邓超邱志《单片机与嵌入式系统应用》2015,15(3)

为了提高嵌入式系统中Cache的使用效率,针对不同类型的应用程序对指令和数据Cache的容量实时需求不同,提出一种滑动Cache组织方案.均衡考虑指令和数据Cache需求,动态地调整一级Cache的容量和配置.采用滑动Cache结构,不但降低了一级Cache的动态和静态泄漏功耗,而且还降低了整个处理器的动态功耗.模拟仿真结果表明,该方案在有效降低Cache功耗的同时能够提高Cache的综合性能. 相似文献

2.

一种低功耗高性能的滑动Cache方案 总被引：2，自引：0，他引：2

赵学梅叶以正李晓明时锐《计算机研究与发展》2004,41(11):2035-2042

Cache存储器的功耗占整个芯片功耗的主要部分．针对不同类型的应用程序对指令和数据Cache的容量实时需求不同，一种滑动Cache组织方案被提出．它均衡考虑指令和数据Cache需求，动态地调整一级Cache的容量和配置，消除了Cache中闲置部分产生的功耗．SPEC95仿真结果表明，采用滑动Cache结构不但降低了一级Cache的动态和静态泄漏功耗，而且还降低了整个处理器的动态功耗，提高了性能．滑动Cache比两种传统Cache结构和DRI结构的一级Cache平均动态功耗分别降低21．3％，19．52％和20．62％．采用滑动Cache结构与采用两种传统Cache结构和DRI结构相比，处理器平均动态功耗分别降低了8．84％，8．23％和10．31％，平均能量延迟乘积提高了12．25％，7．02％和13．39％．相似文献

3.

基于统计信息的Cache漏流功耗估算方法

周宏伟张承义张民选《计算机研究与发展》2008,45(2):367-374

随着工艺尺寸的缩小,漏流功耗逐渐成为制约微处理器设计的主要因素之一.Sleep Cache与Drowsy Cache是两种降低Cache漏流功耗的重要技术.基于统计信息的Cache漏流功耗估算方法(SB-CLPE)用于对Sleep Cache或Drowsy Cache进行Cache漏流功耗估算,根据该方法设计的Cache体系结构能够在程序执行过程中实时估算Cache漏流功耗.通过对所有Cache块的访问间隔时间进行统计,SB_CLPE可以估算出使用不同衰退间隔时Cache的漏流功耗,从而得到使Cache漏流功耗最低的最佳衰退间隔.实验表明,SB_CLPE对Sleep Cache的漏流功耗的估算结果与HotLeakage漏流功耗模拟器通过模拟获得的结果相比,平均偏差仅为3.16%,得到的最佳衰退间隔也可以较好吻合.使用SB_CLPE的Cache体系结构可以用于在程序执行过程中对最佳衰退间隔进行实时估算,通过动态调整衰退间隔以达到最优的功耗降低效果. 相似文献

4.

基于流水化和滑动窗口结构的低功耗指令Cache设计

下载免费PDF全文

李伟肖建青《计算机工程与科学》2015,37(6):1037-1042

嵌入式处理器中Cache的应用极大地提高了处理器的性能,同时Cache,尤其是指令Cache功耗占据了处理器很大一部分功耗,关闭不必要的tag SRAM和data SRAM的访问,可以极大地降低功耗。提出了一种流水化的指令Cache访问机制,关闭不必要的data SRAM的访问;并且通过记录指令Cache行的信息和预测下一行的Cache形成一个Cache行滑动窗口,关闭不必要的tag SRAM访问。所提出的方法没有性能损失,在SMIC 90nm工艺下进行功耗分析,其指令访问的功耗降低50%。相似文献

5.

基于亚阈值漏电流的数据Cache低功耗控制策略研究

赵世凡樊晓桠李玉发《计算机测量与控制》2010,18(3)

随着工艺尺寸及处理器频率的提高,Cache的功耗已经成为处理器功耗的重要来源,数据Cache的亚阈值漏电流功耗在总功耗中的比重也在上升;提出一种通过降低未被访问的Cache line的亚阈值漏电流功耗来降低整个数据Cache功耗的控制策略;该策略对所有Cache line周期性地提供低电压,从而降低了SRAM单元的亚阈值漏电流;当某一行被访问时,提供正常的电压,直到下一次被周期性地控制提供低电压;仿真结果显示,此策略以较少的硬件代价和访问延迟显著地降低了数据Cache的亚阈值漏电流功耗。相似文献

6.

“龙腾”R2微处理器Cache单元的设计与实现

屈文新樊晓桠《计算机工程与应用》2006,42(17):22-25

合理地组织一个多级的高速缓冲存储器(Cache)是一种有效的减少存储器访问延迟的方法。论文提出了一种设计32位超标量微处理器Cache单元的结构,讨论了一级Cache、二级Cache设计中的关键技术,介绍了Cache一致性协议的实现,满足了“龙腾”R2微处理器芯片的设计要求。整个芯片采用0.18umCMOS工艺实现,芯片面积在4.1mm×4.1mm之内,微处理器核心频率超过233MHz,功耗小于1.5W。相似文献

7.

基于超窄数据的低功耗数据Cache方案 总被引：2，自引：0，他引：2

马志强季振洲胡铭曾《计算机研究与发展》2007,44(5):775-781

降低耗电量已经成为当前最重要的设计问题之一.现代微处理器多采用片上Cache来弥合主存储器与中央处理器(CPU)之间的巨大速度差异,但Cache也成为处理器功耗的主要来源,设计低功耗的Cache存储体变得越来越重要.仅需要很少的几位就可以存储的超窄数据(VNV)在Cache的存储和访问中都占有很大的比例.据此,提出了一种基于超窄数据的低功耗Cache结构(VNVC).在VNVC中,数据存储体被分为低位存储体和高位存储体两部分.在标志位控制下,用来存放超窄数据的高存储单元将被关闭,以节省其动态和静态功耗.VNVC仅通过改进存储体来获得低功耗,不需要额外的辅助硬件,并且不影响原有Cache的性能,所以适合于各种Cache组织结构.采用12个Spec2000测试程序的仿真结果表明,4位宽度的超窄数据可以获得最大的节省率,平均可节省动态功耗29.85%、静态功耗29.94%. 相似文献

8.

多核处理器可重构Cache功耗计算方法的研究

《计算机科学》2014,(Z1)

多核动态可重构Cache是解决Cache功耗困扰的一个重要方法。现有Cache功耗模拟器并不能很好地支持多核动态可重构Cache功耗研究,通过对多核动态可重构Cache的功耗模型进行研究,找到了计算可重构Cache的方法和思路,应用CACTI来分别构建各个组成结构的Cache功耗模型,以较为准确地测算可重构Cache的功耗。在Simics模拟器下构建动态可重构Cache,运行测试程序,对比传统的体系结构,可重构Cache的功耗能够得到10.4%的降低。同时,实验中发现功耗的降低不仅仅是动态可重构Cache贡献的,而是由系统综合产生的,因此在低功耗设计中,要综合考虑整体系统的功耗和性能,避免片面地考虑Cache结构而导致整体功耗的提高。相似文献

9.

基于SRAM和STT-RAM的混合指令Cache设计

皇甫晓妍樊晓桠黄小平《计算机工程与应用》2015,51(12):43-48

随着工艺尺寸减小,传统基于SRAM的片上Cache的漏电流功耗成指数增长,阻碍了片上Cache容量的增加。基于牺牲者Cache的原理,利用SRAM写速度快,STT-RAM的非易失性、高密度、极低漏电流功耗等特性设计了一种基于SRAM和STT-RAM的混合型指令Cache。通过实验证明,该混合型指令Cache与传统基于SRAM的指令Cache相比,在不增加指令Cache面积的情况下,增加了指令Cache容量,并显著提高了指令Cache的命中率。相似文献

10.

一种基于统计信息的Cache漏流功耗估算模型

下载免费PDF全文

周宏伟张承义张民选《计算机工程与科学》2007,29(6):81-83

本文提出了一种基于统计信息的Cache漏流功耗估算模型。该模型通过对Cache访问间隔时间的统计,估算出不同衰退间隔条件下Cache的漏流功耗。根据该模型所设计的Cache 漏流功耗模拟器与Hotleakage漏流功耗模拟器相比,对于Cache漏流功耗估算的结果平均偏差小于3.46%。该模型可以用于Sleep Cache与Drowsy Cache中,估算不同衰退间隔下Cache漏流功耗比率,选取最优衰退间隔,最大程度地降低Cache漏流功耗。相似文献

11.

Leakage-efficient design of value predictors through state and non-state preserving techniques

Juan M. Cebrián Juan L. Aragón José M. García Stefanos Kaxiras 《The Journal of supercomputing》2011,55(1):28-50

In the last decade computer engineers have faced changes in the way microprocessors are designed. New microprocessors do not only need to be faster than the previous generation, but also be feasible in terms of energy consumption and thermal dissipation. Recently, a new challenge appeared for computer engineers, the static power consumption. As process technology advances toward deep submicron, the static power component becomes a serious problem, especially for large on-chip array structures such as caches or prediction tables, and it must be taken into consideration. We can fight to reduce leakage power in two different ways: we can switch off the structure, reducing its leakage to zero but losing its contents (non-state preserving techniques), or we can lower its voltage (state preserving techniques), obtaining less savings but being able to restore the state of the structure in a reasonable time. 相似文献

12.

一种嵌入式处理器的动态可重构Cache设计 总被引：1，自引：0，他引：1

张毅汪东升《计算机工程与应用》2004,40(8):94-96,232

一般的处理器芯片都有片上高速缓存Cache,它一般是由固定大小的一级Cache(L1)和二级Cache(L2)构成,文章介绍了一种在嵌入式处理器设计中实现的动态可重构Cache。动态可重构Cache的思想最早是罗彻斯特大学(UniversityofRochester)的学者在他们的一篇关于存储层次的论文1中提出的,当时主要是针对高性能的超标量通用处理器。在此嵌入式处理器设计过程中,笔者创造性地继承了这一思想。通过增加少量硬件以及编译器的配合,在嵌入式处理器中L1Cache和L2Cache总体大小不变的情况下,L1Cache和L2Cache的大小可以根据具体的应用程序动态配置。通过对高速缓存的动态配置,不仅可以有效地提高Cache的命中率,还能够有效降低处理器的功耗。相似文献

13.

Performance linked dynamic cache tuning: A static energy reduction approach in tiled CMPs

《Microprocessors and Microsystems》2017

Advancement in semiconductor technology increases power density in recent Chip Multi-Processors (CMPs) which significantly increases the leakage energy consumptions of on-chip Last Level Caches (LLCs). Performance linked dynamic tuning in LLC size is a promising option for reducing the cache leakage.This paper reduces static power consumption by dynamically shutting down or turning on cache banks based upon system performance and cache bank usage statistics. Shutting down of a cache bank remaps its future requests to another active bank, called as target bank. The proposed method is evaluated on three different implementation policies, viz (1) The system can decide to shutdown or turn-on some cache banks periodically throughout the process execution. (2) The system allows to shutdown banks initially and once the bank restarting initiates, no more shutdown is permitted further. (3) This policy resizes cache like first policy with some predefined time slices, in which cache cannot be resized.For a 4MB 4 way set associative L2 cache, experimental analysis shows 66% reduction in static energy with 29% gain in Energy Delay Product (EDP) for first strategy; for the second policy, static power is reduced by 59% with 27% savings in EDP. Finally, last policy saves 65% in static power and 30% in EDP with minimal performance penalty. 相似文献

14.

片上光网络:一种新型片上互连网络

计永兴钱悦崔大为窦文华《计算机工程与科学》2011,33(4):56-61

随着单个芯片上集成的处理器的个数越来越多,传统的电互连网络已经无法满足对互连网络性能的需求,需要一种新的互连方式,因此光互连网络技术应运而生.目前,电互连的片上网络在功耗、性能、带宽、延迟等方面遇到了瓶颈,而光互连作为一种新的互连方式引用到片上网络具有低损耗、高吞吐率、低延迟等无可比拟的优势.本文主要探讨了片上光网络的... 相似文献

15.

A charge-based on-chip adaptation Kohonen neural network.

Y He U Cilingiroglu 《Neural Networks, IEEE Transactions on》1993,4(3):462-469

A charge-based on-chip synapse adaptation Kohonen neural network circuit is proposed. The properties of the approach are low power dissipation and high density due to the charge transfer mechanism and the novel compact device configurations. The prototype chip which contains 12x10 synapses with a density of 190 synapses/mm(2 ) was fabricated with 2-mum standard CMOS technology. The experimental results from the prototype chip demonstrated successful unsupervised learning and classification as theoretically predicted. 相似文献

16.

基于预缓冲机制的低功耗指令Cache

下载免费PDF全文

王冶张盛兵王党辉《计算机工程》2012,38(1):268-269,272

为降低微处理器中片上Cache的能耗,设计一种基于预缓冲机制的指令Cache。通过预缓冲控制部件的预测,使处理器需要的指令尽可能在缓冲区命中,从而避免访问指令Cache所造成的功耗。对7个测试程序的仿真结果表明,预缓冲机制能节省23.23%的处理器功耗,程序执行性能平均提升7.53%。相似文献

17.

Applying real-time interface and calculus for dynamic power management in hard real-time systems

Kai Huang Luca Santinelli Jian-Jia Chen Lothar Thiele Giorgio C. Buttazzo 《Real-Time Systems》2011,47(2):163-193

Power dissipation has been an important design issue for a wide range of computer systems in the past decades. Dynamic power consumption due to signal switching activities and static power consumption due to leakage current are the two major sources of power consumption in a CMOS circuit. As CMOS technology advances towards deep sub-micron domain, static power dissipation is comparable to or even more than dynamic power dissipation. This article explores how to apply dynamic power management to reduce static power for hard real-time systems. We propose online algorithms that adaptively control the power mode of a system, procrastinating the processing of arrived events as late as possible. To cope with multiple event streams with different characteristics, we provide solutions for preemptive earliest-deadline-first and fixed-priority scheduling policies. By adopting a worst-case interval-based abstraction, our approach can not only tackle arbitrary event arrivals, e.g., with burstiness, but also guarantee hard real-time requirements with respect to both timing and backlog constraints. We also present extensive simulation results to demonstrate the effectiveness of our approaches. 相似文献

18.

Optimization Techniques and Efficient Verification for 32-bit Embedded Floating-point RISC Microprocessor

Haijun Sun Zhibiao Shao Songtao Wang Gang Zou 《通讯和计算机》2005,2(5):41-45

A set of innovative optimization techniques are introduced in the 32.bit floating point RISC microprocessor for high performance operation, which include modified redundant Booth-3 algorithm for fast multiplication or division, dynamic SRAM mode control scheme for low power dissipation, embedded bus preselector improving the performance of bus interface, and the large capacity on-chip memory decreasing the amount of traffic with an external memory. The processor has been verified, each instruction and its random combinations run correctly, and the chip achieves 0.98 mA/MHz at 3.3V power supply, The chip testing shows that optimization techniques have improved the speed and quality of the processor with 38%, boosted frequency and 39% reduced oower dissipation. 相似文献

19.

基于记录缓冲的低功耗指令Cache方案 总被引：1，自引：1，他引：1

马志强季振洲胡铭曾《计算机研究与发展》2006,43(4):744-751

现代微处理器大多采用片上Cache来缓解主存储器与中央处理器(CPU)之间速度的巨大差异,但Cache也成为处理器功耗的主要来源,尤其是其中大部分功耗来自于指令Cache.采用缓冲器可以过滤掉大部分的指令Cache访问,从而降低功耗,但仍存在相当程度不必要的存储体访问,据此提出了一种基于记录缓冲的低功耗指令Cache结构RBC.通过记录缓冲器和对存储体的改造,RBC能够过滤大部分不必要的存储体访问,有效地降低了Cache的功耗.对10个SPEC2000标准测试程序的仿真结果表明,与传统基于缓冲器的Cache结构相比,在仅牺牲6.01%处理器性能和3.75%面积的基础上,该方案可以节省24.33%的指令Cache功耗. 相似文献

20.

Dual partitioning multicasting for high-performance on-chip networks

Jianhua Li Liang Shi Chun Jason Xue Yinlong Xu 《Journal of Parallel and Distributed Computing》2014

As the number of cores integrated onto a single chip increases, power dissipation and network latency become ever-increasingly stringent. On-chip network provides an efficient and scalable interconnection paradigm for chip multiprocessors (CMPs), wherein one-to-many (multicast) communication is universal for such platforms. Without efficient multicasting support, traditional unicasting on-chip networks will be low efficiency in tackling such multicast communication. In this paper, we propose Dual Partitioning Multicasting (DPM) to reduce packet latency and balance network resource utilization. Specifically, DPM scheme adaptively makes routing decisions based on the network load-balance level as well as the link sharing patterns characterized by the distribution of the multicasting destinations. Extensive experimental results for synthetic traffic as well as real applications show that compared with the recently proposed RPM scheme, DPM significantly reduces the average packet latency and mitigates the network power consumption. More importantly, DPM is highly scalable for future on-chip networks with heavy traffic load and varieties of traffic patterns. 相似文献