首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 171 毫秒
1.
现代嵌入式处理器中指令高速缓存的功耗十分显著,对此提出一种基于路访问轨迹的组相联指令高速缓存的低功耗策略,利用改进的指令高速缓存和转移目标缓存建立和维护运行时指令高速缓存的路访问轨迹来减少指令高速缓存命中检测及无关路访问.进一步提出了基于跨行访问前驱指针、转移前驱状态、转移前驱指针及转移目标索引的路访问轨迹信息维护策略用以降低信息重建的频度,从而更有效地利用已建立的路访问轨迹信息.实验结果表明:采用优化后的路访问轨迹策略的指令高速缓存的标志存储器访问和数据存储器访问分别降低到传统指令高速缓存的3.60%和27.70%.  相似文献   

2.
针对现有高速缓存技术计算方法复杂、适用性差的问题,提出基于统计分析的指令高速缓存优化技术。采用GUN覆盖率分析工具和性能分析工具对代码进行静态分析,降低优化过程中的计算复杂度。在软件代码方面,通过优化的缓存块着色算法、地址段静态锁定、代码段选择性不缓存等技术,提高指令高速缓存的读取效率。给出缓存锁定选择排序公式,用于判断代码段是否锁定或不缓存,有效增加指令高速缓存的利用效率。实验结果表明,该优化技术能使程序执行时间平均减少8%,缓存命中率平均提高23%。  相似文献   

3.
在现代微处理器中,指令缓存的Tag读取、比较消耗了指令缓存较大比例的能耗。提出一种基于推断的低能耗指令缓存:不对称指令缓存。根据跳转指令比例低的特点,在该结构中区别处理跳转指令和顺序指令,使用和数据不完全对应的简化标记管理位。该结构采用了命中推断和变长指令取指两种创新技术,其中基于命中推断技术解决了指令缓存命中时Tag比较过多的问题;使用变长指令取指技术提高了顺序指令块的命中率。实验结果表明,对于选取的SPEC2006测试程序,不对称指令缓存结构较常规L1指令Cache取指能耗下降了40%~60%,比无标记指令缓存结构TH IC能耗降低了9%;取指ED2P方面,较常规L1指令Cache优化约50%,比TH IC结构优化约17%。  相似文献   

4.
随着工艺的持续进步,多核处理器集成了越来越多的核以及片上缓存系统,因此利用非一致缓存架构(NUCA)应对片上多核处理器的缓存系统中逐渐增大的线延迟。高效的缓存块迁移策略对整个缓存系统至关重要。当前动态非一致缓存架构(D-NUCA)中的缓存块迁移策略未考虑缓存块的历史访问信息,导致缓存块在不同的bank之间抖动从而增加缓存块的访问延迟。为此,提出一种重用感知的缓存块迁移(RABM)策略,采用缓存块的历史迁移信息来预测将来的缓存块迁移,从而提升D-NUCA的性能以及降低整个缓存系统的功耗。基于PARSEC基准测试程序的全系统仿真结果显示,与D-NUCA相比,基于RABM的D-NUCA可以使每时钟周期指令数平均提高9.6%,片上缓存系统功耗降低14%。  相似文献   

5.
功耗是当今处理器设计领域的重要问题之一.随着多核处理器的普及,片上缓存占有了越来越多的芯片面积和功耗.提出一种带有无效缓存路访问过滤机制的低功耗高速缓存结构来降低CPU的动态功耗,具体为,通过无效缓存块的预先检查(Pre-Invalid Way Checking,PIWC)消除对无效缓存路的访问,及通过不匹配缓存路的预先检测(Pre-Mismatch Way Detecting,PMWD)消除对tag低位不匹配缓存路的访问.对实际程序的测试表明,65.2%-88.9%缓存路的无效访问可以通过以上方法被消除,约60.9%-85.6%由缓存访问带来的动态能耗从而被降低.同时,跟tag-data顺序访问方法相比,对于大多数程序,我们的方法可以获得5.1%-13.8%的节能效果提升.  相似文献   

6.
基于循环的指令高速缓存访问预测方法   总被引:1,自引:0,他引:1  
为了减少高速缓存访问功耗,提出了一种针对循环的基于历史访问路径的指令高速缓存访问预测方法。该方法以循环作为高速缓存访问路预测行为开启的先决条件,通过指令高速缓存的历史访问路径训练预测器。当循环体再次进入时选择对应的访问路径预测器,获取目标指令高速缓存的路进行访问,降低访问功耗。并进一步提出多路径路预测方法,以得到更高的预测准确率。基于Powerstone测试基准的实验结果表明,该预测方法能达到99%的预测准确率。相比传统的指令高速缓存,使用本方法的高速缓存可平均降低65%的访问功耗,仅增加约0.2%的平均指令高速缓存访问周期。  相似文献   

7.
嵌入式Flash(e Flash)在SoC中的运用日益广泛,而Flash较慢的读取速度与处理器高频取指之间的矛盾愈发突出。针对该问题,在Flash控制器中引入Cache机制,并运用组相联映射、优化的"最近最少使用"LRU(Least Recently Used)替换算法、流水预填充结构对Cache进行多方面优化。与未加入Cache机制的Flash控制器相比,加入Cache机制的Flash控制器可使处理器取指时间节省38%。  相似文献   

8.
李明江  段星辉  陈仁 《计算机工程与设计》2021,42(9):2696-2700,后插1
针对不带大容量缓存的固态存储设备随机读取性能差的问题,提出预取和压缩用户热数据映射表的算法,来提升热数据的访问速度.该算法包括把顺序写入的数据和随机写入的数据分开存储、压缩连续物理地址映射、后台预取热数据映射表等方法,这些方法提升了热数据的映射命中率,减少了闪存的访问次数,达到了改善系统读取性能的目的.实验结果表明,该方法能减少将近一半的随机读取延时,读取速度在原来的基础上翻倍.该算法能显著改善那些不带大容量缓存的移动存储设备及消费级固态硬盘使用者的用户体验.  相似文献   

9.
提高数据吞吐率和降低能耗对数据中心具有重要作用。Flash具有存储密度高、功耗低的特性。采用Flash作为磁盘存储的缓存来构建两级缓存结构的存储系统是提高数据吞吐率和降低能耗的有效方法之一。本文首先介绍了Flash、基于Flash文件系统等相关知识,其次详细阐述了Flash存储的三种应用模式及其结构特性,接着介绍了针对两级缓存结构的调度策略,最后对本文进行了总结和展望。  相似文献   

10.
半导体工艺的进步使片上可以集成更多的处理核心,对于消耗较多面积和功耗的存储单元,如何有效地减小面积、降低功耗是片上多核研究的一个重要方向。软件指令缓存技术是降低指令存储复杂性,以及降低功耗的有效方式,本文深入对比了硬件Cache结构和软件指令缓存结构,并且详细分析了两款典型的软件指令缓存结构,总结了其特点和需要解决的关键问题,为片上多核的指令存储设计提供了参考。  相似文献   

11.
The power consumed by memory systems accounts for 45% of the total power consumed by an embedded system, and the power consumed during a memory access is 10 times higher than during a cache access. Thus, increasing the cache hit rate can effectively reduce the power consumption of the memory system and improve system performance. In this study, we increased the cache hit rate and reduced the cache-access power consumption by developing a new cache architecture known as a single linked cache (SLC) that stores frequently executed instructions. SLC has the features of low power consumption and low access delay, similar to a direct mapping cache, and a high cache hit rate similar to a two way-set associative cache by adding a new link field. In addition, we developed another design known as a multiple linked caches (MLC) to further reduce the power consumption during each cache access and avoid unnecessary cache accesses when the requested data is absent from the cache. In MLC, the linked cache is split into several small linked caches that store frequently executed instructions to reduce the power consumption during each access. To avoid unnecessary cache accesses when a requested instruction is not in the linked caches, the addresses of the frequently executed blocks are recorded in the branch target buffer (BTB). By consulting the BTB, a processor can access the memory to obtain the requested instruction directly if the instruction is not in the cache. In the simulation results, our method performed better than selective compression, traditional cache, and filter cache in terms of the cache hit rate, power consumption, and execution time.  相似文献   

12.
同步数据触发体系结构SDTA将传统指令级并行细化到微操作级并行,具有较高的数据处理能力,但其特殊的指令格式及指令特性,给指令Cache访问带来了挑战。指令预取技术能够有效地降低指令Cache的访问失效率,增强处理器取指能力,提高性能。本文分析了SDTA指令集特性,提出了一种适合SDTA指令集特性的软硬件相结合的混合指令预取机制,采用硬件预取引擎和软件提示相结合进行预取。该方法能够有效地提高指令Cache命中率,且具有实现简单、无效预取率低、不会增加代码体积等特点。  相似文献   

13.
The design of a high performance fetch architecture can be challenging due to poor interconnect scaling and energy concerns. Way prediction has been presented as one means of scaling the fetch engine to shorter cycle times, while providing energy efficient instruction cache accesses. However, way prediction requires additional complexity to handle mispredictions.In this paper, we examine a high-bandwidth fetch architecture augmented with an instruction cache way predictor. We compare the performance and energy efficiency of this architecture to both a serial access cache and a parallel access cache. Our results show that a serial fetch architecture achieves approximately the same energy reduction and performance as way prediction architectures, without the added structures and recovery complexity needed for way prediction.  相似文献   

14.
Conventional remote data access middlewares usually provide client applications with either a pre‐staging scheme or an on‐demand access scheme to fetch data. The pre‐staging scheme uses parallel downloads to fetch a completed input file from multiple data sources, even when only a tiny file fragment is required. Such a transfer scheme consumes unnecessary data transmission time and storage space. In contrast, the on‐demand scheme downloads only the required data blocks from a single data source and does not fully utilize the downstream bandwidth of the computing nodes. This paper presents a middleware called ‘Spigot’ that facilitates legacy (grid‐unaware) applications to transparently access remote data by using native I/O function calls. Spigot uses the on‐demand concept to avoid unnecessary data transfer and adopts a co‐allocation download algorithm to improve the data transfer performance. Moreover, it uses the pre‐fetching strategy to reduce the data waiting time by overlapping data acquisition and data processing. It also provides the client application with its own user‐level cache, which is advantageous since a larger cache space is available in comparison with the kernel‐level cache. Further, it is easy to maintain data consistency between Spigot nodes. The experimental results indicate that Spigot achieves superior performance in reducing the data waiting time than the pre‐staging and the on‐demand access schemes. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

15.
Energy consumption and power dissipation are important concerns in the design of embedded systems and they will become even more crucial with finer process geometry, higher frequencies, deeper pipelines and wider issue designs. In particular, the instruction cache consumes more energy than any other processor module, especially with commonly used highly associative CAM-based implementations.Two energy-efficient approaches for highly associative CAM-based instruction cache designs are presented by means of using a segmented wordline and a predictor-based instruction fetch mechanism. The latter is based on the fact that not all instructions in a given I-cache fetch are used due to taken branches. The proposed Fetch Mask Predictor unit determines which instructions in a cache access will actually be used to avoid fetching any of the other instructions. Both proposed approaches are evaluated for an embedded 4-wide issue processor in 100 nm technology. Experimental results show average I-cache energy savings of 48% and overall processor energy savings of 19%.  相似文献   

16.
The instruction fetch unit (IFU) usually dissipates a considerable portion of total chip power. In traditional IFU architectures, as soon as the fetch address is generated, it needs to be sent to the instruction cache and TLB arrays for instruction fetch. Since limited work can be done by the power-saving logic after the fetch address generation and before the instruction fetch, previous power-saving approaches usually suffer from the unnecessary restrictions from traditional IFU architectures. In this paper, we present CASA, a new power-aware IFU architecture, which effectively reduces the unnecessary restrictions on the power-saving approaches and provides sufficient time and information for the power-saving logic of both instruction cache and TLB. By analyzing, recording, and utilizing the key information of the dynamic instruction flow early in the front-end pipeline, CASA brings the opportunity to maximize the power efficiency and minimize the performance overhead. Compared to the baseline configuration, the leakage and dynamic power of instruction cache is reduced by 89.7% and 64.1% respectively, and the dynamic power of instruction TLB is reduced by 90.2%. Meanwhile the performance degradation in the worst case is only 0.63%. Compared to previous state-of-the-art power-saving approaches, the CASA-based approach saves IFU power more effectively, incurs less performance overhead and achieves better scalability. It is promising that CASA can stimulate further work on architectural solutions to power-efficient IFU designs.  相似文献   

17.
基于NAND Flash的固态盘凭借其低延迟、低功耗、高可靠性等优点,已经开始应用于企业级服务器和高性能计算领域。针对固态盘相对较差的写性能及使用寿命有限等不足,提出了一种闪存转换层中基于页映射机制的自适应地址映射算法WAPFTL。该算法能够在地址转换过程中预测负载读写特性并自适应地调整地址映射信息缓存的策略。实验结果表明,WAPFTL能够高效协同利用负载的时间局部性和空间局部性,提高地址映射命中率,减少因地址映射而引起的额外写操作次数;同时,有效减少了垃圾回收次数,提高了SSD整体性能。  相似文献   

18.
在同时多线程处理器中,提高取指单元的吞吐率意味着各线程之间的Cache竞争更加激烈,而这种竞争又制约着取指单元吞吐率的提高。本文针对当前超长指令字体系结构的新特点,提出了一种同时提高取指单元和处理器吞吐率的方法。该方法通过尽可能早地作废取指流水线中的无效地址,减少了由无效取指导致的程序Cache冲突,也提高了整个处理器的性能。实验结果表明,该方法使处理器和取指单元的吞吐率均相对提高了12%~23%,而一级程序Cache的失效率则略微增加甚至降低。另外,它还能够减少10%~25%的一级程
程序Cache读访问,从而降低了处理器的功耗。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号