首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 265 毫秒
1.
嵌入式处理器中降低Cache缺失代价设计方法研究   总被引:2,自引:0,他引:2  
以龙芯1号处理器为研究对象,探讨了嵌入式处理器中降低Cache缺失代价的设计方法.通过分析处理器的结构特征,本文实现了在关键字优先基础上一次缺失下命中的非阻塞数据Cache,可以将处理器平均性能提高3.9%,同时利用局部性原理,在关键字优先非阻塞数据Cache的基础上,本文提出了一种类非阻塞的指令Cache设计方法,可以降低指令Cache的缺失代价,以较小的实现代价进一步将处理器平均性能提高7.7%.通过本文的工作,可以同时降低指令Cache和数据Cache的缺失代价,处理器的平均性能提高了11.6%.  相似文献   

2.
为了提高嵌入式系统中Cache的使用效率,针对不同类型的应用程序对指令和数据Cache的容量实时需求不同,提出一种滑动Cache组织方案.均衡考虑指令和数据Cache需求,动态地调整一级Cache的容量和配置.采用滑动Cache结构,不但降低了一级Cache的动态和静态泄漏功耗,而且还降低了整个处理器的动态功耗.模拟仿真结果表明,该方案在有效降低Cache功耗的同时能够提高Cache的综合性能.  相似文献   

3.
针对嵌入式处理器中日益明显的指令Cache漏功耗,提出了一种基于当前指令状态标志位的分支预测和返回目标寄存器映射的昏睡子块唤醒方法;该方法根据处理器执行过程中指令状态位提前判断分支指令的目标子块,同时设计了一种返回地址目标寄存器映射的结构,提前判断函数返回指令的目标子块。在消除唤醒延迟带来的性能损失基础上,提高了处理器的性能;通过实验对比,该方法可以减小36%的指令Cache静态功耗,同时处理器性能平均有13%的提高。  相似文献   

4.
嵌入式处理器的Cache结构研究   总被引:5,自引:0,他引:5  
针对嵌入式处理嚣结构的特点,探讨虚拟Cache的结构、性能及实施方法等进行,讨论了Cache的锁定来改进Cache的循环淘汰置换算法的可行性,并对基于ARM架构的嵌入式处理器的Cache结构特点作了介绍。  相似文献   

5.
嵌入式处理器中Cache的应用极大地提高了处理器的性能,同时Cache,尤其是指令Cache功耗占据了处理器很大一部分功耗,关闭不必要的tag SRAM和data SRAM的访问,可以极大地降低功耗。提出了一种流水化的指令Cache访问机制,关闭不必要的data SRAM的访问;并且通过记录指令Cache行的信息和预测下一行的Cache形成一个Cache行滑动窗口,关闭不必要的tag SRAM访问。所提出的方法没有性能损失,在SMIC 90nm工艺下进行功耗分析,其指令访问的功耗降低50%。  相似文献   

6.
基于记录缓冲的低功耗指令Cache方案   总被引:1,自引:1,他引:1  
现代微处理器大多采用片上Cache来缓解主存储器与中央处理器(CPU)之间速度的巨大差异,但Cache也成为处理器功耗的主要来源,尤其是其中大部分功耗来自于指令Cache.采用缓冲器可以过滤掉大部分的指令Cache访问,从而降低功耗,但仍存在相当程度不必要的存储体访问,据此提出了一种基于记录缓冲的低功耗指令Cache结构RBC.通过记录缓冲器和对存储体的改造,RBC能够过滤大部分不必要的存储体访问,有效地降低了Cache的功耗.对10个SPEC2000标准测试程序的仿真结果表明,与传统基于缓冲器的Cache结构相比,在仅牺牲6.01%处理器性能和3.75%面积的基础上,该方案可以节省24.33%的指令Cache功耗.  相似文献   

7.
一种低功耗高性能的滑动Cache方案   总被引:2,自引:0,他引:2  
Cache存储器的功耗占整个芯片功耗的主要部分.针对不同类型的应用程序对指令和数据Cache的容量实时需求不同,一种滑动Cache组织方案被提出.它均衡考虑指令和数据Cache需求,动态地调整一级Cache的容量和配置,消除了Cache中闲置部分产生的功耗.SPEC95仿真结果表明,采用滑动Cache结构不但降低了一级Cache的动态和静态泄漏功耗,而且还降低了整个处理器的动态功耗,提高了性能.滑动Cache比两种传统Cache结构和DRI结构的一级Cache平均动态功耗分别降低21.3%,19.52%和20.62%.采用滑动Cache结构与采用两种传统Cache结构和DRI结构相比,处理器平均动态功耗分别降低了8.84%,8.23%和10.31%,平均能量延迟乘积提高了12.25%,7.02%和13.39%.  相似文献   

8.
混合Cache的低功耗设计方案   总被引:1,自引:0,他引:1       下载免费PDF全文
在嵌入式处理器中,Cache的功耗所占的比重越来越大。为降低嵌入式系统中混合Cache的功耗,引入一种基于程序段的重构算法——PPBRA,并提出一种新的基于分类访问的可重构混合Cache结构,该方案能够根据不同程序段对Cache容量的需求,动态地分配混合Cache的指令路数和数据路数,还能够对混合Cache进行分类访问,过滤对不必要路的访问,从而实现降低混合Cache的功耗的目的。Mibench仿真结果表明,该方案在有效降低Cache功耗的同时,还能提高Cache的综合性能。  相似文献   

9.
为了提高密码嵌入式处理器的运行效率,给出了一种哈佛结构的高速缓存(Cache)设计,包括指令Cache(iCache)和数据Cache(dCache)。采用双端口RAM和较低的硬件开销设计了标签存储器和指令/数据存储器,并描述了iCache和dCache控制流程。实现时配置iCache容量为4KB、dCache容量为8KB,并完成了向密码嵌入式处理器的集成。FPGA验证结果表明其满足处理器的应用要求;性能分析结果表明,采用Cache比处理器直接访问主存在速度上至少提高5.26倍。  相似文献   

10.
一种嵌入式处理器的动态可重构Cache设计   总被引:1,自引:0,他引:1  
一般的处理器芯片都有片上高速缓存Cache,它一般是由固定大小的一级Cache(L1)和二级Cache(L2)构成,文章介绍了一种在嵌入式处理器设计中实现的动态可重构Cache。动态可重构Cache的思想最早是罗彻斯特大学(UniversityofRochester)的学者在他们的一篇关于存储层次的论文1中提出的,当时主要是针对高性能的超标量通用处理器。在此嵌入式处理器设计过程中,笔者创造性地继承了这一思想。通过增加少量硬件以及编译器的配合,在嵌入式处理器中L1Cache和L2Cache总体大小不变的情况下,L1Cache和L2Cache的大小可以根据具体的应用程序动态配置。通过对高速缓存的动态配置,不仅可以有效地提高Cache的命中率,还能够有效降低处理器的功耗。  相似文献   

11.
Processors in embedded systems mostly employ cache architectures in order to alleviate the access latency gap between processors and memory systems. Caches in embedded systems usually occupy a major fraction of the implemented chip area. The power dissipation of cache system thus constitutes a significant fraction of the power dissipated by the entire processor in embedded systems. In this paper, we propose the compressed tag architecture to reduce the power dissipation of the tag store in cache systems. We introduce a new tag-matching mechanism by using a locality buffer and a tag compression technique. The main power reduction feature of our proposal is the use of small tag space matching instead of full tag matching, with modest additional hardware costs. The simulation results show that the proposed model provides a power and energy-delay product reduction of up to 27.8% and 26.5%, respectively, while still providing a comparable level of system performance to regular cache systems.  相似文献   

12.
This paper presents a low-power tag organization for physically tagged caches in embedded processors with virtual memory support. An exceedingly small subset of tag bits is identified for each application hot-spot so that only these tag bits are used for cache access with no performance sacrifice as they provide complete address resolution. The minimal subset of physical tag bits is dynamically updated following the changes in the physical address space of the application. Operating system support is introduced in order to maintain the reduced tags during program execution. Efficient algorithms are incorporated within the memory allocator and the dynamic linker in order to achieve dynamic update of the reduced tags. The only hardware support needed within the I/D-caches is the support for disabling bitlines of the tag arrays. An extensive set of experimental results demonstrates the efficacy of the proposed approach.  相似文献   

13.
Cache misses in small, limited-associativityprimary caches very often replace live cache blocks, giventhe dominance of capacity and conflict misses. Towardsmotivating novel cache organizations, we study thecomparative characteristics of the virtual memoryaddress pairs involved in typical primary-cachecontention (block replacements) for the SPEC2000integer benchmarks. We focus on the cache tag bits, andresults show that (i) often just a few tag bits differbetween contending addresses, and (ii) accesses to certainsegments or page groups of the virtual address space (i.e.certain tag-bit groups) contend frequently. Cacheconsciousvirtual address space allocation can furtherreduce the number of conflicting tag bits. We mentiontwo directions for exploiting such page-level contentionpatterns to improve cache cost and performance.  相似文献   

14.
In present multi-core devices, the individual processors do not need to operate at the highest possible frequencies. Instead there is a need to reduce the power, complexity and area of individual processor components like caches. In this paper we propose a low area, high performance cache replacement policy for embedded processors called Hierarchical Non-Most-Recently-Used (H-NMRU). The H-NMRU is a parameterizable policy where we can trade-off performance with area. We extended the Dinero cache simulator with the H-NMRU policy and performed architectural exploration with a set of cellular and multimedia benchmarks. On a 16 way cache, a two level H-NMRU policy where the first and second levels have 8 and 2 branches, respectively, performs as good as the Pseudo-LRU policy with storage area saving of 27%. Compared to true LRU, H-NMRU on a 16 way cache saves huge amount of area (82%) with marginal increase of cache misses (3%). Similar result was also noticed on other cache like structures like branch target buffers. Therefore, the two level H-NMRU cache replacement policy (with associativity/2 and 2 branches on the two levels) is a very attractive option for caches on embedded processors with associativities greater than 4. We present a case-study where it can be used on the L2 cache with substantial gain in performance and area for single and dual core platforms.  相似文献   

15.
左琦  付宇卓  程秀兰  黄洋 《计算机工程》2006,32(1):237-239,275
为了提高性能,通用处理器中所广泛采用的cache技术被引入到了嵌入式处理器中。该文采用基于仿真的方法分析了嵌入式应用环境下几个主要的cache结构参数对cache性能的影响。在分析过程中,还考虑了不同主存实现方式带来的影响。  相似文献   

16.
17.
On-chip instruction cache is a potential power hungry component in embedded systems due to its large chip area and high access-frequency. Aiming at reducing power consumption of the on-chip cache, we propose a Reduced One-Bit Tag Instruction Cache (ROBTIC), where the cache size is judiciously reduced and the cache tag field only contains the least significant bit of the full-tag. We develop a cache operational control scheme for ROBTIC so that with the one-bit cache tag, the program locality can still be efficiently exploited. For applications where most of the memory accesses are localized, our cache can achieve similar performance as a traditional full-tag cache; however, the power consumption of the cache can be significantly reduced due to the much smaller cache size, narrower tag array (just one bit), and tinier tag comparison circuit being used. Experiments on a set of benchmarks implemented in CMOS 180 nm process technology demonstrate that our proposed design can reduce up to 27.3% dynamic power consumption and 30.9% area of the traditional cache when the cache size is fixed at 32 instructions, which outperforms the existing partial-tag based cache design. With the cache size customization, a further 47.8% power saving can be achieved. Our experimental results also show that when implemented in the deep sub-micron technologies where the leakage power is not ignorable, our design is still efficient - a coherent power saving trend (about 22%) has been observed for technologies from 130 nm down to 65 nm.  相似文献   

18.
A new cache architecture based on temporal and spatial locality   总被引:5,自引:0,他引:5  
A data cache system is designed as low power/high performance cache structure for embedded processors. Direct-mapped cache is a favorite choice for short cycle time, but suffers from high miss rate. Hence the proposed dual data cache is an approach to improve the miss ratio of direct-mapped cache without affecting this access time. The proposed cache system can exploit temporal and spatial locality effectively by maximizing the effective cache memory space for any given cache size. The proposed cache system consists of two caches, i.e., a direct-mapped cache with small block size and a fully associative spatial buffer with large block size. Temporal locality is utilized by caching candidate small blocks selectively into the direct-mapped cache. Also spatial locality can be utilized aggressively by fetching multiple neighboring small blocks whenever a cache miss occurs. According to the results of comparison and analysis, similar performance can be achieved by using four times smaller cache size comparing with the conventional direct-mapped cache.And it is shown that power consumption of the proposed cache can be reduced by around 4% comparing with the victim cache configuration.  相似文献   

19.
随着嵌入式处理器技术的不断发展以及人们对嵌入式设备性能的要求越来越高,嵌入式处理器由单核时代进入多核时代。然而,传统嵌入式系统软件开发方法还是基于单核模式,并没有利用嵌入式多核处理器多核并行化的特点,没有充分发挥嵌入式多核处理器的性能。虽然在PC平台上,多核并行化方法相对更成熟,但嵌入式多核处理器在处理器数目、Cache以及总线等方面有很大不同,嵌入式平台多核并行化并不能借助PC平台的实践方法,因此基于嵌入式平台研究多核并行化的方法是很有意义的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号