首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Java虚拟机即时编译器以方法为单位进行编译,编译器将字节码方法编译成可执行代码,并经过数据cache存入内存中,当再次执行到该代码段时,处理器需要从包含该代码段的内存区域取指令执行,如果该内存区域在数据cache中已经建立映射,就可以直接从数据cache中读取数据,读数据的性能就会有大幅度的提高.但是编译生成的大量可执行代码在cache中频繁替换,当生成代码被替换出cache后,代码再次执行时处理器必须访问速度较慢的主存储器,成为编译器的性能瓶颈.设计并实现了硬件cache锁机制,提出了一种软硬件协同设计的即时编译方法.通过该方法,生成代码执行时的cache失效次数降低了6.9%,SPECjvm2008中程序最高获得了17.9%的性能提升,平均性能提升4.2%.  相似文献   

2.
以优化压缩cache的替换策略为目标,提出一种优化的基于修正LRU的压缩cache替换策略MLRU-C。MLRU-C策略能利用压缩cache中额外的tag资源,形成影子tag机制来探测并修正LRU替换策略的错误替换决策,从而优化压缩cache替换策略的性能。实验结果表明,与传统LRU替换策略相比,MLRU-C平均能降低L2压缩cache失效率12.3%。  相似文献   

3.
片上多处理器中延迟和容量权衡的cache结构   总被引:1,自引:0,他引:1  
片上多处理器中二级cache的设计面临着延迟和容量不能同时满足的矛盾,私有结构有较小的命中延迟但是减少了cache的有效容量,共享结构能增加cache的有效容量但是有较长的命中延迟.提出了一种适用于CMP的cache结构--延迟和容量权衡的cache结构(TCLC).该结构是一种混合私有结构和共享结构的设计,核心思想是动态识别cache块的共享类型,根据不同共享类型分别对其进行优化,对私有cache块采用迁移的优化策略,对共享只读cache块采用复制的优化策略,对共享读写cache块采用中心放置的优化策略,以期达到访问延迟接近私有结构,有效容量接近共享结构的目的,从而缓解线延迟的影响,减少平均内存访问延迟.全系统模拟的实验结果表明,采用TCLC结构,相对于私有结构性能平均提高13.7%.相对于共享结构性能平均提高12%.  相似文献   

4.
在Web cache集群中,Web突发请求的频繁出现引发资源供给不足,造成系统性能显著下降.为有效处理Web突发请求,构建了同时使用本地资源和云资源的弹性Web cache集群.在弹性Web cache集群中,为提升系统性能,降低费用,提出一种自适应的负载模型.该模型可以动态自适应地调整,能够有效适用于异构Web cache集群.考虑到云结点的网络延迟,修正该模型得到云结点负载模型.基于以上负载模型,构造弹性Web cache集群的自适应负载均衡策略.与其他负载均衡策略相比较,使用该自适应负载均衡策略能够在弹性Web cache集群中实现高效的负载均衡.  相似文献   

5.
在分析现有体系结构级低功耗cache设计方案的基础上,提出了一种混合cache低功耗设计策略,通过在常规混合cache结构上增加一标志域来区分cache某组中的指令和数据,限制了处理器每次访问的路数,从而达到低功耗的效果。详细阐明了该方法的原理和硬件实现,并将其应用到自主研发的龙腾C2微处理器上。实验结果表明,该方法不损耗cache性能,面积牺牲仅1.45%,总功耗降低了23.1%。  相似文献   

6.
在分析现有体系结构级低功耗cache设计方案的基础上,提出了一种混合cache低功耗设计策略,通过在常规混合cache结构上增加一标志域来区分cache某组中的指令和数据,限制了处理器每次访问的路数,从而达到低功耗的效果。详细阐明了该方法的原理和硬件实现,并将其应用到自主研发的龙腾C2微处理器上。实验结果表明,该方法不损耗cache性能,面积牺牲仅1.45%,总功耗降低了23.1%。  相似文献   

7.
共享内存操作系统使用精心设计的锁来保护各种共享数据,对这些数据的访问需要首先获得对应的锁,当内核中同时有多个流程(系统调用、内核线程或中断处理程序等)试图获得同一个锁时会产生竞争,相关流程越多竞争就越激烈.随着系统中处理单元数目的增长,这些流程的数量也在不断增加,此时,对锁的竞争会影响系统的整体性能,甚至成为瓶颈.另一方面,操作系统与应用程序在同一处理器核上交替运行,因为硬件cache容量有限,导致操作系统的代码和数据经常替换掉应用程序的代码和数据.当应用程序重新被调度运行时,需从更慢速的cache,甚至从内存中读取这些代码和数据,从而降低了性能.通过在一台16核AMD节点上的相关测试,以上问题得到了量化验证,并针对这些问题提出了一种异构操作系统模型.在此模型下,应用程序和操作系统分别运行在不同的处理器核上,实验显示这种模式可以有效降低对锁的竞争和对cache的污染.  相似文献   

8.
本文描述一种由C导出的语言,增加了关键字‘cache’和‘vector’。以pcc为基础,该编译能编译其自身。存在后处理优化器。讨论了所生成代码的一个例子。  相似文献   

9.
代码Cache是动态优化系统的重要组成部分,利用代码Cache可以实现翻译代码的复用,利用软件管理代码Cache存储优化和代码翻译.代码Cache存储大小不等的超级块,超级块之间可能包含指向其它超级块的链接指针,因而会带来较高的替换开销.提出采用分组管理代码Cache的策略,该策略能够有效的平衡Cache管理的复杂性和Cache的失效率.  相似文献   

10.
张柏铖  袁道华  胡经  吴颖 《计算机工程》2005,31(22):133-135
提出了一个cache管理和重分配框架及其基于移动agent的实现。该框架采用cache混合分配方案,并在这种分配方案上应用路径预测算法和移动agent技术进行cache重分配,以支持移动Web浏览和提高无线WWW浏览效率。  相似文献   

11.
提出一种二进制翻译中代码Cache管理的LRC(Level-Region-Chunk)策略.其兼具全清空策略、FIFO策略和多级Cache的优点,并且考虑了程序的时间空间局部性、执行特性和替换开销,具有较好的性能,实现了代码Cache的高效管理.  相似文献   

12.
The static specification of operations executed in parallel using No Operations (NOPs) is another culprit to make code size to be increased in VLIW architecture. Some alternatives in the instruction encoding and memory subsystem are proposed to minimize the impact of NOP on the code size. One is the compressed cache using the packed encoding scheme and the other is the decompressed cache using the unpacked encoding scheme. The compressed cache shows high memory utilization but increases the pipeline branch penalty because it requires very complex fetch hardware. On the contrary, the fetch overhead can be decreased in the decompressed cache because the unpacked encoding scheme allows an instruction to be issued to the pipeline without any recovery process. However, it has a shortcoming that the memory utilization is deteriorated due to the memory allocation irrespective of the number of useful operations. In this research, a new instruction encoding scheme called a semi-packed encoding scheme and the section cache, which enables effective store and retrieval of semi-packed instructions, are proposed. This can decrease the hardware complexity to fetch an instruction and the wasted memory space due to NOPs via the partially fixed length of an instruction. The experimental results reveal that the memory utilization in the section cache is 3.4 times higher than in the decompressed cache. The memory subsystem using the section cache can provide about 15% performance improvement with the moderate size of chip area.  相似文献   

13.
In parallel processor systems, the performance of individual processors is a key factor in overall performance. Processor performance is strongly affected by the behavior of cache memory in that high hit rates are essential for high performance. Hit rates are lowered when collisions on placing lines in the cache force a cache line to be replaced before it has been used to best effect. Spatial cache collisions occur if data structures and data access patterns are misaligned. We describe a mathematical scheme to improve alignment and enhance performance in applications which have moderate-to-large numbers of arrays, where various dimensionalities are involved in localized computation and array access patterns are sequential. These properties are common in many computational modeling applications. Furthermore, the scheme provides a single solution when an application is targeted to run on various numbers of processors in power-of-two sizes. The applicability of the proposed scheme is demonstrated on testbed code for an air quality modeling problem  相似文献   

14.
一种基于移动环境的语义缓存一致性维护技术   总被引:1,自引:0,他引:1  
在深入研究缓存失效广播技术和语义缓存的基础上,提出了一种新的基于移动环境的语义缓存一致性维护技术——基于语义缓存的异步有状态(BSCAS)技术。BSCAS技术可以支持移动客户的各种断接方式,减少无线通信的开销,让移动客户有更好的自治性。  相似文献   

15.
郭晨  郑烇  丁尧  王嵩 《计算机系统应用》2017,26(12):165-169
缓存技术是数据命名网络(Named data networking,NDN)的关键技术之一. NDN传统的LCE缓存策略会造成较大的冗余. 改进的RCOne策略采用随机放置的方法,没有利用任何内容、节点信息,对网络缓存性能的提升有限. Betw策略只考虑到节点介数,导致高介数节点缓存更替频繁,当节点缓存容量远小于内容总量时,缓存性能下降. 为了解决这些问题,本文提出一种结合内容热度与节点介数的新型缓存策略HotBetw(Hot content placed on node with high Betweenness),充分利用内容与节点信息选择最佳的位置放置缓存. 仿真实验表明相对于典型NDN缓存策略,HotBetw缓存策略在提高缓存命中率、降低平均跳数方面具有很好的效果.  相似文献   

16.
Multicomputer cache simulation results derived from address traces collected from an Intel iPSC/2 hypercube multicomponent are presented. The primary emphasis is on examining how increasing the number of processor nodes executing a parallel application affects the overall multicomputer cache performance. The effects on multicomputer direct-mapped cache performance of application-specific data partitioning, data access patterns, communication distribution, and communication frequency are illustrated. The effects of system accesses on total cache performance are explored, as well as the reasons for application-specific differences in cache behavior for system and user accesses. Comparing user code results with full user and system code analysis reveals the significant effect of system accesses, and this effect increases with multicomputer size. The time distribution of an application's message-passing operations is found to more strongly affect cache performance than the total amount of time spent in message-passing code  相似文献   

17.
To confer the robustness and high quality of service, modern computing architectures running real-time applications should provide high system performance and high timing predictability. Cache memory is used to improve performance by bridging the speed gap between the main memory and CPU. However, the cache introduces timing unpredictability creating serious challenges for real-time applications. Herein, we introduce a miss table (MT) based cache locking scheme at level-2 (L2) cache to further improve the timing predictability and system performance/power ratio. The MT holds information of block addresses related to the application being processed which cause most cache misses if not locked. Information in MT is used for efficient selection of the blocks to be locked and victim blocks to be replaced. This MT based approach improves timing predictability by locking important blocks with the highest number of misses inside the cache for the entire execution time. In addition, this technique decreases the average delay per task and total power consumption by reducing cache misses and avoiding unnecessary data transfers. This MT based solution is effective for both uniprocessors and multicores. We evaluate the proposed MT-based cache locking scheme by simulating an 8-core processor with 2 levels of caches using MPEG4 decoding, H.264/AVC decoding, FFT, and MI workloads. Experimental results show that in addition to improving the predictability, a reduction of 21% in mean delay per task and a reduction of 18% in total power consumption are achieved for MPEG4 (and H.264/AVC) by using MT and locking 25% of the L2. The MT results in about 5% delay and power reductions on these video applications, possibly more on applications with worse cache behavior. For the FFT and MI (and other) applications whose code fits inside the level-1 instruction (I1) cache, the mean delay per task increases only by 3% and total power consumption increases by 2% due to the addition of the MT.  相似文献   

18.
查询结果缓存可以对查询结果的文档标识符集合或者实际的返回页面进行缓存,以提高用户查询的响应速度,相应的缓存形式可以分别称之为标识符缓存或页面缓存。对于固定大小的内存,标识符缓存可以获得更高的命中率,而页面缓存可以达到更高的响应速度。该文根据用户查询访问的时间局部性和空间局部性,提出了一种新颖的基于时空局部性的层次化结果缓存机制。首先,该机制将固定大小的结果缓存划分为两层:页面缓存和标识符缓存。对于用户提交的查询,该机制会首先使用第一层的页面缓存进行应答,如果未能命中,则继续尝试使用第二层的标识符缓存。实验显示这种层次化的缓存机制较传统的仅依赖于单一缓存形式的机制,在平均查询响应时间上,取得了可观的性能提升:例如,相对单纯的页面缓存,平均达到9%,最好情况下达到11%。其次,该机制在标识符缓存的基础上,设计了一种启发式的预取策略,对用户查询检索的空间局部性进行挖掘。实验显示,这种预取策略的融合,能进一步促进检索系统性能的有效提升,从而最终建立起一套时空完备的、有效的结果缓存机制。  相似文献   

19.
高速邮件监控审计研究   总被引:1,自引:0,他引:1  
为了满足企业级高速网络中邮件监控需求,提出了基于内存映射和libnids改进框架的邮件监控审计方案.该方案首先通过改进libnids库内核、用户级缓存和内存映射文件技术减小I/O开销,高效捕获存、储原始邮件数据,然后对邮件协议进行了深入地分析,将捕获的数据简化封装成MIME格式,采用多线程技术对其进行还原,最后采用基于Wu_Manber多模式匹配算法对还原的邮件内容进行审计,生成强大的审计报表.测试结果表明,该系统能为企业管理部门提供一个高效的邮件监管工具.  相似文献   

20.
Improved cache performance is crucial to improved code performance. By visualizing cache behavior as the program is simulated, this memory analysis tool dynamically represents cache phenomena. The results can guide developers in making better software and hardware optimizations. The authors have developed a cache visualization tool which both dynamically visualizes cache content and provides related statistics  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号