期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

闵庆豪张为华《计算机系统应用》2015,24(1):1-8

随着多核技术的迅速发展,并发处理和大批量数据操作成为主流,而为了应对更加复杂的程序行为和愈发庞大的数据量,缓存系统的效率也正面临着严重的挑战.如何在复杂的多核环境中更高效的使用缓存,提高缓存响应速度和数据吞吐量一直是体系结构领域的重要课题和研究热点.针对多核环境中缓存的应用场景进行分析,从缓存的效率,内容和共享使用三个角度进行归纳和总结,提出缓存应用的时延问题,容量问题,共享问题等具有针对性的问题,并且对针对这些具体问题和情境的缓存优化技术进行总结和综述,同时对缓存优化的一些新技术和新的优化角度进行探讨,最后对多核缓存优化技术的发展前景进行展望. 相似文献

2.

Cache profiling and the SPEC benchmarks: a case study

Lebeck A.R. Wood D.A. 《Computer》1994,27(10):15-26

A vital tool-box component, the CProf cache profiling system lets programmers identify hot spots by providing cache performance information at the source-line and data-structure level. Our purpose is to introduce a broad audience to cache performance profiling and tuning techniques. Although used sporadically in the supercomputer and multiprocessor communities, these techniques also have broad applicability to programs running on fast uniprocessor workstations. We show that cache profiling, using our CProf cache profiling system, improves program performance by focusing a programmer's attention on problematic code sections and providing insight into appropriate program transformations 相似文献

3.

普适计算中基于上下文信息的缓存管理算法

陈慕冰赵季中郗旻齐勇马兆丰《小型微型计算机系统》2007,28(10):1793-1798

普适计算的一个常见的难题是断连操作,而移动设备在断连状态下对数据进行操作又是必要的.为了支持断连操作,需要在移动客户端上进行数据缓存.数据收集的目的是在断连前把用户将来可能访问的数据预先存储到本地缓存,因此收集过程的结果将对断连操作的性能产生重大影响.目前针对断连操作的数据收集算法,对缓存命中都有一定效果,为了进一步提高缓存命中率,本文根据上下文信息进行数据收集算法;然后在访问数据时同步建立数据之间的关联,并在数据关联的基础上自动选择要收集的数据集;最后将结果按缓存驻留时间和访问次数进行缓存替换.模拟试验结果表明,此算法对于存储容量小的手持移动设备可以有效地提高断连操作时的缓存命中率,可以更好的支持移动设备的断连操作. 相似文献

4.

改进型缓存敏感B+树的研究 总被引：1，自引：0，他引：1

王晨陈刚董金祥《计算机测量与控制》2006,14(11):1531-1534,1550

在内存数据库中,处理器缓存的失配次数对系统的性能有重要的影响;缓存敏感的索引能减少在做查询操作时产生的缓存失配次数,从而提高系统的性能;传统的设计思路将结点大小等于缓存块大小,认为这样就能使得缓存失配次数减少;但是这样的设计忽略了TLB失配对系统性能的影响;我们提出了一种缓存敏感索引——改进型缓存敏感B＋树（简称MCSB＋树）,它同时兼顾了缓存失配和TLB失配对系统性能的影响。比传统的缓存敏感索引能提供更好的操作性能。相似文献

5.

Reducing division latency with reciprocal caches

Stuart F. Oberman Michael J. Flynn 《Reliable Computing》1996,2(2):147-153

Floating-point division is generally regarded as a high latency operation in typical floating-point applications. Many techniques exist for increasing division performance, often at the cost of increasing either chip area, cycle time, or both. This paper presents two methods for reducing the latency of division. Using applications from the SPECfp92 and NAS benchmark suites, these methods are evaluated to determine their effects on overall system performance. The notion of recurring computation is presented, and it is shown how recurring division can be exploited using an additional, dedicated division cache. For multiplication-based division algorithms, reciprocal caches can be utilized to store recurring reciprocals. Results show that reciprocal caches can achieve nearly a two-times speedup in division performance for reasonable cache sizes. 相似文献

6.

Web Caching Replacement Algorithm Based on Web Usage Data

Sorn Jarukasemratana Tsuyoshi Murata 《New Generation Computing》2013,31(4):311-329

Web caching is one of the fundamental techniques for reducing bandwidth usage and download time while browsing the World Wide Web. In this research, we provide an improvement in web caching by combining the result of web usage mining with traditional web caching techniques. Web cache replacement policy is used to select which object should be removed from the cache when the cache is full and which new object should be put into the cache. There are several attributes used for selecting the object to be removed, such as the size of the object, the number of times the object was used, and the time when the object was added into the cache. However, the flaw in these previous approaches is that each object is treated separately without considering the relation among those objects. We have developed a system that can record users’ browsing behavior at the resources level. By using information gathered from this system, we can improve web cache replacement policy so that the number of cache hits will increase, resulting in a faster web browsing experience and less data bandwidth, especially at lower cache storage environments such as on smart phones. 相似文献

7.

深度学习在多核缓存预取中的应用研究综述

张建勋《计算机应用研究》2024,41(2)

当前人工智能技术应用于系统结构领域的研究前景广阔,特别是将深度学习应用于多核架构的数据预取研究已经成为国内外的研究热点。针对基于深度学习的缓存预取任务进行了研究,形式化地定义了深度学习缓存预取模型。在介绍当前常见的多核缓存架构和预取技术的基础上,全面分析了现有基于深度学习的典型缓存预取器的设计思路。深度学习神经网络在多核缓存预取领域的应用主要采用了深度神经网络、循环神经网络、长短期记忆网络和注意力机制等机器学习方法,综合对比分析现有基于深度学习的数据预取神经网络模型后发现,基于深度学习的多核缓存预取技术在计算成本、模型优化和实用性等方面还存在着局限性,未来在自适应预取模型以及神经网络预取模型的实用性方面还有很大的研究探索空间和发展前景。相似文献

8.

基于统计信息的Cache漏流功耗估算方法

周宏伟张承义张民选《计算机研究与发展》2008,45(2):367-374

随着工艺尺寸的缩小,漏流功耗逐渐成为制约微处理器设计的主要因素之一.Sleep Cache与Drowsy Cache是两种降低Cache漏流功耗的重要技术.基于统计信息的Cache漏流功耗估算方法(SB-CLPE)用于对Sleep Cache或Drowsy Cache进行Cache漏流功耗估算,根据该方法设计的Cache体系结构能够在程序执行过程中实时估算Cache漏流功耗.通过对所有Cache块的访问间隔时间进行统计,SB_CLPE可以估算出使用不同衰退间隔时Cache的漏流功耗,从而得到使Cache漏流功耗最低的最佳衰退间隔.实验表明,SB_CLPE对Sleep Cache的漏流功耗的估算结果与HotLeakage漏流功耗模拟器通过模拟获得的结果相比,平均偏差仅为3.16%,得到的最佳衰退间隔也可以较好吻合.使用SB_CLPE的Cache体系结构可以用于在程序执行过程中对最佳衰退间隔进行实时估算,通过动态调整衰退间隔以达到最优的功耗降低效果. 相似文献

9.

CSA-Tree:一种改进的高维主存索引树

梁俊杰冯玉才《计算机学报》2007,30(3):415-423

主存技术的不断进步,使得主存多媒体数据库的实现成为可能.研究表明,主存多媒体数据库系统性能深受处理器缓存未命中的影响,缓存感知型主存索引是提高数据检索效率的有效手段.针对SA-Tree不适用于主存存取的缺点,提出它的变体CSA-Tree.CSA-Tree利用PCA降维技术,将树的各层节点采用不同的维度表示,这样不仅提高了缓存空间的利用率,还降低了CPU负载,从而提高了索引查询效率.大量实验证明,CSA-Tree在主存环境中具有良好的高维数据检索性能. 相似文献

10.

Reducing Data Cache Susceptibility to Soft Errors 总被引：1，自引：0，他引：1

Vilas Sridharan Hossein Asadi Mehdi B. Tahoori David Kaeli 《Dependable and Secure Computing, IEEE Transactions on》2006,3(4):353-364

Data caches are a fundamental component of most modern microprocessors. They provide for efficient read/write access to data memory. Errors occurring in the data cache can corrupt data values or state, and can easily propagate throughout the memory hierarchy. One of the main threats to data cache reliability is soft (transient, nonreproducible) errors. These errors can occur more often than hard (permanent) errors, and most often arise from single event upsets (SEUs) caused by strikes from energetic particles such as neutrons and alpha particles. Many protection techniques exist for data caches; the most common are ECC (error correcting codes) and parity. These protection techniques detect all single bit errors and, in the case of ECC, correct them. To make proper design decisions about which protection technique to use, accurate design-time modeling of cache reliability is crucial. In addition, as caches increase in storage capacity, another important goal is to reduce the failure rate of a cache, to limit disruption to normal system operation. In this paper, we present our modeling approach for assessing the impact of soft errors using architectural simulators. We also describe a new technique for reducing the vulnerability of data caches: refetching. By selectively refetching cache lines from the ECC-protected L2 cache, we can significantly reduce the vulnerability of the L1 data cache. We discuss and present results for two different algorithms that perform selective refetch. Experimental results show that we can obtain an 85 percent decrease in vulnerability when running the SPEC2K benchmark suite while only experiencing a slight decrease in performance. Our results demonstrate that selective refetch can cost-effectivety decrease the error rate of an L1 data cache 相似文献

11.

Balanced instruction cache: reducing conflict misses of direct-mapped caches through balanced subarray accesses 总被引：1，自引：0，他引：1

Chuanjun Zhang 《Computer Architecture Letters》2006,5(1):2-5

It is observed that the limited memory space of direct-mapped caches is not used in balance therefore incurs extra conflict misses. We propose a novel cache organization of a balanced cache, which balances accesses to cache sets at the granularity of cache subarrays. The key technique of the balanced cache is a programmable subarray decoder through which the mapping of memory reference addresses to cache subarrays can be optimized hence conflict misses of direct-mapped caches can be resolved. The experimental results show that the miss rate of balanced cache is lower than that of the same sized two-way set-associative caches on average and can be as low as that of the same sized four-way set-associative caches for particular applications. Compared with previous techniques, the balanced cache requires only one cycle to access all cache hits and has the same access time as direct-mapped caches. 相似文献

12.

一种基于对象存储系统的元数据缓存实现方法 总被引：1，自引：0，他引：1

周功业吴伟杰陈进才《计算机科学》2007,34(10):146-148

对象存储系统中元数据访问速度是影响文件系统性能的关键因素之一。提出了一种在客户端实现元数据缓存的方法,并用元数据操作协议保证缓存一致性,基于Hash的LFU-DA算法提高缓存查找效率。实验表明该方法减少了系统平均服务响应时间,提高了系统的I／O性能。相似文献

13.

Techniques for fast instruction cache performance evaluation

David B. Whalley 《Software》1993,23(1):95-118

Cache performance has become a very crucial factor in the overall system performance of machines. Effective analysis of a cache design requires the evaluation of the performance of the cache for typical programs that are to be executed on the machine. Recent attempts to reduce the time required for such evaluations either result in a loss of accuracy or require an initial pass by a filter to reduce the length of the trace. This paper evaluates techniques that attempt to overcome these problems for instruction cache performance evaluation. For each technique variations with and without periodic context switches are examined. Information calculated during the compilation is used to reduce the number of references in the trace. Thus, in effect references are stripped before the initial trace is generated. These techniques are shown to significantly reduce the time required for evaluating instruction caches with no loss of accuracy. 相似文献

14.

A comparative analysis of performance improvement schemes for cache memories

Krishna KaviAuthor Vitae Izuchukwu Nwachukwu Author VitaeAdemola Fawibe Author Vitae 《Computers & Electrical Engineering》2012,38(2):243-257

There have been numerous techniques proposed in the literature that aim to improve the performance of cache memories by reducing cache conflicts. These techniques were proposed over the past decade and each proposal independently claimed to reduce conflict misses. However, because the published results used different benchmarks and different experimental setups, it is not easy to compare them. In this paper we report a side-by-side comparison of these techniques. We also evaluate the suitability of some of these techniques for caches with higher set associativities. In addition to evaluating techniques for their impact on cache misses and average memory access times, we also evaluate the techniques for their ability in reducing the non-uniformity of cache accesses.The conclusion of our work is that, each application may benefit from a different technique and no single scheme works universally well for all applications. We also observe that, for the majority of applications, XORing (XOR) and Odd-multiplier indexing schemes perform reasonably well. Among programmable associativity techniques, B-cache performs better than column-associative and adaptive-caches, but column-associative caches require very minimal extensions to hardware. Uniformity of cache accesses is improved most by B-cache technique while column-associative cache also improves cache access uniformities.Based on the observation that different techniques benefit different applications, we explored the use of multiple, programmable addressing mechanisms, each addressing scheme designed for a specific application. We include some preliminary data using multiple addressing schemes. 相似文献

15.

基于重用信息的非易失性缓存动态旁路策略

焦童陈玲玲安鑫李建华《计算机工程》2021,47(4):158-165

非易失性存储器具有能耗低、可扩展性强和存储密度大等优势,可替代传统静态随机存取存储器作为片上缓存,但其写操作的能耗及延迟较高,在大规模应用前需优化写性能。提出一种基于缓存块重用信息的动态旁路策略,用于优化非易失性存储器的缓存性能。分析测试程序访问最后一级缓存（LLC）时的重用特征,根据缓存块的重用信息动态预测相应的写操作是否绕过非易失性缓存,利用预测表进行旁路操作完成LLC缺失时的填充,同时采用动态路径选择进行上级缓存写回操作,通过监控模块为旁路的缓存块选择合适的上级缓存,并将重用计数较高的缓存块填充其中以减少LLC写操作次数。实验结果表明,与未采用旁路策略的缓存设计相比,该策略使4核处理器中所有SPLASH-2程序的运行时间平均减少6.6%,缓存能耗平均降低22.5%,有效提高了整体缓存性能。相似文献

16.

片上多处理器中延迟和容量权衡的cache结构 总被引：1，自引：0，他引：1

肖俊华冯子军章隆兵《计算机研究与发展》2009,46(1)

片上多处理器中二级cache的设计面临着延迟和容量不能同时满足的矛盾,私有结构有较小的命中延迟但是减少了cache的有效容量,共享结构能增加cache的有效容量但是有较长的命中延迟.提出了一种适用于CMP的cache结构--延迟和容量权衡的cache结构(TCLC).该结构是一种混合私有结构和共享结构的设计,核心思想是动态识别cache块的共享类型,根据不同共享类型分别对其进行优化,对私有cache块采用迁移的优化策略,对共享只读cache块采用复制的优化策略,对共享读写cache块采用中心放置的优化策略,以期达到访问延迟接近私有结构,有效容量接近共享结构的目的,从而缓解线延迟的影响,减少平均内存访问延迟.全系统模拟的实验结果表明,采用TCLC结构,相对于私有结构性能平均提高13.7%.相对于共享结构性能平均提高12%. 相似文献

17.

Evolutionary Techniques for Web Caching

Athena Vakali 《Distributed and Parallel Databases》2002,11(1):93-116

Web caching has been proposed as an effective solution to the problems of network traffic and congestion, Web objects access and Web load balancing. This paper presents a model for optimizing Web cache content by applying either a genetic algorithm or an evolutionary programming scheme for Web cache content replacement. Three policies are proposed for each of the genetic algorithm and the evolutionary programming techniques, in relation to objects staleness factors and retrieval rates. A simulation model is developed and long term trace-driven simulation is used to experiment on the proposed techniques. The results indicate that all evolutionary techniques are beneficial to the cache replacement, compared to the conventional replacement applied in most Web cache server. Under an appropriate objective function the genetic algorithm has been proven to be the best of all approaches with respect to cache hit and byte hit ratios. 相似文献

18.

ATM网络基于ABR服务的流量控制算法及其研究

代丽娴《计算机与数字工程》2005,33(10):70-73

讨论了在ATM网络中ABR业务拥塞控制机制及其改进方法,即在增强比例速率控制算法（EPRCA）中通过检测缓存队列长度来控制网络拥塞,大大降低了拥塞产生的可能性,防止了网络拥塞的崩溃,并提高了缓存的利用率. 相似文献

19.

片上多处理器中的Cache压缩和接口压缩

下载免费PDF全文

肖俊华冯子军章隆兵《计算机工程》2008,34(4):247-249

提出一种简单的基于频繁值和频繁模式的压缩方法,给出结合Cache压缩技术和接口压缩技术的片上多处理器结构。全系统的模拟结果表明Cache压缩技术和接口压缩技术能提高片上多处理器中Cache的有效容量和pin的有效带宽,从而提高系统的性能。实验表明只采用Cache压缩技术平均能提高10%的性能,只采用接口压缩技术平均能提高5.5%的性能,同时采用Cache压缩技术和接口压缩技术平均能提高12%的性能。相似文献

20.

Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC

Tiantian Liu Yingchao Zhao Minming Li Chun Jason XueAuthor vitae 《Journal of Parallel and Distributed Computing》2011,71(11):1473-1483

Cache locking technique is often utilized to guarantee a tighter prediction of Worst-Case Execution Time (WCET) which is one of the most important performance metrics for embedded systems. However, in Multi-Processor Systems-on-Chip (MPSoC) systems with multi-tasks, Level 2 (L2) cache is often shared among different tasks and cores, which leads to extended unpredictability of cache. Task assignment has inherent relevancy for cache behavior, while cache behavior also affects the efficiency of task assignment. Task assignment and cache behavior have dramatic influences on the overall WCET of MPSoC. This paper proposes joint task assignment and cache partitioning techniques to minimize the overall WCET for MPSoC systems. Cache locking is applied to each task to guarantee a precise WCET. We prove that the joint problem is NP-hard and propose several efficient algorithms. Experimental results show that the proposed algorithms can consistently reduce the overall WCET compared to previous techniques. 相似文献