首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 54 毫秒
1.
多处理机系统循环间数据重用的cache优化*   总被引:2,自引:0,他引:2  
cache的使用缓解了CPU和主存储器之间速度差距太大的矛盾,同时,也使cache的命中率成为影响多处理机系统性能发挥的重要因素.人们对如何加强数据的局部性,提高cache命中率,使多处理机系统的性能得到更好的发挥进行了积极的探索.但过去的工作主要集中于如何加强并行循环内的数据局部性,减少甚至消除并行循环内真假共享cache行所引起的cache抖动,对多处理机系统中循环间数据重用的开发和利用却少有论述.该文对如何开发和利用这些循环间数据重用进行了分析和讨论,并提出了一些切实可行、易于实现的方法.这些方法的  相似文献   

2.
On-board disk cache is an effective approach to improve disk performance by reducing the number of physical accesses to the magnetic media. Disk drive manufacturers are increasing the on-board disk cache size to match the capacity growth of the backend magnetic media. Some disk drives nowadays have a cache of 32 MB. Modern computer systems use large amounts of memory to improve performance, any data brought into host memory will be re-accessed there, not in the on-board disk cache. This feature has a significant impact on the behavior of disk cache. This is because computer systems are complex systems consisting of various components. The components are correlated with each other. Therefore, a specific component cannot be isolated from the overall system when we analyze its performance behavior. This paper employs four block-level real traces to explore the performance behavior of the on-board disk cache by considering the impacts of the cache hierarchy contained in computer systems. The analysis gives three major implications: (1) I/O stream at block-level contains negligible temporal locality. Therefore, read/write cache can only achieve marginal benefits. (2) Static write cache does not achieve performance gains since the write stream does not have too much interference with the read stream. Therefore, it is better to leave the on-board disk cache shared by both the write and read streams. (3) Read cache dominates the contribution to the hit ratio besides prefetch. Thus, it is better to focus on improving the read performance rather than write performance of disk cache.  相似文献   

3.
The speed gap between processor and main memory is the major performance bottleneck of modern computer systems. As a result, today's microprocessors suffer from frequent cache misses and lose many CPU cycles due to pipeline stalling. Although traditional data prefetching methods considerably reduce the number of cache misses, most of them strongly rely on the predictability for future accesses and often fail when memory accesses do not contain much locality. To solve the long latency problem of current memory systems, this paper presents the design and evaluation of our high-performance decoupled architecture, the HiDISC (Hierarchical Decoupled Instruction Stream Computer). The motivation for the design originated from the traditional decoupled architecture concept and its limits. The HiDISC approach implements an additional prefetching processor on top of a traditional access/execute architecture. Our design aims at providing low memory access latency by separating and decoupling otherwise sequential pieces of code into three streams and executing each stream on three dedicated processors. The three streams act in concert to mask the long access latencies by providing the necessary data to the upper level on time. This is achieved by separating the access-related instructions from the main computation and running them early enough on the two dedicated processors. Detailed hardware design and performance evaluation are performed with development of an architectural simulator and compiling tools. Our performance results show that the proposed HiDISC model reduces 19.7% of the cache misses and improves the overall IPC (Instructions Per Cycle) by 15.8%. With a slower memory model assuming 200 CPU cycles as memory access latency, our HiDISC improves the performance by 17.2%.  相似文献   

4.
Proxy caches are essential to improve the performance of the World Wide Web and to enhance user perceived latency. Appropriate cache management strategies are crucial to achieve these goals. In our previous work, we have introduced Web object-based caching policies. A Web object consists of the main HTML page and all of its constituent embedded files. Our studies have shown that these policies improve proxy cache performance substantially.In this paper, we propose a new Web object-based policy to manage the storage system of a proxy cache. We propose two techniques to improve the storage system performance. The first technique is concerned with prefetching the related files belonging to a Web object, from the disk to main memory. This prefetching improves performance as most of the files can be provided from the main memory rather than from the proxy disk. The second technique stores the Web object members in contiguous disk blocks in order to reduce the disk access time. We used trace-driven simulations to study the performance improvements one can obtain with these two techniques. Our results show that the first technique by itself provides up to 50% reduction in hit latency, which is the delay involved in providing a hit document by the proxy. An additional 5% improvement can be obtained by incorporating the second technique.  相似文献   

5.
固态驱动器(SSD)读写性能优越,但成本高,因此在实践中人们往往利用SSD和普通硬盘(HDD)构建混合存储系统以获取较高的性价比.在混合存储系统中,如何使更多的IO请求能够命中SSD是充分利用SSD性能的关键.针对多任务共享存储环境下集中访问和随机访问IO存取模式并存,且通常情况下IO工作流大部分请求相对集中于有限区域内的特点,本文提出一种基于热区跟踪(HZT)的缓存替换算法.HZT算法充分考虑了IO工作流的空间局部性和时间局部性,利用IO工作流的历史访问信息,跟踪当前热区,并为热区数据块赋予更高的驻留SSD的优先级,能够有效提高混合存储中SSD缓存的命中率.经测试,在典型多任务共享存储环境下HZT算法可以使SSD缓存的命中率比使用LRU(Least Recently Used)算法的系统提高12%.采用适当的预取策略,该算法的命中率与LRU算法相比可获得23%的提升.  相似文献   

6.
Mobile computers can be equipped with wireless communication devices that enable users to access data services from any location. In wireless communication, the server-to-client (downlink) communication bandwidth is much higher than the client-to-server (uplink) communication bandwidth. This asymmetry makes the dissemination of data to client machines a desirable approach. However, dissemination of data by broadcasting may induce high access latency in case the number of broadcast data items is large. We propose two methods aiming to reduce client access latency of broadcast data. Our methods are based on analyzing the broadcast history (i.e., the chronological sequence of items that have been requested by clients) using data mining techniques. With the first method, the data items in the broadcast disk are organized in such a way that the items requested subsequently are placed close to each other. The second method focuses on improving the cache hit ratio to be able to decrease the access latency. It enables clients to prefetch the data from the broadcast disk based on the rules extracted from previous data request patterns. The proposed methods are implemented on a Web log to estimate their effectiveness. It is shown through performance experiments that the proposed rule-based methods are effective in improving the system performance in terms of the average latency as well as the cache hit ratio of mobile clients.  相似文献   

7.
随着处理器和存储器速度差距的不断拉大,访存指令尤其是频繁cache miss的指令成为影响性能的重要瓶颈。编译器由于无法得知访存指令动态执行的拍数,一般假定这些指令的延迟为cache命中或者cache miss的延迟,所以并不准确。我们引入cache profiling技术来收集访存指令运行时的cache miss或者命中的信息,利用这些信息来计算访存的延迟。乱序机器上硬件的指令调度对于发射窗口内的指令能进行很好的动态调度,编译器则对更长的范围内的指令调度更有优势。在reorder buffer中cache miss一旦发生,容易引起reorder buffer满,导致流水线阻塞。调度容易cache miss的指令。使其并行执行,从而隐藏cache miss的长延迟,就可以提高程序性能。因此,我们针对load指令,一方面修改频繁miss的指令的延迟,一方面修改调度策略,提高存储级并行度。实验证明,我们的调度对于bzip2有高达4.8%的提升,art有4%的提升,整体平均提高1.5%。  相似文献   

8.
A new cache architecture based on temporal and spatial locality   总被引:5,自引:0,他引:5  
A data cache system is designed as low power/high performance cache structure for embedded processors. Direct-mapped cache is a favorite choice for short cycle time, but suffers from high miss rate. Hence the proposed dual data cache is an approach to improve the miss ratio of direct-mapped cache without affecting this access time. The proposed cache system can exploit temporal and spatial locality effectively by maximizing the effective cache memory space for any given cache size. The proposed cache system consists of two caches, i.e., a direct-mapped cache with small block size and a fully associative spatial buffer with large block size. Temporal locality is utilized by caching candidate small blocks selectively into the direct-mapped cache. Also spatial locality can be utilized aggressively by fetching multiple neighboring small blocks whenever a cache miss occurs. According to the results of comparison and analysis, similar performance can be achieved by using four times smaller cache size comparing with the conventional direct-mapped cache.And it is shown that power consumption of the proposed cache can be reduced by around 4% comparing with the victim cache configuration.  相似文献   

9.
较好地利用内存作为缓存,并优化磁盘设备的请求处理,是缓解系统I/O瓶颈的有效途径。提出一种驱动程序预写的方法来处理内存缓存中脏数据写回磁盘的方法,其基本思想是:通过将文件系统高速缓存中的脏数据写盘操作由磁盘设备驱动程序发起,磁盘可以在恰当的时间(设备空闲)或者恰当的位置(减少寻遣和设备旋转)完成写请求,减少缓存flush操作对当前应用的影响。模拟试验表明,谊方法能提高磁盘写操作的效率、系统数据的可靠性和系统的I/O性能。  相似文献   

10.
大数据时代到来,备份数据量增大给存储空间带来新的挑战。重复数据删除技术在备份存储系统中正逐渐流行,但大量数据访问,造成了磁盘的很大负担。针对重复数据删除技术存在的块索引查询磁盘瓶颈问题,文中提出了文件相似性与数据流局部性结合方法改善磁盘I/O性能。该方法充分发挥了各自的优势,相似性优化了索引查找,可以检测到相同数据检测技术不能识别的重复数据;而数据局部性保留了数据流的序列,使得cache的命中率提高,减少磁盘访问次数。布鲁过滤器存储数据块索引可节省大量查询时间和空间开销。对于提出的解决方法所涉及的重要参数如块大小、段大小以及对误判率的影响做了深入分析。通过相关实验评估与性能分析,实验数据与结果为进一步系统性能优化问题提供了重要的数据依据。  相似文献   

11.
Recently, a hybrid disk drive that integrates a small amount of flash memory within a mechanical drive has received significant attention. The hybrid drive extends the storage hierarchy by using flash memory to cache data from the mechanical disk. Unfortunately, current caching architectures fail to fully exploit the potential of the hybrid drive. Furthermore, current disk input/output (I/O) schedulers are optimized for rotational mechanical disk drives and thus must be re‐targeted for the hybrid disk drive. In this paper, we propose a new data caching scheme, called Profit Caching, for hybrid drives. Profit Caching is a self‐optimizing caching algorithm. It considers and seamlessly integrates all possible data characteristics that impact the performance of hybrid drives, including read count, write count, sequentiality, randomness, and recency, to determine the caching policy. Moreover, we propose a hybrid disk‐aware Completely Fair Queuing (HA‐CFQ) scheduler to avoid unnecessary I/O anticipations of the CFQ scheduler. We have implemented Profit Caching and HA‐CFQ scheduler in the Linux kernel. Coupled with a trace‐driven simulator, we have also conducted detailed experiments under a variety of workloads. Experimental results show that Profit Caching provides significantly improved performance compared with the previous schemes. In particular, the throughput of Profit Caching outperforms previous Random Access First and FlashCache caching schemes by factors of up to 1.8 and 7.6, respectively. In addition, the HA‐CFQ scheduler reduces the total execution time of the CFQ scheduler by up to 1.74%. Finally, the experimental results show that the runtime overhead of Profit Caching is extremely insignificant and can be ignored. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

12.
Caches are essential to bridge the gap between the high latency main memory and the fast processor pipeline. Standard processor architectures implement two first-level caches to avoid a structural hazard in the pipeline: an instruction cache and a data cache. For tight worst-case execution times it is important to classify memory accesses as either cache hit or cache miss. The addresses of instruction fetches are known statically and static cache hit/miss classification is possible for the instruction cache. The access to data that is cached in the data cache is harder to predict statically. Several different data areas, such as stack, global data, and heap allocated data, share the same cache. Some addresses are known statically, other addresses are only known at runtime. With a standard cache organization all those different data areas must be considered by worst-case execution time analysis. In this paper we propose to split the data cache for the different data areas. Data cache analysis can be performed individually for the different areas. Access to an unknown address in the heap does not destroy the abstract cache state for other data areas. Furthermore, we propose to use a small, highly associative cache for the heap area. We designed and implemented a static analysis for this cache, and integrated it into a worst-case execution time analysis tool.  相似文献   

13.
The power consumed by memory systems accounts for 45% of the total power consumed by an embedded system, and the power consumed during a memory access is 10 times higher than during a cache access. Thus, increasing the cache hit rate can effectively reduce the power consumption of the memory system and improve system performance. In this study, we increased the cache hit rate and reduced the cache-access power consumption by developing a new cache architecture known as a single linked cache (SLC) that stores frequently executed instructions. SLC has the features of low power consumption and low access delay, similar to a direct mapping cache, and a high cache hit rate similar to a two way-set associative cache by adding a new link field. In addition, we developed another design known as a multiple linked caches (MLC) to further reduce the power consumption during each cache access and avoid unnecessary cache accesses when the requested data is absent from the cache. In MLC, the linked cache is split into several small linked caches that store frequently executed instructions to reduce the power consumption during each access. To avoid unnecessary cache accesses when a requested instruction is not in the linked caches, the addresses of the frequently executed blocks are recorded in the branch target buffer (BTB). By consulting the BTB, a processor can access the memory to obtain the requested instruction directly if the instruction is not in the cache. In the simulation results, our method performed better than selective compression, traditional cache, and filter cache in terms of the cache hit rate, power consumption, and execution time.  相似文献   

14.
用于二级缓存的一种改进的自适应缓存管理算法   总被引:1,自引:0,他引:1  
在机群系统或数据库服务器等应用环境下,由于本地内存资源限制,某些大内存应用与磁盘交互过多,会严重损害其性能.在高速网络支持下,把其他节点内存或采用专门的内存服务器作为系统的二级缓存,可减少对磁盘访问并提高应用性能.在二级缓存应用模式下,基于LIRS算法并对其存在的缺点进行改进,提出了一种自适应缓存管理算法LIRS-A.LIRS-A可根据应用访问特征自适应调整,避免了LIRS不适应某些具有时间局部性模式的情况.在TPC-H应用中,LIRS-A比LIRS最多有7.2%的性能提升;在网络流分析数据库的典型Groupby查询中,LIRS-A比LIRS的命中率最多可提高31.2%.  相似文献   

15.
缓存加速技术可以利用固态硬盘(SSD,solid state disk)随机访问性能高的优势,提升机械硬盘的随机读写性能;传统的缓存加速技术难以适应大数据背景下高并发、间歇性频繁访问等热点数据访问需求;为了提升缓存整体性能,提出一种基于虚拟存储层的缓存策略(CVSL,cache policy based on the virtual storage layer),将缓存技术和分层存储技术相结合,通过热度统计、数据逻辑迁移,实现基于数据逻辑分层的缓存控制;实验结果表明,相对传统的缓存策略,CVSL策略的随机读写性能提升了9%~10%,未见明显波动,在缓存命中率方面具有良好的效果,达到了预期设计目标.  相似文献   

16.
结合访存失效队列状态的预取策略   总被引:1,自引:0,他引:1  
随着存储系统的访问速度与处理器的运算速度的差距越来越显著,访存性能已成为提高计算机系统性能的瓶颈.通过对指令Cache和数据Cache失效行为的分析,提出一种预取策略--结合访存失效队列状态的预取策略.该预取策略保持了指令和数据访问的次序,有利于预取流的提取.并将指令流和数据流的预取相分离,避免相互替换.在预取发起时机的选择上,不但考虑当前总线是否空闲,而且结合访存失效队列的状态,减小对处理器正常访存请求的影响.通过流过滤机制提高预取准确性,降低预取对访存带宽的需求.结果表明,采用结合访存失效队列状态的预取策略,处理器的平均访存延时减少30%,SPEC CPU2000程序的IPC值平均提高8.3%.  相似文献   

17.
随着存储系统的访问速度与处理器运算速度的差距越来越显著,访存性能已成为提高处理器性能的瓶颈.通过对程序的访存行为进行分析,提出快速地址计算的自适应栈高速缓存方案.该方案将栈访问从数据高速缓存的访问中分离出来,充分利用栈空间数据访问的特点,提高指令级并行度,减少数据高速缓存污染,降低数据高速缓存失效率,并采用快速地址计算策略,减少栈访问的命中时间.该栈高速缓存在发生栈溢出时能够自适应地关闭,以避免栈切换对处理器性能的影响.栈高速缓存标志中增加进程标识,进程切换时不需要将数据写到低层存储系统中,适用于多进程环境.SPEC CPU2000程序运行结果表明,采用快速地址计算的自适应栈高速缓存方案,25.8%的访存指令可以并行执行,数据高速缓存失效率平均降低9.4%,IPC值平均提高6.9%.  相似文献   

18.
This paper is concerned with the seek control of Optical Disk Drives (ODDs). We propose a direct Seek Control Scheme (SCS) that provides fast data access capability and robust stability for high-performance Optical Disk Drives. Although an optical disk drive has a significant advantage of random accessibility, increased rotational speed of a disk and limitations of mechanical structure always make it impossible for the conventional SCS to achieve stable and satisfactory seek performance. The conventional seek control technique utilizes only the coarse actuator without any maneuvering of the fine actuator. In this paper, we analyze the problems that may arise when the conventional SCS is applied to high-speed rotational ODD and propose a new SCS that employs both the coarse and fine actuators. With assistance of the fine actuator, however, the seek control system is designed such that its performance is guaranteed for various disturbances and mechanical limitations. Simulations and experiments show the improvements of the proposed direct SCS implemented on a practical DVD-ROM drive system.  相似文献   

19.
Flash solid-state drives (SSDs) provide much faster access to data compared with traditional hard disk drives (HDDs). The current price and performance of SSD suggest it can be adopted as a data buffer between main memory and HDD, and buffer management policy in such hybrid systems has attracted more and more interest from research community recently. In this paper, we propose a novel approach to manage the buffer in flash-based hybrid storage systems, named hotness aware hit (HAT). HAT exploits a page reference queue to record the access history as well as the status of accessed pages, i.e., hot, warm, and cold. Additionally, the page reference queue is further split into hot and warm regions which correspond to the memory and flash in general. The HAT approach updates the page status and deals with the page migration in the memory hierarchy according to the current page status and hit position in the page reference queue. Compared with the existing hybrid storage approaches, the proposed HAT can manage the memory and flash cache layers more effectively. Our empirical evaluation on benchmark traces demonstrates the superiority of the proposed strategy against the state-of-the-art competitors.  相似文献   

20.
Today independent publishers are offering digital libraries with fulltext archives. In an attempt to provide a single user-interface to a large set of archives, the studied Article-Database-Service offers a consolidated interface to a geographically distributed set of archives. While this approach offers a tremendous functional advantage to a user, the fulltext download delays caused by the network and queuing in servers make the user-perceived interactive performance poor.This paper studies how effective caching of articles at the client level can be achieved as well as at intermediate points as manifested by gateways that implement the interfaces to the many fulltext archives. A central research question in this approach is: What is the nature of locality in the user access stream to such a digital library? Based on access logs that drive the simulations, it is shown that client-side caching can result in a 20% hit rate. Even at the gateway level temporal locality is observable, but published replacement algorithms are unable to exploit this temporal locality. Additionally, spatial locality can be exploited by considering loading into cache all articles in an issue, volume, or journal, if a single article is accessed. But our experiments showed that improvement introduced a lot of overhead. Finally, it is shown that the reason for this cache behavior is the long time distance between re-accesses, which makes caching quite unfeasible.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号