期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Second-level buffer cache management 总被引：2，自引：0，他引：2

Zhou Y. Chen Z. Li K. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(6):505-519

Buffer caches are commonly used in servers to reduce the number of slow disk accesses or network messages. These buffer caches form a multilevel buffer cache hierarchy. In such a hierarchy, second-level buffer caches have different access patterns from first-level buffer caches because accesses to a second-level are actually misses from a first-level. Therefore, commonly used cache management algorithms such as the least recently used (LRU) replacement algorithm that work well for single-level buffer caches may not work well for second-level. We investigate multiple approaches to effectively manage second-level buffer caches. In particular, we report our research results in 1) second-level buffer cache access pattern characterization, 2) a new local algorithm called multi-queue (MQ) that performs better than nine tested alternative algorithms for second-level buffer caches, 3) a set of global algorithms that manage a multilevel buffer cache hierarchy globally and significantly improve second-level buffer cache hit ratios over corresponding local algorithms, and 4) implementation and evaluation of these algorithms in a real storage system connected with commercial database servers (Microsoft SQL server and Oracle) running industrial-strength online transaction processing benchmarks. 相似文献

2.

Computing and Minimizing Cache Vulnerability to Transient Errors

Wei Zhang 《Design & Test of Computers, IEEE》2009,26(2):44-51

Using a cache vulnerability factor to measure the susceptibility of cache memories to transient errors at the architecture level can help designers make appropriate cost and reliability trade-offs at early design cycles. Two early write-back strategies can also improve the reliability of write-back data caches without compromising performance. 相似文献

3.

Integrating reliable memory in databases

Wee Teck Ng Peter M. Chen 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(3):194-204

Recent results in the Rio project at the University of Michigan show that it is possible to create an area of main memory that is as safe as disk from operating system crashes. This paper explores how to integrate the reliable memory provided by the Rio file cache into a database system. Prior studies have analyzed the performance benefits of reliable memory; we focus instead on how different designs affect reliability. We propose three designs for integrating reliable memory into databases: non-persistent database buffer cache, persistent database buffer cache, and persistent database buffer cache with protection. Non-persistent buffer caches use an I/O interface to reliable memory and require the fewest modifications to existing databases. However, they waste memory capacity and bandwidth due to double buffering. Persistent buffer caches use a memory interface to reliable memory by mapping it into the database address space. This places reliable memory under complete database control and eliminates double buffering, but it may expose the buffer cache to database errors. Our third design reduces this exposure by write protecting the buffer pages. Extensive fault tests show that mapping reliable memory into the database address space does not significantly hurt reliability. This is because wild stores rarely touch dirty, committed pages written by previous transactions. As a result, we believe that databases should use a memory interface to reliable memory. Received January 1, 1998 / Accepted June 20, 1998 相似文献

4.

Bounded-overhead caching for definite-clause theorem proving

Alberto Segre Daniel Scharstein 《Journal of Automated Reasoning》1993,11(1):83-113

In this paper we describe the design of an effective caching mechanism for resource-limited, definite-clause theorem-proving systems. Previous work in adapting caches for theorem proving relies on the use of unlimited-size caches. We show how unlimited-size caches are unsuitable in application contexts where resource-limited theorem provers are used to solve multiple problems from a single problem distribution. We introduce bounded-overhead caches, that is, those caches that contain at most a fixed number of entries and entail a fixed amount of overhead per lookup, and we examine cache design issues for bounded-overhead caches. Finally, we present an empirical evaluation of bounded-overhead cache performance, relying on a specially designed experimental methodology that separates hardware-dependent, implementation-dependent, and domain-dependent effects. 相似文献

5.

Filtering directory lookups in CMPs

A. Bosque V. Viñals P. Ibáñez J.M. Llaber?´aAuthor vitae 《Microprocessors and Microsystems》2011,35(8):695-707

Coherence protocols consume an important fraction of power to determine which coherence action to perform. Specifically, on CMPs with shared cache and directory-based coherence protocol implemented as a duplicate of local caches tags, we have observed that a big fraction of directory lookups cause a miss, because the block looked up is not allocated in any local cache. To reduce the number of directory lookups and therefore the power consumption, we propose to add a filter before the directory access.We introduce two filter implementations. In the first one, filtering information is explicitly kept in the shared cache for every block. In the second one, filtering information is decoupled from the shared cache organization, so the filter size does not depend on the shared cache size.We evaluate our filters in a CMP with 8 in-order processors with 4 threads each and a memory hierarchy with write-through local caches and a shared cache. We show that, for SPLASH2 benchmarks, the proposed filters reduce the number of directory lookups performed by 60% while power consumption is reduced by ∼28%. For Specweb2005, the number of directory lookups performed is reduced by 68% (44%), while directory power consumption is reduced by 19% (9%) using the first (second) filter implementation. 相似文献

6.

Using Write Caches to Improve Performance of Cache Coherence Protocols in Shared-Memory Multiprocessors

Dahlgren F. Stenstrom P. 《Journal of Parallel and Distributed Computing》1995,26(2)

Write-invalidate protocols suffer from memory-access penalties due to coherence misses. While write-update or hybrid update/invalidate protocols can reduce coherence misses, the update traffic can increase memory-system contention. We show in this paper that update-based cache protocols can perform significantly better than write-invalidate protocols by incorporating a write cache in each processing node. Because it is legal to delay the propagation of modifications of a block until the next synchronization under relaxed memory consistency models, a write cache can significantly reduce traffic by exploiting locality in write accesses. By concentrating on a cache-coherent NUMA architecture, we study the implementation aspects of augmenting a write-invalidate, a write-update and two hybrid update/invalidate protocols with write caches. Through detailed architectural simulations using five benchmark programs, we find that write caches, with only a few blocks each, help write-invalidate protocols to cut the false-sharing miss rate and hybrid update/invalidate protocols to keep other copies, including the memory copy, clean at an acceptable write traffic level. Overall, the memory-access penalty associated with coherence misses is drastically reduced. 相似文献

7.

Performance benefits of non-volatile caches in distributed file systems

P. Biswas D. Towsley K. K. Ramakrishnan C. M. Krishna 《Concurrency and Computation》1994,6(4):289-323

We study the use of non-volatile memory for caching in distributed file systems. This provides an advantage over traditional distributed file systems in that the load is reduced at the server without making the data vulnerable to failures. We propose the use of a small non-volatile cache for writes, at the client and the file server, together with a larger volatile read cache to keep the cost of the caches reasonable. We use a synthetic workload developed from analysis of file I/O traces from commercial production systems and use a detailed simulation of the distributed environment. The service times for the resources of the system were derived from measurements performed on a typical workstation. We show that non-volatile write caches at the clients and the file server reduce the write response time and the load on the file server dramatically, thus improving the scalability of the system. We examine the comparative benefits of two alternative writeback policies for the non-volatile write cache. We show that a proposed threshold based writeback policy is more effective than a periodic writeback policy under heavy load. We also investigate the effect of varying the write cache size and show that introducing a small non-volatile cache at the client in conjunction with a moderate sized non-volatile server write cache improves the write response time by a factor of four at all load levels. 相似文献

8.

多虚拟机环境下磁盘写优化机制

余林琛廖小飞《计算机工程与科学》2012,34(10):78-82

在虚拟化环境下,如何在写直达法和写回法之间做出权衡以使系统既具有较好的可靠性又具有较高的磁盘读写性能,是一个亟待解决的问题。本文提出了基于XEN的兼有写直达法和写回法优点的磁盘数据写操作方法。通过修改虚拟机块设备前后端驱动程序,在虚拟机管理器中为每个虚拟机建立块级别的虚拟机磁盘缓存。虚拟机中的应用程序均采取写直达的方式,将数据直接写到位于虚拟机管理器的虚拟机磁盘缓存中。结果表明,该机制能为虚拟机提供高效率的磁盘读写操作,同时,在虚拟机系统突然宕机时也能保证用户数据的完整性和可靠性。相似文献

9.

Improving Metadata Caching Efficiency for Data Deduplication via In-RAM Metadata Utilization

下载免费PDF全文

Bing Zhou Jiang-Tao Wen 《计算机科学技术学报》2016,31(4):805-819

We describe a data deduplication system for backup storage of PC disk images, named in-RAM metadata utilizing deduplication (IR-MUD). In-RAM hash granularity adaptation and miniLZO based data compression are firstly proposed to reduce the in-RAM metadata size and thereby reduce the space overheads required by the in-RAM metadata caches. Secondly, an in-RAM metadata write cache, as opposed to the traditional metadata read cache, is proposed for further reducing metadata-related disk I/O operations and improving deduplication throughput. During deduplication, the metadata write cache is managed following the LRU caching policy. For each manifest that is hit in the metadata write cache, an expensive manifest reloading operation from the disk is avoided. After deduplication, all the manifests in the metadata write cache are cleared and stored on the disk. Our experimental results using 1.5 TB real-world disk image dataset show that 1) IR-MUD achieved about 95% size reduction for the deduplication metadata, with a small time overhead introduced, 2) when the metadata write cache was not utilized, with the same RAM space size for the metadata read cache, IR-MUD achieved a 400% higher RAM hit ratio and a 50% higher deduplication throughput, as compared with the classic Sparse Indexing deduplication system where no metadata utilization approaches are utilized, and 3) when the metadata write cache was utilized and enough RAM space was available, IR-MUD achieved a 500% higher RAM hit ratio compared with Sparse Indexing and a 70% higher deduplication throughput compared with IR-MUD with only a single metadata read cache. The in-RAM metadata harnessing and metadata write caching approaches of IR-MUD can be applied in most parallel deduplication systems for improving metadata caching efficiency. 相似文献

10.

Reliability improvement in private non-uniform cache architecture using two enhanced structures for coherence protocols and replacement policies

Mohammad Maghsoudloo Hamid R. Zarandi 《Microprocessors and Microsystems》2014

In this paper, a comprehensive study is first conducted to investigate the effects of cache coherence protocols and cache replacement policies on the characteristics of NUCA in current many-core processors. The main focus of this study is to analyze the effects of coherence protocols and replacement policies on the vulnerability of caches. The outcomes of this analysis indicate two facts: (i) Differences in handling write operations play an important role to make distinction in favor of or against a cache coherence protocol; (ii) Near-optimal solutions for replacement problem, aimed at enhancing the performance, can also make positive influence on reduction of cache vulnerability factor. Based on the results of first step, two schemes are introduced to enhance the reliability of caches by applying some modification on the structures of cache coherence protocols and cache replacement policies. The first scheme tries to manage sharing of the dirty data items among different same-level caches. The second helps to give priority and more opportunity to old dirty blocks than clean blocks for replacement. The proposed schemes reveal about 18% improvement in MTTF, with negligible performance, bandwidth and energy consumption overhead compared to previous cache structures. 相似文献

11.

Techniques for Improving Performance of Hybrid Snooping Cache Protocols

Fredrik Dahlgren 《Journal of Parallel and Distributed Computing》1999,59(3):329

Bus-based multiprocessors constitute a cost-effective class of shared-memory multiprocessors. Private caches are the key to an efficient utilization of the shared bus, and most such systems use a write-invalidate cache-coherence protocol to keep the caches coherent. Two important factors that limit the performance of the system are cache misses that lead to long-latency reads and bus congestion because of read misses and coherence traffic. While hybrid write-invalidate/write-update snooping protocols lead to fewer read misses than write-invalidate protocols, previous studies have shown them to be incapable of providing consistent performance improvements because of heavily increased coherence traffic. In this paper, we analyze how the deficiencies of hybrid snooping protocols can be dramatically reduced by using write caches and read snarfing (also called read-broadcast) under release consistency. Our performance evaluation is based on program-driven simulation and a set of five scientific applications with different sharing behaviors including migratory sharing as well as producer–consumer sharing. We show that one of the evaluated hybrid protocols, extended with write caches as well as read snarfing, manages to reduce the number of coherence misses by between 83 and 93% as compared to a write-invalidate protocol for all five applications in this study. In addition, the number of bus transactions is reduced substantially. However, we also show that read snarfing and hybrid snooping protocols might lead to higher cache occupancy because of increased sharing. Because of the small implementation cost of the hybrid protocol and the two extensions, we believe the combination to be an effective approach to boosting the performance of bus-based multiprocessors. 相似文献

12.

Reorder Write Sequence by Hetero-Buffer to Extend SSD’s Lifespan

下载免费PDF全文

陈志广肖侬刘芳杜溢墨《计算机科学技术学报》2013,28(1):14-27

The limited lifespan is the Achilles’ heel of solid state drives(SSDs) based on NAND flash.NAND flash has two drawbacks that degrade SSDs’ lifespan.One is the out-of-place update.Another is the sequential write constraint within a block.SSDs usually employ write buffer to extend their lifetime.However,existing write buffer schemes only pay attention to the first drawback,while neglect the second one.We propose a hetero-buffer architecture covering both aspects simultaneously.The hetero-buffer consists of two components,dynamic random access memory(DRAM) and the reorder area.DRAM endeavors to reduce write traffic as much as possible by pursuing a higher hit ratio(overcome the first drawback).The reorder area focuses on reordering write sequence(overcome the second drawback).Our hetero-buffer outperforms traditional write buffers because of two reasons.First,the DRAM can adopt existing superior cache replacement policy,thus achieves higher hit ratio.Second,the hetero-buffer reorders the write sequence,which has not been exploited by traditional write buffers.Besides the optimizations mentioned above,our hetero-buffer considers the work environment of write buffer,which is also neglected by traditional write buffers.By this way,the hetero-buffer is further improved.The performance is evaluated via trace-driven simulations.Experimental results show that,SSDs employing the hetero-buffer survive longer lifespan on most workloads. 相似文献

13.

Performance and reliability improvement by using asynchronous algorithms in disk buffer cache memory

Anna Haé 《Acta Informatica》1993,30(2):131-146

This paper proposes performance and reliability improvement by using new algorithms for asynchronous operations in disk buffer cache memory. These algorithms allow for writing the files into the buffer cache by the processes and consider the number of active processes in the system and the length of the queue to the disk buffer cache. Writing the contents of the buffer cache to the disk depends on the system load and the write activity. Performance and reliability measures including the elapsed time of writing a file into the buffer cache, the waiting time to start writing a file, and the mean number of blocks written to the disk between system failures are used to show performance and reliability improvement by using the algorithms. Sensitivity analysis is used to influence the algorithms' design. Examples of real systems are used to show the numerical results of performance and reliability improvement in different systems with various disk cache parameters and file sizes. 相似文献

14.

On-line Restricted Caching

Mark Brehob Richard Enbody Eric Torng Stephen Wagner 《Journal of Scheduling》2003,6(2):149-166

We study the on-line caching problem in a restricted cache where each memory item can be placed in only a restricted subset of cache locations. Examples of restricted caches in practice include victim caches, assist caches, and skew caches. To the best of our knowledge, all previous on-line caching studies have considered on-line caching in identical or fully-associative caches where every memory item can be placed in any cache location.In this paper, we focus on companion caches, a simple restricted cache that includes victim caches and assist caches as special cases. Our results show that restricted caches are significantly more complex than identical caches. For example, we show that the commonly studied Least Recently Used algorithm is not competitive unless cache reorganization is allowed while the performance of the First In First Out algorithm is competitive but not optimal. We also present two near optimal algorithms for this problem as well as lower bound arguments. 相似文献

15.

Reducing Data Cache Susceptibility to Soft Errors 总被引：1，自引：0，他引：1

Vilas Sridharan Hossein Asadi Mehdi B. Tahoori David Kaeli 《Dependable and Secure Computing, IEEE Transactions on》2006,3(4):353-364

Data caches are a fundamental component of most modern microprocessors. They provide for efficient read/write access to data memory. Errors occurring in the data cache can corrupt data values or state, and can easily propagate throughout the memory hierarchy. One of the main threats to data cache reliability is soft (transient, nonreproducible) errors. These errors can occur more often than hard (permanent) errors, and most often arise from single event upsets (SEUs) caused by strikes from energetic particles such as neutrons and alpha particles. Many protection techniques exist for data caches; the most common are ECC (error correcting codes) and parity. These protection techniques detect all single bit errors and, in the case of ECC, correct them. To make proper design decisions about which protection technique to use, accurate design-time modeling of cache reliability is crucial. In addition, as caches increase in storage capacity, another important goal is to reduce the failure rate of a cache, to limit disruption to normal system operation. In this paper, we present our modeling approach for assessing the impact of soft errors using architectural simulators. We also describe a new technique for reducing the vulnerability of data caches: refetching. By selectively refetching cache lines from the ECC-protected L2 cache, we can significantly reduce the vulnerability of the L1 data cache. We discuss and present results for two different algorithms that perform selective refetch. Experimental results show that we can obtain an 85 percent decrease in vulnerability when running the SPEC2K benchmark suite while only experiencing a slight decrease in performance. Our results demonstrate that selective refetch can cost-effectivety decrease the error rate of an L1 data cache 相似文献

16.

Model-based cache-aware dispatching of object-oriented software for multicore systems

Tolga Ovatman Feza Buzluca 《Journal of Systems and Software》2013

In recent years, processor technology has evolved towards multicore processors, which include multiple processing units (cores) in a single package. Those cores, having their own private caches, often share a higher level cache memory dedicated to each processor die. This multi-level cache hierarchy in multicore processors raises the importance of cache utilization problem. Assigning parallel-running software components with common data to processor cores that do not share a common cache increases the number of cache misses. In this paper we present a novel approach that uses model-based information to guide the OS scheduler in assigning appropriate core affinities to software objects at run-time. We build graph models of software and cache hierarchies of processors and devise a graph matcher algorithm that provides mapping between these two graphs. Using this mapping we obtain candidate core sets that each software object can be affiliated with at run-time. These affiliations are determined based on the idea that software components that have the potential to share common data at run-time should run on cores that share a common cache. We also develop an object dispatcher algorithm that keeps track of object affiliations at run-time and dispatches objects by using the information from the compile-time graph matcher. We apply our approach on design pattern implementations and two different application program running on servers using CFS scheduling. Our results show that cache-aware dispatching based on information obtained from software model, decreases number of cache misses significantly and improves CFS’ scheduling performance. 相似文献

17.

Characterization and Evaluation of Cache Hierarchies for Web Servers

Iyer Ravi 《World Wide Web》2004,7(3):259-280

As Internet usage continues to expand rapidly, careful attention needs to be paid to the design of Internet servers for achieving high performance and end-user satisfaction. Currently, the memory system continues to remain a significant performance bottleneck for Internet servers employing multi-GHz processors. In this paper, our aim is two-fold: (1) to characterize the cache/memory performance of web server workloads and (2) to propose and evaluate cache design alternatives for future web servers. We chose SPECweb99 as the representative web server workload and our entire characterization and evaluation methodology is based on our CASPER simulation framework. We begin by exploring the processor cache design space for single and dual-processor servers. Based on our observations, we then evaluate other cache hierarchy alternatives such as chipset caches, coherence filters and decompressed page stores. We show the sensitivity of these components to basic organization parameters such as cache size, line size and degree of associativity. We also present the performance implications of routing memory requests initiated by I/O devices through these caches. Based on detailed simulation data and its implications on system level performance, this paper shows that chipset caches have significant potential for improving future web server performance. 相似文献

18.

Data-type specific cache compression in GPGPUs

Ehsan Atoofian Sean Rea 《The Journal of supercomputing》2018,74(4):1609-1635

In this paper, we evaluate compressibility of L1 data caches and L2 cache in general-purpose graphics processing units (GPGPUs). Our proposed scheme is geared toward improving performance and power of GPGPUs through cache compression. GPGPUs are throughput-oriented devices which execute thousands of threads simultaneously. To handle working set of this massive number of threads, modern GPGPUs exploit several levels of caches. GPGPU design trend shows that the size of caches continues to grow to support even more thread level parallelism. We propose using cache compression to increase effective cache capacity, improve performance, and reduce power consumption in GPGPUs. Our work is motivated by the observation that the values within a cache block are similar, i.e., the arithmetic difference of two successive values within a cache block is small. To reduce data redundancy in L1 data caches and L2 cache, we use low-cost and implementation-efficient base-delta-immediate (BDI) algorithm. BDI replaces a cache block with a base and an array of deltas where the combined size of the base and deltas is less than the original cache block. We also study locality of fields in integer and floating-point numbers. We found that entropy of fields varies across different data types. Based on entropy, we offer different BDI compression schemes for integer and floating-point numbers. We augment a simple, yet effective, predictor that determines type of values dynamically in hardware and without the help of a compiler or a programmer. Evaluation results show that on average, cache compression improves performance by 8% and saves energy of caches by 9%. 相似文献

19.

A leakage-aware L2 cache management technique for producer-consumer sharing in low-power chip multiprocessors

Hyunhee Kim Author VitaeJihong KimAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(12):1545-1557

This paper proposes a novel leakage management technique for applications with producer-consumer sharing patterns. Although previous research has proposed leakage management techniques by turning off inactive cache blocks, these techniques can be further improved by exploiting the various run-time characteristics of target applications in CMPs. By exploiting particular access sequences observed in producer-consumer sharing patterns and the spatial locality of shared buffers, our technique enables a more aggressive turn-off of L2 cache blocks of these buffers. Experimental results using a CMP simulator show that our proposed technique reduces the energy consumption of on-chip L2 caches, a shared bus, and off-chip memory by up to 31.3% over the existing cache leakage power management techniques with no significant performance loss. 相似文献

20.

一种闪存敏感的多级缓存管理方法

王江涛赖文豫孟小峰《软件学报》2014,25(11):2575-2586

基于闪存的固态硬盘(solid state driver,简称SSD)已经广泛应用于各种移动设备、PC机和服务器.与磁盘相比,尽管SSD具有数据存取速度高、抗震、低功耗等优良特性,但SSD自身也存在读写不对称、价格昂贵等不利因素,这使得SSD 短期内不会完全取代磁盘.将SSD和磁盘组合构建混合系统,可以发挥不同的硬件特性,提升系统性能.基于 MLC 型 SSD 和 SLC 型 SSD 之间的特性差异,提出了一种闪存敏感的多级缓存管理策略——FAMC.FAMC将SSD用在内存和磁盘之间作扩展缓存,针对数据库系统、文件管理中数据访问的特点,有选择地将内存牺牲页缓存到不同类型的SSD.FAMC同时考虑写请求模式和负载类型对系统性能的影响,设计实现对SSD友好的数据管理策略.此外,FAMC基于不同的数据置换代价提出了适用于SSD的缓冲区管理算法.基于多级缓存存储系统对FAMC的性能进行了评测,实验结果表明,FAMC可以大幅度降低系统响应时间,减少磁盘I/O. 相似文献