首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Data prefetching is a well-known technique to hide the memory latency in the last-level cache (LCC). Among many prefetching methods in recent years, the Global History Buffer (GHB) proves to be efficient in terms of cost and speedup. In this paper, we show that a fixed value for detecting patterns and prefetch degree makes GHB to (1) be conservative while there are more opportunities to create new addresses and (2) generate wrong addresses in the presence of constant strides. To resolve these problems, we separate the pattern length from the prefetching degree. The result is an aggressive prefetcher that can generate more addresses with a given pattern length. Furthermore with a variable pattern length mechanism, constant strides are grouped, such that more accurate patterns are detected. As the aggressiveness of this prefetcher is relatively high, we further propose an efficient throttling procedure to reduce the negative effects of wrong prefetching using a new measure of cache pollution. This adaptive method is suitable for CMP processors where the prefetcher resides in the shared LCC. Simulation results with a mixed suite of integer and floating point benchmarks from SPEC CPU2006 show that on a single-core processor both aggressive and adaptive methods outperform existing prefetchers by 48 and 28 %, respectively, while increasing the memory traffic by 20 and 14 %, respectively. Further on an 8-core CMP with a mix of multiprogrammed workloads, the adaptive method outperforms the state-of-the-art throttling methods by 8 % in speedup, while reducing the memory traffic by 3 %.  相似文献   

2.
Web prefetching is an attractive solution to reduce the network resources consumed by Web services as well as the access latencies perceived by Web users. Unlike Web caching, which exploits the temporal locality, Web prefetching utilizes the spatial locality of Web objects. Specifically, Web prefetching fetches objects that are likely to be accessed in the near future and stores them in advance. In this context, a sophisticated combination of these two techniques may cause significant improvements on the performance of the Web infrastructure. Considering that there have been several caching policies proposed in the past, the challenge is to extend them by using data mining techniques. In this paper, we present a clustering-based prefetching scheme where a graph-based clustering algorithm identifies clusters of “correlated” Web pages based on the users’ access patterns. This scheme can be integrated easily into a Web proxy server, improving its performance. Through a simulation environment, using a real data set, we show that the proposed integrated framework is robust and effective in improving the performance of the Web caching environment.  相似文献   

3.
Main memory cache performance continues to play an important role in determining the overall performance of object-oriented, object-relational and XML databases. An effective method of improving main memory cache performance is to prefetch or pre-load pages in advance to their usage, in anticipation of main memory cache misses. In this paper we describe a framework for creating prefetching algorithms with the novel features of path and cache consciousness. Path consciousness refers to the use of short sequences of object references at key points in the reference trace to identify paths of navigation. Cache consciousness refers to the use of historical page access knowledge to guess which pages are likely to be main memory cache resident most of the time and then assumes these pages do not exist in the context of prefetching. We have conducted a number of experiments comparing our approach against four highly competitive prefetching algorithms. The results shows our approach outperforms existing prefetching techniques in some situations while performing worse in others. We provide guidelines as to when our algorithm should be used and when others maybe more desirable.  相似文献   

4.
Second-level buffer cache management   总被引:2,自引:0,他引:2  
Buffer caches are commonly used in servers to reduce the number of slow disk accesses or network messages. These buffer caches form a multilevel buffer cache hierarchy. In such a hierarchy, second-level buffer caches have different access patterns from first-level buffer caches because accesses to a second-level are actually misses from a first-level. Therefore, commonly used cache management algorithms such as the least recently used (LRU) replacement algorithm that work well for single-level buffer caches may not work well for second-level. We investigate multiple approaches to effectively manage second-level buffer caches. In particular, we report our research results in 1) second-level buffer cache access pattern characterization, 2) a new local algorithm called multi-queue (MQ) that performs better than nine tested alternative algorithms for second-level buffer caches, 3) a set of global algorithms that manage a multilevel buffer cache hierarchy globally and significantly improve second-level buffer cache hit ratios over corresponding local algorithms, and 4) implementation and evaluation of these algorithms in a real storage system connected with commercial database servers (Microsoft SQL server and Oracle) running industrial-strength online transaction processing benchmarks.  相似文献   

5.
Cache performance is strongly influenced by the type of locality embodied in programs. In particular, multimedia programs handling images and videos are characterized by a bidimensional spatial locality, which is not adequately exploited by standard caches. In this paper we propose novel cache prefetching techniques for image data, called neighbor prefetching, able to improve exploitation of bidimensional spatial locality. A performance comparison is provided against other assessed prefetching techniques on a multimedia workload (with MPEG-2 and MPEG-4 decoding, image processing, and visual object segmentation), including a detailed evaluation of both the miss rate and the memory access time. Results prove that neighbor prefetching achieves a significant reduction in the time due to delayed memory cycles (more than 97% on MPEG-4 with respect to 75% of the second performing technique). This reduction leads to a substantial speedup on the overall memory access time (up to 140% for MPEG-4). Performance has been measured with the PRIMA trace-driven simulator, specifically devised to support cache prefetching.  相似文献   

6.
Cache being the fastest medium in memory hierarchy has a vital role to play for fully exploiting available resources, concealing latencies in IO operations, languishing the impact of these latencies and hence in improving system response time. Despite plenty of efforts made, caches alone cannot comprehend larger storage requirements without prefetching. Cache prefetching is speculatively fetching data to restrain all delays. However, effective prefetching requires a strong prediction mechanism to load relevant data with higher degree of accuracy. In order to ameliorate the predictive performance of cache prefetching, we applied the hybrid of two AI approaches named case based reasoning (CBR) and artificial neural networks (ANN). CBR maintains the past experience and ANN are used in adaptation phase of CBR instead of employing static rule base. The novelty of technique in this domain is valued due to hybrid of two approaches as well as usage of suffix tree in populating the CBR’s case base. Suffix trees provide rich data patterns for populating case base and greatly enhanced the overall performance. A number of evaluations from different aspects with varying parameters are presented (along with some findings) where the efficacy of our technique is affirmed with improved predictive accuracy and reduced level of associated costs.  相似文献   

7.
Anna Haé 《Acta Informatica》1993,30(2):131-146
This paper proposes performance and reliability improvement by using new algorithms for asynchronous operations in disk buffer cache memory. These algorithms allow for writing the files into the buffer cache by the processes and consider the number of active processes in the system and the length of the queue to the disk buffer cache. Writing the contents of the buffer cache to the disk depends on the system load and the write activity. Performance and reliability measures including the elapsed time of writing a file into the buffer cache, the waiting time to start writing a file, and the mean number of blocks written to the disk between system failures are used to show performance and reliability improvement by using the algorithms. Sensitivity analysis is used to influence the algorithms' design. Examples of real systems are used to show the numerical results of performance and reliability improvement in different systems with various disk cache parameters and file sizes.  相似文献   

8.
In this paper, we study the problem of how to maximize the throughput of a continuous-media system, given fixed amounts of buffer space and disk bandwidth both pre-determined at design time. Our approach is to maximize the utilizations of disk and buffers. We propose doing so in two ways. First, we analyze a scheme that allows multiple streams to share buffers. Our analysis and preliminary simulation results indicate that buffer sharing could lead to as much as 50% reduction in total buffer requirement. Second, we develop three prefetching strategies: SP, IP1 and IP2. As demonstrated by SP, straightforward prefetching is not effective at all. In contrast, IP1 and IP2, which prefetch more intelligently than does SP, could be valuable in maximizing the effective use of buffers and disk. Our preliminary simulation results show that IP1 and IP2 could lead to a 40% improvement in throughput.  相似文献   

9.
The effectiveness of the buffer cache replacement is critical to the performance of I/O systems. In this paper, we propose a degree of inter-reference gap (DIG) based block replacement scheme. This scheme keeps the simplicity of the least recently used (LRU) scheme and does not depend on the detection of access regularities. The proposed scheme is based on the low inter-reference recency set (LIRS) scheme, which is currently known to be very effective. However, the proposed scheme employs several history information items whereas the LIRS scheme uses only one history information item. The overhead of the proposed scheme is almost negligible. To evaluate the performance of the proposed scheme, the comprehensive trace-driven computer simulation is used in general access patterns. Our simulation results show that the cache hit ratio (CHR) in the proposed scheme is improved as much as 65.3% (with an average of 26.6%) compared to the LRU for the same workloads, and up to 6% compared to the LIRS in multi3 trace.  相似文献   

10.
An SSD generally has a small memory, called cache buffer, to increase its performance and the frequently accessed data are maintained in this cache buffer. These cached data must periodically write back to the NAND Flash memory to prevent the data loss due to sudden power-off, and it should immediately flush all dirty data items into a non-volatile storage media (i.e., NAND Flash memory), when receiving a flush command, while the flush command is supported in Serial ATA (SATA) and Serial Attached SCSI (SAS). Thus, a flush command is an important factor to give significant impact on SSD performance.In this paper, we have investigated the impact of a flush command on SSD performance and have conducted in-depth experiments with versatile workloads, using the modified FlashSim simulator. Our performance measurements using PC and server workloads provide several interesting conclusions. First, a cache buffer without a flush command could improve SSD performance as a cache buffer size increases, since more requested data could be handled in the cache buffer. Second, our experiments have revealed that a flush command might give a negative impact on SSD performance. The average response time per request with a flush command is getting worse compared to not supporting the flush command, as cache buffer size increases. Finally, we have proposed the backend flushing scheme to nullify the negative performance impact of the flush command. The backend flushing scheme first writes the requested data into a cache buffer and sends the acknowledgment of the request completion to a host system. Then, it writes back the data in the cache buffer to NAND Flash memory. Thus, the proposed scheme could improve SSD performance since it might reduce the number of the dirty data items in a cache buffer to write back to NAND Flash memory.All these results suggest that a flush command could give a negative impact on SSD performance and our proposed backend flushing scheme could improve the SSD performance while supporting a flush command.  相似文献   

11.
Prefetching is a simple and general method for single-chain parallelisation of the Metropolis-Hastings algorithm based on the idea of evaluating the posterior in parallel and ahead of time. Improved Metropolis-Hastings prefetching algorithms are presented and evaluated. It is shown how to use available information to make better predictions of the future states of the chain and increase the efficiency of prefetching considerably. The optimal acceptance rate for the prefetching random walk Metropolis-Hastings algorithm is obtained for a special case and it is shown to decrease in the number of processors employed. The performance of the algorithms is illustrated using a well-known macroeconomic model. Bayesian estimation of DSGE models, linearly or nonlinearly approximated, is identified as a potential area of application for prefetching methods. The generality of the proposed method, however, suggests that it could be applied in other contexts as well.  相似文献   

12.
On coupling multiple systems with a global buffer   总被引:1,自引:0,他引:1  
We conduct a performance study of coupling multiple systems with a global buffer, and present several results obtained from a multiple-system simulator. This simulator has been run against three workloads, and the coupled system behavior with these three different inputs is studied. Several statistics, including those on local and global buffer hits, page writes to the global buffer, cross-invalidations, and castouts are reported. Their relationship to the degree of data skew is explored. Moreover, in addition to the update-caching approach, a design alternative for the use of a global buffer, namely read-caching, is explored. In read-caching, not only updated pages but also pages read by each node are kept in the global buffer, thereby facilitating other nodes access to the same pages at the cost of a higher global buffer usage. Also investigated is the case of no-caching, i.e., without using a global buffer. Several simulation results are presented and analyzed  相似文献   

13.
The I/O performance of applications in multiple-disk systems can be improved by overlapping disk accesses. This requires the use of appropriate prefetching and buffer management algorithms that ensure the most useful blocks are accessed and retained in the buffer. In this paper, we answer several fundamental questions on prefetching and buffer management for distributed-buffer parallel I/O systems. First, we derive and prove the optimality of an algorithm, P-min, that minimizes the number of parallel I/Os. Second, we analyze P-con, an algorithm that always matches its replacement decisions with those of the well-known demand-paged MIN algorithm. We show that P-con can become fully sequential in the worst case. Third, we investigate the behavior of on-line algorithms for multiple-disk prefetching and buffer management. We define and analyze P-Iru, a parallel version of the traditional LRU buffer management algorithm. Unexpectedly, we find that the competitive ratio of P-Iru is independent of the number of disks. Finally, we present the practical performance of these algorithms on randomly generated reference strings. These results confirm the conclusions derived from the analysis on worst case inputs  相似文献   

14.
A new dynamic buffer allocation strategy based on the notion of marginal gains is presented for the buffer cache that is used by the operating system to store frequently accessed disk blocks in main memory, and the performance of the proposed strategy is compared with those of previous allocation strategies. In the proposed strategy, marginal gain values are predicted by exploiting functions that approximate the expected number of buffer hits per unit time. Experimental results from both trace-driven simulation and an actual implementation in the FreeBSD operating system show that the proposed strategy accurately predicts the marginal gain values for various workloads resulting in significantly improved buffer hit ratios.  相似文献   

15.
Due to the explosive increases of data from both the cyber and physical worlds, the demand for database support in embedded systems is increasing. Databases for embedded systems, or embedded databases, are expected to provide timely in situ data services under various resource constraints, such as limited energy. However, traditional buffer cache management schemes, in which the primary goal is to minimize the number of I/O operations, is problematic since they do not consider the constraints of modern embedded devices such as limited energy and distinctive underlying storage. In particular, due to asymmetric read/write characteristics of flash memory-based storage of modern embedded devices, minimum buffer cache misses neither coincide with minimum power consumption nor minimum I/O deadline misses. In this paper we propose a novel power- and time-aware buffer cache management scheme for embedded databases. A novel multi-dimensional feedback control architecture is proposed and the characteristics of underlying storage of modern embedded devices is exploited for the simultaneous support of the desired I/O power consumption and the I/O deadline miss ratio. We have shown through an extensive simulation that our approach satisfies both power and timing requirements in I/O operations under a variety of workloads while consuming significantly smaller buffer space than baseline approaches.  相似文献   

16.
In this paper, a method combining the loop pipelining technique with data prefetching, called Partition Scheduling with Prefetching (PSP), is proposed. In PSP, the iteration space is first divided into regular partitions. Then a two-part schedule, consisting of the ALU and memory parts, is produced and balanced to produce high throughput. These two parts are executed simultaneously, and hence, the remote memory latencies are overlapped. We study the optimal partition shape and size so that a well-balanced overall schedule can be obtained. Experiments on DSP benchmarks show that the proposed methodology consistently produces optimal or near optimal solutions  相似文献   

17.
18.
Caches are essential to bridge the gap between the high latency main memory and the fast processor pipeline. Standard processor architectures implement two first-level caches to avoid a structural hazard in the pipeline: an instruction cache and a data cache. For tight worst-case execution times it is important to classify memory accesses as either cache hit or cache miss. The addresses of instruction fetches are known statically and static cache hit/miss classification is possible for the instruction cache. The access to data that is cached in the data cache is harder to predict statically. Several different data areas, such as stack, global data, and heap allocated data, share the same cache. Some addresses are known statically, other addresses are only known at runtime. With a standard cache organization all those different data areas must be considered by worst-case execution time analysis. In this paper we propose to split the data cache for the different data areas. Data cache analysis can be performed individually for the different areas. Access to an unknown address in the heap does not destroy the abstract cache state for other data areas. Furthermore, we propose to use a small, highly associative cache for the heap area. We designed and implemented a static analysis for this cache, and integrated it into a worst-case execution time analysis tool.  相似文献   

19.
This paper proposes using a user-level memory thread (ULMT) for correlation prefetching. In this approach, a user thread runs on a general-purpose processor in main memory, either in the memory controller chip or in a DRAM chip. The thread performs correlation prefetching in software, sending the prefetched data into the L2 cache of the main processor. This approach requires minimal hardware beyond the memory processor: The correlation table is a software data structure that resides in main memory, while the main processor only needs a few modifications to its L2 cache so that it can accept incoming prefetches. In addition, the approach has wide applicability, as it can effectively prefetch even for irregular applications. Finally, it is very flexible, as the prefetching algorithm can be customized by the user on an application basis. Our simulation results show that, through a new design of the correlation table and prefetching algorithm, our scheme delivers good results. Specifically, nine mostly-irregular applications show an average speedup of 1.32. Furthermore, our scheme works well in combination with a conventional processor-side sequential prefetcher, in which case the average speedup increases to 1.46. Finally, by exploiting the customization of the prefetching algorithm, we increase the average speedup to 1.53.  相似文献   

20.
This paper presents a Page rank-based prefetching technique for accesses to Web page clusters. The approach uses the link structure of a requested page to determine the “most important” linked pages and to identify the page(s) to be prefetched. The underlying premise of our approach is that in the case of cluster accesses, the next pages requested by users of the Web server are typically based on the current and previous pages requested. Furthermore, if the requested pages have a lot of links to some “important” page, that page has a higher probability of being the next one requested. An experimental evaluation of the prefetching mechanism is presented using real server logs. The results show that the Page rank-based scheme does better than random prefetching for clustered accesses, with hit rates of 90% in some cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号