首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Hill  M.D. 《Computer》1988,21(12):25-40
Direct-mapped caches are defined, and it is shown that trends toward larger cache sizes and faster hit times favor their use. The arguments are restricted initially to single-level caches in uniprocessors. They are then extended to two-level cache hierarchies. How and when these arguments for caches in uniprocessors apply to caches in multiprocessors are also discussed  相似文献   

2.
《Computer Networks》2002,38(6):779-794
This paper describes the design and use of a synthetic web proxy workload generator called ProWGen to investigate the sensitivity of web proxy cache replacement policies to five selected web workload characteristics. Three representative cache replacement policies are considered in the simulation study: a recency-based policy called least-recently-used, a frequency-based policy called least-frequently-used-with-aging, and a size-based policy called greedy-dual-size.Trace-driven simulations with synthetic workloads from ProWGen show the relative sensitivity of these cache replacement policies to three web workload characteristics: the slope of the Zipf-like document popularity distribution, the degree of temporal locality in the document referencing behaviour, and the correlation (if any) between document size and document popularity. The three replacement policies are relatively insensitive to the percentage of one-timers in the workload, and to the Pareto tail index of the heavy-tailed document size distribution. Performance differences between the three cache replacement policies are also highlighted.  相似文献   

3.
In order to meet the ever-increasing computing requirement in the embedded market, multiprocessor chips were proposed as the best way out. In this work we investigate the energy consumption in these embedded MPSoC systems. One of the efficient solutions to reduce the energy consumption is to reconfigure the cache memories. This approach was applied for one cache level/one processor architecture, but has not yet been investigated for multiprocessor architecture with two level caches. The main contribution of this paper is to explore two level caches (L1/L2) multiprocessor architecture by estimating the energy consumption. Using a simulation platform, we first built a multiprocessor architecture, and then we propose a new algorithm that tunes the two-level cache memory hierarchy (L1 and L2). The tuning caches approach is based on three parameters: cache size, line size, and associativity. To find the best cache configuration, the application is divided into several execution intervals. And then, for each interval, we generate the best cache configuration. Finally, the approach is validated using a set of open source benchmarks; Spec 2006, Splash-2, MediaBench and we discuss the performance in terms of speedup and energy reduction.  相似文献   

4.
Hardware and software cache optimizations are active fields of research, that have yielded powerful but occasionally complex designs and algorithms. The purpose of this paper is to investigate the performance of combined through simple software and hardware optimizations. Because current caches provide little flexibility for exploiting temporal and spatial locality, two hardware modifications are proposed to support these two kinds of locality. Spatial locality is exploited by using large virtual cache lines which do not exhibit the performance flaws of large physical cache lines. Temporal locality is exploited by minimizing cache pollution with a bypass mechanism that still allows to exploit spatial locality. Subsequently, it is shown that simple software informations on the spatial/temporal locality of array references, as provided by current data locality optimization algorithms, can be used to increase cache performance significantly. The performance and design tradeoffs of the proposed mechanisms are discussed, Software-assisted caches are also shown to provide a very convenient support for further enhancement of data locality optimizations.  相似文献   

5.
eDRAM cells have been considered as a promising alternative to conventional SRAM cells and already adopted in commercial processors. However, eDRAM cells need to be refreshed periodically, resulting in non-negligible energy and performance overhead. Moreover, under process variations, retention time of eDRAM cells exhibits non-uniform distributions. This phenomenon affects both manufacturing yield and eDRAM refresh burden. In this paper, we first analyze eDRAM module (cache) yield and retention time failure patterns under process variations. Based on our analysis, we disclose most of the failing cache lines have only one faulty cell and propose a cost-efficient technique to save those one-cell failing cache lines. Our technique maintains a one-cell failing line (OFL) buffer which manages the status of the one-cell failing cache lines. By effectively curing one-cell failing lines, our technique significantly improves manufacturing yield by up to 46.1% under the identical refresh intervals. In addition, our technique can be used to loosen refresh intervals with comparable yield. By using the loosened refresh intervals, our technique reduces energy per instruction and improves performance by up to 19.9% and 1.3%, respectively.  相似文献   

6.
Current computer architectures employ caching to improve the performance of a wide variety of applications. One of the main characteristics of such cache schemes is the use of block fetching whenever an uncached data element is accessed. To maximize the benefit of the block fetching mechanism, we present novel cache-aware and cache-oblivious layouts of surface and volume meshes that improve the performance of interactive visualization and geometric processing algorithms. Based on a general I/O model, we derive new cache-aware and cache-oblivious metrics that have high correlations with the number of cache misses when accessing a mesh. In addition to guiding the layout process, our metrics can be used to quantify the quality of a layout, e.g. for comparing different layouts of the same mesh and for determining whether a given layout is amenable to significant improvement. We show that layouts of unstructured meshes optimized for our metrics result in improvements over conventional layouts in the performance of visualization applications such as isosurface extraction and view-dependent rendering. Moreover, we improve upon recent cache-oblivious mesh layouts in terms of performance, applicability, and accuracy.  相似文献   

7.
Efficient constraint handling techniques are of great significance when Evolutionary Algorithms (EAs) are applied to constrained optimization problems (COPs). Generally, when use EAs to deal with COPs, equality constraints are much harder to satisfy, compared with inequality constraints. In this study, we propose a strategy named equality constraint and variable reduction strategy (ECVRS) to reduce equality constraints as well as variables of COPs. Since equality constraints are always expressed by equations, ECVRS makes use of the variable relationships implied in such equality constraint equations. The essence of ECVRS is it makes some variables of a COP considered be represented and calculated by some other variables, thereby shrinking the search space and leading to efficiency improvement for EAs. Meanwhile, ECVRS eliminates the involved equality constraints that providing variable relationships, thus improves the feasibility of obtained solutions. ECVRS is tested on many benchmark problems. Computational results and comparative studies verify the effectiveness of the proposed ECVRS.  相似文献   

8.
For an ISP (Internet Service Provider) that has deployed P2P caches in more than one ASs (autonomous systems), cooperative caching which makes their caches cooperate with each other can save more cost of carrying P2P traffic than independent caching. However, existing cooperative caching algorithms only use objects’ popularity as the measurement to decide which objects should be cached, and cost on intra-ISP links that has great impact on the benefits of cooperative caching is not considered. In this paper, we first model the cooperative caching problem as a NP-Complete problem, which is based on our analysis about the cost of serving requests with consideration of both the objects’ popularity and the cost on intra-ISP links. Then we propose a novel cooperative caching algorithm named cLGV (Cooperative, Lowest Global Value). The cLGV algorithm uses a new concept global value to estimate the benefits of caching or replacing an object in the cooperative caching system, and the global value of each object is evaluated according to not only objects’ popularity in each AS but also cost on intra-ISP links among ASs. Results of both synthetic and real traces driven simulations indicate that our cLGV algorithm can save the cost of carrying P2P traffic at least 23 % higher than that of existing cooperative caching algorithms.  相似文献   

9.
《Computer Networks》2002,38(6):795-808
Web content caches are often placed between end users and origin servers as a mean to reduce server load, network usage, and ultimately, user-perceived latency. Cached objects typically have associated expiration times, after which they are considered stale and must be validated with a remote server (origin or another cache) before they can be sent to a client. A considerable fraction of cache “hits” involve stale copies that turned out to be current. These validations of current objects have small message size, but nonetheless, often induce latency comparable to full-fledged cache misses. Thus, the functionality of caches as a latency-reducing mechanism highly depends not only on content availability but also on its freshness. We propose policies for caches to proactively validate selected objects as they become stale, and thus allow for more client requests to be processed locally. Our policies operate within the existing protocols and exploit natural properties of request patterns such as frequency and recency. We evaluated and compared different policies using trace-based simulations.  相似文献   

10.
11.
Several recently developed model order reduction methods for fast simulation of large-scale dynamical systems with two or more parameters are reviewed. Besides, an alternative approach for linear parameter system model reduction as well as a more efficient method for nonlinear parameter system model reduction are proposed in this paper. Comparison between different methods from theoretical elegancy to complexity of implementation are given. By these methods, a large dimensional system with parameters can be reduced to a smaller dimensional parameter system that can approximate the original large sized system to a certain degree for all the parameters.  相似文献   

12.
A processor consumes far less energy running tasks requiring a low supply voltage than it does executing high-performance tasks. Effective voltage-scheduling techniques take advantage of this situation by using software to dynamically vary supply voltages, thereby minimizing energy consumption and accommodating timing constraints  相似文献   

13.
In distributed query processing systems, load balancing plays an important role in maximizing system throughput. When queries can leverage cached intermediate results, improving the cache hit ratio becomes as important as load balancing in query scheduling, especially when dealing with computationally expensive queries. The scheduling policies must be designed to take into consideration the dynamic contents of the distributed caching infrastructure. In this paper, we propose and discuss several distributed query scheduling policies that directly consider the available cache contents by employing distributed multidimensional indexing structures and an exponential moving average approach to predicting cache contents. These approaches are shown to produce better query plans and faster query response times than traditional scheduling policies that do not predict dynamic contents in distributed caches. We experimentally demonstrate the utility of the scheduling policies using MQO, which is a distributed, Grid-enabled, multiple query processing middleware system we developed to optimize query processing for data analysis and visualization applications.  相似文献   

14.
Coordinated placement and replacement for large-scale distributed caches   总被引:5,自引:0,他引:5  
In a large-scale information system such as a digital library or the Web, a set of distributed caches can improve their effectiveness by coordinating their data placement decisions. Using simulation, we examine three practical cooperative placement algorithms, including one that is provably close to optimal, and we compare these algorithms to the optimal placement algorithm and several cooperative and noncooperative replacement algorithms. We draw five conclusions from these experiments: 1) cooperative placement can significantly improve performance compared to local replacement algorithms, particularly when the size of individual caches is limited compared to the universe of objects; 2) although the amortizing placement algorithm is only guaranteed to be within 14 times the optimal, in practice it seems to provide an excellent approximation of the optimal; 3) in a cooperative caching scenario, the recent greedy-dual local replacement algorithm performs much better than the other local replacement algorithms; 4) our hierarchical-greedy-dual replacement algorithm yields further improvements over the greedy-dual algorithm especially when there are idle caches in the system; and 5) a key challenge to coordinated placement algorithms is generating good predictions of access patterns based on past accesses.  相似文献   

15.
Loopback: exploiting collaborative caches for large-scale streaming   总被引:1,自引:0,他引:1  
In this paper, we propose a Loopback approach in a two-level streaming architecture to exploit collaborative client/proxy buffers for improving the quality and efficiency of large-scale streaming applications. At the upper level we use a content delivery network (CDN) to deliver video from a central server to proxy servers. At the lower level a proxy server delivers video with the help of collaborative client caches. In particular, a proxy server and its clients in a local domain cache different portions of a video and form delivery loops. In each loop, a single video stream originates at the proxy, passes through a number of clients, and finally is passed back to the proxy. As a result, with limited bandwidth and storage space contributed by collaborative clients, we are able to significantly reduce the required network bandwidth, I/O bandwidth, and cache space of a proxy. Furthermore, we develop a local repair scheme to address the client failure issue for enhancing service quality and eliminating most required repairing load at the central server. For popular videos, our local repair scheme is able to handle most of single-client failures without service disruption and retransmissions from the central server. Our analysis and simulations have shown the effectiveness of the proposed scheme.  相似文献   

16.
A distributed parallel alarm management strategy based on massive historical alarms and distributed clustering algorithm is proposed to reduce the number of alarms presented to operators in modern chemical plants. Due to the large and growing scale of historical alarms as the basis of analysis, it is difficult for traditional alarm management strategy to store and analyze all alarms efficiently. In this paper, by designing the row key and storage structure in a distributed extensible NoSQL database, the strategy spreads alarm data in a group of commercial machines, which ensures the capacity and scalability of the whole system. Meanwhile, Distributed Parallel Query Model (DPQM) proposed as a unified query model provides efficient query and better integration of distributed platform. Based on the characteristics of alarms and time-delay correlation of alarm occurrence, alarm similarity criteria are proposed to effectively identify repetitive and homologous alarms. In order to group massive alarm data, a new distributed clustering algorithm is designed to work concurrently in MapReduce frameworks. The test results using alarm data from real chemical plants show that the strategy is better than traditional method based on MySQL at system performance, and provides excellent redundant alarm suppression in both normal situation and alarm flooding situation.  相似文献   

17.
无线传感器网络中能量保护策略的研究   总被引:2,自引:2,他引:2  
姜华  袁晓兵  童琦  刘海涛 《计算机工程与设计》2006,27(21):3951-3955,3994
无线传感器网络是一组带有无线收发装置的传感器节点组成的临时性的网络自治系统,由于无线传感器网络的节点是用有限寿命的电池来提供的,因此能量保护策略成为该网络所有协议层的关键问题。从节点级、链路级和网络级3个层次总结和评估了适用于无线传感器网络的能量保护策略;在网络层提出了基于信道接入分簇算法的路由协议,并简述了该算法的主要实现过程;通过OPNET仿真给出相关结果。  相似文献   

18.
It is shown that the impulse energy approximation technique for model reduction can be improved by scaling in the frequency domain. The method remains simple to use and retains the stability property of impulse energy approximation. Criteria are given for choosing an appropriate scaling parameter. An example illustrates the technique  相似文献   

19.
Nonuniform cache access designs solve the on-chip wire delay problem for future large integrated caches. By embedding a network in the cache, NUCA designs let data migrate within the cache, clustering the working set nearest the processor. The authors propose several designs that treat the cache as a network of banks and facilitate nonuniform accesses to different physical regions. NUCA architectures offer low-latency access, increased scalability, and greater performance stability than conventional uniform access cache architectures.  相似文献   

20.
In traditional cache structures, entries in the data array and the tag array are tightly coupled, that is, entries in both arrays are one-to-one mapped. In this paper, we decouple the traditional one-to-one mapping between data and tag arrays for cache structures. The key idea is that the block tag is stored in different tag arrays such that these tag and data arrays are accessed by different indices. The freedom due to decoupling the tag association may bring several advantages. We use a formal inference to verify if a cache structure can give correct decoupled addressing. We summarize three generalized decoupled models that can also be applied to other previously proposed approaches in the literature. We evaluate our schemes and compare with other approaches by trace-driven simulation. The simulation results show that the decoupled mechanisms can reduce significant tag area with a slight increase of the average access time per instruction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号