首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 764 毫秒
1.
Energy consumption is one of the most significant aspects of large-scale storage systems where multilevel caches are widely used. In a typical hierarchical storage structure, upper-level storage serves as a cache for the lower level, forming a distributed multilevel cache system. In the past two decades, several classic LRU-based multilevel cache policies have been proposed to improve the overall I/O performance of storage systems. However, few power-aware multilevel cache policies focus on the storage devices in the bottom level, which consume more than 27% of the energy of the whole system [1]. To address this problem, we propose a novel power-aware multilevel cache (PAM) policy that can reduce the energy consumption of high-performance and I/O bandwidth storage devices. In our PAM policy, an appropriate number of cold dirty blocks in the upper level cache are identified and selected to flush directly to the storage devices, providing high probability extension of the lifetime of disks in standby mode. To demonstrate the effectiveness of our proposed policy, we conduct several simulations with real-world traces. Compared to existing popular cache schemes such as PALRU, PB-LRU, and Demote, PAM reduces power consumption by up to 15% under different I/O workloads, and improves energy efficiency by up to 50.5%.  相似文献   

2.
提高数据吞吐率和降低能耗对数据中心具有重要作用。Flash具有存储密度高、功耗低的特性。采用Flash作为磁盘存储的缓存来构建两级缓存结构的存储系统是提高数据吞吐率和降低能耗的有效方法之一。本文首先介绍了Flash、基于Flash文件系统等相关知识,其次详细阐述了Flash存储的三种应用模式及其结构特性,接着介绍了针对两级缓存结构的调度策略,最后对本文进行了总结和展望。  相似文献   

3.
With the popularity of column-store databases, modern multi-core CPUs, and general-purpose computing on graphics processing units (GPGPUs), there will be radical changes in how processing is done in the online analytical processing (OLAP) and data warehousing fields. Cube computation is a core and time-consuming problem which has been researched extensively. However, most of the algorithms have been proposed without considering the prevalent multi-core architectures and column storage. This paper presents a new parallel cube algorithm that takes advantage of multi-core architectures. We first propose a cache-conscious bottom-up computation (BUC) algorithm called CC-BUC that adopts an integrated bottom-up and breadth-first partitioning order. Each dimension is separately stored and processed. In processing each dimension, breadth-first data scanning and results outputting reduce memory I/O and enhance cache locality. Cache misses are limited in a dimension scope, and translation lookaside buffer (TLB) misses are reduced. Based on CC-BUC, we give a multi-core architecture-based cube algorithm called MC-Cubing. Multiple partitions are processed simultaneously and multiple threads undergo parallel execution inside each partition. MC-Cubing is consistent with multi-core architectures and high parallelism. The layout and associated algorithms take advantage of single instruction, multiple data (SIMD) instructions and thread-level parallelism (TLP). We implement and demonstrate the effectiveness of MC-Cubing on two multi-core architectures: multi-core CPUs and GPUs. Experimental results show that the MC-Cubing algorithm can speed up nearly six times faster than BUC in real datasets.  相似文献   

4.
To confer the robustness and high quality of service, modern computing architectures running real-time applications should provide high system performance and high timing predictability. Cache memory is used to improve performance by bridging the speed gap between the main memory and CPU. However, the cache introduces timing unpredictability creating serious challenges for real-time applications. Herein, we introduce a miss table (MT) based cache locking scheme at level-2 (L2) cache to further improve the timing predictability and system performance/power ratio. The MT holds information of block addresses related to the application being processed which cause most cache misses if not locked. Information in MT is used for efficient selection of the blocks to be locked and victim blocks to be replaced. This MT based approach improves timing predictability by locking important blocks with the highest number of misses inside the cache for the entire execution time. In addition, this technique decreases the average delay per task and total power consumption by reducing cache misses and avoiding unnecessary data transfers. This MT based solution is effective for both uniprocessors and multicores. We evaluate the proposed MT-based cache locking scheme by simulating an 8-core processor with 2 levels of caches using MPEG4 decoding, H.264/AVC decoding, FFT, and MI workloads. Experimental results show that in addition to improving the predictability, a reduction of 21% in mean delay per task and a reduction of 18% in total power consumption are achieved for MPEG4 (and H.264/AVC) by using MT and locking 25% of the L2. The MT results in about 5% delay and power reductions on these video applications, possibly more on applications with worse cache behavior. For the FFT and MI (and other) applications whose code fits inside the level-1 instruction (I1) cache, the mean delay per task increases only by 3% and total power consumption increases by 2% due to the addition of the MT.  相似文献   

5.
能耗已经成为嵌入式系统设计中一个重要的约束条件.嵌入式系统是典型的软件驱动执行系统,硬件的电路活动直接导致系统参数功耗,而软件中的指令执行和数据存取等操作底层的微处理、总线、Cache、存储器和I/O接口等硬件的活动都会间接的导致系统参数功耗.在现代“低碳经济”的背景下嵌入系统的功耗已经引起人们关注的一个重点.而软件设计早期对高层所作的功率耗评估和优化对整个系统的的能耗影响最为显著.本文通过构建算法级能耗估算模型,并通过实例采用神经网络算法、遗传算法等进行能耗求解,同时在求解过程中进行能耗分析.  相似文献   

6.
The performance loss resulting from different cache misses is variable in modern systems for two reasons: 1) memory access latency is not uniform, and 2) the latency toleration ability of processor cor...  相似文献   

7.
高性能计算系统需要一个可靠高效的并行文件系统.Lustre集群文件系统是典型的基于对象存储的集群文件系统,它适合大数据量聚合I/O操作.大文件I/O操作能够达到很高的带宽,但是小文件I/O性能低下.针对导致Lustre的设计中不利于小文件I/O操作的两个方面,提出了Filter Cache方法.在Lustre的OST组件中设计一个存放小文件I/O数据的Cache,让OST端的小文件I/O操作异步进行,以此来减少用户感知的小文件I/O操作完成的时间,提高小文件I/O操作的性能.  相似文献   

8.
In embedded systems caches are very precious for keeping low the memory bandwidth and to allow employing slow and narrow off-chip devices. Conversely, the power and die size resources consumed by the cache force the embedded system designers to use small and simple cache memories. This kind of caches can experience poor performance because of their not flexible placement policy. In this scenario, a big fraction of the misses can originate from the mismatch between the cache behavior and the memory accesses' locality features (conflict misses).In this paper we analyze the conflict miss phenomenon and define a cache utilization measure. Then we propose an object level Cache Aware allocation Technique (CAT) to transform the application to fit the cache structure, minimize the number of conflict misses and maximize cache exploitation. The solution transforms the program layout using the standard functionalities of a linker.The CAT approach allowed the considered applications to deliver the same performance on two times and sometimes four times smaller caches. Moreover the CAT improved programs on direct-mapped caches outperformed the original versions on set-associative caches. In this way, the results highlight that our approach can help embedded system designers to meet the system requirements with smaller and simpler cache memories.  相似文献   

9.
现代嵌入式处理器中指令高速缓存的功耗十分显著,对此提出一种基于路访问轨迹的组相联指令高速缓存的低功耗策略,利用改进的指令高速缓存和转移目标缓存建立和维护运行时指令高速缓存的路访问轨迹来减少指令高速缓存命中检测及无关路访问.进一步提出了基于跨行访问前驱指针、转移前驱状态、转移前驱指针及转移目标索引的路访问轨迹信息维护策略用以降低信息重建的频度,从而更有效地利用已建立的路访问轨迹信息.实验结果表明:采用优化后的路访问轨迹策略的指令高速缓存的标志存储器访问和数据存储器访问分别降低到传统指令高速缓存的3.60%和27.70%.  相似文献   

10.
Deduplication technology has been increasingly used to reduce storage costs. Though it has been successfully applied to backup and archival systems, existing techniques can hardly be deployed in primary storage systems due to the associated latency cost of detecting duplicated data, where every unit has to be checked against a substantially large fingerprint index before it is written. In this paper we introduce Leach, for inline primary storage, a self-learning in-memory fingerprints cache to reduce the writing cost in deduplication system. Leach is motivated by the characteristics of real-world I/O workloads: highly data skew exist in the access patterns of duplicated data. Leach adopts a splay tree to organize the on-disk fingerprint index, automatically learns the access patterns and maintains hot working sets in cachememory, with a goal to service a majority of duplicated data detection. Leveraging the working set property, Leach provides optimization to reduce the cost of splay operations on the fingerprint index and cache updates. In comprehensive experiments on several real-world datasets, Leach outperforms conventional LRU (least recently used) cache policy by reducing the number of cache misses, and significantly improves write performance without great impact to cache hits.  相似文献   

11.
Power consumption is an important issue for cluster supercomputers as it directly affects running cost and cooling requirements. This paper investigates the memory energy efficiency of high-end data servers used for supercomputers. Emerging memory technologies allow memory devices to dynamically adjust their power states and enable free rides by overlapping multiple DMA transfers from different I/O buses to the same memory device. To achieve maximum energy saving, the memory management on data servers needs to judiciously utilize these energy-aware devices. As we explore different management schemes under five real-world parallel I/O workloads, we find that the memory energy behavior is determined by a complex interaction among four important factors: (1) cache hit rates that may directly translate performance gain into energy saving, (2) cache populating schemes that perform buffer allocation and affect access locality at the chip level, (3) request clustering that aims to temporally align memory transfers from different buses into the same memory chips, and (4) access patterns in workloads that affect the first three factors.  相似文献   

12.
尹洋  刘振军  许鲁 《软件学报》2009,20(10):2752-2765
随着计算规模越来越大,网络存储系统应用领域越来越广泛,对网络存储系统I/O性能要求也越来越高.在存储系统高负载的情况下,采用低速介质在客户机和网络存储系统的I/O路径上作为数据缓存也变得具有实际的意义.设计并实现了一种基于磁盘介质的存储系统块一级的缓存原型D-Cache.采用两级结构对磁盘缓存进行管理,并提出了相应的基于块一级的两级缓存管理算法.该管理算法有效地解决了因磁盘介质响应速度慢而带来的磁盘缓存管理难题,并通过位图的使用消除了磁盘缓存写Miss时的Copy on Write开销.原型系统的测试结果表明,在存储服务器高负载的情况下,缓存系统能够有效地提高系统的整体性能.  相似文献   

13.
刘朝斌  吴非 《计算机工程》2006,32(12):45-46,49
在分析嵌入式Linux下驱动程序设计特点基础上,提出了一种提高存储I/O性能的新的RAID算法,并实现了相应的设备驱动和cache管理技术。通过对RAID5存储子系统进行的测试和性能分析,证明该方法是有效和可行的。  相似文献   

14.
In present multi-core devices, the individual processors do not need to operate at the highest possible frequencies. Instead there is a need to reduce the power, complexity and area of individual processor components like caches. In this paper we propose a low area, high performance cache replacement policy for embedded processors called Hierarchical Non-Most-Recently-Used (H-NMRU). The H-NMRU is a parameterizable policy where we can trade-off performance with area. We extended the Dinero cache simulator with the H-NMRU policy and performed architectural exploration with a set of cellular and multimedia benchmarks. On a 16 way cache, a two level H-NMRU policy where the first and second levels have 8 and 2 branches, respectively, performs as good as the Pseudo-LRU policy with storage area saving of 27%. Compared to true LRU, H-NMRU on a 16 way cache saves huge amount of area (82%) with marginal increase of cache misses (3%). Similar result was also noticed on other cache like structures like branch target buffers. Therefore, the two level H-NMRU cache replacement policy (with associativity/2 and 2 branches on the two levels) is a very attractive option for caches on embedded processors with associativities greater than 4. We present a case-study where it can be used on the L2 cache with substantial gain in performance and area for single and dual core platforms.  相似文献   

15.
Memory devices can be used as storage systems to provide a lower latency that can be achieved by disk and flash storage. However, traditional buffered input/output (I/O) and direct I/O are not optimized for memory-based storages. Traditional buffered I/O includes a redundant memory copy with a disk cache. Traditional direct I/O does not support byte addressing. Memory-mapped direct I/O, which optimizes file operations for byte-addressable persistent memory and appears to the CPU as a main memory. However, it has an interface that is not always compatible with existing applications. In addition, it cannot be used for peripheral memory devices (e.g., networked memory devices and hardware RAM drives) that are not interfaced with the memory bus. This paper presents a new Linux I/O layer, byte direct I/O (BDIO), that can process byte-addressable direct I/O using the standard application programming interface. It requires no modification of existing application programs and can be used not only for the memory but also for the peripheral memory devices that are not addressable by a memory management unit. The proposed BDIO layer allows file systems and device drivers to easily support BDIO. The new I/O achieved 18% to 102% performance improvements in the evaluation experiments conducted with online transaction processing, file server, and desktop virtualization storage.  相似文献   

16.
随着高性能计算机逐步应用在大规模数据处理领域,存储系统将成为制约数据处理效率的主要瓶颈.在分析了影响数据密集型计算I/O性能若干关键因素的基础上,提出使用计算结点本地存储构建协作式非易失缓存、以分布式存储架构加速集中式存储架构的方法.该方法基于应用层协同使用分布化的本地存储资源,使用非易失存储介质构成大缓存空间,存放大规模数据分析的中间过程结果,以此实现高缓存命中率,并利用并发度约束控制等手段避免I/O竞争,充分利用本地存储的特定性能优势保证缓存加速效果,从而有效地提高了大规模数据处理过程的I/O效率.基于多平台多种I/O模式的测试结果证实了该方法的有效性,聚合I/O带宽具有高扩展性,典型数据密集应用的整体性能最大可提升6倍.  相似文献   

17.
Heterogeneous storage architectures combine the strengths of different storage devices in a synergistically useful fashion, and are increasingly being used in mobile storage systems. In this paper, we propose ARC-H, an adaptive cache replacement algorithm for heterogeneous storage systems consisting of a hard disk and a NAND flash memory. ARC-H employs a dynamically adaptive management policy based on ghost buffers and takes account of recency, I/O cost per device, and workload patterns in making cache replacement decisions. Realistic trace-driven simulations show that ARC-H reduces service time by up to 88% compared with existing caching algorithms with a 20 Mb cache. ARC-H also reduces energy consumption by up to 81%.  相似文献   

18.
In this work, we develop energy-aware disk scheduling algorithm for soft real-time I/O. Energy consumption is one of the major factors which bar the adoption of hard disk in mobile environment. Heat dissipation of large scale storage system also calls for an energy-aware scheduling technique to further increase the storage density. The basic idea in this work is to properly determine the I/O burst size so that device can be in standby mode between consecutive I/O bursts and that it can satisfy the soft real-time requirement. We develop an elaborate model which incorporates the energy consumption characteristics, overhead of mode transition in determining the appropriate I/O burst size and the respective disk operating schedule. Efficacy of energy-aware disk scheduling algorithm greatly relies on not only disk scheduling algorithm itself but also various operating system and device firmware related concerns. It is crucial that the various operating system level and device level features need to be properly addressed within disk scheduling framework. Our energy-aware disk scheduling algorithm successfully addresses a number of outstanding issues. First, we examine the effect of OS and hard disk firmware level prefetch policy and incorporate its effect in our disk scheduling framework. Second, our energy aware scheduling framework can allocate a certain fraction of disk bandwidth to handle sporadically arriving non real-time I/O’s. Third, we examine the relationship between lock granularity of the buffer management and energy consumption. We develop a prototype software with energy-aware scheduling algorithm. In our experiment, proposed algorithm can reduce the energy consumption to one fourth if we use energy-aware disk scheduling algorithm. However, energy-aware disk scheduling algorithm increases buffer requirement significantly, e.g., from 4 to 140 KByte. We carefully argue that the buffer overhead is still justifiable given the cost of DRAM chip and importance of energy management in modern mobile devices. The result of our work not only provides the energy efficient scheduling algorithm but also provides an important guideline in capacity planning of future energy efficient mobile devices. This paper is funded by KOSEF through Statistical Research Paper for Complex System at Seoul National University.  相似文献   

19.
Recent results in the Rio project at the University of Michigan show that it is possible to create an area of main memory that is as safe as disk from operating system crashes. This paper explores how to integrate the reliable memory provided by the Rio file cache into a database system. Prior studies have analyzed the performance benefits of reliable memory; we focus instead on how different designs affect reliability. We propose three designs for integrating reliable memory into databases: non-persistent database buffer cache, persistent database buffer cache, and persistent database buffer cache with protection. Non-persistent buffer caches use an I/O interface to reliable memory and require the fewest modifications to existing databases. However, they waste memory capacity and bandwidth due to double buffering. Persistent buffer caches use a memory interface to reliable memory by mapping it into the database address space. This places reliable memory under complete database control and eliminates double buffering, but it may expose the buffer cache to database errors. Our third design reduces this exposure by write protecting the buffer pages. Extensive fault tests show that mapping reliable memory into the database address space does not significantly hurt reliability. This is because wild stores rarely touch dirty, committed pages written by previous transactions. As a result, we believe that databases should use a memory interface to reliable memory. Received January 1, 1998 / Accepted June 20, 1998  相似文献   

20.
随着嵌入式应用领域的发展,系统功能的日趋复杂,嵌入式设备中使用嵌入式数据库越来越成为一种趋势。SQLite因其性能和功能上的优势被广泛的使用于嵌入式应用中。但是嵌入式设备CPU处理能力相对较低,存储器容量有限,制约了SQLite的性能。针对上述问题,根据高速缓存原理,简化SQL语句执行过程中的词法分析、语法分析过程,以减少运行过程中的时间消耗。实验表明,本方法可有效提高SQL语句执行效率,在保持可用性与可靠性的前提下,提升SQLite的整体性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号