首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 97 毫秒
1.
With the enormous and increasing user demand, I/O performance is one of the primary considerations to build a data center. Several new technologies in data centers, such as tiered storage, prompt the widespread usage of multilevel cache techniques. In these storage systems, the upper level storage typically serves as a cache for the lower level, which forms a distributed multilevel cache system. However, although many excellent multilevel cache algorithms have been proposed to improve the I/O performance, they still have potential to be enhanced by investigating the history information of hints. To address this challenge, in this paper, we propose a novel hint frequency based approach (HFA), to improve the overall multilevel cache performance of storage systems. The main idea of HFA is using hint frequencies (the total number of demotions/promotions by employing demote/promote hints) to efficiently explore the valuable history information of data blocks among multiple levels. HFA can be applied with several popular multilevel cache algorithms, such as Demote, Promote and Hint-K. Simulation results show that, compared with original multilevel cache algorithms such as Demote, Promote and Hint-K, HFA can improve the I/O performance by up to 20% under different I/O workloads.  相似文献   

2.
Heterogeneous storage architectures combine the strengths of different storage devices in a synergistically useful fashion, and are increasingly being used in mobile storage systems. In this paper, we propose ARC-H, an adaptive cache replacement algorithm for heterogeneous storage systems consisting of a hard disk and a NAND flash memory. ARC-H employs a dynamically adaptive management policy based on ghost buffers and takes account of recency, I/O cost per device, and workload patterns in making cache replacement decisions. Realistic trace-driven simulations show that ARC-H reduces service time by up to 88% compared with existing caching algorithms with a 20 Mb cache. ARC-H also reduces energy consumption by up to 81%.  相似文献   

3.
Due to the explosive increases of data from both the cyber and physical worlds, the demand for database support in embedded systems is increasing. Databases for embedded systems, or embedded databases, are expected to provide timely in situ data services under various resource constraints, such as limited energy. However, traditional buffer cache management schemes, in which the primary goal is to minimize the number of I/O operations, is problematic since they do not consider the constraints of modern embedded devices such as limited energy and distinctive underlying storage. In particular, due to asymmetric read/write characteristics of flash memory-based storage of modern embedded devices, minimum buffer cache misses neither coincide with minimum power consumption nor minimum I/O deadline misses. In this paper we propose a novel power- and time-aware buffer cache management scheme for embedded databases. A novel multi-dimensional feedback control architecture is proposed and the characteristics of underlying storage of modern embedded devices is exploited for the simultaneous support of the desired I/O power consumption and the I/O deadline miss ratio. We have shown through an extensive simulation that our approach satisfies both power and timing requirements in I/O operations under a variety of workloads while consuming significantly smaller buffer space than baseline approaches.  相似文献   

4.
Client cache is an important technology for the optimization of distributed and centralized storage systems. As a representative client cache system, the performance of CacheFiles is limited by transition faults. Furthermore, CacheFiles just supports a simple LRU policy with a tightly-coupled design. To overcome these limitations, we propose to employ Stable Set Model (SSM) to improve CacheFiles and design an enhanced CacheFiles, SAC. SSM assumes that data access can be decomposed to access on some stable sets, in which elements are always repeatedly accessed or not accessed together. Using SSM methods can improve the cache management and reduce the effect of transition faults. We also adopt loosely- coupled methods to design prefetch and replacement policies. We implement our scheme on Linux 2.6.32 and measure the execution time of the scheme with various file I/O benchmarks. Experiments show that SAC can significantly improve I/O performance and reduce execution time up to 84%0, compared with the existing CacheFiles.  相似文献   

5.
尹洋  刘振军  许鲁 《软件学报》2009,20(10):2752-2765
随着计算规模越来越大,网络存储系统应用领域越来越广泛,对网络存储系统I/O性能要求也越来越高.在存储系统高负载的情况下,采用低速介质在客户机和网络存储系统的I/O路径上作为数据缓存也变得具有实际的意义.设计并实现了一种基于磁盘介质的存储系统块一级的缓存原型D-Cache.采用两级结构对磁盘缓存进行管理,并提出了相应的基于块一级的两级缓存管理算法.该管理算法有效地解决了因磁盘介质响应速度慢而带来的磁盘缓存管理难题,并通过位图的使用消除了磁盘缓存写Miss时的Copy on Write开销.原型系统的测试结果表明,在存储服务器高负载的情况下,缓存系统能够有效地提高系统的整体性能.  相似文献   

6.
In this work, we develop energy-aware disk scheduling algorithm for soft real-time I/O. Energy consumption is one of the major factors which bar the adoption of hard disk in mobile environment. Heat dissipation of large scale storage system also calls for an energy-aware scheduling technique to further increase the storage density. The basic idea in this work is to properly determine the I/O burst size so that device can be in standby mode between consecutive I/O bursts and that it can satisfy the soft real-time requirement. We develop an elaborate model which incorporates the energy consumption characteristics, overhead of mode transition in determining the appropriate I/O burst size and the respective disk operating schedule. Efficacy of energy-aware disk scheduling algorithm greatly relies on not only disk scheduling algorithm itself but also various operating system and device firmware related concerns. It is crucial that the various operating system level and device level features need to be properly addressed within disk scheduling framework. Our energy-aware disk scheduling algorithm successfully addresses a number of outstanding issues. First, we examine the effect of OS and hard disk firmware level prefetch policy and incorporate its effect in our disk scheduling framework. Second, our energy aware scheduling framework can allocate a certain fraction of disk bandwidth to handle sporadically arriving non real-time I/O’s. Third, we examine the relationship between lock granularity of the buffer management and energy consumption. We develop a prototype software with energy-aware scheduling algorithm. In our experiment, proposed algorithm can reduce the energy consumption to one fourth if we use energy-aware disk scheduling algorithm. However, energy-aware disk scheduling algorithm increases buffer requirement significantly, e.g., from 4 to 140 KByte. We carefully argue that the buffer overhead is still justifiable given the cost of DRAM chip and importance of energy management in modern mobile devices. The result of our work not only provides the energy efficient scheduling algorithm but also provides an important guideline in capacity planning of future energy efficient mobile devices. This paper is funded by KOSEF through Statistical Research Paper for Complex System at Seoul National University.  相似文献   

7.
提高数据吞吐率和降低能耗对数据中心具有重要作用。Flash具有存储密度高、功耗低的特性。采用Flash作为磁盘存储的缓存来构建两级缓存结构的存储系统是提高数据吞吐率和降低能耗的有效方法之一。本文首先介绍了Flash、基于Flash文件系统等相关知识,其次详细阐述了Flash存储的三种应用模式及其结构特性,接着介绍了针对两级缓存结构的调度策略,最后对本文进行了总结和展望。  相似文献   

8.
Dynamic power management (DPM) and dynamic voltage scaling (DVS) are crucial techniques to reduce the energy consumption in embedded real-time systems. Many previous studies have focused on the energy consumption of the processor or I/O devices. In this paper, we focus on the problem of energy management integrating DVS and DPM techniques for periodic embedded real-time applications with rate monotonic (RM) policy and present a system level fixed priority energy-efficient scheduling (SLFPEES) algorithm. The SLFPEES algorithm consists of I/O device scheduling and job scheduling. I/O device scheduling is based on the dynamic power management with rate monotonic (DPM-RM) policy which puts devices into the sleep state when the idle interval is larger than devices break even time. Job scheduling is based on the RM policy and uses stack resource protocol (SRP) to guarantee exclusive access to the shared resources. For energy efficiency, the SLFPEES algorithm schedules the task with a lower speed and a higher speed. The experimental result shows that the SLFPEES algorithm can yield significantly energy savings with respect to the existing techniques.  相似文献   

9.
Memory devices can be used as storage systems to provide a lower latency that can be achieved by disk and flash storage. However, traditional buffered input/output (I/O) and direct I/O are not optimized for memory-based storages. Traditional buffered I/O includes a redundant memory copy with a disk cache. Traditional direct I/O does not support byte addressing. Memory-mapped direct I/O, which optimizes file operations for byte-addressable persistent memory and appears to the CPU as a main memory. However, it has an interface that is not always compatible with existing applications. In addition, it cannot be used for peripheral memory devices (e.g., networked memory devices and hardware RAM drives) that are not interfaced with the memory bus. This paper presents a new Linux I/O layer, byte direct I/O (BDIO), that can process byte-addressable direct I/O using the standard application programming interface. It requires no modification of existing application programs and can be used not only for the memory but also for the peripheral memory devices that are not addressable by a memory management unit. The proposed BDIO layer allows file systems and device drivers to easily support BDIO. The new I/O achieved 18% to 102% performance improvements in the evaluation experiments conducted with online transaction processing, file server, and desktop virtualization storage.  相似文献   

10.
Deduplication technology has been increasingly used to reduce storage costs. Though it has been successfully applied to backup and archival systems, existing techniques can hardly be deployed in primary storage systems due to the associated latency cost of detecting duplicated data, where every unit has to be checked against a substantially large fingerprint index before it is written. In this paper we introduce Leach, for inline primary storage, a self-learning in-memory fingerprints cache to reduce the writing cost in deduplication system. Leach is motivated by the characteristics of real-world I/O workloads: highly data skew exist in the access patterns of duplicated data. Leach adopts a splay tree to organize the on-disk fingerprint index, automatically learns the access patterns and maintains hot working sets in cachememory, with a goal to service a majority of duplicated data detection. Leveraging the working set property, Leach provides optimization to reduce the cost of splay operations on the fingerprint index and cache updates. In comprehensive experiments on several real-world datasets, Leach outperforms conventional LRU (least recently used) cache policy by reducing the number of cache misses, and significantly improves write performance without great impact to cache hits.  相似文献   

11.
Power consumption is an important issue for cluster supercomputers as it directly affects running cost and cooling requirements. This paper investigates the memory energy efficiency of high-end data servers used for supercomputers. Emerging memory technologies allow memory devices to dynamically adjust their power states and enable free rides by overlapping multiple DMA transfers from different I/O buses to the same memory device. To achieve maximum energy saving, the memory management on data servers needs to judiciously utilize these energy-aware devices. As we explore different management schemes under five real-world parallel I/O workloads, we find that the memory energy behavior is determined by a complex interaction among four important factors: (1) cache hit rates that may directly translate performance gain into energy saving, (2) cache populating schemes that perform buffer allocation and affect access locality at the chip level, (3) request clustering that aims to temporally align memory transfers from different buses into the same memory chips, and (4) access patterns in workloads that affect the first three factors.  相似文献   

12.
Proxy servers have been used to cache web objects to alleviate the load of the web servers and to reduce network congestion on the Internet. In this paper, a central video server is connected to a proxy server via wide area networks (WANs) and the proxy server can reach many clients via local area networks (LANs). We assume a video can be either entirely or partially cached in the proxy to reduce WAN bandwidth consumption. Since the storage space and the sustained disk I/O bandwidth are limited resources in the proxy, how to efficiently utilize these resources to maximize the WAN bandwidth reduction is an important issue. We design a progressive video caching policy in which each video can be cached at several levels corresponding to cached data sizes and required WAN bandwidths. For a video, the proxy server determines to cache a smaller amount of data at a lower level or to gradually accumulate more data to reach a higher level. The proposed progressive caching policy allows the proxy to adjust caching amount for each video based on its resource condition and the user access pattern. We investigate the scenarios in which the access pattern is priorly known or unknown and the effectiveness of the caching policy is evaluated.  相似文献   

13.
Contemporary operating systems for single-ISA (instruction set architecture) multi-core systems attempt to distribute tasks equally among all the CPUs. This approach works relatively well when there is no difference in CPU capability. However, there are cases in which CPU capability differs from one another. For instance, static capability asymmetry results from the advent of new asymmetric hardware, and dynamic capability asymmetry comes from the operating system (OS) outside noise caused from networking or I/O handling. These asymmetries can make it hard for the OS scheduler to evenly distribute the tasks, resulting in less efficient load balancing. In this paper, we propose a user-level load balancer for parallel applications, called the ’capability balancer’, which recognizes the difference of CPU capability and makes subtasks share the entire CPU capability fairly. The balancer can coexist with the existing kernel-level load balancer without detrimenting the behavior of the kernel balancer. The capability balancer can fairly distribute CPU capability to tasks with very little overhead. For real workloads like the NAS Parallel Benchmark (NPB), we have accomplished speedups of up to 9.8% and 8.5% in dynamic and static asymmetries, respectively. We have also experienced speedups of 13.3% for dynamic asymmetry and 24.1% for static asymmetry in a competitive environment. The impacts of our task selection policies, FIFO (first in, first out) and cache, were compared. The use of the cache policy led to a speedup of 5.3% in overall execution time and a decrease of 4.7% in the overall cache miss count, compared with the FIFO policy, which is used by default.  相似文献   

14.
Deduplication 通常在两个企业存储系统和云存储被使用了。克服性能挑战为选择恢复 deduplication 系统的操作, solid-state-drive-based (即,基于 SSD ) 读的缓存能为由缓冲加快被部署流行动态地恢复内容。不幸地,经常的数据更改由古典缓存计划导致了(例如, LRU 和 LFU ) 显著地弄短 SSD 一生当在 SSD 减慢 I/O 进程时。处理这个问题,我们建议新解决方案砍缓存极大地由扩大比例象 I/O 性能一样改进 SSD 的 write 耐久性长期流行(砍) 在写进基于 SSD 的缓存的数据之中的数据。砍缓存保留很长时间在 SSD 缓存砍数据减少的时期缓存代替的数字。而且,它在 deduplication 集装箱阻止不得人心或不必要的数据被写进 SSD 缓存。我们在一个原型 deduplication 系统实现了砍缓存评估它的性能。我们的试验性的结果显示砍缓存弄短潜伏选择与仅仅 deduplicated 数据的 5.56% 能力以小基于 SSD 的缓存的成本由 37.3% 的一般水准恢复。重要地,砍缓存由 9.77 的一个因素改进 SSD 一生。砍缓存为一个成本效率的基于 SSD 的读的缓存解决方案提供到的证据表演增加性能选择为 deduplication 恢复系统。  相似文献   

15.
非易失性存储器具有能耗低、可扩展性强和存储密度大等优势,可替代传统静态随机存取存储器作为片上缓存,但其写操作的能耗及延迟较高,在大规模应用前需优化写性能。提出一种基于缓存块重用信息的动态旁路策略,用于优化非易失性存储器的缓存性能。分析测试程序访问最后一级缓存(LLC)时的重用特征,根据缓存块的重用信息动态预测相应的写操作是否绕过非易失性缓存,利用预测表进行旁路操作完成LLC缺失时的填充,同时采用动态路径选择进行上级缓存写回操作,通过监控模块为旁路的缓存块选择合适的上级缓存,并将重用计数较高的缓存块填充其中以减少LLC写操作次数。实验结果表明,与未采用旁路策略的缓存设计相比,该策略使4核处理器中所有SPLASH-2程序的运行时间平均减少6.6%,缓存能耗平均降低22.5%,有效提高了整体缓存性能。  相似文献   

16.
Cloud computing is currently being explored by the scientific community to assess its suitability for High Performance Computing (HPC) environments. In this novel paradigm, compute and storage resources, as well as applications, can be dynamically provisioned on a pay-per-use basis. This paper presents a thorough evaluation of the I/O storage subsystem using the Amazon EC2 Cluster Compute platform and the recent High I/O instance type, to determine its suitability for I/O-intensive applications. The evaluation has been carried out at different layers using representative benchmarks in order to evaluate the low-level cloud storage devices available in Amazon EC2, ephemeral disks and Elastic Block Store (EBS) volumes, both on local and distributed file systems. In addition, several I/O interfaces (POSIX, MPI-IO and HDF5) commonly used by scientific workloads have also been assessed. Furthermore, the scalability of a representative parallel I/O code has also been analyzed at the application level, taking into account both performance and cost metrics. The analysis of the experimental results has shown that available cloud storage devices can have different performance characteristics and usage constraints. Our comprehensive evaluation can help scientists to increase significantly (up to several times) the performance of I/O-intensive applications in Amazon EC2 cloud. An example of optimal configuration that can maximize I/O performance in this cloud is the use of a RAID 0 of 2 ephemeral disks, TCP with 9,000 bytes MTU, NFS async and MPI-IO on the High I/O instance type, which provides ephemeral disks backed by Solid State Drive (SSD) technology.  相似文献   

17.
The file system, and the components of the computer system associated with it (disks, drums, channels, mass storage, tapes and tape drives, controllers, I/O drivers, etc.) comprise a very substantial fraction of most computer systems; substantial in several aspects, including amount of operating system code, expense for components, physical size and effect on performance. In a companion paper, we surveyed the traditional methods for optimizing the I/O system. We then examined disk and I/O system architecture in IBM type systems, and indicated shortcomings and future directions. In this paper we go one step further and summarize research by the author on two topics: cache disks and file migration. Cache disks are disks which have an associated cache which buffers recently used tracks of data. The case for cache disks is presented, and some of the issues are discussed. Parameter values for some aspects of the cache design are suggested. The second part of this paper summarizes the author's work on file migration, by which files are migrated between disk and mass storage as needed in order to effectively maintain on-line a much larger amount of information than the disks can hold. Some of the algorithms investigated are discussed, and the basic results are presented.  相似文献   

18.
The file system, and the components of the computer system associated with it (disks, drums, channels, mass storage tapes and tape drives, controllers, I/O drivers, etc.) comprise a very substantial fraction of most computer systems; substantial in several aspects, including amount of operating system code, expense for components, physical size and effect on performance. In a comparison paper, we surveyed the traditional methods for optimizing the I/O system. We then examined disk and I/O system architecture in IBM type systems, and indicated shortcomings and future directions. In this paper we go one step further and summarize research by the author on two topics: cache disks and file migration. Cache disks are disks which have an associated cache which buffers recently used tracks of data. The case for cache disks is presented, and some of the issues are discussed. Parameter values for some aspects of the cache design are suggested. The second part of this paper summarizes the author's work on file migration, by which files are migrated between disk and mass storage as needed in order to effectively maintain on-line a much larger amount of information than the disks can hold. Some of the algorithms investigated are discussed, and the basic results are presented.  相似文献   

19.
High processing speed is required to support computation intensive applications. Cache memory is used to improve processing speed by reducing the speed gap between the fast processing core and slow main memory. However, the problem of adopting cache into computing systems is twofold: cache is power hungry (that challenges energy constraints) and cache introduces execution time unpredictability (that challenges supporting real-time multimedia applications). Recently published articles suggest that using cache locking improves predictability. However, increased cache activities due to aggressive cache locking make the system consume more energy and become less efficient. In this paper, we investigate the impact of cache parameters and cache locking on power consumption and performance for real-time multimedia applications running on low-power devices. In this work, we consider Intel Pentium-like single-processor and Xeon-like multicore architectures, both with two-level cache memory hierarchy, using three popular multimedia applications: MPEG-4 (the global video coding standard), H.264/AVC (the network friendly video coding standard), and recently introduced H.265/HEVC (for improved video quality and data compression ratio). Experimental results show that cache locking mechanism added to an optimized cache memory structure is very promising to increase the performance/power ratio of low-power systems running multimedia applications. According to the simulation results, performance can be improved by decreasing cache miss rate down to 36 % and the total power consumption can be saved up to 33 %. It is also observed that H.265/HEVC has significant performance advantage over H.264/AVC (and MPEG-4) for smaller caches.  相似文献   

20.
The cache memory consumes a large proportion of the energy used by a processor. In the on-chip cache, the translation lookaside buffer (TLB) accounts for 20–50% of energy consumption of the on-chip cache. To reduce energy consumption caused by TLB accesses, a virtual cache can be accessed by virtual addresses which are issued by a processor directly. However, a virtual cache may result in the synonym problem. In this paper, we propose low-cost synonym detection hardware and a synonym data coherence mechanism. These reduce the energy consumption incurred by TLB lookups, and maintain synonym data consistency in the virtual cache. The proposed synonym detection hardware efficiently reduces the number of blocks that must be looked up in a virtual cache for saving energy. In addition, the proposed synonym data coherence mechanism also reduces the number of invalidated blocks in the virtual cache to prevent the destruction of cache locality. The simulation results show that our proposed energy-aware virtual cache consumes 51%, 27%, and 20% less energy than the traditional physical cache, traditional virtual cache, and synonym lookaside buffer (SLB), respectively. In addition, our proposed design shows almost the same static energy consumption as SLB, and reduces static energy consumption by about 20% compared with the traditional physical cache and virtual cache.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号