首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
We present the software library STXXL that is an implementation of the C++ standard template library (STL) for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/O‐efficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in academic and industrial environments for a range of problems including text processing, graph algorithms, computational geometry, Gaussian elimination, visualization, and analysis of microscopic images, differential cryptographic analysis, etc. The performance of STXXL and its applications are evaluated on synthetic and real‐world inputs. We present the design of the library, how its performance features are supported, and demonstrate how the library integrates with STL. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

2.
一种新型的能够防止两块磁盘失败的技术   总被引:3,自引:0,他引:3  
海量存储系统的建设是目前计算机系统最热门和发展最快的领域,存储系统的主要部分是在线存储系统。RAID(磁盘阵列)对于提升存储系统的效率、数据的高可靠性、防止数据破坏和业务停顿具有重大意义。目前实际应用中的RAID 1,RAID 0+1,RAID 4,RAID 5都只能防止单块磁盘的损坏,实际生产中已经出现了很多由于双盘损坏造成业务长时间停顿的事故。在介绍了通用的RAID级别的基础上,介绍了一种新型的对角线奇偶校验方法,结合水平奇偶校验,可以防止两块磁盘损坏。通过可靠的数学分析,可以看到该方法可以极大提高磁  相似文献   

3.
Issues and challenges in the performance analysis of real disk arrays   总被引:2,自引:0,他引:2  
The performance modeling and analysis of disk arrays is challenging due to the presence of multiple disks, large array caches, and sophisticated array controllers. Moreover, storage manufacturers may not reveal the internal algorithms implemented in their devices, so real disk arrays are effectively black-boxes. We use standard performance techniques to develop an integrated performance model that incorporates some of the complexities of real disk arrays. We show how measurement data and baseline performance models can be used to extract information about the various features implemented in a disk array. In this process, we identify areas for future research in the performance analysis of real disk arrays.  相似文献   

4.
Energy efficiency of data analysis systems has become a very important issue in recent times because of the increasing costs of data center operations. Although distributed streaming workloads have increasingly been present in modern data centers, energy‐efficient scheduling of such applications remains as a significant challenge. In this paper, we conduct an energy consumption analysis of data stream processing systems in order to identify their energy consumption patterns. We follow stream system benchmarking approach to solve this issue. Specifically, we implement Linear Road benchmark on six stream processing environments (S4, Storm, ActiveMQ, Esper, Kafka, and Spark Streaming) and characterize these systems' performance on a real‐world data center. We study the energy consumption characteristics of each system with varying number of roads as well as with different types of component layouts. We also use a microbenchmark to capture raw energy consumption characteristics. We observed that S4, Esper, and Spark Streaming environments had highest average energy consumption efficiencies compared with the other systems. Using a neural networkbased technique with the power/performance information gathered from our experiments, we developed a model for the power consumption behavior of a streaming environment. We observed that energy‐efficient execution of streaming application cannot be specifically attributed to the system CPU usage. We observed that communication between compute nodes with moderate tuple sizes and scheduling plans with balanced system overhead produces better power consumption behaviors in the context of data stream processing systems. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

5.
磁盘存储系统节能技术研究综述   总被引:1,自引:0,他引:1  
目前磁盘是构成存储系统的重要组成部分,在存储系统总能耗中磁盘能耗占了大部分.因此磁盘存储系统的高能耗问题受到越来越多研究人员的关注.综述了磁盘存储系统从磁盘到存储系统各个层次的能耗研究进展和现状,同时对各种典型节能方法从原理、实现机制和评测手段等诸方面进行了分析和讨论,并对比分析和总结了各种节能技术的适应环境.结合海量存储系统负载特征的复杂性和应用环境的复杂性等特点,指出了磁盘存储系统节能技术的未来研究方向.  相似文献   

6.
High performance computing (HPC) systems allow researchers and businesses to harness large amounts of computing power needed for solving complex problems. In such systems a job scheduler prioritizes the execution of jobs belonging to users of the system in a manner that allows the system to satisfy performance objectives for various groups of users while simultaneously making efficient use of available resources. Typically, system administrators have the responsibility of manually configuring or tuning the job scheduler such that the performance objectives of user groups as well as system‐level performance objectives are met. Modern job schedulers used in production systems are quite complex. Through detailed trace‐driven simulations, we show that manually tuning the configuration of production schedulers in an environment characterized by multiple performance objectives is very challenging and may not be feasible. To alleviate this problem, this paper describes a toolset that can help a system administrator to automatically configure a scheduler such that the performance objectives for various classes of users in the system as well as other system‐level performance objectives can be satisfied. A unique aspect of this work that differentiates it from the existing work on scheduler tuning is that it has been implemented to work with a widely used production scheduler. Furthermore, in contrast to the existing work it considers the challenging real‐world problem of delivering different levels of performance to different classes of users. System administrators can exploit the toolset to react quickly to changes in performance objectives and workload conditions. Case studies using synthetic and real HPC workloads demonstrate the effectiveness of the technique. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

7.
A number of recent technological trends have made data intensive applications such as continuous media (audio and video) servers a reality. These servers store and retrieve large volumes of data using magnetic disks. Servers consisting of multiple nodes and large arrays of heterogeneous disk drives have become a fact of life for several reasons. First, magnetic disks might fail. Failed disks are almost always replaced with newer disk models because the current technological trend for these devices is one of annual increase in both performance and storage capacity. Second, storage requirements are ever increasing, forcing servers to be scaled up progressively. In this study, we present a framework to enable parity-based data protection for heterogeneous storage systems and to compute their mean lifetime. We describe the tradeoffs associated with three alternative techniques: independent subservers, dependent subservers, and disk merging. The disk merging approach provides a solution for systems that require highly available secondary storage in environments that also necessitate maximum flexibility.  相似文献   

8.
RAID-VCR:一种能够承受三个磁盘故障的RAID结构   总被引:1,自引:0,他引:1  
提出了一种新RAID结构——RAID-VCR.这种结构仅需要3个额外的磁盘来保存校验信息,但是却能够承受任意模式的3个成员磁盘故障.与现有的其它RAID结构相比,RAID-VCR的容灾能力大幅提高,但是对磁盘空间利用率和系统吞吐量的影响却非常小.RAID-VCR的编码和解码过程都是基于简单的XOR操作,并且以明文方式保存了用户数据,从而可以高效地执行读操作.仿真实验结果表明,RAID-VCR的编码和解码性能较好,具有很好的应用前景.  相似文献   

9.
Mass storage systems (MSSs) play a key role in data‐intensive parallel computing. Most contemporary MSSs are implemented as redundant arrays of independent/inexpensive disks (RAID) in which commodity disks are tied together with proprietary controller hardware. The performance of such systems can be difficult to predict because most internal details of the controller behavior are not public. We present a systematic method for empirically evaluating MSS performance by obtaining measurements on a series of RAID configurations of increasing size and complexity. We apply this methodology to a large MSS at Ohio Supercomputer Center that has 16 input/output processors, each connected to four 8 + 1 RAID5 units and provides 128 TB of storage (of which 116.8 TB are usable when formatted). Our methodology permits storage‐system designers to evaluate empirically the performance of their systems with considerable confidence. Although we have carried out our experiments in the context of a specific system, our methodology is applicable to all large MSSs. The measurements obtained using our methods permit application programmers to be aware of the limits to the performance of their codes. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

10.
Simulation of real‐world traffic scenarios is widely needed in virtual environments. Different from many previous works on simulating vehicles or pedestrians separately, our approach aims to capture the realistic process of vehicle–pedestrian interaction for mixed traffic simulation. We model a decision‐making process for their interaction based on a gap acceptance judging criterion and then design a novel environmental feedback mechanism for both vehicles' and pedestrians' behavior‐control models to drive their motions. We demonstrate that our proposed method can soundly model vehicle–pedestrian interaction behaviors in a realistic and efficient manner and is convenient to be plugged into various traffic simulation systems. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

11.
Disk idle behavior has a significant impact on the energy efficiency of disk storage systems. For example, accurately predicting or extending the idle length experienced by disks can generate more potential opportunities to save energy. This paper employs a trace driven simulation to evaluate the impacts of different disk schedulers and queue length thresholds on the disk idle behavior. Experimental results give three implications: (1) Position based schedulers and long queue length thresholds can significantly reduce the maximal queue length and the average queue length. (2) Position based schedulers and long queue length thresholds can generate more idle periods which are shorter than 1 s, but they do not affect those long idle periods contained in the modern server workloads. (3) Disk idle periods demonstrate both self-similarity and weak long-range dependence, and the disk schedulers and queue length thresholds do impact the Hurst parameter and the correlation behavior of the workloads. The analysis results in this paper provide useful insights for designing and implementing energy efficient policies for the disk drive based storage systems.  相似文献   

12.
Server storage systems use numerous disks to achieve high performance, thereby consuming a significant amount of power. Intradisk parallelism can significantly reduce such systems' power consumption by letting disk drives exploit parallelism in the I/O request stream. By doing so, it's possible to match, and even surpass, a storage array's performance for these workloads using a single, high-capacity disk drive.  相似文献   

13.
In modern energy-saving replication storage systems, a primary group of disks is always powered up to serve incoming requests while other disks are often spun down to save energy during slack periods. However, since new writes cannot be immediately synchronized into all disks, system reliability is degraded. In this paper, we develop a high-reliability and energy-efficient replication storage system, named RERAID, based on RAID10. RERAID employs part of the free space in the primary disk group and uses erasure coding to construct a code cache at the front end to absorb new writes. Since code cache supports failure recovery of two or more disks by using erasure coding, RERAID guarantees a reliability comparable with that of the RAID10 storage system. In addition, we develop an algorithm, called erasure coding write (ECW), to buffer many small random writes into a few large writes, which are then written to the code cache in a parallel fashion sequentially to improve the write performance. Experimental results show that RERAID significantly improves write performance and saves more energy than existing solutions.  相似文献   

14.
Multimedia systems store and retrieve large amounts of data which require extremely high disk bandwidth and their performance critically depends on the efficiency of disk storage. However, existing magnetic disks are designed for small amounts of data retrievals geared to traditional operations; with speed improvements mainly focused on how to reduce seek time and rotational latency. When the same mechanism is applied to multimedia systems, overheads in disk I/O can result in dramatic deterioration in system performance. In this paper, we present a mathematical model to evaluate the performance of constant-density recording disks, and use this model to analyze quantitatively the performance of multimedia data request streams. We show that high disk throughput may be achieved by suitably adjusting the relevant parameters. In addition to demonstrating quantitatively that constant-density recording disks perform significantly better than traditional disks for multimedia data storage, a novel disk-partitioning scheme which places data according to their bandwidths is presented.  相似文献   

15.
We propose a efficient writeback scheme that enables guaranteeing throughput in high-performance storage systems. The proposed scheme, called de-fragmented writeback (DFW), reduces positioning time of storage devices in writing workloads, and thus enables fast writeback in storage systems. We consider both of storage media in designing DFW scheme; traditional rotating disk and emerging solid-state disks. First, sorting and filling holes methods are used for rotating disk media for the higher throughput. The scheme converts fragmented data blocks into sequential ones so that it reduces the number of write requests and unnecessary disk-head movements. Second, flash block aware clustering-based writeback scheme is used for solid-state disks considering the characteristics of flash memory. The experimental results show that our schemes guarantee system’s high throughput while guaranteeing data reliability.  相似文献   

16.
One of the major challenges in cloud computing and data centers is the energy conservation and emission reduction. Accurate prediction algorithms are essential for building energy efficient storage systems in cloud computing. In this paper, we first propose a Three-State Disk Model (3SDM), which can describe the service quality and energy consumption states of a storage system accurately. Based on this model, we develop a method for achieving energy conservation without losing quality by skewing the workload among the disks to transmit the disk states of a storage system. The efficiency of this method is highly dependent on the accuracy of the information predicting the blocks to be accessed and the blocks not be accessed in the near future. We develop a priori information and sliding window based prediction (PISWP) algorithm by taking advantage of the priori information about human behavior and selecting suitable size of sliding window. The PISWP method targets at streaming media applications, but we also check its efficiency on other two applications, news in webpage and new tool released. Disksim, an established storage system simulator, is applied in our experiments to verify the effect of our method for various users’ traces. The results show that this prediction method can bring a high degree energy saving for storage systems in cloud computing environment.  相似文献   

17.
Process checkpointing is a procedure which periodically saves the process states into stable storage. Most checkpointing facilities select hard disks for archiving. However, the disk seek time is limited by the speed of the read‐write heads, thus checkpointing process into a local disk requires extensive disk bandwidth. In this paper, we propose an approach that exploits the memory on idle workstations as a faster storage for checkpointing. In our scheme, autonomous machines which submit jobs to the computation server offer their physical memory to the server for job checkpointing. Eight applications are used to measure the remote memory performance in four checkpointing policies. Experimental results show that remote memory reduces at least 34.5 per cent of the overhead for sequential checkpointing and 32.1 per cent for incremental checkpointing. Additionally, to checkpoint a running process into a remote memory requires only 60 per cent of the local disk checkpoint latency time. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

18.
随着云存储的迅猛发展与大数据时代的来临,越来越多的存储系统开始采用纠删码技术,以保障数据的可靠性.在基于纠删码的存储系统中,一旦有磁盘出错,系统需根据其他磁盘里存储的冗余信息,重构所有失效数据.由于当前存储系统中绝大部分磁盘错误都是单磁盘错误,因此,如何快速地在单磁盘错误的情况下重构失效数据,已成为存储系统的研究热点.首先介绍了存储系统中基于纠删码的单磁盘错误重构优化方法的研究背景与研究意义,给出了纠删码的基本概念与定义,并分析了单磁盘错误重构优化的基本原理;接着归纳了现有的一些主流单磁盘错误重构方法的构造算法及其优缺点与适用范围,并分类介绍了一些用于优化单磁盘错误重构效率的新型纠删码技术;最后指出了存储系统中基于纠删码的磁盘错误重构方法的进一步研究方向.  相似文献   

19.
基于遗传算法的RAID磁盘阵列中磁盘负载均衡方法   总被引:3,自引:0,他引:3  
该文对RAID磁盘阵列内逻辑磁盘和物理磁盘之间的映射和负载关系进行分析,提出了一种基于遗传算法的物理磁盘间负载均衡方法,以提高磁盘阵列的吞吐量。仿真实验证明该方法是有效的。  相似文献   

20.
为了满足指数级增长的大数据存储需求,现代的分布式存储系统需要提供大容量的存储空间以及快速的存储服务.因此在主流的分布式存储系统中,均应用了纠删码技术以节约数据中心的磁盘成本,保证数据的可靠性,并且满足应用程序和客户端的快速存储需求.在实际应用中数据往往重要程度并不相同,对数据可用性要求不一,且不同磁盘的故障率和可靠性动...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号