首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The file system, and the components of the computer system associated with it (disks, drums, channels, mass storage tapes and tape drives, controllers, I/O drivers, etc.) comprise a very substantial fraction of most computer systems; substantial in several aspects, including amount of operating system code, expense for components, physical size and effect on performance. In a comparison paper, we surveyed the traditional methods for optimizing the I/O system. We then examined disk and I/O system architecture in IBM type systems, and indicated shortcomings and future directions. In this paper we go one step further and summarize research by the author on two topics: cache disks and file migration. Cache disks are disks which have an associated cache which buffers recently used tracks of data. The case for cache disks is presented, and some of the issues are discussed. Parameter values for some aspects of the cache design are suggested. The second part of this paper summarizes the author's work on file migration, by which files are migrated between disk and mass storage as needed in order to effectively maintain on-line a much larger amount of information than the disks can hold. Some of the algorithms investigated are discussed, and the basic results are presented.  相似文献   

2.
Three information retrieval storage structures are considered to determine their suitability for a World Wide Web search engine: The Wolverhampton Web Library — The Next Generation. The structures are an inverted file, signature file and Pat tree. A number of implementations are considered for each structure. For the index of an inverted file a sorted array, B-tree, B+-tree, trie and hash table are considered. For the signature file vertical and horizontal partitioning schemes are considered and for the Pat tree a tree and array implementation are considered. A theoretical comparison of the structures is done on seven criteria that include: response time, support for results ranking, search techniques, file maintenance, efficient use of disk space (including the use of compression), scalability and extensibility. The comparison reveals that an inverted file is the most suitable structure, unlike the signature file and Pat tree, which encounter problems with very large corpora.  相似文献   

3.
This paper presents the design and performance of SPIFFI, a scalable high-performance parallel file system intended for use by extremely I/O intensive applications including “Grand Challenge” scientific applications and multimedia systems. This paper contains experimental results from a SPIFFI prototype on a 64 node/64 disk Intel Paragon. The results show that SPIFFI provides high performance and linear scaleup on real hardware. The paper also explains how shared file pointers (i.e., file pointers that are shared by multiple processes) can simplify the design of a parallel application. By sequentializing I/O accesses and by providing dynamic I/O load balancing, a shared file pointer may even improve an application's performance. This paper also presents the predictions of a SPIFFI simulator that we validated using the prototype. The simulator results show that SPIFFI continues to provide high performance even when it is scaled to configurations with as many as 128 disks or 256 compute nodes  相似文献   

4.
The file system, and the components of the computer system associated with it (disks, drums, channels, mass storage, tapes and tape drives, controllers, I/O drivers, etc.) comprise a very substantial fraction of most computer systems; substantial in several aspects, including amount of operating system code, expense for components, physical size and effect on performance. In a companion paper, we surveyed the traditional methods for optimizing the I/O system. We then examined disk and I/O system architecture in IBM type systems, and indicated shortcomings and future directions. In this paper we go one step further and summarize research by the author on two topics: cache disks and file migration. Cache disks are disks which have an associated cache which buffers recently used tracks of data. The case for cache disks is presented, and some of the issues are discussed. Parameter values for some aspects of the cache design are suggested. The second part of this paper summarizes the author's work on file migration, by which files are migrated between disk and mass storage as needed in order to effectively maintain on-line a much larger amount of information than the disks can hold. Some of the algorithms investigated are discussed, and the basic results are presented.  相似文献   

5.
The file system, and the components of the computer system associated with it (disks, drums, channels, mass storage, tapes and tape drives, controllers, I/O drivers, etc.) comprise a very substantial fraction of most computer systems; substantial in several aspects including amount of operating system code, expense for components, physical size and effect on performance. In this paper we survey the state of the art file and I/O system design and optimization as it applies to large data processing installations. In a companion paper, some research results applicable to both current and future system designs are summarized.Among the topics we discuss is the optimization of current file systems, where some material is provided regarding block size choice, data set placement, disk arm scheduling, rotational scheduling, compaction, fragmentation, I/O multipathing and file data structures. A set of references to the literature, especially to analytic I/O system models, is presented. The general tuning of file and I/O systems is also considered. Current and forthcoming disk architectures are the second topic. The count key data architecture of current disks (e.g. IBM 3350, 3380) and the fixed block architecture of new products (IBM 3310, 3370) are compared. The use of semiconductor drum replacements is considered and some commercially available systems are briefly described.  相似文献   

6.
High-performance servers and high-speed networks will form the backbone of the infrastructure required for distributed multimedia information systems. A server for an interactive distributed multimedia system may require thousands of gigabytes of storage space and a high I/O bandwidth. In order to maximize the system utilization, and thus minimize the cost, it is essential that the load be balanced among each of the server's components, viz. the disks, the interconnection network and the scheduler. Many algorithms for maximizing retrieval capacity from the storage system have been proposed in the literature. This paper presents techniques for improving the server capacity by assigning media requests to the nodes of a server so as to balance the load on the interconnection network and the scheduling nodes. Five policies for request assignment-round-robin (RR), minimum link allocation (MLA), minimum contention allocation (MCA), weighted minimum link allocation (WMLA) and weighted minimum contention allocation (WMCA)-are developed. The performance of these policies on a server model developed by the authors (1995) is presented. We also consider the issue of file replication, and develop two schemes for storing the replicas: the parent group-based round-robin placement (PGBRRP) scheme, and the group-wide round-robin placement (GWRRP) scheme. The performance of the request assignment policies in the presence of file replication is presented  相似文献   

7.
I/O performance of an RAID-10 style parallel file system   总被引:1,自引:0,他引:1       下载免费PDF全文
Without any additional cost, all the disks on the nodes of a cluster can be connected together through CEFT-PVFS, an RAID-10 style parallel file system, to provide a multi-GB/s parallel I/O performance.I/O response time is one of the most important measures of quality of service for a client. When multiple clients submit data-intensive jobs at the same time, the response time experienced by the user is an indicator of the power of the cluster. In this paper, a queuing model is used to analyze in detail the average response time when multiple clients access CEFT-PVFS. The results reveal that response time is with a function of several operational parameters. The results show that I/O response time decreases with the increases in I/O buffer hit rate for read requests, write buffer size for write requests and the number of server nodes in the parallel file system, while the higher the I/O requests arrival rate, the longer the I/O response time. On the other hand, the collective power of a large cluster supported by CEFT-PVFS is shown to be able to sustain a steady and stable I/O response time for a relatively large range of the request arrival rate.  相似文献   

8.
One of the key components of a multiuser multimedia-on-demand system is the data server. Digitalization of traditionally analog data such as video and audio, and the feasibility of obtaining network bandwidths above the gigabit-per-second range, are two important advances that have made possible the realization, in the near future, of interactive distributed multimedia systems. Secondary-to-main memory I/O technology has not kept pace with advances in networking, main memory, and CPU processing power. Consequently, the performance of the server has a direct bearing on the overall performance of such a system. In this paper, we present a highperformance solution to the I/O retrieval problem in a distributed multimedia system. We develop a model for the architecture of a server for such a system. Parallelism of data retrieval is achieved by striping the data across multiple disks. We present the algorithms for server operation when servicing a constant number of streams, as well as the admission control policy for accepting requests for new streams. The performance of any server ultimately depends on the data access patterns. Two modifications of the basic retrieval algorithm are presented to exploit data access patterns in order to improve system throughput and response time. Finally, we present preliminary performance results of these algorithms on the IBM SP1 and Intel Paragon parallel computers.  相似文献   

9.
并行文件系统的设计   总被引:2,自引:0,他引:2  
孙凝晖 《计算机学报》1994,17(12):938-945
在大规模并行处理巨型机(MPP)的设计中,提高I/O性能同提高计算能力和通信能力同样重要。并行文件系统(PFS)在多个I/O结点的多个磁盘上,分布文件系统和文件的磁盘块,将文件读写在计算结点转化成多个对物理块的直接I/O请求,利用预读,预分配,磁盘缓冲式区和异步I/O增加I/O的并发生,在特定的文件使用模式下,也是MPP应用的主要I/O模式,获得很高的I/O效率。  相似文献   

10.
One problem with data-intensive computing facilitating is how to effectively manage massive amounts of data stored in a parallel I/O system. The file assignment method plays a significant role in data management. However, in the context of a parallel I/O system, most existing file assignment approaches share the following two limitations. First, most existing methods are designed for a non-partitioned file, while the file in a parallel I/O system is generally partitioned to provide aggregated bandwidth. Second, the file allocation metric, e.g. service time, of most existing methods is difficult to determine in practice, and also these metrics only reflect the static property of the file. In this paper, a new metric, namely file access density is proposed to capture the dynamic property of file access, i.e. disk contention property. Based on file access density definition, this paper introduces a new static file assignment algorithm named MinCPP and its dynamic version DMinCPP, both of which aim at minimizing the disk contention property. Furthermore MinCPP and DMinCPP take the file partition property into consideration by trying to allocate the partitions belonging to the same file onto different disks. By assuming file request arrival follows the Poisson process, we prove the effectiveness of the proposed schemes both analytically and experimentally. The MinCPP presented in this study can be applied to reorganize the files stored in a large-scale parallel I/O system and the DMinCPP can be integrated into file systems which dynamically allocate files in a batch.  相似文献   

11.
In petascale systems with a million CPU cores, scalable and consistent I/O performance is becoming increasingly difficult to sustain mainly because of I/O variability. The I/O variability is caused by concurrently running processes/jobs competing for I/O or a RAID rebuild when a disk drive fails. We present a mechanism that stripes across a selected subset of I/O nodes with the lightest workload at runtime to achieve the highest I/O bandwidth available in the system. In this paper, we propose a probing mechanism to enable application-level dynamic file striping to mitigate I/O variability. We implement the proposed mechanism in the high-level I/O library that enables memory-to-file data layout transformation and allows transparent file partitioning using subfiling. Subfiling is a technique that partitions data into a set of files of smaller size and manages file access to them, making data to be treated as a single, normal file to users. We demonstrate that our bandwidth probing mechanism can successfully identify temporally slower I/O nodes without noticeable runtime overhead. Experimental results on NERSC’s systems also show that our approach isolates I/O variability effectively on shared systems and improves overall collective I/O performance with less variation.  相似文献   

12.
分级存储系统通过将数据在不同性能设备间动态迁移以达到高性能.已有分级存储系统未能充分利用负载信息导致数据迁移严重影响应用性能.提出了一种分级存储系统中的数据自动迁移方法AutoMig,目标是提高前台应用的I?O性能.AutoMig综合文件访问历史、文件大小、设备利用情况等参数,对文件进行动态分级,并使用LRU队列维护快速存储设备中的文件状态;挖掘关联文件用于自动预取;针对不同文件迁移操作采取不同的速率控制策略.对降级操作,根据负载变化动态调整迁移速率,对回迁操作则采取尽力而为的策略.在分级存储系统中的应用表明,与已有方法相比,AutoMig有效缩短了前台I?O响应时间.  相似文献   

13.
This paper studies workfile disk management for concurrent mergesorts ina multiprocessor database system. Specifically, we examine the impacts of workfile disk allocation and data striping on the average mergesort response time. Concurrent mergesorts in a multiprocessor system can creat severe I/O interference in which a large number of sequential write requests are continuously issued to the same workfile disk and block other read requests for a long period of time. We examine through detailed simulations a logical partitioning approach to workfile disk management and evaluate the effectiveness of datastriping. The results show that (1) without data striping, the best performance is achieved by using the entire workfile disks as a single partition if there are abundant workfile disks (or system workload is light); (2) however, if there are limited workfile disks (or system workload is heavy), the workfile disks should be partitioned into multiple groups and the optimal partition size is workload dependent; (3) data striping is beneficial only if the striping unit size is properly chosen; and (4) with a proper striping size, the best performance is generally achieved by using the entire disks as a single logical partition.  相似文献   

14.
Gfarm Grid file system is a global distributed file system to share data and to support distributed data-intensive computing. It federates local file systems on compute nodes to maximize distributed file I/O bandwidth, and allows to store multiple file replicas in any location to avoid read access concentration of hot files. Data location aware process scheduling improves the file I/O performance of distributed data-intensive computing. This paper discusses the design and implementation of the Gfarm Grid file system, and reports the performance.  相似文献   

15.
并行文件系统自适应的文件条带化技术   总被引:1,自引:0,他引:1       下载免费PDF全文
研究并行文件系统自适应的文件条带(Striping)策略对改进文件访问性能的影响,并开发动态的文件条带分析模型,利用自动访问模式分类和实时文件系统性能数据为文件条带策略选择模糊逻辑规则库,来优化文件访问性能。研究结果表明:当文件系统负载低时,可以尽量将文件分布到所有磁盘上来最小化I/O的反馈时间;反之,在系统负载高时,使文件分布的范围小一些以便最大化文件系统整体的吞吐量。并通过实验给出了请求大小、请求宽度、请求到达率与系统性能的相互关系,实证了自适应规则库的正确性。  相似文献   

16.
Existing SSD technology exploits the properties of NAND flash and leverages NAND flash with a controller running FTL algorithms to improve system performance. On one hand, however, in this black-box-modeled structure, data semantic information is hard to be transferred and interpreted by conventional interfaces. Hence, SSD firmware fails to make full use of the performance potential of SSD by utilizing semantic information. Moreover, the host cannot obtain physical characteristics and statistical information about SSD, failing to be used by the file system or I/O scheduling algorithm designed for the disks. On the other hand, in SSD-based storage systems, persistent data are stored in the NAND flash and however manipulated in DRAM, causing the decoupled inefficiency. The data being closer to the processors are much easier to be lost due to the volatile property of DRAM, leading to serious data reliability problems. What’s more, restrictive read/program granularity and out-of-place updates limit the performance while flash suffers from small size operations.In order to address these problems, we propose a user-visible solid-state storage system with software-defined fusion methods for PCM and NAND flash. PCM is used for improving data reliability and reducing the write amplification of NAND flash as PCM shows some outstanding features, such as in-place updates, byte-addressable, non-volatile properties and better endurance. In this system, we manage the storage device as user-visible structure rather than black-box-modeled structure. In detail, we expose the number of channels, erase counts and data distribution of PCM/NAND flash to the host and design FTL algorithm closer to file system to obtain more semantic information of data accessing. PCM can be software-defined as the same level storage or buffer of NAND flash to reduce the WA (Write Amplification) of NAND flash and improve the data reliability. Moreover, some key software components (such as FTL, I/O scheduling and buffer management) are also reconfigurable and operated easily combined with physical characteristics. To achieve these design goals, we implement a Host Fusion Storage Layer (HFSL) and redesign the lengthy I/O path. Applications or filesystem can access PCM/flash directly via provided interfaces by HFSL without passing traditional I/O subsystem. Moreover, we provide the system management software to make the storage system can be easily software-defined by the upper-level system. We implement our software-defined fusion storage system in our actual hardware prototype and extensive experimental results demonstrate the efficiency of the proposed schemes.  相似文献   

17.
服务器端文件系统不仅需要很大的容量,而且要为大量并发访问提供很高的I/O性能。该文提出一种把多个物理文件系统通过软件集成为一个逻辑文件系统的技术,很好地聚合了各个文件系统所在磁盘设备的带宽和容量,综合了不同文件系统在元数据和数据处理性能上的优势。性能测试结果表明,逻辑文件系统技术是一种构造支持高度并发访问的高性能文件系统的有效方法。  相似文献   

18.
We consider the complexity of the general information retrieval system design problem and multiattribute file systems based upon the multiple key hashing (MKH) design problem. We first show that the problem of designing an optimal multiattribute file system is NP-hard. The performance formula for multiattribute file systems based upon the MKH method is derived. We also show that the design problem for a multiattribute file system based upon the MKH method is related to the prime number problem. We show that the problem of designing optimal multiattribute files based upon the MKH method can be reduced to finding minimal N-tuples, which was discussed by Chang, Lee and Du. We further present a very efficient method for designing good multiple key hashing functions in the case where the number of buckets is a power of a prime number. We also propose a heuristic algorithm to design good multiple key hashing functions in general.  相似文献   

19.
为了有效提高搜索引擎检索服务系统的整体性能,提出了一种基于倒排文件索引的缓存机制优化方法。具体研究过程是:首先分析倒排文件缓存的体系结构和数据加载,接着讨论负载数据对倒排文件缓存和缓存替换算法的影响,最后通过设计仿真实验研究倒排文件的缓存优化。研究结果表明,采用倒排文件索引的缓存机制优化方法可以明显减少磁盘系统I/O访问次数,提高磁盘系统带宽的利用率。  相似文献   

20.
近年来随着云计算市场规模不断增长,作为云计算平台基础设施的云存储系统也随之显得越来越重要。数以万计的互联网应用已经运行于云计算环境,同时大量不同的应用也即将从传统运行环境转移到云计算平台。不同的互联网应用的存储需求可能不一样。例如:应用中涉及的单个文件大小,文件数量,IO访问模式,读写比率等,都对底层存储系统提出了不同的要求。这说明在云计算环境中,单个文件系统可能无法满足全部应用的存储需求,本文尝试通过在单一云计算平台中部署多个不同分布式文件系统来优化存储系统的总体性能。为了优化混合式文件系统的性能,首先需要分析不同文件系统的性能特征。本文通过量化方法分析了云计算环境下几个常用的分布式文件系统,这些文件系统分别是ceph,moosefs,glusterfs和hdfs。实验结果显示:即使针对同一文件的相同读写操作,不同分布式文件系统之间的性能也差异显著,当单个文件的大小小于256MB时,moosefs的平均写性能比其它几个文件系统高22.3%;当单个文件大小大于256KB时,glusterfs的平均读性能比其它几个文件系统高21.0%。这些结果为设计和实现一个基于以上几个分布式文件系统的混合式文件系统提供了基础。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号