首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 921 毫秒
1.
Gaston is a peer-to-peer large-scale file system designed to provide a fault-tolerant and highly available file service for a virtually unlimited number of users. Data management in Gaston disseminates and stores replicas of files on multiple machines to achieve the requested level of data availability and uses a dynamic tree-topology structure to connect replication schema members. We present generic algorithms for replication schema creation and maintenance according to file user requirements and autonomous constraints that are set on individual nodes. We also show specific data object structure as well as mechanisms for secure and efficient update propagation among replicas with data consistency control. Finally, we introduce a scalable and efficient technique improving fault-tolerance of the tree-topology structure connecting replicas.  相似文献   

2.
Replica Placement Strategies in Data Grid   总被引:1,自引:0,他引:1  
Replication is a technique used in Data Grid environments that helps to reduce access latency and network bandwidth utilization. Replication also increases data availability thereby enhancing system reliability. The research addresses the problem of replication in Data Grid environment by investigating a set of highly decentralized dynamic replica placement algorithms. Replica placement algorithms are based on heuristics that consider both network latency and user requests to select the best candidate sites to place replicas. Due to dynamic nature of Grid, the candidate site holds replicas currently may not be the best sites to fetch replicas in subsequent periods. Therefore, a replica maintenance algorithm is proposed to relocate replicas to different sites if the performance metric degrades significantly. The study of our replica placement algorithms is carried out using a model of the EU Data Grid Testbed 1 [Bell et al. Comput. Appl., 17(4), 2003] sites and their associated network geometry. We validate our replica placement algorithms with total file transfer times, the number of local file accesses, and the number of remote file accesses.  相似文献   

3.
Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.  相似文献   

4.
Data Grid is a geographically distributed environment that deals with large-scale data-intensive applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Data replication is another key optimization technique for reducing access latency and managing large data by storing data in a wisely manner. In this paper two algorithms are proposed, first a novel job scheduling algorithm called Combined Scheduling Strategy (CSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in queue, the location of required data for the job and the computing capacity of sites. Second a dynamic data replication strategy, called the Modified Dynamic Hierarchical Replication Algorithm (MDHRA) that improves file access time. This strategy is an enhanced version of Dynamic Hierarchical Replication (DHR) strategy. Data replication should be used wisely because the storage capacity of each Grid site is limited. Thus, it is important to design an effective strategy for the replication replacement. MDHRA replaces replicas based on the last time the replica was requested, number of access, and size of replica. It selects the best replica location from among the many replicas based on response time that can be determined by considering the data transfer time, the storage access latency, the replica requests that waiting in the storage queue and the distance between nodes. The simulation results demonstrate the proposed replication and scheduling strategies give better performance compared to the other algorithms.  相似文献   

5.
In recent years, grid technology has had such a fast growth that it has been used in many scientific experiments and research centers. A large number of storage elements and computational resources are combined to generate a grid which gives us shared access to extra computing power. In particular, data grid deals with data intensive applications and provides intensive resources across widely distributed communities. Data replication is an efficient way for distributing replicas among the data grids, making it possible to access similar data in different locations of the data grid. Replication reduces data access time and improves the performance of the system. In this paper, we propose a new dynamic data replication algorithm named PDDRA that optimizes the traditional algorithms. Our proposed algorithm is based on an assumption: members in a VO (Virtual Organization) have similar interests in files. Based on this assumption and also file access history, PDDRA predicts future needs of grid sites and pre-fetches a sequence of files to the requester grid site, so the next time that this site needs a file, it will be locally available. This will considerably reduce access latency, response time and bandwidth consumption. PDDRA consists of three phases: storing file access patterns, requesting a file and performing replication and pre-fetching and replacement. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid projects. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage, total number of replications, hit ratio and percentage of storage filled.  相似文献   

6.
Data Grid is a geographically distributed environment that deals with large-scale data-intensive applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Data replication is another key optimization technique for reducing access latency and managing large data by storing data in a wisely manner. In this paper, two algorithms are proposed: first, a novel job scheduling algorithm called Combined Scheduling Strategy (CSS) that considers the number of jobs waiting in queue, the location of required data for the job, and computational capability; second, a dynamic data replication strategy called Dynamic Hierarchical Replication Algorithm (DHRA) that improves file access time. DHRA stores each replica in an appropriate site, i.e., appropriate site in the requested region that has the highest number of access for that particular replica. Also, it can minimize access latency by selecting the best replica when various sites hold replicas of datasets. The simulation results demonstrate the proposed replication and scheduling strategies give better performance compared to the other algorithms.  相似文献   

7.
Many current international scientific projects are based on large scale applications that are both computationally complex and require the management of large amounts of distributed data. Grid computing is fast emerging as the solution to the problems posed by these applications. To evaluate the impact of resource optimisation algorithms, simulation of the Grid environment can be used to achieve important performance results before any algorithms are deployed on the Grid. In this paper, we study the effects of various job scheduling and data replication strategies and compare them in a variety of Grid scenarios using several performance metrics. We use the Grid simulator , and base our simulations on a world-wide Grid testbed for data intensive high energy physics experiments. Our results show that scheduling algorithms which take into account both the file access cost of jobs and the workload of computing resources are the most effective at optimising computing and storage resources as well as improving the job throughput. The results also show that, in most cases, the economy-based replication strategies which we have developed improve the Grid performance under changing network loads.  相似文献   

8.
Cloud computing environment is getting more interesting as a new trend of data management. Data replication has been widely applied to improve data access in distributed systems such as Grid and Cloud. However, due to the finite storage capacity of each site, copies that are useful for future jobs can be wastefully deleted and replaced with less valuable ones. Therefore, it is considerable to have appropriate replication strategy that can dynamically store the replicas while satisfying quality of service (QoS) requirements and storage capacity constraints. In this paper, we present a dynamic replication algorithm, named hierarchical data replication strategy (HDRS). HDRS consists of the replica creation that can adaptively increase replicas based on exponential growth or decay rate, the replica placement according to the access load and labeling technique, and finally the replica replacement based on the value of file in the future. We evaluate different dynamic data replication methods using CloudSim simulation. Experiments demonstrate that HDRS can reduce response time and bandwidth usage compared with other algorithms. It means that the HDRS can determine a popular file and replicates it to the best site. This method avoids useless replications and decreases access latency by balancing the load of sites.  相似文献   

9.
The Data Grid provides massive aggregated computing resources and distributed storage space to deal with data-intensive applications. Due to the limitation of available resources in the grid as well as production of large volumes of data, efficient use of the Grid resources becomes an important challenge. Data replication is a key optimization technique for reducing access latency and managing large data by storing data in a wise manner. Effective scheduling in the Grid can reduce the amount of data transferred among nodes by submitting a job to a node where most of the requested data files are available. In this paper two strategies are proposed, first a novel job scheduling strategy called Weighted Scheduling Strategy (WSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in a queue, the location of the required data for the job and the computing capacity of the sites Second, a dynamic data replication strategy, called Enhanced Dynamic Hierarchical Replication (EDHR) that improves file access time. This strategy is an enhanced version of the Dynamic Hierarchical Replication strategy. It uses an economic model for file deletion when there is not enough space for the replica. The economic model is based on the future value of a data file. Best replica placement plays an important role for obtaining maximum benefit from replication as well as reducing storage cost and mean job execution time. So, it is considered in this paper. The proposed strategies are implemented by OptorSim, the European Data Grid simulator. Experiment results show that the proposed strategies achieve better performance by minimizing the data access time and avoiding unnecessary replication.  相似文献   

10.
File downloads make up a large percentage of the Internet traffic to satisfy various clients using distributed environments for their Cloud, Grid and Internet applications. In particular, the Cloud has become a popular data storage provider and users (individuals and corporates) are relying heavily on it to keep their data. Furthermore, most cloud data servers replicate their data storage infrastructures and servers at various sites to meet the overall high demands of their clients and increase availability. However, most of them do not use that replication to enhance the download performance per client. To make use of this redundancy and to enhance the download speed, we introduce a fast and efficient concurrent technique for downloading large files from replicated Cloud data servers and traditional FTP servers as well. The technique, DDFTP utilizes the availability of replicated files on distributed servers to enhance file download times through concurrent downloads of file blocks from opposite directions in the files. DDFTP does not require coordination between the servers and relies on the in-order and reliability features of TCP to provide fast file downloads. In addition, DDFTP offers efficient load balancing among multiple heterogeneous data servers with minimal overhead. As a result, we can maximize network utilization while maintaining efficient load balancing on dynamic environments where resources, current loads and operational properties vary dynamically. We implemented and evaluated DDFTP and experimentally demonstrated considerable performance gains for file downloads compared to other concurrent/parallel file/data download models.  相似文献   

11.
Parallel file systems are experiencing more and more applications from various fields. Various applications have different I/O workload characteristics, which have diverse requirements on accessing storage resources. However, parallel file systems often adopt the “one-size-fits-all” solution, which fails to meet specific application needs and hinders the full exploitation of potential performance. This paper presents a framework to enable dynamic file I/O path selection with fine granularity at runtime. The framework adopts a file handle-rich scheme to allow file systems choose corresponding optimizations to serve I/O requests. Consistency control algorithms are proposed to ensure data consistency while changing optimizations at runtime. One case study on our prototype shows that choosing proper optimizations can improve the I/O performance for small files and large files by up to 40 and 64.4 %, respectively. Another case study shows that the data prefetch performance for real-world application traces can be improved by up to 193 % by selecting correct prefetch patterns. Simulations in large-scale environment also show that our method is scalable and both the memory consumption and the consistency control overhead can be negligible.  相似文献   

12.
动态复制技术对于提高数据网格的性能是非常重要的。对目前的动态复制策略进行了综述,鉴于目前效果较好的动态复制策略均为单选址算法,对于延迟较大、分布较广的网格存在很大局限性。提出了三种多选址的动态复制策略,并将它们转化为经典的数学问题进行求解。在给出了多选址动态复制策略在远程教育资源管理中的应用后,在欧洲数据网格试验床1拓扑上进行了仿真实验,实验结果表明:与目前效果较好的选址策略相比,所提出的多选址策略对于减少网络负载和网络延迟效果显著。  相似文献   

13.
We describe the major features of the completely decentralized adaptive file system MELODY which was designed for realizing anintegrated system design for a distributed real-time system working in a hazardous and unpredictable environment. MELODY's adaptivity mechanisms are based on novel services rendered by the distributed operating system DRAGON SLAYER. The file system, in order to both meet real-time constraints and provide for high availability, allows for replicating, relocating, or deleting file copies. Such copies may also bepublic orprivate. At every site aLocal Task Scheduler tries to schedule the arriving critical tasks, based on the availability of resources at this site such that deadline failures are minimized. Depending on the deadline failure history, status changes as well as file replication, deletion, or relocation are analyzed and managed by the cooperatingLocal File Assigners. In order to analyze MELODY's real-time performance we report on simulation experiments in which its capability of minimizing deadline failures of time-critical tasks was compared to other file system models: an idealbest-case model, abaseline model with no file replication, a file system allowingonly for replication ofprivate copies, and a model which allows forreplication and relocation of public copies only. While the best-case is unrealistic for a distributed implementation, the other models embody only part of MELODY's mechanisms yet have the benefit of a considerably smaller communication overhead. We report on the distributed simulation results which unambiguously show MELODY's superior performance, in addition to the built-in sensitivity to changes in the environment. A DRAGON SLAYER/MELODY prototype has been completed in our labs in order to serve as a distributed real-time testbed in our future work with MELODY.This work was partially supported by IBM Endicott (research Agreement No. 6073-86) by the State of Michigan (IMR-87-146751), and by General Dynamics Land Systems (#DEY-605089).  相似文献   

14.
Data Grid has evolved to be the solution for data-intensive applications, such as High Energy Physics (HEP), astrophysics, and computational genomics. These applications usually have large input of data to be analyzed and these input data are widely replicated across Data Grid to improve the performance. The job scheduling performance on traditional computing jobs can be studied using queuing theory. However, with the addition of data transfer, the job scheduling performance is too complex to be modeled. In this research, we study the impact of data transfer on the performance of job scheduling in the Data Grid environment. We have proposed a parallel downloading system that supports replicating data fragments and parallel downloading of replicated data fragments, to improve the job scheduling performance. The performance of the parallel downloading system is compared with non-parallel downloading system, using three scheduling heuristics: Shortest Turnaround Time (STT), Least Relative Load (LRL) and Data Present (DP). Our simulation results show that the proposed parallel download approach greatly improves the Data Grid performance for all three scheduling algorithms, in terms of the geometric mean of job turnaround time. The advantage of parallel downloading system is most evident when the Data Grid has relatively low network bandwidth and relatively high computing power.  相似文献   

15.
Much of the past research on file migration and file replication has examined these two resource management strategies in isolation or in an environment where they do not work together. We establish through simulation that these two strategies can be utilized simultaneously to potentially provide significant performance benefits over a system without file migration or replication. File replication can be viewed as a natural extension to file migration, and thus, we derive a dynamic file replication policy based on an established file migration heuristic: a file is migrated (or replicated) whenever a reduction in total mean response time of the file requests currently in the affected storage sites can be achieved. Through our performance model, we use simulation to establish the conditions under which our file migration/replication policies are beneficial in a distributed file system  相似文献   

16.
Data replication is becoming a popular technology in many fields such as cloud storage, Data grids and P2P systems. By replicating files to other servers/nodes, we can reduce network traffic and file access time and increase data availability to react natural and man-made disasters. However, it does not mean that more replicas can always have a better system performance. Replicas indeed decrease read access time and provide better fault-tolerance, but if we consider write access, maintaining a large number of replications will result in a huge update overhead. Hence, a trade-off between read access time and write updating cost is needed. File popularity is an important factor in making decisions about data replication. To avoid data access fluctuations, historical file popularity can be used for selecting really popular files. In this research, a dynamic data replication strategy is proposed based on two ideas. The first one employs historical access records which are useful for picking up a file to replicate. The second one is a proactive deletion method, which is applied to control the replica number to reach an optimal balance between the read access time and the write update overhead. A unified cost model is used as a means to measure and compare the performance of our data replication algorithm and other existing algorithms. The results indicate that our new algorithm performs much better than those algorithms.  相似文献   

17.
Efficient data scheduling is becoming an important issue in distributed real-time applications that produce huge data sets. The Grid environment on which these applications may run seeks to harness the geographically distributed resources for the applications. Scheduling components should account for real-time measures of the applications and reduce communication overhead due to enormous data size experienced, especially in dissemination applications. In this study, we consider the data staging scheme to provide the dissemination of large-scale data sets for the distributed real-time applications. We propose a new path selection-based algorithm for optimizing a criterion that reflects the general satisfiability of the system. The algorithm adopts a blocking-time analysis method combined with a simple heuristic to explore the most likely regions of a search space. Two heuristics are provided for the algorithm to explore these regions of the search space. Simulation results show that the proposed algorithm together with either of the heuristic has higher performance compared to other algorithms in the literature. We also show by simulation that a new optimization criterion we proposed in this study is successful in improving the performance of the individual applications.  相似文献   

18.
Operating system support for a video-on-demand file service   总被引:1,自引:0,他引:1  
This paper describes the design and implementation of a continuous media file server intended for use in emerging video-on-demand applications. The main focus and contribution of the paper is in scheduling and admission-control algorithms for accessing the server's processor and storage resources. The scheduling algorithms support multiple classes of tasks with diverse performance requirements and allow for the co-existence of guaranteed real-time requests with sporadic, and unsolicited requests. The scheduler maintains performance guarantees for real-time streams in the presence of unpredictably varying non-real-time traffic while ensuring system stability even during overloads. A prototype video file server was implemented on an Intel 486 platform. Performance results show that a large number of streams can be supported, while maintaining efficient utilization of system resources.  相似文献   

19.
John J. Wallace 《Software》1983,13(4):385-387
DMERT (Duplex, Multi-Environment, Real Time) is a real-time operating system that supports reliable telecommunications applications. Although DMERT file systems are based on UNIXTM file systems, which suffer robustness drawbacks, DMERT file systems do not suffer these drawbacks. The DMERT file manager uses synchronous writes and file system audits to manage crash resistant file systems. The performance penalty for this crash resistance is minimal. This note describes DMERT's crash resistance policy and shows how UNIX and UNIX-like file systems can be made crash resistant without sacrificing performance.  相似文献   

20.
Lustre文件系统I/O锁的应用与优化   总被引:2,自引:1,他引:1       下载免费PDF全文
分布式文件系统需要有一种机制对来自各个客户端的并发访问进行控制,维护文件数据的一致性。锁是实现并发控制最流行的机制。研究了Lustre文件系统的分布式I/O范围锁的模型,并对它的各种应用进行了优化。介绍了Lustre分布式锁的基本概念,对基于锁实现数据客户端写回缓冲以及多写者文件大小的动态获取的算法进行了分析;提出自适应I/O锁策略,基于区间树的范围锁冲突检测优化策略以及客户端锁淘汰策略来增强Lustre锁服务的性能和扩展性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号