首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 29 毫秒
1.
Data Grid is a geographically distributed environment that deals with large-scale data-intensive applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Data replication is another key optimization technique for reducing access latency and managing large data by storing data in a wisely manner. In this paper, two algorithms are proposed: first, a novel job scheduling algorithm called Combined Scheduling Strategy (CSS) that considers the number of jobs waiting in queue, the location of required data for the job, and computational capability; second, a dynamic data replication strategy called Dynamic Hierarchical Replication Algorithm (DHRA) that improves file access time. DHRA stores each replica in an appropriate site, i.e., appropriate site in the requested region that has the highest number of access for that particular replica. Also, it can minimize access latency by selecting the best replica when various sites hold replicas of datasets. The simulation results demonstrate the proposed replication and scheduling strategies give better performance compared to the other algorithms.  相似文献   

2.
The Data Grid provides massive aggregated computing resources and distributed storage space to deal with data-intensive applications. Due to the limitation of available resources in the grid as well as production of large volumes of data, efficient use of the Grid resources becomes an important challenge. Data replication is a key optimization technique for reducing access latency and managing large data by storing data in a wise manner. Effective scheduling in the Grid can reduce the amount of data transferred among nodes by submitting a job to a node where most of the requested data files are available. In this paper two strategies are proposed, first a novel job scheduling strategy called Weighted Scheduling Strategy (WSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in a queue, the location of the required data for the job and the computing capacity of the sites Second, a dynamic data replication strategy, called Enhanced Dynamic Hierarchical Replication (EDHR) that improves file access time. This strategy is an enhanced version of the Dynamic Hierarchical Replication strategy. It uses an economic model for file deletion when there is not enough space for the replica. The economic model is based on the future value of a data file. Best replica placement plays an important role for obtaining maximum benefit from replication as well as reducing storage cost and mean job execution time. So, it is considered in this paper. The proposed strategies are implemented by OptorSim, the European Data Grid simulator. Experiment results show that the proposed strategies achieve better performance by minimizing the data access time and avoiding unnecessary replication.  相似文献   

3.
Data Grid provides scalable infrastructure for storage resource and data files management, which supports several large scale applications. Due to limitation of available resources in grid, efficient use of the grid resources becomes an important challenge. Replication is a technique used in data grid to improve fault tolerance and to reduce the bandwidth consumption. This paper proposes a Dynamic Hierarchical Replication (DHR) algorithm that places replicas in appropriate sites i.e. best site that has the highest number of access for that particular replica. It also minimizes access latency by selecting the best replica when various sites hold replicas. The proposed replica selection strategy selects the best replica location for the users' running jobs by considering the replica requests that waiting in the storage and data transfer time. The simulated results with OptorSim, i.e. European Data Grid simulator show that DHR strategy gives better performance compared to the other algorithms and prevents unnecessary creation of replica which leads to efficient storage usage.  相似文献   

4.
Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.  相似文献   

5.
基于层次化调度策略和动态数据复制的网格调度方法   总被引:2,自引:0,他引:2  
针对在网格中如何有效地进行任务调度和数据复制, 以便减少任务执行时间等问题, 提出了任务调度算法(ISS)和优化动态数据复制算法(ODHRA), 并构建一个方案将两种算法进行了有效结合。该方案采用ISS算法综合考虑任务等待队列的数量、任务需求数据的位置和站点的计算容量, 采用网络结构分级调度的方式, 配以适当的权重系数计算综合任务成本, 搜索出最佳计算节点区域; 采用ODHRA算法分析数据传输时间、存储访问延迟、等待在存储队列中的副本请求和节点间的距离, 在众多的副本中选取出最佳副本位置, 再结合副本放置和副本管理, 从而降低了文件访问时间。仿真结果表明, 提出的方案在平均任务执行时间方面, 与其他算法相比表现出了更好的性能。  相似文献   

6.
Cloud computing is becoming a very popular word in industry and is receiving a large amount of attention from the research community. Replica management is one of the most important issues in the cloud, which can offer fast data access time, high data availability and reliability. By keeping all replicas active, the replicas may enhance system task successful execution rate if the replicas and requests are reasonably distributed. However, appropriate replica placement in a large-scale, dynamically scalable and totally virtualized data centers is much more complicated. To provide cost-effective availability, minimize the response time of applications and make load balancing for cloud storage, a new replica placement is proposed. The replica placement is based on five important parameters: mean service time, failure probability, load variance, latency and storage usage. However, replication should be used wisely because the storage size of each site is limited. Thus, the site must keep only the important replicas.We also present a new replica replacement strategy based on the availability of the file, the last time the replica was requested, number of access, and size of replica. We evaluate our algorithm using the CloudSim simulator and find that it offers better performance in comparison with other algorithms in terms of mean response time, effective network usage, load balancing, replication frequency, and storage usage.  相似文献   

7.
Cloud computing environment is getting more interesting as a new trend of data management. Data replication has been widely applied to improve data access in distributed systems such as Grid and Cloud. However, due to the finite storage capacity of each site, copies that are useful for future jobs can be wastefully deleted and replaced with less valuable ones. Therefore, it is considerable to have appropriate replication strategy that can dynamically store the replicas while satisfying quality of service (QoS) requirements and storage capacity constraints. In this paper, we present a dynamic replication algorithm, named hierarchical data replication strategy (HDRS). HDRS consists of the replica creation that can adaptively increase replicas based on exponential growth or decay rate, the replica placement according to the access load and labeling technique, and finally the replica replacement based on the value of file in the future. We evaluate different dynamic data replication methods using CloudSim simulation. Experiments demonstrate that HDRS can reduce response time and bandwidth usage compared with other algorithms. It means that the HDRS can determine a popular file and replicates it to the best site. This method avoids useless replications and decreases access latency by balancing the load of sites.  相似文献   

8.
Data replication is the creation and maintenance of multiple copies of the same data. Replication is used in Data Grid to enhance data availability and fault tolerance. One of the main objectives of replication strategies is reducing response time and bandwidth consumption. In this paper, a dynamic replication strategy that is based on Fast Spread but superior to it in terms of total response time and total bandwidth consumption is proposed. This is achieved by storing only the important replicas on the storage of the node. The main idea of this strategy is using a threshold to determine if the requested replica needs to be copied to the node. The simulation results show that the proposed strategy achieved better performance compared with Fast Spread with Least Recently Used (LRU), and Fast Spread with Least Frequently Used (LFU).  相似文献   

9.
Replica Placement Strategies in Data Grid   总被引:1,自引:0,他引:1  
Replication is a technique used in Data Grid environments that helps to reduce access latency and network bandwidth utilization. Replication also increases data availability thereby enhancing system reliability. The research addresses the problem of replication in Data Grid environment by investigating a set of highly decentralized dynamic replica placement algorithms. Replica placement algorithms are based on heuristics that consider both network latency and user requests to select the best candidate sites to place replicas. Due to dynamic nature of Grid, the candidate site holds replicas currently may not be the best sites to fetch replicas in subsequent periods. Therefore, a replica maintenance algorithm is proposed to relocate replicas to different sites if the performance metric degrades significantly. The study of our replica placement algorithms is carried out using a model of the EU Data Grid Testbed 1 [Bell et al. Comput. Appl., 17(4), 2003] sites and their associated network geometry. We validate our replica placement algorithms with total file transfer times, the number of local file accesses, and the number of remote file accesses.  相似文献   

10.
Data grids support access to widely distributed storage for large numbers of users accessing potentially many large files. Efficient access is hindered by the high latency of the Internet. To improve access time, replication at nearby sites may be used. Replication also provides high availability, decreased bandwidth use, enhanced fault tolerance, and improved scalability. Resource availability, network latency, and user requests in a grid environment may vary with time. Any replica placement strategy must be able to adapt to such dynamic behavior. In this paper, we describe a new dynamic replica placement algorithm, Popularity Based Replica Placement (PBRP), for hierarchical data grids which is guided by file “popularity”. Our goal is to place replicas close to clients to reduce data access time while still using network and storage resources efficiently. The effectiveness of PBRP depends on the selection of a threshold value related to file popularity. We also present Adaptive-PBRP (APBRP) that determines this threshold dynamically based on data request arrival rates. We evaluate both algorithms using simulation. Results for a range of data access patterns show that our algorithms can shorten job execution time significantly and reduce bandwidth consumption compared to other dynamic replication methods.  相似文献   

11.
Data replication and consistency refer to the same data being stored in distributed sites, and kept consistent when one or more copies are modified. A good file maintenance and consistency strategy can reduce file access times and access latencies, and increase download speeds, thus reducing overall computing times. In this paper, we propose dynamic services for replicating and maintaining data in grid environments, and directing replicas to appropriate locations for use. To address a problem with the Bandwidth Hierarchy-based Replication (BHR) algorithm, a strategy for maintaining replicas dynamically, we propose the Dynamic Maintenance Service (DMS). We also propose a One-way Replica Consistency Service (ORCS) for data grid environments, a positive approach to resolving consistency maintenance issues we hope will strike a balance between improving data access performance and replica consistency. Experimental results show that our services are more efficient than other strategies.  相似文献   

12.
In recent years, grid technology has had such a fast growth that it has been used in many scientific experiments and research centers. A large number of storage elements and computational resources are combined to generate a grid which gives us shared access to extra computing power. In particular, data grid deals with data intensive applications and provides intensive resources across widely distributed communities. Data replication is an efficient way for distributing replicas among the data grids, making it possible to access similar data in different locations of the data grid. Replication reduces data access time and improves the performance of the system. In this paper, we propose a new dynamic data replication algorithm named PDDRA that optimizes the traditional algorithms. Our proposed algorithm is based on an assumption: members in a VO (Virtual Organization) have similar interests in files. Based on this assumption and also file access history, PDDRA predicts future needs of grid sites and pre-fetches a sequence of files to the requester grid site, so the next time that this site needs a file, it will be locally available. This will considerably reduce access latency, response time and bandwidth consumption. PDDRA consists of three phases: storing file access patterns, requesting a file and performing replication and pre-fetching and replacement. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid projects. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage, total number of replications, hit ratio and percentage of storage filled.  相似文献   

13.
An optimal replication strategy for data grid systems   总被引:1,自引:0,他引:1  
Data access latency is an important metric of system performance in data grid. By means of efficient replication strategy, the amount of data transferred in a wide area network will decrease, and the average access latency of data will decrease ultimately. The motivation of our research is to solve the optimized replica distribution problem in a data grid; that is, the system should utilize many replicas for every data with storage constraints to minimize the average access latency of data. This paper proposes a model of replication strategy in federated data grid and gives the optimized solution. The analysis results and simulation results show that the optimized replication strategy proposed in this paper is superior to LRU caching strategy, uniform replication strategy, proportional replication strategy and square root replication strategy in terms of wide area network bandwidth requirement and in the average access latency of data.  相似文献   

14.
数据副本管理是云计算系统管理的重要组成部分,在云计算系统的海量数据处理过程中,针对目前已知的数据存放与资源调度算法存在考虑副本动态性和可靠性的不足,提出了一种动态的副本放置机制。该机制基于区域结构,考虑数据处理时其副本的数量和放置位置,以及副本的产生对于内存和带宽等系统资源的开销:首先根据云存储中的副本信息,对被访问频率高且访问平均响应时间长的数据信息进行复制,并给出副本数量的计算方法;考虑缩小副本分布的节点选择范围,提出动态的副本放置算法——DRA,将一定范围内的节点根据提出的域的划分,进行放置筛选,以存放数据副本。实验结果表明,提出的动态放置机制不仅减少了低访问率副本对系统存储空间的浪费;同时也减少了高访问率副本所需跨节点的传输延迟,有效提高了云存储系统中的数据文件的访问效率、负载的均衡水平,以及云存储系统的可靠性和可用性。  相似文献   

15.
云存储技术已经成为当前互联网中共享存储和数据服务的基础技术,云存储系统普遍利用数据复制来提高数据可用性,增强系统容错能力和改善系统性能。提出了一种云存储系统中基于分簇的数据复制策略,该策略包括产生数据复制的时机判断、复制副本数量的决定以及如何放置复制所产生的数据副本。在放置数据副本时,设计了一种基于分簇的负载均衡副本放置方法。相关的仿真实验表明,提出的基于分簇的负载均衡副本放置方法是可行的,并且具有良好的性能。  相似文献   

16.
基于蚂蚁算法的数据网格副本选择策略   总被引:3,自引:0,他引:3  
在分布着大量数据和计算能力的数据网格环境中,采用数据副本是提高网格应用可用性的重要方法。如何对数据网格中大量的数据副本进行优化选择是影响数据网格性能的重要因素。因此提出一种基于蚂蚁算法的数据网格副本选择策略,并在网格仿真器OptorSim中对该算法进行实现及性能分析。仿真实验结果表明该算法可以减少数据访问延迟及带宽消耗,并有效做到网格中存储节点间的负载平衡。  相似文献   

17.
Data replication techniques are used in data grid to reduce makespan, storage consumption, access latency and network bandwidth. Data replication enhances data availability and thereby increases the system reliability. There are two steps involved in data replication, namely, replica placement and replica selection. Replica placement involves identifying the best possible node to duplicate data based on network latency and user request. Replica selection involves selecting the best replica location to access the data for job execution in the data grid. Various replica placement and selection algorithms are available in the literature. These algorithms measure and analyze different parameters such as bandwidth consumption, access cost, scalability, execution time, storage consumption and makespan. In this paper, various replica placement and selection strategies along with their merits and demerits are discussed. This paper also analyses the performance of various strategies with respect to the parameters mentioned above. In particular, this paper focuses on the dynamic replica placement and selection strategies in the data grid environment.  相似文献   

18.
基于遗传算法的副本管理策略   总被引:2,自引:0,他引:2  
在数据网格环境中,为了提高数据的可靠性和降低用户对数据访问的延迟,广泛采用了数据复制技术;由于副本技术的引入,就会面临一个很直接的问题,即一个应用程序如何根据副本性能和访问特性,从一组副本中选择一个最佳副本,就是副本选择问题.针对数据复制技术中的关键技术:副本创建和副本选择,在价格机制模型和并行数据传输的基础上,采用了遗传算法实现副本的优化管理.最后使用网格模拟器OptorSim对算法进行测试分析,结果显示基于遗传算法的策略有更好的性能.  相似文献   

19.
Replica selection strategies in data grid   总被引:1,自引:0,他引:1  
Replication in Data Grids reduces access latency and bandwidth consumption. When different sites hold replicas of datasets, there is a significant benefit realized by selecting the best replica. By selecting the best replica, the access latency can be minimized. In this research, we propose two different replica selection techniques. To select the best replica from information gathered locally, a simple technique called the k-Nearest Neighbor (KNN) rule is exploited. The KNN rule selects the best replica for a file by considering previous file transfer logs indicating the history of the file and those nearby. We also propose a predictive technique to estimate the transfer time between sites. The predicted transfer time can be used as an estimate of transfer bandwidth of different sites that hold replica currently, and help in selecting the best replica among different sites. Simulation results demonstrate that the k-nearest algorithm shows a significant performance improvement over the traditional replica catalog based model. Besides, the neural network predictive technique estimates the transfer time among sites more accurately than the multi-regression model.  相似文献   

20.
Data replication is becoming a popular technology in many fields such as cloud storage, Data grids and P2P systems. By replicating files to other servers/nodes, we can reduce network traffic and file access time and increase data availability to react natural and man-made disasters. However, it does not mean that more replicas can always have a better system performance. Replicas indeed decrease read access time and provide better fault-tolerance, but if we consider write access, maintaining a large number of replications will result in a huge update overhead. Hence, a trade-off between read access time and write updating cost is needed. File popularity is an important factor in making decisions about data replication. To avoid data access fluctuations, historical file popularity can be used for selecting really popular files. In this research, a dynamic data replication strategy is proposed based on two ideas. The first one employs historical access records which are useful for picking up a file to replicate. The second one is a proactive deletion method, which is applied to control the replica number to reach an optimal balance between the read access time and the write update overhead. A unified cost model is used as a means to measure and compare the performance of our data replication algorithm and other existing algorithms. The results indicate that our new algorithm performs much better than those algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号