Combination of data replication and scheduling algorithm for improving data availability in Data Grids期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Combination of data replication and scheduling algorithm for improving data availability in Data Grids

Authors:	Najme Mansouri Gholam Hosein Dastghaibyfard Ehsan Mansouri

Affiliation:	1. Department of Computer Science and Engineering, Birjand University of Technology, Postal Code 97175-569, Birjand, Iran;2. Department of Computer Science and Engineering, College of Electerical and Computer Engineering, Shiraz University, MollaSadra Avenue, Shiraz, Iran;3. Department of Computer Science and Engineering, College of Electrical and Computer Engineering, Birjand University, Avini Avenue, Birjand, Iran

Abstract:	Data Grid is a geographically distributed environment that deals with large-scale data-intensive applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Data replication is another key optimization technique for reducing access latency and managing large data by storing data in a wisely manner. In this paper two algorithms are proposed, first a novel job scheduling algorithm called Combined Scheduling Strategy (CSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in queue, the location of required data for the job and the computing capacity of sites. Second a dynamic data replication strategy, called the Modified Dynamic Hierarchical Replication Algorithm (MDHRA) that improves file access time. This strategy is an enhanced version of Dynamic Hierarchical Replication (DHR) strategy. Data replication should be used wisely because the storage capacity of each Grid site is limited. Thus, it is important to design an effective strategy for the replication replacement. MDHRA replaces replicas based on the last time the replica was requested, number of access, and size of replica. It selects the best replica location from among the many replicas based on response time that can be determined by considering the data transfer time, the storage access latency, the replica requests that waiting in the storage queue and the distance between nodes. The simulation results demonstrate the proposed replication and scheduling strategies give better performance compared to the other algorithms.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏