期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

File replication,maintenance, and consistency management services in data grids

Chao-Tung Yang Chun-Pin Fu Ching-Hsien Hsu 《The Journal of supercomputing》2010,53(3):411-439

Data replication and consistency refer to the same data being stored in distributed sites, and kept consistent when one or more copies are modified. A good file maintenance and consistency strategy can reduce file access times and access latencies, and increase download speeds, thus reducing overall computing times. In this paper, we propose dynamic services for replicating and maintaining data in grid environments, and directing replicas to appropriate locations for use. To address a problem with the Bandwidth Hierarchy-based Replication (BHR) algorithm, a strategy for maintaining replicas dynamically, we propose the Dynamic Maintenance Service (DMS). We also propose a One-way Replica Consistency Service (ORCS) for data grid environments, a positive approach to resolving consistency maintenance issues we hope will strike a balance between improving data access performance and replica consistency. Experimental results show that our services are more efficient than other strategies. 相似文献

2.

Adaptive data replication strategy in cloud computing for performance improvement

Najme MANSOURI 《Frontiers of Computer Science》2016,10(5):925-935

Cloud computing is becoming a very popular word in industry and is receiving a large amount of attention from the research community. Replica management is one of the most important issues in the cloud, which can offer fast data access time, high data availability and reliability. By keeping all replicas active, the replicas may enhance system task successful execution rate if the replicas and requests are reasonably distributed. However, appropriate replica placement in a large-scale, dynamically scalable and totally virtualized data centers is much more complicated. To provide cost-effective availability, minimize the response time of applications and make load balancing for cloud storage, a new replica placement is proposed. The replica placement is based on five important parameters: mean service time, failure probability, load variance, latency and storage usage. However, replication should be used wisely because the storage size of each site is limited. Thus, the site must keep only the important replicas.We also present a new replica replacement strategy based on the availability of the file, the last time the replica was requested, number of access, and size of replica. We evaluate our algorithm using the CloudSim simulator and find that it offers better performance in comparison with other algorithms in terms of mean response time, effective network usage, load balancing, replication frequency, and storage usage. 相似文献

3.

基于多属性最优化的海洋监测数据副本布局策略

黄冬梅杜艳玲贺琪随宏运李瑶《计算机科学》2018,45(6):72-75, 104

数据的完整性和可靠性是保证其能被高效访问的关键,尤其是在云存储环境中,数据副本策略是影响系统性能和保障数据可用性的核心。从数据副本布局的角度,提出了基于多属性最优化的数据副本布局策略(Data Replica Layout Strategy based on Multiple Attribute Optimization,MAO-DRLS)。该策略根据数据的访问热度和存储节点的关键属性特点,为每个数据设置动态的副本数,并选择合适的节点对副本进行布局。实验表明,MAO-DRLS策略能够有效地提升数据副本的利用率,缩短系统的响应时间。相似文献

4.

Analyzing, modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments

Dawei Sun Guiran Chang Changsheng Miao Xingwei Wang 《The Journal of supercomputing》2013,66(1):193-228

Failures are normal rather than exceptional in cloud computing environments, high fault tolerance issue is one of the major obstacles for opening up a new era of high serviceability cloud computing as fault tolerance plays a key role in ensuring cloud serviceability. Fault tolerant service is an essential part of Service Level Objectives (SLOs) in clouds. To achieve high level of cloud serviceability and to meet high level of cloud SLOs, a foolproof fault tolerance strategy is needed. In this paper, the definitions of fault, error, and failure in a cloud are given, and the principles for high fault tolerance objectives are systematically analyzed by referring to the fault tolerance theories suitable for large-scale distributed computing environments. Based on the principles and semantics of cloud fault tolerance, a dynamic adaptive fault tolerance strategy DAFT is put forward. It includes: (i) analyzing the mathematical relationship between different failure rates and two different fault tolerance strategies, which are checkpointing fault tolerance strategy and data replication fault tolerance strategy; (ii) building a dynamic adaptive checkpointing fault tolerance model and a dynamic adaptive replication fault tolerance model by combining the two fault tolerance models together to maximize the serviceability and meet the SLOs; and (iii) evaluating the dynamic adaptive fault tolerance strategy under various conditions in large-scale cloud data centers and consider different system centric parameters, such as fault tolerance degree, fault tolerance overhead, response time, etc. Theoretical as well as experimental results conclusively demonstrate that the dynamic adaptive fault tolerance strategy DAFT has high potential as it provides efficient fault tolerance enhancements, significant cloud serviceability improvement, and great SLOs satisfaction. It efficiently and effectively achieves a trade-off for fault tolerance objectives in cloud computing environments. 相似文献

5.

Service replication taxonomy in distributed environments

Marwa F. Mohamed 《Service Oriented Computing and Applications》2016,10(3):317-336

Abstract Nowadays, most modern distributed environments, including service-oriented architecture (SOA), cloud computing, and mobile computing, support replication technologies in order to improve operational characteristics of the services provided. Unfortunately, replication requires additional computational resources and a longer design and deployment process to implement service adequately for a specific situation and to enable service providers to maintain high levels of service with a moderate number of replicas. This paper provides a comprehensive review of replication challenges, types, techniques, and algorithms in distributed environments such as SOA, cloud, and mobile. Moreover, the role of replication in enhancing several QoS attributes, including performance, availability, security, scalability, and reliability, is examined. The author believes that the proposed research will help researchers to easily apply and develop the service replication in distributed system. 相似文献

6.

云存储系统中基于分簇的数据复制策略

付雄贡晓杰王汝传《计算机工程与科学》2014,36(12):2296-2304

云存储技术已经成为当前互联网中共享存储和数据服务的基础技术,云存储系统普遍利用数据复制来提高数据可用性,增强系统容错能力和改善系统性能。提出了一种云存储系统中基于分簇的数据复制策略,该策略包括产生数据复制的时机判断、复制副本数量的决定以及如何放置复制所产生的数据副本。在放置数据副本时,设计了一种基于分簇的负载均衡副本放置方法。相关的仿真实验表明,提出的基于分簇的负载均衡副本放置方法是可行的,并且具有良好的性能。相似文献

7.

A dynamic data replication strategy using access-weights in data grids 总被引：2，自引：0，他引：2

Ruay-Shiung Chang Hui-Ping Chang 《The Journal of supercomputing》2008,45(3):277-295

Data grids deal with a huge amount of data regularly. It is a fundamental challenge to ensure efficient accesses to such widely distributed data sets. Creating replicas to a suitable site by data replication strategy can increase the system performance. It shortens the data access time and reduces bandwidth consumption. In this paper, a dynamic data replication mechanism called Latest Access Largest Weight (LALW) is proposed. LALW selects a popular file for replication and calculates a suitable number of copies and grid sites for replication. By associating a different weight to each historical data access record, the importance of each record is differentiated. A more recent data access record has a larger weight. It indicates that the record is more pertinent to the current situation of data access. A Grid simulator, OptorSim, is used to evaluate the performance of this dynamic replication strategy. The simulation results show that LALW successfully increases the effective network usage. It means that the LALW replication strategy can find out a popular file and replicates it to a suitable site without increasing the network burden too much.

Ruay-Shiung ChangEmail:

相似文献

8.

云存储中动态副本放置机制研究

王岩汪晋宽《计算机工程与科学》2017,39(9):1581-1587

数据副本管理是云计算系统管理的重要组成部分,在云计算系统的海量数据处理过程中,针对目前已知的数据存放与资源调度算法存在考虑副本动态性和可靠性的不足,提出了一种动态的副本放置机制。该机制基于区域结构,考虑数据处理时其副本的数量和放置位置,以及副本的产生对于内存和带宽等系统资源的开销:首先根据云存储中的副本信息,对被访问频率高且访问平均响应时间长的数据信息进行复制,并给出副本数量的计算方法;考虑缩小副本分布的节点选择范围,提出动态的副本放置算法——DRA,将一定范围内的节点根据提出的域的划分,进行放置筛选,以存放数据副本。实验结果表明,提出的动态放置机制不仅减少了低访问率副本对系统存储空间的浪费;同时也减少了高访问率副本所需跨节点的传输延迟,有效提高了云存储系统中的数据文件的访问效率、负载的均衡水平,以及云存储系统的可靠性和可用性。相似文献

9.

Hierarchical data replication strategy to improve performance in cloud computing

Najme MANSOURI Mohammad Masoud JAVIDI Behnam Mohammad Hasani ZADE 《Frontiers of Computer Science》2021,15(2):152501

Cloud computing environment is getting more interesting as a new trend of data management. Data replication has been widely applied to improve data access in distributed systems such as Grid and Cloud. However, due to the finite storage capacity of each site, copies that are useful for future jobs can be wastefully deleted and replaced with less valuable ones. Therefore, it is considerable to have appropriate replication strategy that can dynamically store the replicas while satisfying quality of service (QoS) requirements and storage capacity constraints. In this paper, we present a dynamic replication algorithm, named hierarchical data replication strategy (HDRS). HDRS consists of the replica creation that can adaptively increase replicas based on exponential growth or decay rate, the replica placement according to the access load and labeling technique, and finally the replica replacement based on the value of file in the future. We evaluate different dynamic data replication methods using CloudSim simulation. Experiments demonstrate that HDRS can reduce response time and bandwidth usage compared with other algorithms. It means that the HDRS can determine a popular file and replicates it to the best site. This method avoids useless replications and decreases access latency by balancing the load of sites. 相似文献

10.

云文件系统中纠删码技术的研究与实现

程振东栾钟治孟由李亮淑和荣杨婷婷钱德沛管刚陈伟《计算机科学与探索》2013,(4)

云文件系统凭借高性能、高扩展、高可用、易管理等特点,成为云存储和大数据的基础和核心。云文件系统一般采用完全副本技术来提升容错能力,提高数据资源的使用效率和系统性能。但完全副本的存储开销随着副本数目的增加呈线性增长,存储副本时造成额外的写带宽和数据管理开销。纠删码在没有增加过量的存储空间的基础上,通过合理的冗余编码来保证数据的高可靠性和可用性。研究了纠删码技术在云文件系统中的应用,从纠删码类型、编码对象、编码时机、数据更改、数据访问方式和数据访问性能等六个方面,对云文件系统中纠删码的设计进行了探究,以增强云文件系统的存储模型。在此基础上,设计并实现了纠删码原型系统,并通过实验证明了纠删码能有效地保障云文件系统的数据可用性,并且节省存储空间。相似文献

11.

基于模糊预测的数据复制优化模型的研究

王理想刘波林伟伟《微机发展》2013,(12):82-85,91

云数据处理系统中广泛采用了多数据副本复制技术,以防止数据丢失,如果数据复制的份数或位置不当,就会引起数据的可用性小于用户期望的数据可用性或存储空间的浪费（如复制份数过多）。针对该问题,经研究提出了一种基于模糊预测的数据复制优化模型,该模型由模糊预测模块和复制优化模块组成。模糊预测模块以节点信息（CPU信息、节点带宽信息、内存信息和硬盘信息）作为输入,预测出节点的可用性;复制优化模块把节点的可用性和用户期望的数据可用性作为输入,计算出在满足用户期望情况下数据复制的份数和位置。提出的复制优化模型能根据云数据存储系统中数据节点可用性实现动态的优化数据复制,能获得较高的存储性价比。模拟实验中基于模糊预测的数据复制优化模型策略需要的存储空间分别是Hadoop策略的42．62％,42．84％,但文件的平均可用性可达到88．69％,90．54％,表明提出的基于模糊预测的复制模型实现了在节省存储空间的同时保证了文件可用性。相似文献

12.

A novel dynamic network data replication scheme based on historical access record and proactive deletion

Zhe Wang Tao Li Naixue Xiong Yi Pan 《The Journal of supercomputing》2012,62(1):227-250

Data replication is becoming a popular technology in many fields such as cloud storage, Data grids and P2P systems. By replicating files to other servers/nodes, we can reduce network traffic and file access time and increase data availability to react natural and man-made disasters. However, it does not mean that more replicas can always have a better system performance. Replicas indeed decrease read access time and provide better fault-tolerance, but if we consider write access, maintaining a large number of replications will result in a huge update overhead. Hence, a trade-off between read access time and write updating cost is needed. File popularity is an important factor in making decisions about data replication. To avoid data access fluctuations, historical file popularity can be used for selecting really popular files. In this research, a dynamic data replication strategy is proposed based on two ideas. The first one employs historical access records which are useful for picking up a file to replicate. The second one is a proactive deletion method, which is applied to control the replica number to reach an optimal balance between the read access time and the write update overhead. A unified cost model is used as a means to measure and compare the performance of our data replication algorithm and other existing algorithms. The results indicate that our new algorithm performs much better than those algorithms. 相似文献

13.

Scalable replica selection based on node service capability for improving data access performance in edge computing environment

Li Chunlin Tang Jianhang Luo Youlong 《The Journal of supercomputing》2019,75(11):7209-7243

The replica strategies in traditional cloud computing often result in excessive resource consumption and long response time. In the edge cloud environment, if the replica node cannot be managed efficiently, it will cause problems such as low user’s access speed and low system fault tolerance. Therefore, this paper proposed replica creation and selection strategy based on the edge cloud architecture. The dynamic replica creation algorithm based on access heat (DRC-AH) and replica selection algorithms based on node service capability (DRS-NSC) were proposed. The DRC-AH uses data block as replication granularity and Grey Markov chain to dynamically adjust the number of replicas. After the replica is created, when client receives the user’s request, the DRS-NSC selects the best replica node to respond to the user. The experiments show that the proposed algorithms have significant advantages in prediction accuracy, user’s request response time, resource utilization, etc., and improve the performance of the system to a certain extent.

相似文献

14.

Usability of a cloud-based collaborative learning framework to improve learners’ experience

《Computers in human behavior》2015

Computer-Supported Collaborative Learning (CSCL) is concerned with how Information and Communication Technology (ICT) might facilitate learning in groups which can be co-located or distributed over a network of computers such as Internet. CSCL supports effective learning by means of communication of ideas and information among learners, collaborative access of essential documents, and feedback from instructors and peers on learning activities. As the cloud technologies are increasingly becoming popular and collaborative learning is evolving, new directions for development of collaborative learning tools deployed on cloud are proposed. Development of such learning tools requires access to substantial data stored in the cloud. Ensuring efficient access to such data is hindered by the high latencies of wide-area networks underlying the cloud infrastructures. To improve learners’ experience by accelerating data access, important files can be replicated so a group of learners can access data from nearby locations. Since a cloud environment is highly dynamic, resource availability, network latency, and learner requests may change. In this paper, we present the advantages of collaborative learning and focus on the importance of data replication in the design of such a dynamic cloud-based system that a collaborative learning portal uses. To this end, we introduce a highly distributed replication technique that determines optimal data locations to improve access performance by minimizing replication overhead (access and update). The problem is formulated using dynamic programming. Experimental results demonstrate the usefulness of the proposed collaborative learning system used by institutions in geographically distributed locations. 相似文献

15.

Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments

Najme MANSOURI 《Frontiers of Computer Science》2014,8(3):391-408

Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage. 相似文献

16.

An improved vertical fragmentation,allocation and replication for enhancing e‐learning in distributed database environment

P. Sathishkumar M. Gunasekaran 《Computational Intelligence》2021,37(1):253-272

E‐learning is the indispensable technique to educate huge number of people and students in short period of time with optimized usage of different kind of required resources. It is employed as a crucial teaching approach by almost all kind of educational institutions all around the world. Since e‐learning involves significant amount of resource utilization and cost, it requires some essential methodology to enhance the current system of e‐learning more efficient. The mere publication of the educational content in websites is not enough. It is very clear that, without applying suitable strategic models and concepts and establishing appropriate communication channels between contributors of e‐learning system, the educational goals cannot be achieved as we desired. Distributed database involves greater contribution in the field of cloud based e‐learning process. Basically, data replication is crucial decision of companies as database distribution can be achieved effectively by the method of database replication which generates the same copies of information called replicas. In this article, we analyze the supremacy of synergetic learning and concentrates on data replication's significance in cloud based learning system. Here we propose an excellent mechanism for data replication and enhancing the performance in terms optimized access and update of data by the determination of exact location of data through dynamic programming. The efficiency of proposed mechanism is clearly illustrated by experimental results. 相似文献

17.

A threshold-based dynamic data replication strategy

Mohammad Bsoul Ahmad Al-Khasawneh Yousef Kilani Ibrahim Obeidat 《The Journal of supercomputing》2012,60(3):301-310

Data replication is the creation and maintenance of multiple copies of the same data. Replication is used in Data Grid to enhance data availability and fault tolerance. One of the main objectives of replication strategies is reducing response time and bandwidth consumption. In this paper, a dynamic replication strategy that is based on Fast Spread but superior to it in terms of total response time and total bandwidth consumption is proposed. This is achieved by storing only the important replicas on the storage of the node. The main idea of this strategy is using a threshold to determine if the requested replica needs to be copied to the node. The simulation results show that the proposed strategy achieved better performance compared with Fast Spread with Least Recently Used (LRU), and Fast Spread with Least Frequently Used (LFU). 相似文献

18.

Replica Placement Strategies in Data Grid 总被引：1，自引：0，他引：1

Rashedur M. Rahman Ken Barker Reda Alhajj 《Journal of Grid Computing》2008,6(1):103-123

Replication is a technique used in Data Grid environments that helps to reduce access latency and network bandwidth utilization. Replication also increases data availability thereby enhancing system reliability. The research addresses the problem of replication in Data Grid environment by investigating a set of highly decentralized dynamic replica placement algorithms. Replica placement algorithms are based on heuristics that consider both network latency and user requests to select the best candidate sites to place replicas. Due to dynamic nature of Grid, the candidate site holds replicas currently may not be the best sites to fetch replicas in subsequent periods. Therefore, a replica maintenance algorithm is proposed to relocate replicas to different sites if the performance metric degrades significantly. The study of our replica placement algorithms is carried out using a model of the EU Data Grid Testbed 1 [Bell et al. Comput. Appl., 17(4), 2003] sites and their associated network geometry. We validate our replica placement algorithms with total file transfer times, the number of local file accesses, and the number of remote file accesses. 相似文献

19.

A model-based strategy for quantifying the impact of availability on the energy flow of data centers

Valentim Thiago Callou Gustavo 《The Journal of supercomputing》2021,77(3):2566-2589

The demand for higher computing power increases and, as a result, also leads to an increased demand for services hosted in cloud computing environments. It is known, for example, that in 2018 more than 4 billion people made daily access to these services through the Internet, corresponding to more than half of the world’s population. To support such services, these clouds are made available by large data centers. These systems are responsible for the increasing consumption of electricity, given the increasing number of accesses, increasing the demand for greater communication capacity, processing and high availability. Since electricity is not always obtained from renewable resources, the relentless pursuit of cloud services can have a significant environmental impact. In this context, this paper proposes an integrated and dynamic strategy that demonstrates the impact of the availability of data center architecture equipment on energy consumption. For this, we used the technique of modeling colored Petri nets (CPN), responsible for quantifying the cost, environmental impact and availability of the electricity infrastructure of the data centers under analysis. Such proposed models are supported by the developed tool, where data center designers do not need to know CPN to compute the metrics of interest. A case study was proposed to show the applicability of the proposed strategy. Significant results were obtained, showing an increase in system availability of 100%, with equivalents operating cost and environmental impact.

相似文献

20.

Combination of data replication and scheduling algorithm for improving data availability in Data Grids

Najme Mansouri Gholam Hosein Dastghaibyfard Ehsan Mansouri 《Journal of Network and Computer Applications》2013,36(2):711-722

Data Grid is a geographically distributed environment that deals with large-scale data-intensive applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Data replication is another key optimization technique for reducing access latency and managing large data by storing data in a wisely manner. In this paper two algorithms are proposed, first a novel job scheduling algorithm called Combined Scheduling Strategy (CSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in queue, the location of required data for the job and the computing capacity of sites. Second a dynamic data replication strategy, called the Modified Dynamic Hierarchical Replication Algorithm (MDHRA) that improves file access time. This strategy is an enhanced version of Dynamic Hierarchical Replication (DHR) strategy. Data replication should be used wisely because the storage capacity of each Grid site is limited. Thus, it is important to design an effective strategy for the replication replacement. MDHRA replaces replicas based on the last time the replica was requested, number of access, and size of replica. It selects the best replica location from among the many replicas based on response time that can be determined by considering the data transfer time, the storage access latency, the replica requests that waiting in the storage queue and the distance between nodes. The simulation results demonstrate the proposed replication and scheduling strategies give better performance compared to the other algorithms. 相似文献