首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In recent years, grid technology has had such a fast growth that it has been used in many scientific experiments and research centers. A large number of storage elements and computational resources are combined to generate a grid which gives us shared access to extra computing power. In particular, data grid deals with data intensive applications and provides intensive resources across widely distributed communities. Data replication is an efficient way for distributing replicas among the data grids, making it possible to access similar data in different locations of the data grid. Replication reduces data access time and improves the performance of the system. In this paper, we propose a new dynamic data replication algorithm named PDDRA that optimizes the traditional algorithms. Our proposed algorithm is based on an assumption: members in a VO (Virtual Organization) have similar interests in files. Based on this assumption and also file access history, PDDRA predicts future needs of grid sites and pre-fetches a sequence of files to the requester grid site, so the next time that this site needs a file, it will be locally available. This will considerably reduce access latency, response time and bandwidth consumption. PDDRA consists of three phases: storing file access patterns, requesting a file and performing replication and pre-fetching and replacement. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid projects. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage, total number of replications, hit ratio and percentage of storage filled.  相似文献   

2.
Replica Placement Strategies in Data Grid   总被引:1,自引:0,他引:1  
Replication is a technique used in Data Grid environments that helps to reduce access latency and network bandwidth utilization. Replication also increases data availability thereby enhancing system reliability. The research addresses the problem of replication in Data Grid environment by investigating a set of highly decentralized dynamic replica placement algorithms. Replica placement algorithms are based on heuristics that consider both network latency and user requests to select the best candidate sites to place replicas. Due to dynamic nature of Grid, the candidate site holds replicas currently may not be the best sites to fetch replicas in subsequent periods. Therefore, a replica maintenance algorithm is proposed to relocate replicas to different sites if the performance metric degrades significantly. The study of our replica placement algorithms is carried out using a model of the EU Data Grid Testbed 1 [Bell et al. Comput. Appl., 17(4), 2003] sites and their associated network geometry. We validate our replica placement algorithms with total file transfer times, the number of local file accesses, and the number of remote file accesses.  相似文献   

3.
Gaston is a peer-to-peer large-scale file system designed to provide a fault-tolerant and highly available file service for a virtually unlimited number of users. Data management in Gaston disseminates and stores replicas of files on multiple machines to achieve the requested level of data availability and uses a dynamic tree-topology structure to connect replication schema members. We present generic algorithms for replication schema creation and maintenance according to file user requirements and autonomous constraints that are set on individual nodes. We also show specific data object structure as well as mechanisms for secure and efficient update propagation among replicas with data consistency control. Finally, we introduce a scalable and efficient technique improving fault-tolerance of the tree-topology structure connecting replicas.  相似文献   

4.
一种P2P环境下分布式文件存储系统的缓存策略   总被引:4,自引:1,他引:4  
在分布式文件存储系统中,缓存技术被广泛用于提高系统性能。论文针对P2P环境下分布式文件存储系统的特点,提出了一种兼顾用户访问效率和复本一致性的灵活的缓存策略,不同于目前已经存在的P2P存储系统,论文使用“阀值”来将文件区分为热点文件和非热点文件,并且只针对热点文件来做缓存,根据缓存空间的使用效率和不同的文件类型来设置不同的阀值使得缓存策略灵活而有效,论文对该策略进行了理论上的分析,然后通过Trace-Driven模拟的方法验证了该策略的可行性。  相似文献   

5.
IRM: Integrated File Replication and Consistency Maintenance in P2P Systems   总被引:1,自引:0,他引:1  
In peer-to-peer file sharing systems, file replication and consistency maintenance are widely used techniques for high system performance. Despite significant interdependencies between them, these two issues are typically addressed separately. Most file replication methods rigidly specify replica nodes, leading to low replica utilization, unnecessary replicas and hence extra consistency maintenance overhead. Most consistency maintenance methods propagate update messages based on message spreading or a structure without considering file replication dynamism, leading to inefficient file update and hence high possibility of outdated file response. This paper presents an Integrated file Replication and consistency Maintenance mechanism (IRM) that integrates the two techniques in a systematic and harmonized manner. It achieves high efficiency in file replication and consistency maintenance at a significantly low cost. Instead of passively accepting replicas and updates, each node determines file replication and update polling by dynamically adapting to time-varying file query and update rates, which avoids unnecessary file replications and updates. Simulation results demonstrate the effectiveness of IRM in comparison with other approaches. It dramatically reduces overhead and yields significant improvements on the efficiency of both file replication and consistency maintenance approaches.  相似文献   

6.
The introduction of software defined networking (SDN) has created an opportunity for file access services to get a view of the underlying network and to further optimize large data transfers. This opportunity is still unexplored while the amount of data that needs to be transferred is growing. Data transfers are also becoming more frequent as a result of interdisciplinary collaborations and the nature of research infrastructures. To address the needs for larger and more frequent data transfers, we propose an approach which enables file access services to use SDN. We extend the file access services developed in our earlier work by including network resources in the provisioning for large data transfers. A novel SDN-aware file transfer mechanism is prototyped for improving the performance and reliability of large data transfers on research infrastructure equipped with programmable network switches. Our results show that I/O and data-intensive scientific workflows benefit from SDN-aware file access services.  相似文献   

7.
This paper focuses on data-intensive workflows and addresses the problem of scheduling workflow ensembles under cost and deadline constraints in Infrastructure as a Service (IaaS) clouds. Previous research in this area ignores file transfers between workflow tasks, which, as we show, often have a large impact on workflow ensemble execution. In this paper we propose and implement a simulation model for handling file transfers between tasks, featuring the ability to dynamically calculate bandwidth and supporting a configurable number of replicas, thus allowing us to simulate various levels of congestion. The resulting model is capable of representing a wide range of storage systems available on clouds: from in-memory caches (such as memcached), to distributed file systems (such as NFS servers) and cloud storage (such as Amazon S3 or Google Cloud Storage). We observe that file transfers may have a significant impact on ensemble execution; for some applications up to 90 % of the execution time is spent on file transfers. Next, we propose and evaluate a novel scheduling algorithm that minimizes the number of transfers by taking advantage of data caching and file locality. We find that for data-intensive applications it performs better than other scheduling algorithms. Additionally, we modify the original scheduling algorithms to effectively operate in environments where file transfers take non-zero time.  相似文献   

8.
Data replication is becoming a popular technology in many fields such as cloud storage, Data grids and P2P systems. By replicating files to other servers/nodes, we can reduce network traffic and file access time and increase data availability to react natural and man-made disasters. However, it does not mean that more replicas can always have a better system performance. Replicas indeed decrease read access time and provide better fault-tolerance, but if we consider write access, maintaining a large number of replications will result in a huge update overhead. Hence, a trade-off between read access time and write updating cost is needed. File popularity is an important factor in making decisions about data replication. To avoid data access fluctuations, historical file popularity can be used for selecting really popular files. In this research, a dynamic data replication strategy is proposed based on two ideas. The first one employs historical access records which are useful for picking up a file to replicate. The second one is a proactive deletion method, which is applied to control the replica number to reach an optimal balance between the read access time and the write update overhead. A unified cost model is used as a means to measure and compare the performance of our data replication algorithm and other existing algorithms. The results indicate that our new algorithm performs much better than those algorithms.  相似文献   

9.
Providing a real-time cloud service requires simultaneously retrieving a large amount of data. How to improve the performance of file access becomes a great challenge. This paper first addresses the preconditions of dealing with this problem considering the requirements of applications, hardware, software, and network environments in the cloud. Then, a novel distributed layered cache system named HDCache is proposed. HDCahe is built on the top of Hadoop Distributed File System (HDFS). Applications can integrate the client library of HDCache to access the multiple cache services. The cache services are built up with three access layers an in-memory cache, a snapshot of the local disk, and a network disk provided by HDFS. The files loaded from HDFS are cached in a shared memory which can be directly accessed by the client library. In order to improve robustness and alleviate workload, the cache services are organized in a peer-to-peer style using a distributed hash table and every cached file has three replicas scattered in different cache service nodes. Experimental results show that HDCache can store files with a wide range in their sizes and has the access performance in a millisecond level under highly concurrent environments. The tested hit ratio obtained from a real-world cloud serviced is higher than 95 %.  相似文献   

10.
In this paper, we propose a role-based access control (RBAC) method for grid database services in open grid services architecture-data access and integration (OGSA-DAI). OGSA-DAI is an efficient grid-enabled middleware implementation of interfaces and services to access and control data sources and sinks. However, in OGSA-DAI, access control causes substantial administration overhead for resource providers in virtual organizations (VOs) because each of them has to manage a role-map file containing authorization information for individual grid users. To solve this problem, we used the community authorization service (CAS) provided by the globus toolkit to support the RBAC within the OGSA-DAI framework. The CAS grants the membership on VO roles to users. The resource providers then need to maintain only the mapping information from VO roles to local database roles in the role-map files, so that the number of entries in the role-map file is reduced dramatically. Furthermore, the resource providers control the granting of access privileges to the local roles. Thus, our access control method provides increased manageability for a large number of users and reduces day-to-day administration tasks of the resource providers, while they maintain the ultimate authority over their resources. Performance analysis shows that our method adds very little overhead to the existing security infrastructure of OGSA-DAI.  相似文献   

11.
Co-allocation architecture was developed to enable parallel transferring of files from multiple replicas stored in the different servers. Several co-allocation strategies have been coupled and used to exploit the different transfer rates among various client-server links and to address dynamic rate fluctuations by dividing files into multiple blocks of equal sizes. The paper presents a dynamic file transfer scheme, called dynamic adjustment strategy (DAS), for co-allocation architecture in concurrently transferring a file from multiple replicas stored in multiple servers within a data grid. The scheme overcomes the obstacle of transfer performance due to idle waiting time of faster servers in co-allocation based file transfers and, therefore, provides reduced file transfer time. A tool with user friendly interface that can be used to manage replicas and downloading in a data grid environment is also described. Experimental results show that our DAS can obtain high-performance file transfer speed and reduce the time cost of reassembling data blocks.  相似文献   

12.
副本技术是网格中提高数据访问和处理效率的关键技术。针对目前副本管理存在的局限性和亟待解决的一致性维护关键问题,以Globus提供的副本技术为基础,从副本创建与更新的角度出发,采用日志管理思想,提出了一种副本创建与一致性维护相结合的解决方案。  相似文献   

13.
File replication is a widely used technique for high performance in peer-to-peer content delivery networks. A file replication technique should be efficient and at the same time facilitates efficient file consistency maintenance. However, most traditional methods do not consider nodes’ available capacity and physical location in file replication, leading to high overhead for both file replication and consistency maintenance. This paper presents a proactive low-overhead file replication scheme, namely Plover. By making file replicas among physically close nodes based on nodes’ available capacities, Plover not only achieves high efficiency in file replication but also supports low-cost and timely consistency maintenance. It also includes an efficient file query redirection algorithm for load balancing between replica nodes. Theoretical analysis and simulation results demonstrate the effectiveness of Plover in comparison with other file replication schemes. It dramatically reduces the overhead of both file replication and consistency maintenance compared to other schemes. In addition, it yields significant improvements in reduction of overloaded nodes.  相似文献   

14.
Data grids support access to widely distributed storage for large numbers of users accessing potentially many large files. Efficient access is hindered by the high latency of the Internet. To improve access time, replication at nearby sites may be used. Replication also provides high availability, decreased bandwidth use, enhanced fault tolerance, and improved scalability. Resource availability, network latency, and user requests in a grid environment may vary with time. Any replica placement strategy must be able to adapt to such dynamic behavior. In this paper, we describe a new dynamic replica placement algorithm, Popularity Based Replica Placement (PBRP), for hierarchical data grids which is guided by file “popularity”. Our goal is to place replicas close to clients to reduce data access time while still using network and storage resources efficiently. The effectiveness of PBRP depends on the selection of a threshold value related to file popularity. We also present Adaptive-PBRP (APBRP) that determines this threshold dynamically based on data request arrival rates. We evaluate both algorithms using simulation. Results for a range of data access patterns show that our algorithms can shorten job execution time significantly and reduce bandwidth consumption compared to other dynamic replication methods.  相似文献   

15.
Cloud computing environment is getting more interesting as a new trend of data management. Data replication has been widely applied to improve data access in distributed systems such as Grid and Cloud. However, due to the finite storage capacity of each site, copies that are useful for future jobs can be wastefully deleted and replaced with less valuable ones. Therefore, it is considerable to have appropriate replication strategy that can dynamically store the replicas while satisfying quality of service (QoS) requirements and storage capacity constraints. In this paper, we present a dynamic replication algorithm, named hierarchical data replication strategy (HDRS). HDRS consists of the replica creation that can adaptively increase replicas based on exponential growth or decay rate, the replica placement according to the access load and labeling technique, and finally the replica replacement based on the value of file in the future. We evaluate different dynamic data replication methods using CloudSim simulation. Experiments demonstrate that HDRS can reduce response time and bandwidth usage compared with other algorithms. It means that the HDRS can determine a popular file and replicates it to the best site. This method avoids useless replications and decreases access latency by balancing the load of sites.  相似文献   

16.
为了有效地对文件复制进行管理,在资源网格中引入了超市模型,提出了基于超市模型的文件复制管理策略,利用超市模型与叠层复制相结合的方法来复制文件的副本,使文件复制过程中既有推力也有拉力,提高了整体系统效率。  相似文献   

17.
Diana Berbecaru  Antonio Lioy 《Software》2015,45(11):1457-1477
Since December 2009, the European Union Trusted Service Status Lists (TSLs) have been specified and adopted across European Union countries in order to enable the verification of digital signatures with legal values. This paper deals with the exploitation of TSLs in real digital services, other than electronic signatures, that is for certificate validation service. In particular, we used such lists in the service provided by the pan‐European Secure identTities acRoss boRders linKed identity management infrastructure in order to validate X.509 public key certificates. In addition, we propose an XML data structure to be used in conjunction with a TSL, in the form of a Trust Service Association (TrSA) file, to hold trust relationships between different services in a TSL. The TrSA file in conjunction with the TSLs may be used directly by the service providers or users to validate certificates. For the generation of the TSLs, we propose also a tool for automatic generation of the TSLs, named TSLGenerator. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

18.
为了简化文件系统的实现,支持超大规模数据集的流式访问,HDFS牺牲了文件的随机访问功能,而在实际场景中很多应用都需要对文件进行随机访问。在深入分析HDFS数据读写原理的基础上,提出了一种面向HDFS的数据随机访问方法。其设计思想是为Datanode添加本地数据访问接口,用户程序可以读取Datanode上存放的数据块文件以及把数据写入到Datanode上的数据块存放目录。文件的首副本由用户程序直接产生,其余副本在首副本写入完成之后采用数据复制的方式生成。此外,为数据块添加了权限管理功能,Datanode上的文件副本属于用户所有。若名字空间中文件权限发生变化,文件对应的数据块权限也会改变。测试表明,数据读取性能提升了约10%,数据写入性能提升了20%以上,在高并发下写入性能最大可提升2.5倍。  相似文献   

19.
Using a central file server is good for interactive access to files, because of the coherency implied by a centralized design. In fact, within local area networks, this is a common case. However, distributed environments in use today may exhibit round‐trip times on the order of 50 or 100 ms. This is a problem for interactive file access to a central file server because of the resulting access times. Although aggressive caching and loosely synchronized replicas may be used for distributed file access, there are cases where the better coherency provided by a central server is still desirable. In this paper, we present ZX, a distributed file system and protocol designed with latency in mind. It can use caching, but it does not require caching or batching to address latency issues. ZX relies on a novel channel‐based file system interface. It includes find requests and leverages streaming requests to work well under high‐latency conditions. Unlike other protocols designed for distributed access to a central server, ZX tolerates round‐trip times on the order of 50 or 100 ms to access a central file server for interactive usage such as compiling shared sources, running binaries, editing documents, and other similar workloads. It can be used on UNIX using a FUSE adaptor while permitting native ZX speakers to run faster.  相似文献   

20.
An optimal replication strategy for data grid systems   总被引:1,自引:0,他引:1  
Data access latency is an important metric of system performance in data grid. By means of efficient replication strategy, the amount of data transferred in a wide area network will decrease, and the average access latency of data will decrease ultimately. The motivation of our research is to solve the optimized replica distribution problem in a data grid; that is, the system should utilize many replicas for every data with storage constraints to minimize the average access latency of data. This paper proposes a model of replication strategy in federated data grid and gives the optimized solution. The analysis results and simulation results show that the optimized replication strategy proposed in this paper is superior to LRU caching strategy, uniform replication strategy, proportional replication strategy and square root replication strategy in terms of wide area network bandwidth requirement and in the average access latency of data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号