首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
数据副本管理是云计算系统管理的重要组成部分,在云计算系统的海量数据处理过程中,针对目前已知的数据存放与资源调度算法存在考虑副本动态性和可靠性的不足,提出了一种动态的副本放置机制。该机制基于区域结构,考虑数据处理时其副本的数量和放置位置,以及副本的产生对于内存和带宽等系统资源的开销:首先根据云存储中的副本信息,对被访问频率高且访问平均响应时间长的数据信息进行复制,并给出副本数量的计算方法;考虑缩小副本分布的节点选择范围,提出动态的副本放置算法——DRA,将一定范围内的节点根据提出的域的划分,进行放置筛选,以存放数据副本。实验结果表明,提出的动态放置机制不仅减少了低访问率副本对系统存储空间的浪费;同时也减少了高访问率副本所需跨节点的传输延迟,有效提高了云存储系统中的数据文件的访问效率、负载的均衡水平,以及云存储系统的可靠性和可用性。  相似文献   

2.
分析流行度分布、会话长度、负载均衡度等因素对系统性能的影响,在流媒体集群系统中提出一种多目标复制存储方法。将副本生成问题归纳为席位分配问题,给出最优副本生成算法,根据文件流行度服从Zipf-like分布的特点,启发式地将磁盘空间分配给各流媒体副本,以平衡副本负载。在此基础上,设计基于进化的全局最优的副本放置算法。仿真结果证明,该方法可在存储空间受限的情况下实现较低的拒绝率和较高的负载均衡度。  相似文献   

3.
The Data Grid provides massive aggregated computing resources and distributed storage space to deal with data-intensive applications. Due to the limitation of available resources in the grid as well as production of large volumes of data, efficient use of the Grid resources becomes an important challenge. Data replication is a key optimization technique for reducing access latency and managing large data by storing data in a wise manner. Effective scheduling in the Grid can reduce the amount of data transferred among nodes by submitting a job to a node where most of the requested data files are available. In this paper two strategies are proposed, first a novel job scheduling strategy called Weighted Scheduling Strategy (WSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in a queue, the location of the required data for the job and the computing capacity of the sites Second, a dynamic data replication strategy, called Enhanced Dynamic Hierarchical Replication (EDHR) that improves file access time. This strategy is an enhanced version of the Dynamic Hierarchical Replication strategy. It uses an economic model for file deletion when there is not enough space for the replica. The economic model is based on the future value of a data file. Best replica placement plays an important role for obtaining maximum benefit from replication as well as reducing storage cost and mean job execution time. So, it is considered in this paper. The proposed strategies are implemented by OptorSim, the European Data Grid simulator. Experiment results show that the proposed strategies achieve better performance by minimizing the data access time and avoiding unnecessary replication.  相似文献   

4.
Contemporary applications continuously modify large volumes of multidimensional data that must be accessed efficiently and, more importantly, must be updated in a timely manner. Single-server storage approaches are insufficient when managing such volumes of data, while the high frequency of data modification render classical indexing methods inefficient. To address these two problems we introduce a distributed storage manager for multidimensional data based on a Cluster-of-Workstations. The manager addresses the above challenges through a set of mechanisms that, through selective on-line data reorganization, collectively maintain a balanced load across a cluster of workstations. With the help of both a highly efficient and speedy self-tuning mechanism, based on a new data structure called stat-index, as well as a query aggregation and clustering algorithm, our storage manager attains short query response times even in the presence of massive modifications and highly skewed access patterns. Furthermore, we provide a data migration cost model used to determine the best data redistribution strategy. Through extensive experimentation with our prototype, we establish that our storage manager can sustain significant update rates with minimal overhead.  相似文献   

5.
ABSTRACT

With the exponential growth of network data storage scale, the issue of uniform distribution and efficient retrieval of data in the distributed storage systems such as the Redis cluster has received increasing attention in recent years. In view of the existing problems in scalability, usability and other aspects of the solution in current researches, we propose the distributed dynamic cuckoo filter system based on Redis cluster. On one hand, we introduce an efficient hash indexing structure–dynamic cuckoo filter (DCF), which only stores the fingerprint information of data, and has the automatically scalable capacity to meet the demand of data storage on a dynamic scale. On the other hand, we use an improved consistent hashing algorithm to construct Redis cluster and use the thorough communication mechanism of Redis cluster to achieve the data sharing and efficient utilisation of multi-machine filters. The scheme proposed in this paper can take the time and space efficiency into account, greatly promote the retrieval performance of massive data, and improve the reliability and availability of Redis cluster.  相似文献   

6.
云存储技术已经成为当前互联网中共享存储和数据服务的基础技术,云存储系统普遍利用数据复制来提高数据可用性,增强系统容错能力和改善系统性能。提出了一种云存储系统中基于分簇的数据复制策略,该策略包括产生数据复制的时机判断、复制副本数量的决定以及如何放置复制所产生的数据副本。在放置数据副本时,设计了一种基于分簇的负载均衡副本放置方法。相关的仿真实验表明,提出的基于分簇的负载均衡副本放置方法是可行的,并且具有良好的性能。  相似文献   

7.
随着计算机科学的发展和大数据时代的到来,应用系统已经出现了数据海量化、用户访问高量化的局面,使得企业应用系统的原有关系型数据库(RDBMS)面临承担更大负荷的压力,系统的高性能要求得不到有效满足,对于关系型数据库所面临的问题,Hadoop平台中的HBase数据库可有效解决。以关系型数据库中MySQL数据库及Hadoop平台中分布式数据库HBase数据库为研究基础,应对企业应用数据海量化增长,提出从关系型数据库(MySQL数据库)向分布式数据库(HBase数据库)进行数据迁移的方法,并通过研究HBase数据库存储原理提出从MySQL到HBase的表模式转换原则实现高效数据查询性能的数据迁移方法。最后,将该方法与同类数据迁移工具Sqoop进行比较,证明该方法进行数据迁移的便捷性和在迁移后数据库中进行连接查询的高效性。  相似文献   

8.
大规模分布式数据存储是云计算和大数据时代的重要支撑技术.在分布式存储系统中,数据副本如何放置是一个基本问题.然而,现有可实用的算法或忽略应用具体的访问特征而牺牲效率,或拘泥于单一应用而不具备泛化能力.通过建立副本存储策略的统一描述模型以及提取应用的关键访问特征参数,定义出副本存储策略自动生成算法的输出和输入;通过机器学习的方法获得访问特征参数和最优副本存储策略参数之间的一般性关系,从而形成自动生成机制的核心算法.在提高存储系统访问性能及节约能耗等成本的同时,有效降低副本存储策略设计过程中的人工干预程度.  相似文献   

9.
随着大数据应用的普及,高效可扩展的数据流操作在实时分析处理中扮演着越来越重要的角色.分布式并行处理架构是应对大流量、低延时数据流处理任务的一种有效解决方案.然而,在Key-based分组并行处理中,由于数据的倾斜分布及数据流本身的实时、动态和数据规模不可预知等特性,使得数据流分布并行处理系统存在持续且动态的负载不均衡现象,这会造成系统时效性降低、硬件资源浪费等问题.现有的研究工作处理均衡负载有两种方案:1)基于key粒度的迁移使得并行处理节点负载达到均衡,2)基于元组粒度级别的拆分,采用随机分发来使系统均衡.前者将系统调整至给定的均衡容忍范围内,类似于一维装箱的NP问题;后者对key的拆分势必带来新的为维护Key-based操作的正确性而增加的额外代价,如内存及网络通信成本.本文综合两种方法,提出对key按需拆分、尽量合并的方法,通过轻量级均衡调整算法以及保证Key-based操作特性的拆分方法,使系统既能达到后者的均衡,又能减少细粒度均衡所带来的额外代价.  相似文献   

10.
副本管理策略对于分布式存储系统的可用性、可靠性和系统整体性能有至关重要的作用。本文针对基于文件的动态副本调整策略的不足,提出了一种基于热点数据块的动态副本调整策略。根据时间局部性原理和数据访问规律,通过对历史访问周期和当前周期赋予不同的权重,数据块下一周期的预测进行访问频率计算,接下来基于计算出的预测访问频率对数据块进行热点判定。结合HDFS中数据访问规律近似二八定律的特点和热点数据块的判定结果,来确定数据块的调整阈值。最后,分3个步骤对基于热点数据块的动态副本调整策略进行性详细设计。实验结果表明,本文提出的基于热点数据块的动态副本调整策略在数据访问效率和集群存储资源利用率两方面有了明显提升。  相似文献   

11.
为了解决航空物流领域海量小文件存储效率和访问效率不高的问题,提出一种基于Nosql的海量小文件分布式多级存储方法,充分考虑到数据的时效性、本地性、操作的并发性以及文件之间的相关性,先根据相关性将文件合并,然后采用分布式多级存储,使用内存式Redis数据库做缓存,HDFS做数据的持久化存储,其过程采用预取机制。实验结果表明,该方法有效提高了小文件的存取效率和磁盘的利用率,显著地降低了网络的带宽占用和集群NameNode的内存消耗,适合解决航空领域海量小文件存储问题。  相似文献   

12.
电力系统是一个由多个子系统构成的综合性系统,作为一个能够实现海量数据处理同时具有高实时性、高可靠性的管理控制平台,需要电力系统能够实现对所辖多个子系统进行复杂、细密、大范围的访问控制,这些条件要求能够设计出合理有效的访问控制模型;为了实现安全、可靠、高效的电力系统访问控制提出了将传统电力系统同云存储平台相结合的访问控制方案,通过云存储平台对数据进行存取可以达到大数据量、均衡负载、安全可靠的目的;通过添加可信度因子构建访问控制模型,根据不同用户的可行度计算值分配给以不同的权限,匹配其可操作的资源,实现了对于用户操作对象的细化识别。  相似文献   

13.
In distributed systems, data may be correlated due to accesses from clients and the correlation has some impact on date placement, and existing research works focus on independent data objects. In this paper, we address both the scalability and the stability of the data placement solutions in internet environment. We first show that replica allocation decisions can be made locally for each replica site in a tree network, with data access knowledge of its neighbors. We then develop a new replication cost model for correlated data objects in Internet environment. Based on the cost model and the algorithms in previous research, we develop a distributed optimal replica allocation algorithm (DOPR) for correlated data in internet environment. A distributed heuristic algorithm (DHPR) is then developed to efficiently make replica placement decisions. The algorithm obtains sub-optimal solutions for the correlated data model and yields significant performance gains. Experimental studies show that the distributed heuristic allocation algorithm significantly outperforms the general frequency-based replication schemes (in which the replication decision of each data object is made based on the number of accesses on that data object).  相似文献   

14.
Cloud computing is becoming a very popular word in industry and is receiving a large amount of attention from the research community. Replica management is one of the most important issues in the cloud, which can offer fast data access time, high data availability and reliability. By keeping all replicas active, the replicas may enhance system task successful execution rate if the replicas and requests are reasonably distributed. However, appropriate replica placement in a large-scale, dynamically scalable and totally virtualized data centers is much more complicated. To provide cost-effective availability, minimize the response time of applications and make load balancing for cloud storage, a new replica placement is proposed. The replica placement is based on five important parameters: mean service time, failure probability, load variance, latency and storage usage. However, replication should be used wisely because the storage size of each site is limited. Thus, the site must keep only the important replicas.We also present a new replica replacement strategy based on the availability of the file, the last time the replica was requested, number of access, and size of replica. We evaluate our algorithm using the CloudSim simulator and find that it offers better performance in comparison with other algorithms in terms of mean response time, effective network usage, load balancing, replication frequency, and storage usage.  相似文献   

15.
一种基于负载均衡的分层副本定位方法   总被引:1,自引:0,他引:1  
提出一种基于负载均衡的分层副本定位方法――RepliLoc。从副本信息存储和副本定位计算2种负载考虑,将副本定位问题划分为社区层和社区联合层。分别采用应用层广播方式和基于P2P资源定位Tapestry前缀匹配路由机制,实现副本局部和全局定位。通过哈希和前缀匹配方法将副本信息均衡存放,实现了存储负载均衡,通过社区将副本定位计算局部化,实现了计算负载均衡。  相似文献   

16.
Ceph分布式存储系统正成为广泛使用的开源云环境存储解决方案。异构存储如果应用有效的数据管理策略,则能够在保持低成本的同时提供大容量和高性能存储。在Ceph中使用异构存储设备不能有效发挥异构存储设备的性能,由于数据的多个副本可以存放到不同的存储介质中,因此不同的副本组合的性能和成本都不一样。针对Ceph提出一种面向异构存储的数据放置方法,通过划分多种不同的副本组合,根据数据热度和读写比例将不同的数据放到不同的副本组合上,在提升系统性能的同时有效地控制了系统容量成本。  相似文献   

17.
宋宝燕  张洪梅  王妍  李琼 《计算机应用》2012,32(9):2496-2499
针对大规模智能电网中的监测数据具有海量性、实时性、动态性等特点,提出一种以数据为中心的支持大规模智能电网的数据存储方法:海量动态数据的分层扩展存储机制。首先,采用扩展哈希编码方法动态增加存储节点,避免突发、频发事件数据的丢失,增强系统的可用性;然后,采用多阈值级别方法将数据分散到多个存储节点上,避免出现存储热点问题,实现负载均衡。实验结果表明,分层扩展存储机制能够最大限度地满足海量数据的存储需求,获得较好的负载均衡,并且使总能耗最低,有效地延长了网络的生命周期。  相似文献   

18.
针对云资源弹性调度问题,结合Ceph数据存储的特点,提出一种基于Docker容器的云资源弹性调度策略。首先,指出Docker容器数据卷不能跨主机的特性给应用在线迁移带来了困难,并对Ceph集群的数据存储方法进行改进;然后,建立了一个基于节点综合负载的资源调度优化模型;最后,将Ceph集群和Docker容器的特点相结合,利用Docker Swarm实现了既考虑数据存储、又考虑集群负载的应用容器部署算法和应用在线迁移算法。实验结果表明,与一些调度策略相比,该调度策略对集群资源进行了更细粒度的划分,实现了云平台资源的弹性调度,并在保证应用性能的同时,达到了合理利用云平台资源和降低数据中心运营成本的目的。  相似文献   

19.
为保证访问负载的均衡分布,分布式存储系统往往依赖访问热度信息进行文件放置。然而,访问热度信息在文件存入系统时刻并不可知,并且随时间不断变化,依赖访问热度信息的放置算法需要不断调整文件的存储位置,产生高昂的迁移成本。本文提出一种细粒度均衡的新型分布式文件放置算法。该算法利用文件访问热度同已创建时间之间的相关性,通过保证各节点所存储数据量在创建时间维度上的细粒度相似性,实现较好的访问负载均衡。该算法仅基于文件的创建时间属性,该属性在文件存入系统时刻属于已知信息并且不随时间变化。实验结果表明,相较于HDFS系统的随机放置算法,本文算法能够更好地实现访问负载的均衡分布,提高访问性能。  相似文献   

20.
Cloud computing environment is getting more interesting as a new trend of data management. Data replication has been widely applied to improve data access in distributed systems such as Grid and Cloud. However, due to the finite storage capacity of each site, copies that are useful for future jobs can be wastefully deleted and replaced with less valuable ones. Therefore, it is considerable to have appropriate replication strategy that can dynamically store the replicas while satisfying quality of service (QoS) requirements and storage capacity constraints. In this paper, we present a dynamic replication algorithm, named hierarchical data replication strategy (HDRS). HDRS consists of the replica creation that can adaptively increase replicas based on exponential growth or decay rate, the replica placement according to the access load and labeling technique, and finally the replica replacement based on the value of file in the future. We evaluate different dynamic data replication methods using CloudSim simulation. Experiments demonstrate that HDRS can reduce response time and bandwidth usage compared with other algorithms. It means that the HDRS can determine a popular file and replicates it to the best site. This method avoids useless replications and decreases access latency by balancing the load of sites.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号