首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
李晓恺  代翔  李文杰  崔喆 《计算机应用》2012,32(8):2150-2158
为了让Hadoop分布式文件系统(HDFS)达到更高的存储效率以及更加优化的负载均衡能力,针对HDFS的多副本存储技术提出了改进方案——Noah。Noah引入了编码和译码模块,对HDFS中的block进行编码分解,生成更多数量的数据分片(section),并随机地分散保存到集群当中,替代原有系统的多副本容灾策略;在集群出现节点失效的情况下,通过收集与失效block相关的任意70%左右的section进行原始数据的恢复;同时根据分布式集群运行情况以及对副本数目需求的不同采用动态副本策略。通过相关的集群实验,表明Noah在容灾效率、负载均衡、存储成本以及安全性上对HDFS作了相应的优化。  相似文献   

2.
WWW集群服务器的数据副本分布方式研究   总被引:7,自引:0,他引:7  
为了有效地提高WWW服务器的吞吐能力、反应速度和可扩展性,国际上许多著名站点纷纷转向采用WWW集群服务器来替代原有的单一主机服务器.采用不同副本分布方式的WWW集群服务器,其数据可靠性也有所不同.对不同数据副本分布方式进行探讨,同时,论证了最优副本分布方案.  相似文献   

3.
副本和代理策略是目前在VoD服务中应用最广泛的两种主流技术方案,都是保证终端用户VoD服务实时流畅的有效手段,但以往的研究都局限于研究单一策略对系统性能的提升。本文提出了一种VoD集群文件副本加代理的复合策略,集群中的服务器作为文件副本服务器存在,而同时又具有内容代理服务器的功能,并建立了相应的数学模型。仿真实验表明,相比单一的策略,副本结合代理的复合策略可以减少30%以上的点播延时,满足系统的负载均衡需求,较大程度地减少重定向引起的延时,提高系统的准入概率。  相似文献   

4.
HDFS 文件系统通过多副本备份的方式解决数据损坏或丢失的问题,但是随着存储系统内容增多,在数据量级很大的时候,这种容灾方案消耗的额外存储空间是实际存储内容的数倍,不利于系统资源长期积累。文章提出使用纠删码编/解码文件代替 HDFS 的副本备份容灾策略,在保证数据安全性的前提下大大提高了存储空间利用率,降低存储额外消耗。  相似文献   

5.
传统容灾系统后台存储采用灾备中心直接磁盘存储的方法,该方法存在集中存储带来的一系列问题,如存储数据易损坏、存储能力无法在线扩展以及随着磁盘容量的增大存储性能会不断下降等。针对这些问题,设计并实现了一种支持集群存储的容灾系统。通过在容灾系统后台部署GlusterFS分布式文件系统,实现了数据的分布式和副本化存储,数据存储的安全性、可扩展性和存储性能均得到了极大的改进,很好地克服了上述问题。  相似文献   

6.
VOD服务器集群中,副本放置及调度策略是影响服务质量的关键因素之一.综合考虑视频的热度、会话长度以及视频码率等变量,提出了一种基于Erlang-B公式的副本放置以及调度算法.仿真试验表明,该算法降低了请求拒绝率,同时也使请求拒绝率在不同视频之间分布得更加均衡,可使服务器集群提供较优的服务质量.  相似文献   

7.
针对传统远程文件备份系统,备份数据存储在单节点服务器存在的存储空间受限、多用户情况下的读写性能以及备份数据单副本问题,提出了一种基于HDFS的远程文件备份系统的设计方案。将用户的备份数据分布式存储于多台不同的数据存储服务器,元数据存储在单独的控制服务器。该存储架构可以有效解决单存储服务器存储空间受限的问题,改善面对多用户并发读性能问题,提供了文件多副本存储策略,并且该系统增强了备份文件存储的安全性。  相似文献   

8.
副本管理是提高网格性能、降低客户端延迟的有效手段。针对副本部署策略问题,提出基于模拟退火算法的副本部署策略,给出优化模型及算法。OptorSim的仿真结果表明,该策略可减少作业对文件请求的响应时间,提高系统的整体性能。目前该策略已在大庆油田海量数据中心副本部署中得到成功应用。  相似文献   

9.
可用性是衡量服务器集群的重要指标,在基于马尔可夫过程的服务器状态转移模型基础上,推出根据单台服务器的可用性计算服务器集群可用性的形式化方法,建立了单服务器可用性、服务单元可用性和整个服务器集群可用性之间的量化关系,进而得到多副本分布集群的数据可用性公式,本文的成果有效支持给定服务结点可用性参数的约束下准确估计服务器集群的可用性,对于从可用性角度辅助用户进行集群设计具有理论价值和实际意义.  相似文献   

10.
《信息与电脑》2021,(1):191-193
在医疗技术革新越来越快的当下,医疗行为对信息技术的依赖程度日益增加。医院的数据中心由初期的不断添置物理服务器和交换机,发展为开始搭建虚拟服务器集群,形成私有云计算中心的模式。快速增加的业务系统和服务器数量不断向数据保护和容灾中心的建设提出挑战。传统的数据保护手段早已无法满足容灾需求,越来越多的医院开始参照两地三中心容灾模型建立容灾机制,但在标准的两地三中心模型面前,医院数据中心需要做大量底层改造和网络调整。在此背景下,郴州市第一人民医院以标准的两地三中心模型为雏形,通过梳理系统之间的耦合关系,搭建了一套适合医院数据中心容灾的构架。  相似文献   

11.
Ceph分布式存储系统正成为广泛使用的开源云环境存储解决方案。异构存储如果应用有效的数据管理策略,则能够在保持低成本的同时提供大容量和高性能存储。在Ceph中使用异构存储设备不能有效发挥异构存储设备的性能,由于数据的多个副本可以存放到不同的存储介质中,因此不同的副本组合的性能和成本都不一样。针对Ceph提出一种面向异构存储的数据放置方法,通过划分多种不同的副本组合,根据数据热度和读写比例将不同的数据放到不同的副本组合上,在提升系统性能的同时有效地控制了系统容量成本。  相似文献   

12.
周婧  王意洁  李思昆 《软件学报》2007,18(6):1456-1467
针对大量数据副本所带来的资源管理问题,提出一种基于有限编码的多副本分簇管理方法.在该方法中,根据单副本复制产生新副本的过程对副本分级和分簇,通过定义"副本级别+副本顺序"的编码规则对划分后的副本进行编码和组织,并依据编码规则对由于副本的动态调整(增加或撤消)而引起的簇的动态变化进行有效管理.通过该方法,在大量副本之间建立局域集中、广域对等的管理模式,再结合定义的"最小更新传播时间"可以降低大量副本的一致性维护开销.讨论了方法中编码规则与副本规模之间的关系,以及副本失效和恢复时的解决方法.性能测试结果表明,该方法能够有效组织大规模的数据副本,具有较好的可扩展性,对适度的结点失效不敏感,适合更新频繁的应用.  相似文献   

13.
Geographically replicating popular objects in the Internet speeds up content distribution at the cost of keeping the replicas consistent and up-to-date. The overall effectiveness of replication can be measured by the total communication cost consisting of client accesses and consistency management, both of which depend on the locations of the replicas. This paper investigates the problem of placing replicas under the widely used TTL-based consistency scheme. A polynomial-time algorithm is proposed to compute the optimal placement of a given number of replicas in a network. The new replica placement scheme is compared, using real Internet topologies and Web traces, against two existing approaches which do not consider consistency management or assume invalidation-based consistency scheme. The factors affecting their performance are identified and discussed  相似文献   

14.
The expiration-based scheme is widely used to manage the consistency of cached and replicated contents such as Web objects. In this approach, each replica is associated with an expiration time beyond which the replica has to be validated. While the expiration-based scheme has been investigated in the context of a single replica, not much work has been done on its behaviors with respect to multiple replicas. To allow for efficient consistency management, it is desirable to organize the replicas into a distribution tree where a lower level replica seeks validation with a higher level replica when its lifetime expires. This paper investigates the construction of a distribution tree for a given set of replicas with the objective of minimizing the total communication cost of consistency management. This is formulated as an optimization problem and is proven to be NP-complete. The optimal distribution tree is identified in some special cases and several heuristic algorithms are proposed for the general problem. The performance of the heuristic algorithms is experimentally evaluated against two classical graph-theoretic algorithms of tree construction: the shortest-paths tree and the minimum spanning tree.  相似文献   

15.
随着对等网络应用的不断深入,如何减少时间延迟,减轻集中性带宽负载,提高服务质量,已经成为研究的一个重点.提出了CORPC缓存管理方案.该方案通过使用流媒体片段的流行度来定义媒体片段副本数可占用的最佳系统缓存容量,综合考虑流媒体片段已有的副本容量、流媒体片段的热度、系统节点存储容量,使用启发式贪婪算法来实现缓存准入和缓存替换机制.该方案兼顾了不同热度的媒体片段的服务质量.模拟环境的测试结果表明,随着节点缓存空间的增加,系统服务质量得到改善.  相似文献   

16.
数据副本管理是云计算系统管理的重要组成部分,在云计算系统的海量数据处理过程中,针对目前已知的数据存放与资源调度算法存在考虑副本动态性和可靠性的不足,提出了一种动态的副本放置机制。该机制基于区域结构,考虑数据处理时其副本的数量和放置位置,以及副本的产生对于内存和带宽等系统资源的开销:首先根据云存储中的副本信息,对被访问频率高且访问平均响应时间长的数据信息进行复制,并给出副本数量的计算方法;考虑缩小副本分布的节点选择范围,提出动态的副本放置算法——DRA,将一定范围内的节点根据提出的域的划分,进行放置筛选,以存放数据副本。实验结果表明,提出的动态放置机制不仅减少了低访问率副本对系统存储空间的浪费;同时也减少了高访问率副本所需跨节点的传输延迟,有效提高了云存储系统中的数据文件的访问效率、负载的均衡水平,以及云存储系统的可靠性和可用性。  相似文献   

17.
Data Grid provides scalable infrastructure for storage resource and data files management, which supports several large scale applications. Due to limitation of available resources in grid, efficient use of the grid resources becomes an important challenge. Replication is a technique used in data grid to improve fault tolerance and to reduce the bandwidth consumption. This paper proposes a Dynamic Hierarchical Replication (DHR) algorithm that places replicas in appropriate sites i.e. best site that has the highest number of access for that particular replica. It also minimizes access latency by selecting the best replica when various sites hold replicas. The proposed replica selection strategy selects the best replica location for the users' running jobs by considering the replica requests that waiting in the storage and data transfer time. The simulated results with OptorSim, i.e. European Data Grid simulator show that DHR strategy gives better performance compared to the other algorithms and prevents unnecessary creation of replica which leads to efficient storage usage.  相似文献   

18.
左林  刘绍华  魏峻  冯玉琳  范国闯 《软件学报》2008,19(5):1212-1223
提出了一个基于域的自适应副本选择模型DARSM(domain based adaptive replica selection model).该模型将组件副本划分为强一致性域和弱一致性域,域间通过一致性窗口机制进行状态同步.基于DARSM模型,给出了一种基于分区加权的自适应副本选择算法PWARS(partition-weighted based adaptive replica selection,).该算法利用动态性能度量信息来选择满足时间约束和一致性约束的组件副本集合.为了适应请求一致性约束的动态变化,还提出了一致性窗口自适应重配算法CWAR(consistency window adaptive reconfiguration).通过引入的一个一致性约束的可能性模型,该算法动态地对一致性窗口进行重配,从而实现了副本一致性的自适应控制.通过在OnceAS应用服务器集群中的原型实验及性能评价,表明该方法能够明显地提高副本选择的性能.  相似文献   

19.
Analysis of Replica Placement under Expiration-Based Consistency Management   总被引:1,自引:0,他引:1  
Expiration-based consistency management is widely used to keep replicated contents up-to-date in the Internet. The effectiveness of replication can be characterized by the communication costs of client accesses and consistency management. Both costs depend on the locations of the replicas. This paper investigates the problem of placing replicas in a network where replica consistency is managed by the expiration-based scheme. Our objective is to minimize the total cost of client accesses and consistency management. By analyzing the communication cost of recursive validations for cascaded replicas, we prove that in the optimal placement scheme, the nodes not assigned replicas induce a connected subgraph that includes the origin server. Our results are generic in that they apply to any request arrival patterns. Based on the analysis, an O(D)-time algorithm is proposed to compute the optimal placement of the replicas, where D is the sum of the number of descendants over all nodes in the routing tree  相似文献   

20.
Cloud computing is becoming a very popular word in industry and is receiving a large amount of attention from the research community. Replica management is one of the most important issues in the cloud, which can offer fast data access time, high data availability and reliability. By keeping all replicas active, the replicas may enhance system task successful execution rate if the replicas and requests are reasonably distributed. However, appropriate replica placement in a large-scale, dynamically scalable and totally virtualized data centers is much more complicated. To provide cost-effective availability, minimize the response time of applications and make load balancing for cloud storage, a new replica placement is proposed. The replica placement is based on five important parameters: mean service time, failure probability, load variance, latency and storage usage. However, replication should be used wisely because the storage size of each site is limited. Thus, the site must keep only the important replicas.We also present a new replica replacement strategy based on the availability of the file, the last time the replica was requested, number of access, and size of replica. We evaluate our algorithm using the CloudSim simulator and find that it offers better performance in comparison with other algorithms in terms of mean response time, effective network usage, load balancing, replication frequency, and storage usage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号