共查询到16条相似文献,搜索用时 328 毫秒
1.
2.
针对已有故障检测服务不能有效满足分布式系统需求问题,设计了一种适用于分布式系统的动态故障检测服务.根据分布式系统的特点,在定义分布式系统模型的基础上,提出了动态故障检测服务架构.结合心跳策略和灰色预测方法,设计了一种动态心跳机制,并给出了预测模型和动态预测策略,提出了基于该动态心跳机制的分布式系统的故障检测算法.最后,仿真实验验证了该算法的正确性和有效性. 相似文献
3.
《计算机应用与软件》2016,(6)
研究大数据环境下网格动态故障检测的方法。大数据来源范围广博,数据类型极复杂;数据的广泛性,资源的高度异构和不同地理上的分布,使网格故障发生成为影响系统应用的主要问题。目前网格故障检测方式,不能满足网格动态故障检测需要。利用"灰色预测理论"的算法,依据动态心跳的原理,设计动态故障检测架构,给出了预测模型;提出了网格动态故障检测方法。实验结果证实是有效的和准确的,提出的动态故障检测算法优于静态故障检测算法,解决了大数据环境下网格动态故障检测问题。 相似文献
4.
《计算机工程与应用》2017,(24)
分布式系统中心跳检测是节点故障检测机制的关键技术之一,心跳频率设定的合理性将影响到故障检测的准确性和完整性。针对大数据环境下,分布式系统产生故障受到网络、节点、作业多方面影响,为了提高心跳频率在多方面因素影响下的合理性设定,提出一种多因素心跳检测综合指标评价模型。在该模型下同时考虑网络负载情况和节点CPU工作状态及节点作业的大小对心跳检测过程的影响。在此基础上,提出了基于多因素评价模型的自适应心跳检测算法。该算法可以随网络环境、节点CPU占用率、作业任务大小自适应地改变心跳频率,综合各因素给出心跳频率设定的最优方案。最后通过实验验证了多因素对心跳频率自适应调整的影响。 相似文献
5.
6.
7.
8.
针对传统可分割作业多路调度算法不能适应动态网格环境的不足,基于统一多路(Uniform Multi-Round:UMR)算法,提出一种可靠的可分割作业调度机制.系统动态地监控网格资源的变化,当资源发生变化时,通过性能预测与评估,及时地对剩余作业进行再调度.实验表明,较之传统的多路调度算法,该机制在动态的网格环境下,降低了作业完成时间,有效地利用了网格资源,提高了作业调度的可靠性. 相似文献
9.
信任是网格资源调度中一个很重要的因素,也是影响网格计算有效性和性能的关键技术之一。将信任机制引入到网格资源调度中,提出了网格环境下的信任模型和基于信任机制的资源调度模型,在调度策略上对传统的Min-Min算法进行了改进,提出了基于信任机制的Trust-Min-Min算法。仿真结果表明,算法不仅可以缩短任务的总执行时间,而且可以有效地平衡负载,是网格环境下一种有效的资源调度方法。 相似文献
10.
11.
运动规划算法作为自动驾驶系统中的重要研究内容,愈发受到研究者们关注.然而目前多数算法仅考虑在确定性结构化环境中的应用,忽视动态交通环境中潜在的不确定性因素.文中面向不确定性环境,将运动规划算法总结为两类:部分可观测马尔可夫决策过程(POMDP)和概率占用栅格图(POGM),从理论基础、求解算法、实际应用三方面进行介绍.基于当前置信状态,POMDP计算使未来折扣奖励最大的策略.POGM使用概率表征对应栅格上的占用情况,衡量车流动态变化的可能性,良好表征不确定性情况.最后,总结不确定性环境中当前运动规划问题面临的主要挑战和未来可能的研究方向. 相似文献
12.
13.
Besides the dynamic nature of grids, which means that resources may enter and leave the grid at any time, in many cases outside of the applications’ control, grid resources are also heterogeneous in nature. Many grid applications will be running in environments where interaction faults are more likely to occur between disparate grid nodes. As resources may also be used outside of organizational boundaries, it becomes increasingly difficult to guarantee that a resource being used is not malicious. Due to the diverse faults and failure conditions, developing, deploying, and executing long running applications over the grid remains a challenge. So fault tolerance is an essential factor for grid computing. This paper presents an extensive survey of different fault tolerant techniques such as replication strategies, check-pointing mechanisms, scheduling policies, failure detection mechanisms and finally malleability and migration support for divide-and-conquer applications. These techniques are used according to the needs of the computational grid and the type of environment, resources, virtual organizations and job profile it is supposed to work with. Each has its own merits and demerits which forms the subject matter of this survey. 相似文献
14.
Mohammad Shorfuzzaman Peter Graham Rasit Eskicioglu 《The Journal of supercomputing》2010,51(3):374-392
Data grids support access to widely distributed storage for large numbers of users accessing potentially many large files.
Efficient access is hindered by the high latency of the Internet. To improve access time, replication at nearby sites may
be used. Replication also provides high availability, decreased bandwidth use, enhanced fault tolerance, and improved scalability.
Resource availability, network latency, and user requests in a grid environment may vary with time. Any replica placement
strategy must be able to adapt to such dynamic behavior. In this paper, we describe a new dynamic replica placement algorithm,
Popularity Based Replica Placement (PBRP), for hierarchical data grids which is guided by file “popularity”. Our goal is to
place replicas close to clients to reduce data access time while still using network and storage resources efficiently. The
effectiveness of PBRP depends on the selection of a threshold value related to file popularity. We also present Adaptive-PBRP
(APBRP) that determines this threshold dynamically based on data request arrival rates. We evaluate both algorithms using
simulation. Results for a range of data access patterns show that our algorithms can shorten job execution time significantly
and reduce bandwidth consumption compared to other dynamic replication methods. 相似文献
15.
面向高可靠智能应用的分布计算系统,首先提出一组故障侦测服务的QoS度量标准,其次给出一种自适应故障侦测方法.该方法使用一个无需统计行为的高度动态的计算方法,动态地估算心跳消息超时时限,并协商改变心跳消息的发送周期,以适应分布计算系统计算节点和网络状态变化,提高故障侦测服务的QoS.模拟实验表明,该方法能够适应分布计算系统状况的变化,在侦测的实时性和正确性上提供较好的平衡. 相似文献