共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
3.
容错分布式系统的维修策略研究 总被引:1,自引:0,他引:1
一、引言 许多实际系统在使用过程中,往往由于对维修性问题考虑不周,致使系统的维费用增加;另一方面如果对系统进行过多的维修,不仅不能提高系统的可靠性和可用度,反而使统的性能降低。因此系 相似文献
4.
5.
6.
分布式系统中的检查点算法 总被引:12,自引:0,他引:12
检查点能够保存和恢复程序的运行状态.它在进程迁移、容错、卷回调试等领域都有重要的应用.本文对分布式系统中的检查点算法进行了详细的分类评述.检查点算法可分为单进程和分布式程序检查点算法,分布式程序检查点算法又可分为异步检查点算法和一致检查点算法.同时本文系统介绍了改进检查点算法性能的典型方法.这些改进算法主要采用两个策略来减少算法的开销与延迟:一是减少检查点文件中需要存储的信息量,如增量算法等;二是提高检查点操作与目标程序运行的并行性,如主存算法等.最后,文章讨论了目前检查点算法的局限性和进一步的工作. 相似文献
7.
分布式系统检查点算法中程序卷回时文件系统的状态恢复 总被引:3,自引:0,他引:3
检查点技术,也称为“回溯恢复”,是软件容错的重要手段,它主要用于保存和恢复程序的运行状态。在分布式计算和并行计算系统中有十分重要的作用。该文从减少检查点的开销角度,对分布式系统检查点算法中关于程序卷回时文件系统状态的恢复问题进行了分析讨论和进一步的研究。 相似文献
8.
9.
基于异构分布式系统的实时容错调度算法 总被引:26,自引:1,他引:26
目前文献中研究的实时容错调度算法都是基于同构分布式系统,系统中的所有处理机完全相同。该文首先建立了一个基于异构分布式系统实时容错调度模型,异构分布式系统中的各个处理机均不相同。基于该异构分布式系统模型,该文引入了可靠性代价(reliability cost)概念,并提出两种静态实时容错调度算法(RTFTNO和RTFTRC)用于调度周期性实时容错任务。算法RTFTRC在调度任务时,尽量使系统的可靠性代价最小;而算法RTFTNO在调度实时任务时,没有考虑系统的可靠性代价。该文详细讨论了两种调度算法的性能。性能模拟实验分别比较了两个算法的可靠性代价,超时比率和可调度性;并研究了任务的计算时间与可靠性代价的关系以及调度长度阈值与最小处理机个数的关系。实验结果表明,算法RTFTRC的性能优于算法RTFTNO。 相似文献
10.
11.
建立了一个异构分布式系统实时调度模型,对异构分布式系统中的任务及不同处理机资源进行了形式化描述.结合基版本/副版本技术,给出了用于异构分布式系统的实时任务轮转式容错调度算法.实例分析表明,该算法有效提高了异构处理机环境下的资源利用率以及整体计算性能. 相似文献
12.
陈华林 《计算机与数字工程》2009,37(4):147-150
SIP从20世纪90年起一经使用,就彻底改进了人们使用融合服务彼此进行通信的方式。会话初始协议提供了在网络上无缝透明传递声音、视频、数据和无线服务的框架结构。但SIP应用的可靠性的研究还处于初级阶段。文章阐述一种分布式容错SIP协议栈的实现方式,以方便可靠的SIP网络的设计和构建,从而使得SIP服务的用户得到更好的服务体验。 相似文献
13.
Communication-Induced Checkpointing (CIC) protocols are classified into two categories in the literature: Index-based and Model-based. In this paper, we discuss two data structures being used in these two kinds of CIC protocols, and their different roles in helping the checkpointing algorithms to enforce Z-cycle Free (ZCF) property. Then, we present our Fully Informed aNd Efficient (FINE) communication-induced checkpointing algorithm, which not only has less checkpointing overhead than the well-known Fully Informed (FI) CIC protocol proposed by Helary et al. but also has less message overhead. Performance evaluation indicates that our protocol performs better than many of the other existing CIC protocols. 相似文献
14.
Tibor Gyires 《Applied Intelligence》1991,1(2):145-155
Distributed Problem Solving Networks (DPSN) provide a means for interconnecting intelligent problem solver nodes that can solve only a part of a problem depending on their ability in the problem domain. The decomposition of a problem into subproblems, and the selection of nodes to solve them can be regarded as the generation of an AND/OR tree, and the solution of the problem as a search for a solution tree. Introducing measurements for the cost of a solution tree, we present an algorithm to find one having minimal cost under certain conditions. A Flexible Manufacturing System consisting of a network of flexible workcells is used as an example. 相似文献
15.
The STAR fault manager for distributed operating environments. design,implementation and performance
This paper presents the design, implementation and performance evaluation of a software fault manager for distributed applications. Dubbed Star, it uses the natural redundancy existing in networks of workstations to offer a high level of fault tolerance. Fault management is transparent to the supported parallel applications. To improve the response time of fault-tolerant applications, Star implements non-blocking and incremental checkpointing to perform an efficient backup of process state. Moreover, Star is application independent, highly configurable. Star actually runs on top of SunOs and is easily portable to UNIX™-like operating systems. The current implementation is based on independent checkpointing and message logging. Measurements show the efficiency and the limits of this implementation. The challenge is to show that a software approach to fault tolerance can efficiently be implemented in a standard networked environment. © 1998 John Wiley & Sons, Ltd. 相似文献
16.
The concepts of semistability and exponential semistability are well-developed for finite-dimensional systems with nonisolated equilibrium points, where asymptotic or exponential stability is not possible. Definitions of semistability and exponential semistability have recently been formulated for networks with time-delays. This paper further extends the semistability theory to continuous and discrete spatially distributed systems. This requires the definition of the notions of exact and approximate semicontrollability and semiobservability, and discrete approximate semicontrollability and semiobservability. Also introduced is the property of weak semistability. Necessary and sufficient conditions are given for exponential semistability and weak semistability, and sufficient conditions are given for semistability. 相似文献
17.
18.
Often hard real-time systems require results that are produced on time despite the occurrence of processor failures. This paper considers a distributed system where tasks are periodic and each task occurs in multiple copies which are periodically synchronized in order to handle failures. The problem of preemptively scheduling a set of such tasks is discussed where every occurrence of a task has to be completely executed before the next occurrence of the same task. First, a static scheduling algorithm is proposed which uses periodic checkpoints to tolerate processor failures. Then, the performance of the algorithm is substancially improved employing a mixed strategy which constructs a schedule where high frequency tasks are duplicated, and low frequency tasks are periodically checkpointed. The performance of the solution proposed is evaluated in terms of the minimum achievable processor utilization due to the useful computation of the tasks. Moreover, analytical and simulation studies are used to reveal interesting trade-offs associated with the scheduling algorithm. In particular, if high frequency tasks are less than 70 percent of the total number of tasks then the mixed strategy yields a higher processor utilization than the task duplication scheme. 相似文献
19.
传统的计算机应用系统体系是Client/Server结构模式,这种多对一的信息共享方式,造成了服务器系统的异常复杂和难以维护。随着网络服务需求的剧增,传统的信息共享模式已无法再满足新业务的需要,于是分布式应用系统技术(B/S结构)慢慢成为网络服务系统的首选结构。 相似文献
20.
A network partition can break a distributed computing system into groups of isolated nodes. When this occurs, a mutual exclusion mechanism may be required to ensure that isolated groups do not concurrently perform conflicting operations. We study and formalize these mechanisms in three basic scenarios: where there is a single conflicting type of action; where there are two conflicting types, but operations of the same type do not conflict; and where there are two conflicting types, but operations of one type do not conflict among themselves. For each scenario, we present applications that require mutual exclusion (e.g., name servers, termination protocols, concurrency control). In each case, we also present mutual exclusion mechanisms that are more general and that may provide higher reliability than the voting mechanisms that have been proposed as solutions to this problem.
Daniel Barbara is a graduate student in the Computer Science Department at Princeton University and expects to receive his Ph.D. Degree by July 1985. He obtained his BS in Electrical Engineering at the Universidad Metropolitana, Caracas, Venezuela in 1975. His research interests are Distributed Systems, Databases and Computer Networks. He is a member of IEEE and ACM.
Hector Garcia-Molina is associate professor in the Department of Computer Science at Princeton University, Princeton, New Jersey. His research interests include distributed computing systems and database systems. He received a BS in electrical engineering from the Instituto Tecnologico de Monterrey. Mexico, in 1974. From Stanford University, Stanford, California, he received in 1975 a MS in electrical engineering and a PhD in computer science in 1979. Garcia-Molina is a member of the ACM and IEEE. 相似文献