首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
提出了一种基于Fail-silent节点故障行为的高可靠分布式计算机系统,提出了减少上述系统节点故障检测延迟的两条有效途径,利用单机有限的故障检测能力和增加任务特征状态比较,利用节点故障的诊断技术,将系统中的故障节点与正确节点隔离,通过节点故障恢复和系统重构,以提高整个系统的可靠性。  相似文献   

2.
关于链路故障的分布式故障诊断   总被引:2,自引:0,他引:2  
一、引言 在分布式系统中,故障诊断应包括故障结点机的诊断和故障链路的诊断,故障链路的诊断在分布式系统级故障诊断中占有重要地位,它不仅是分布式乐统级故障诊断的目标,而且也是分布式系统级故障诊断的前提,因为一个系统的链路发生故障,则将会对故障结点机的诊断带来困难。在现有的诊断算法中,并不是所有的诊断算法都考虑了链路故障。 在分布式系统中,虽然任两个结点机之间都能进行通讯,也即都能进行互测,因而实际分布式系统的测试图的拓扑结构可以是任意结构的,但我们一般只在实际分布式系统的拓扑结构是分布式的或多总线的情形下才考虑链路故障。因为在单总线系统中,链路故障的出现将是致命的,会使整个系统的故障诊断成为不可能,因此一般不考虑链路故障。  相似文献   

3.
提出了一个系统级概率分布式故障诊断的算法 ,它能诊断局域网及其类似系统中结点机故障和链路故障 .该算法突破了现有研究成果中要求网络中故障结点数不超过 t(t相似文献   

4.
本文介绍了分布式计算机系统资源分散管理方式的一个分布同步算法.算法定义了互斥区各结点共同操作的全程变量——虚拟时间戳.算法还引入了时间到机制来处理失效结点,以实现算法容错和分布系统的坚定性. 在PDP-11/03计算机及THUDS分布计算机实验系统(双机)上,对算法进行模拟调试.结果表明算法的控制性能良好.  相似文献   

5.
系统级故障诊断是保证计算机系统和网络安全与稳定的重要手段。文章提出一个实际的基于广播的分布式诊断算法,并在局域网中予以实现。实验结果表明,该算法可以有效地用于局域网中故障的分布式诊断。  相似文献   

6.
传感器网络的结点是基于嵌入式设备的计算机系统,该系统对功能、可靠性、成本、体积、功耗有严格要求,系统中的每个任务、设备以及网络的连接都要求有足够的存储空间,因此选择适合的内存管理算法,高效地利用存储空间,是提高系统性能的一个重要方面。该文以自主开发的传感器结点操作系统SNNEOS为背景,重点讨论了传感器网络结点操作系统内存管理的分配算法与回收算法,为传感器结点的内存管理提供了一个有效的解决方案。  相似文献   

7.
基于Internet应用层组播在流媒体中的应用   总被引:3,自引:0,他引:3  
根据流媒体的特性阐述了在Internet上应用层组播应用在流媒体中的结构设想,以及建立一个最大平均接收速率的应用层组播树的分布式算法。在结构设想中加入中心结点克服了系统中复杂查询,结点出现故障时及时发现并修复,以及与成员之间有效的控制消息交换补偿可扩展性;在分布式优化算法中将组中的成员根据“能力”和最大平均接收速率建树,使得组播树更加稳定和健壮,也提高了树的自适应性,而且描述了这个算法的状态变量的估算。  相似文献   

8.
在并行计算机系统中,广播通信是极为重要的通信模式之一。该文基于k-Mesh子网(子立方体)连通的概念提出一个基于局部信息和分布式的三维Mesh网络容错广播路由算法。该算法利用邻结点的状态信息,动态地构建以单个k-Mesh子网为结点的广播树,该广播树能容忍相当多的结点出错。模拟结果表明广播路由算法的广播时间步接近最优的。该算法只要求结点知道它的邻结点的状态,而无需知道整个网络状态信息,也就是说,这些算法是基于局部信息的,因而具有很好的实际意义。  相似文献   

9.
彭玲玲 《电脑学习》2009,(3):134-135
为了使分布式网络保持高效的运作,在多种常用的负载平衡算法中选择了动态自适应负载平衡算法,根据结点负载情况,区分轻载结点和重载结点。  相似文献   

10.
分布式和并行系统的负载平衡是影响系统性能的一个重要因素,本文提出了一个基于预测的动态负载平衡算法,本算法以本地负载信息为基础预测该结点达到空闲状态的时间,并且在该结点到达空闲状态之前发出任务请求,从而保证系统中各结点都处于忙碌状态,提高系统资源的利用率,提高系统性能。  相似文献   

11.
在PMC故障模型下,现有的自适应顺序诊断算法(ASD算法)不能充分利用所有的测试结果。为了有效地减少测试次数,提高诊断效率,提出一种新的自适应顺序诊断算法(NASD算法)。引入相对故障单元的概念,给出并证明了故障单元和无故障单元的判别定理。据此给出系统诊断的策略:(1)边寻求无故障单元边确诊故障单元;(2)已确认的故障单元不再参与任何测试;(3)找到无故障单元或故障单元数接近一半时,系统诊断结束。实例表明,NASD算法优于其他ASD算法。  相似文献   

12.
A distributed system is self-stabilizing if it can be started in any possible global state. Once started the system regains its consistency by itself, without any kind of outside intervention. The self-stabilization property makes the system tolerant to faults in which processors exhibit a faulty behavior for a while and then recover spontaneously in an arbitrary state. When the intermediate period in between one recovery and the next faulty period is long enough, the system stabilizes. A distributed system is uniform if all processors with the same number of neighbors are identical. A distributed system is dynamic if it can tolerate addition or deletion of processors and links without reinitialization. In this work, we study uniform dynamic self-stabilizing protocols for leader election under readwrite atomicity. Our protocols use randomization to break symmetry. The leader election protocol stabilizes in O(ΔD log n) time when the number of the processors is unknown and O(ΔD), otherwise. Here Δ denotes the maximal degree of a node, D denotes the diameter of the graph and n denotes the number of processors in the graph. We introduce self-stabilizing protocols for synchronization that are used as building blocks by the leader-election algorithm. We conclude this work by presenting a simple, uniform, self-stabilizing ranking protocol  相似文献   

13.
We give efficient algorithms for distributed computation on oriented, anonymous, asynchronous hypercubes with possible faulty components (i.e. processors and links) and deterministic processors. Initially, the processors know only the size of the network and that they are inter-connected in a hypercube topology. Faults may occur only before the start of the computation (and that despite this the hypercube remains a connected network). However, the processors do not know where these faults are located. As a measure of complexity we use the total number of bits transmitted during the execution of the algorithm and we concentrate on giving algorithms that will minimize this number of bits. The main result of this paper is an algorithm for computing Boolean functions on anonymous hypercubes with bit cost , where is the number of faulty components (i.e. links plus processors), is the number of links which are either faulty, or non-faulty but adjacent to faulty processors, and is the diameter of the hypercube with faulty components. Received: October 1992 / Accepted: April 2001  相似文献   

14.
提出了超立方体并行计算机的一个新型系统级故障诊断算法.与现有诊断算法相比,该算法能够在系统中存在较多故障处理器的情况下,正确定位全部故障处理器(代价是至多误诊断三个无故障处理器).另外,该算法的时间复杂度与最好的现有算法相当.  相似文献   

15.
为了提高分布式存储系统中故障节点的修复效率, 提出一种新的部分重复(fractional repetition, FR)码的构造算法. 该算法利用完全图的因子分解进行构造, 称为CGFBFR (complete graph factorization based FR)码. 该算法首先对完全图进行因子分解, 分解完成以后确定完全图的因子分解个数, 根据需要存储数据块的重复度来选择完全图的因子个数, 将完全图选中的因子所有顶点当做分布式存储系统中需要存储的数据块, 然后对选中因子图的边进行标记, 标记的边当做分布式数据节点进行存储. 最后根据选中的因子的顶点和边生成编码矩阵, 在分布式存储系统中按照编码矩阵中的数据对数据块分别进行存储. 实验仿真结果显示, 本文提出的一种新的部分重复码构造算法, 与分布式存储系统中的里所(reed-solomon, RS)码、简单再生码(simple regenerating codes, SRC)以及最新的循环可变部分重复(variable fractional repetition, VFR)码相比, 在系统修复故障节点时, 能够快速地修复故障节点, 有效降低了故障节点的修复带宽开销、修复局部性、修复复杂度, 而且构造过程简单, 同时可以灵活选择构造参数, 广泛适用于分布式存储系统中.  相似文献   

16.

In this article, we consider the problem of self-diagnosis of multiprocessor and multicomputer systems under the generalized comparison model. In this approach, a system consists of a collection n independent heterogeneous processors (or units) interconnected via point-to-point communication links, and it is assumed that at most t of these processors are permanently faulty. For the purpose of diagnosis, system tasks are assigned to pairs of processors and the results are compared. The agreements and disagreements among units are the basis for identifying faulty processors. Such a system is said to be t-diagnosable if, given any complete collection of comparison results, the set of faulty processors can be unambiguously identified. We present an efficient fault identification method based on genetic algorithms. Analysis and simulations are provided, first, to evaluate the genetic parameters of the diagnosis algorithm; second, to show the efficiency of the genetic approach. The new strategy is shown to correctly identify the set of faulty processors, making it an attractive and viable addition or alternative to present fault diagnosis techniques.  相似文献   

17.
We present a multi-heuristic evolutionary task allocation algorithm to dynamically map tasks to processors in a heterogeneous distributed system. It utilizes a genetic algorithm, combined with eight common heuristics, in an effort to minimize the total execution time. It operates on batches of unmapped tasks and can preemptively remap tasks to processors. The algorithm has been implemented on a Java distributed system and evaluated with a set of six problems from the areas of bioinformatics, biomedical engineering, computer science and cryptography. Experiments using up to 150 heterogeneous processors show that the algorithm achieves better efficiency than other state-of-the-art heuristic algorithms.  相似文献   

18.
Fault tolerance is an important design criterion for reliable and robust video-on-demand systems. Conventional fault-tolerant designs use either a primary backup or an active replication method to provide system fault tolerance. However, these approaches suffer from low utilization of the backup or replication system. In this paper we propose two playback-recovery schemes for distributed video-on-demand systems called the forward playback-recovery scheme and the backward playback-recovery scheme. Unlike conventional fault-tolerant designs, our schemes use existing playback resources to recover faulty playbacks without allocating new resources, significantly reducing recovery overhead. To use the schemes effectively, we developed a distributed algorithm for determining the order and gap information between the playbacks on the distributed video-on-demand servers so that overhead for recovering from a server failure can be minimized. This algorithm achieves N – 1 fault-tolerant resiliency for N-server video-on-demand systems. In addition, three server-recovery policies are also presented to guide surviving servers in applying the proper scheme to recover faulty playbacks, thus reducing overall recovery costs. Simulation results show that the proposed recovery schemes are effective and useful in designing fault-tolerant multiple-server video-on-demand systems.  相似文献   

19.
We present a comparison-based algorithm for identifying faulty and fault-free elements in a wafer-scale linear array of processors (or other logic elements). Only nearest neighbor communication is assumed to be possible between the processors in the array. Because the algorithm is simple and requires no storage of test vectors or test outcomes, it is ideally suited for implementation on the wafer to provide the capability for built-in production (or post production) testing. We show that, surprisingly, this method achieves high accuracy of diagnosis over a wide range of yields even though the diagnosis may be based on a high proportion of results produced by faulty processors. We also propose an improvement to the above algorithm which uses a processor diagnosed as fault-free by the basic algorithm as the starting point in improving the accuracy with which faulty processors are identified. Quantitative and qualitative reasoning validate the efficiency of these schemes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号