首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
高效的容错技术对于提高多处理器系统的可靠性至关重要。环网(Torus)是连接多处理器阵列的重要网络结构,而环网处理器阵列上的容错重构技术目前尚属空白。针对环网阵列的特殊连接方式,将环网阵列重构问题转化为矛盾图上求解最大独立集问题。矛盾图上的结点表示故障处理器的替换方案,而边代表了不同替换方案之间的不可共存特性。主要是根据三种不同的冗余处理器分布方案,设计生成矛盾图算法,求解最大独立集算法,以及由独立集生成逻辑处理器阵列算法,取得了令人满意的结果。实验结果表明,当阵列规模较小或故障率较低时,一行一列和十字型的冗余单元分布的重构能力较好;而随着阵列规模或故障率的增大,三种冗余单元分布策略的重构成功率都随之下降,但可通过增加冗余单元以及调整冗余分布来改善容错效果。此外,从实验结果中还可以看出,环网处理器阵列的容错能力显然优于网格(Mesh)处理器阵列。  相似文献   

2.
The mobile agents create a new paradigm for data exchange and resource sharing in rapidly growing and continually changing computer networks. In a distributed system, failures can occur in any software or hardware component. A mobile agent can get lost when its hosting server crashes during execution, or it can get dropped in a congested network. Therefore, survivability and fault tolerance are vital issues for deploying mobile-agent systems. This fault tolerance approach deploys three kinds of cooperating agents to detect server and agent failures and recover services in mobile-agent systems. An actual agent is a common mobile agent that performs specific computations for its owner. Witness agents monitor the actual agent and detect whether it's lost. A probe recovers the failed actual agent and the witness agents. A peer-to-peer message-passing mechanism stands between each actual agent and its witness agents to perform failure detection and recovery through time-bounded information exchange; a log records the actual agent's actions. When failures occur, the system performs rollback recovery to abort uncommitted actions. Moreover, our method uses checkpointed data to recover the lost actual agent.  相似文献   

3.
陈国林  章立生 《计算机应用》2005,25(8):1916-1918,1922
在FPGA内部使用各种IP软核搭建了完整的嵌入式系统,实现了用三个MicroBlaze CPU软核进行表决的三模冗余容错方案。同时对μC/OS—Ⅱ操作系统以及应用程序进行改进,在程序的内部加入了错误检测和校正(EDAC)、函数堆栈保护等容错功能。通过实验证明,该系统减小了器件本身和内存模块受到的SEU(Single Event Upset)影响。  相似文献   

4.
The development of an operating system that is a central component of a fault-tolerant multiprocessor is described. The operating system, while relatively simple and small, supports multitasking and multiprocessing, as well as both self-diagnostics and cross-diagnostics for fault detection. In the event of a fault, the system permits rapid reconfiguration in a manner that retains processing for the highest-priority tasks. Since the hardware needed to provide fault tolerance is available when there are no faults, the operating system can utilize this excess capacity to accomplish lower-priority tasks during normal operation. This approach yields graceful degradation in response to faults in the system components  相似文献   

5.
An examination of the structure of fault-tolerant systems incorporating backward error recovery indicates a partitioning into two broad classes. Two canonical models, each representing a particular class of systems, have been constructed. The first model incorporates objects and actions as the entities for program construction whereas the second model employs communicating processes and conversations. Applications in areas such as office information and banking systems are typically described and built in terms of the first model whereas applications in the area of process control are usually described and built in terms of the second model. The paper claims that the two models are duals of each other and presents arguments and examples to substantiate this claim. It will be shown that the techniques that have been developed within the context of one model turn out to have interesting and hitherto unexplored duals in the other model.  相似文献   

6.
牛跃华  赵文彦 《计算机应用》2014,34(9):2497-2500
目前SpaceWire总线应用主要基于单个路由器上连接数个节点设备,构成小型星状网络结构,而对复杂航天器多节点情况下的SpaceWire网络系统应用研究较少。针对航天器高可靠要求,提出一种总线型网络拓扑容错设计方案,对网络工作方式、多层次冗余容错机制、系统可靠度和网络传输性能进行了分析推导,结果表明提出的网络拓扑满足星载应用要求。最后根据分析结果提出了SpaceWire网络系统中数据包长、链路速率和节点布局的设计指导原则。  相似文献   

7.
We present an integrated computer-aided design environment, the PrT (predicate/transition) net system, in order to systematically introduce fault-tolerant properties into the design of complicated digital systems. This is accomplished by exploiting a formal specification of the system requirements in which the amount of necessary redundancy can be determined. The system is based on an integration of PrT nets with regular expressions. PrT nets are used to describe and analyze a high level system and regular expressions are used to describe and analyze the more detailed system structures. Both models provide us with well-defined levels of fault diagnosis needed in the digital system design. An S-invariant technique can be used to check the constancy of PrT nets; and a finite state automaton can be used to check the acceptability of regular expressions. Furthermore, the regular expression can also enable a system designer to determine redundancy in order to perform error correction. In consequence, our approach is superior to the current techniques for requirements analysis. Finally, main results are presented in the form of four propositions and supported by some experiments  相似文献   

8.
The e-health domain has the objective to assist and manage citizens’ health. It concerns many actors like patient, doctors, hospitals and administration. Current and forthcoming generations of application will be web based and will integrate more and more mobile devices. In such application domain, called m-health, dependability is a key notion. In addition, more and more functionalities of such systems will be implemented as services for providing qualities like adaptability and maintainability. This paper presents, through a case study, how we can analyse and design an application that controls the insulin injection and that is embedded in a mobile device belonging to an e-health Web Information System (WIS). In order to ensure the dependability of the control systems, we show how to use Coordinated Atomic Actions (CAA). CAAs provide well defined concepts for fault tolerance, error-detection and error recovery in a distributed context where competitive and cooperative concurrencies are considered. The combination of CAA and SOA at design level it is proposed, with the purpose of being able to design systems that are partially implemented using service-oriented technologies. We updated and used our implementation framework, called CAA-DRIP, which originally was not tailored for service-oriented mobile applications. Thus, in this paper, we also propose an adaptation of CAA-DRIP for mobile devices.  相似文献   

9.
This paper presents an online diagnosable fault-tolerant system: N-unit t-fault tolerablesystem. The number of units N in the system can be either odd or even. The relationshipbetween N and t (the number of faulty units which can be tolerated) is presented. The approachof an optimum N- unit t-fault tolerable system is also given. As an example, a 4-unit 2-faulttolerable system is discussed. The reliability and mean time to failure of 4-unit 2-fault tolerablesystem are shown to be higher than 5MR (5-modular redundancy) and TMR (Triple ModulerRedundancy) system reliabilities. The amount of hardware components in a 4-unit t-faulttolerable system is simpler than 5MR. The complexity of switching circuit for N-unit t-faulttolerable system increases only linearly with respect of the number of modules. Our scheme isalso simpler than the hybrid redundancy system. Some theorems for the online diagnosis of N-unit t-fault systems are given and proved.  相似文献   

10.
研究不确定系统D-稳定鲁棒容错H∞控制问题.基于连续型执行器故障模式,利用线性矩阵不等式(LMI)给出了系统D-稳定的鲁棒容错输出反馈控制器存在的充分条件,并将动态输出反馈控制器设计方法归结为求解一族线性矩阵不等式组.仿真示例表明,无论执行器是否发生故障,所得到的动态输出反馈控制器不仅保证闭环系统是D-稳定的,而且满足给定的H∞干扰指标,从而验证了所提出的控制器设计方法的有效性.  相似文献   

11.
《Computers in Industry》1986,7(4):351-359
The experimental microcomputer-system FIPS represents a part of a distributed process automation system. It was developed to investigate the possibilities of implementing fault-tolerance with respect to the specific requirements of the industrial process automation (fairly independent and flexible extensible local subsystems). The present paper describes the synchronization tools of the according real-time operating system REX, supporting parallel co-operating jobs on different subsystems that compare or vote on process-output-signals. Further it shows a method to inform the application programmer about a job restart allowing him to react according to the requirements of the technical process and of his control-algorithm.  相似文献   

12.
Fault-tolerant control combines fault detection and isolation techniques with supervisory control, to achieve the autonomous accommodation of faults before they develop into failures. While fault detection and isolation (FDI) methods have matured during the past decade, the extension to fault-tolerant control is a fairly new area. This paper presents a ship propulsion system as a benchmark that should be useful as a platform for the development of new ideas and a comparison of methods. The benchmark has two main elements. One is the development of efficient FDI algorithms, and the other is the analysis and implementation of autonomous fault accommodation. A benchmark kit can be obtained from the authors.  相似文献   

13.
万玮  杨志义 《计算机工程与设计》2005,26(10):2811-2813,2816
为了提高分布式计算集群系统的可靠性,增强系统的容错能力,使系统在局部出错的情况下仍能稳定正常运行,建立了一个容错系统模型,该模型采用两级容错机制即节点级容错和任务级容错。此模型为分布式计算集群系统下的容错的进一步研究建立了基础。  相似文献   

14.
15.
针对水下机器人(UUV)推进系统容错控制分配问题,本文提出了基于SVD分解(奇异值分解)与定点分配的混合算法.与传统的方法相比,它回避了求伪逆矩阵的问题,降低了计算量;能够满足推进器饱和约束限制.利用水下实验平台推进系统模型进行了仿真实验,验证了算法的正确性和有效性.  相似文献   

16.
针对电力直流监控系统的可靠性要求,采用实时双机嵌入式容错系统的设计以实现功能.采用了基于内核抢占式的实时多任务操作系μC/OS-Ⅱ进行容错设计、修改了内核调度,并讨论、验证了容错的任务可调度性.可靠性检测结果表明,双机容错系统的功能可以满足实际要求.  相似文献   

17.
The method of designing an event-driven observer-based fault-tolerant controller is addressed for a state-dependent system with external disturbance and fault in this paper. An event-driven criterion is proposed to determine the updating of the controller based on the state of the Luenberger-type state-dependent observer. As a result, communication resources can be saved significantly while the desired H performance is preserved. The observer error closed-loop system is rewritten as a time-varying delay system. By employing a state-dependent integral function to be a Lyapunov function candidate, the error system is proved to be asymptotically stable. The observer gain, the controller gain and the event parameters in the event condition can be co-designed and obtained in terms of solution to a set of linear matrix inequalities (LMIs). Finally, a numerical example and the tunnel diode circuit model are shown that the proposed method is effective, and the simulation results can reflect that the event-triggered scheme can lead to a larger release period than time-triggering scheme.  相似文献   

18.
This paper describes the architecture,fundamental principle and implementation of a distributed fault-tolerant system-DFTSNA,Its objective is o combine extreme reliability with high availability in a shipboard environment,Multi-level fault tolerance is considered and several special purpose hardware subsystems(F-T clusters)are developed.The physical and functional distribution of the system is empha-sized to meet the stringent shipboard requirements.A number of algorithms are produced to support fault-tolerant operation.  相似文献   

19.
Fault-tolerant scheduling is an important issue for computational grid systems, as grids typically consist of strongly varying and geographically distributed resources. The main scheduling strategy of most fault-tolerant scheduling systems depends on the response time and fault index when selecting a resource to execute a certain job.In this paper, a scheduling system is presented that depends on a new factor called scheduling indicator in selecting resources. This factor comprises of the response time and the failure rate of grid resources. Whenever a grid scheduler has jobs to schedule on grid resources, it uses the scheduling indicator to generate the scheduling decisions. The main scheduling strategy of the system is to select resources that have the lowest tendency to fail. Extensive simulation experiments are conducted to quantify the performance of the proposed system. Experiments have shown that the proposed system can considerably improve grid performance in terms of throughput, unavailability, turnaround time, and fail tendency.  相似文献   

20.
The grid provides an integrated computer platform composed of differentiated and distributed systems. These resources are dynamic and heterogeneous. In this paper, a novel fault-tolerant grid-scheduling model is presented based on Stochastic Petri Nets (SPN) to assure the heterogeneity and dynamism of the grid system. Also, a new grid-scheduling strategy, the dependable strategy for the shortest expected accomplishing time (DSEAT), is put forward, in which the dependability factor is introduced in the task-dispatching strategy. In the end, the performance of the scheduling strategy based on the fault-tolerant grid-scheduling model is analyzed by an software package, named SPNP. The numerical results show that dynamic resources will increase the response time for all classes of tasks in differing degrees. Compared with shortest expected accomplishing time (SEAT) strategy, the DSEAT strategy can reduce the negative effects of dynamic and autonomic resources to some extent so as to guarantee a high quality of service (QoS).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号