首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 78 毫秒
1.
2.
容错技术中硬件冗余会产生较高的设计和生产成本.针对该问题,提出一种改进的实时嵌入式系统容错优化方法,基于检查点容错技术综合分析系统故障性能、硬实时任务时间约束和软实时任务的效用函数值.以设计的容错模型为基础,计算系统故障概率保证其在故障最大概率值内,给出硬任务截止时间确定可调度性,并应用改进的禁忌搜索算法获得软任务效用函数最佳值,算法有2种简单的邻节点结构,其禁忌准则遵循邻节点方法禁忌,优化效率明显改善.实验结果表明,该方法可进行故障分析等综合分析,并能迅速获得最大效用函数值.  相似文献   

3.
4.
张堂华 《福建电脑》2010,26(7):169-169
Vxworks以其良好的可靠性和卓越的实时性被广泛的应用在通信、军事、航空、航天等高精尖技术及实时性要求极高的领域中。  相似文献   

5.
在介绍实时高可靠应用系统传统处理模式的基础上 ,描述了一个利用集群技术实现的实时多任务双工系统所采用的基本工作流程、故障的判定方法和双缓冲切换技术。实践表明 ,该系统具有切入速度快、可靠性高、实现简单的优点  相似文献   

6.
实时多任务系统的TPCQ建模方法   总被引:1,自引:0,他引:1  
本文提出了一个基于定时的Petri网和立方体队列网络的实时多任务系统的建模方法-TPCQ,它特别适合于描述含有同步、通信和立方体队列调度的复杂的实时多任务系统,也可以描述一般的实时多任务系统。文中讨论了TPCQ模型,并且给出了一个建立实时多任务系统的TPCQ模型的例子。  相似文献   

7.
在许多系统资源非常紧张的单片机应用中,使用实时操作系统进行任务调度来实现实时多任务系统时,由操作系统带来的系统开销往往是不可接受的。通过升级硬件来改善系统资源紧张,意味着成本的增加,降低产品的竞争力。本文介绍采用Protothread在非常小的系统开销下实现实时多任务系统的方法。  相似文献   

8.
介绍TMS320C2812的BIOS内核,实现实时多任务操作,扩展外部时钟的接口电路和AT49BV162内部结构。结合TMS320C2812的功能,介绍DSP在嵌入式系统中如何对Flash进行数据的读写操作。  相似文献   

9.
基于DSP的实时多任务嵌入式系统   总被引:1,自引:3,他引:1  
介绍TMS320C2812的BIOS内核,实现实时多任务操作,扩展外部时钟的接口电路和AT49BV162内部结构.结合TMS320C2812的功能,介绍DSP在嵌入式系统中如何对Flash进行数据的读写操作.  相似文献   

10.
本文介绍一种在单片机应用系统中实现实时多任务调度的策略和方法,结合16位单片机(8098)实时系统,详述了调度策略、中断管理以及实时时钟等实现过程。  相似文献   

11.
The main issues when supporting fault tolerance based on checkpointing and rollback recovery for High‐Performance applications are related to the scalability of the introduced support, the possibility of analyzing the induced overhead and, in more general terms, the optimization of the trade‐off between failure‐free and recovery performances. In this paper we describe our contribution in fault tolerance for high‐level structured parallelism models. We take a different viewpoint w.r.t. existing contributions, by introducing a methodology to derive interesting properties to support fault tolerance. We show how to apply this methodology to a general data parallel model, deriving useful properties to introduce a class of checkpointing protocols. Thanks to this methodology, this class of protocols is not affected by the described issues. We exemplify two checkpointing protocols and the related rollback recovery techniques. For each protocol we also derive cost models statically describing the failure‐free performance, which can be used for performance tuning or to target some Quality of Service parameter. To assess the innovation of the results we analytically and experimentally compare the introduced protocols with two literature protocols. Results show that while the protocols introduced in this paper permit the definition of cost models and have a good scalability, the literature protocols do not always have these properties. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

12.
Building large-scale parallel computer systems for time-critical applications is a challenging task since the designers of such systems need to consider a number of related factors such as proper support for fault tolerance, efficient task allocation and reallocation strategies, and scalability. In this paper we propose a massively parallel fault-tolerant architecture using hundreds or thousands of processors for critical applications with timing constraints. The proposed architecture is based on an interconnection network called thebisectional network. A bisectional network is isomorphic to a hypercube in that a binary hypercube network can be easily extended as a bisectional network by adding additional links. These additional links add to the network some rich topological properties such as node symmetry, small diameter, small internode distance, and partitionability. The important property of partitioning is exploited to propose a redundant task allocation and a task redistribution strategy under realtime constraints. The system is partitioned into symmetric regions (spheres) such that each sphere has a central control point. The central points, calledfault control points (FCPs), are distributed throughout the entire system in an optimal fashion and provide two-level task redundancy and efficiently redistribute the loads of failed nodes. FCPs are assigned to the processing nodes such that each node is assigned two types of FCPs for storing two redundant copies of every task present at the node. Similarly, the number of nodes assigned to each FCP is the same. For a failure-repair system environment the performance of the proposed system has been evaluated and compared with a hypercube-based system. Simulation results indicate that the proposed system can yield improved performance in the presence of a high number of node failures.  相似文献   

13.
容错问题是大规模并行程序长时间运行中不可回避的问题,超级计算机中异构计算部件的加入使得该问题更加复杂。考察由CPU和GPU组成的异构并行系统中应用程序的容错,利用Charm++并行编程模型和CUDA的并行计算架构,对大规模计算宇宙学软件WIGEON进行重构。针对异构并行系统中存在的fail-stop硬件故障,设计并实现了内存检查点的应用容错机制。支持计算恢复后对产生变化的CPU/GPU资源配置进行自适应负载调整。通过在高性能计算机Mole8.5上的实验和分析,验证了异构容错方案的高效性和可行性,故障恢复时间仅需1~4 s。此外,使用分布式冗余数据改进了Charm++现有内存检查点存储模式,对比原有Double-in-Memory机制,性能未受影响,且最多降低了50%的额外内存使用量。  相似文献   

14.
杨娜  刘靖 《软件学报》2019,30(4):1191-1202
通过提供高效且持续可用的容错服务以保障云应用系统的可靠运行是至关重要的.采用容错即服务的模式,提出了一种优化的云容错服务动态提供方法,从云应用组件的可靠性及响应时间等方面描述云应用容错需求,以常用的复制、检查点和NVP(N-version programming)等容错技术为基础,充分考虑容错服务动态切换开销,分别针对支撑容错服务的底层云资源是否足够的场景,给出可用容错即服务提供方案的最优化求解方法.实验结果表明,所提方法降低了云应用系统支付的容错服务费用及支撑容错服务的底层云资源的开销,提高了容错服务提供商为多个云应用实施高效、可靠容错即服务的能力.  相似文献   

15.
节点崩溃或者仿真资源不足导致的分布式仿真系统故障,降低了仿真系统可靠性。为保证系统容错效果,降低容错开销,提出了一种基于虚拟化技术的仿真系统容错方法,按照系统故障发生的位置,对不同类型故障动态采用不同类型的容错策略。分析了检查点容错策略的优化方法,给出了最优设置间隔;结合虚拟化技术的优势,解决了副本容错策略的节点选择、副本数量以及位置分布问题;同时,引入基于虚拟机迁移的容错策略,并将其作为检查点容错策略和副本容错策略的补充,以降低容错开销。通过仿真实验数据对比,分析了动态容错策略与普通容错策略的性能,可知动态容错策略保证了系统容错性能,容错开销也保持在较低水平。  相似文献   

16.
RTEMS嵌入式系统中的软件容错设计   总被引:1,自引:0,他引:1       下载免费PDF全文
为了提高嵌入式系统在恶劣环境下的可靠性,除了在硬件上采用诸如双机冷备份之类的容错方案外,在实时操作系统级提供软件容错处理功能既可以减小硬件资源开销,又可以在不影响系统工作效率的前提下明显提高系统的容错纠错能力.本文针对RTEMS实时操作系统缺乏软件容错支持功能的不足,在操作系统级设计了一套两级软件容错的方案,提高了嵌入式系统的可靠性.  相似文献   

17.
提出与描述了支持低延迟通信与容错的计算资源共享环境LF-CRSE (low latency and fault tolerance CRSE),LF-CRSE提出了节点功能角色的观点,由客户端功能节点、任务服务器、工作机服务提供器、工作机节点组成,形成一个可扩展的分布式网络体系结构.采用了任务缓存、任务预获取和任务服务器端计算等策略保证了通信过程的低延迟开销.在应用上利用分支界限模式的任务划分,使LF-CRSE支持主-从模式和分-治模式的灵活编程模型.通过工作机端的心跳消息和面向子任务的容错方式保证了LF-CRSE的正确性.测试过程选择了具有数据依赖的分布式旅行商问题,实验结果表明,LF-CRSE的加速比随着工作机的增加稳定提高,在低延迟通信和容错特性上也具有良好的性能.  相似文献   

18.
Fault tolerance is especially important for computer systems that require a high degree of confidence. Computer Integrated Manufacturing (CIM) is an area where computer systems must not be disturbed by uncontrolled failures. This article deals with two problems that are related to fault tolerance and network partitions in automated manufacturing systems.The first problem relates to the distribution of information in partitioned data networks in CIM systems. We indicate how to overcome this problem by using the material network as a redundant data network:The second problem relates to fault detection and diagnosis in manufacturing systems. The problem is whether the indication of a fault means that a production unit itself has actually broken down, or that the indication is instead due to disturbances in the transmission of material. That is, the production unit continues to operate propcrly despite indications to the contrary. We describe how the material network can be used for detection and diagnosis.  相似文献   

19.
故障容错是衡量多处理器互连网络可靠性的重要方式之一。其中g-限制边连通度和g-限制连通度保证了剩下每个分支之间不连通且每个分支中节点的邻居数目不少于 g,能够更加精准地测量多处理器和多信道系统的容错性和可靠性。平衡超立方体是超立方体的一个变形,它特有的良好拓扑性质能够更好地满足多处理器系统和多种新型网络的需要。提出了n维平衡超立方体的{1,2}-限制边连通度和{1,2}-限制连通度,能够丰富以平衡超立方体为拓扑结构的网络容错性和可靠性的评价体系,并为平衡超立方体的故障诊断算法打下良好基础。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号