首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Kim  Jae Kwon  Kim  Byung Kook 《Real-Time Systems》2004,26(2):199-222
To improve reliability of real-time control systems, various fault-tolerance methods have been designed and implemented. We propose a highly reliable control system using modular and temporal redundancy, called dual-modular temporal redundancy (DMTR). Assuming that transient faults occur and recover with exponential probability distributions, we analyze the probabilistic schedulability of DMTR for multiple tasks with harmonic periods (DMTR-HP). After formulating a discrete-time reliability model for DMTR-HP, we formulate an efficient recursive computation algorithm for rapidly obtaining the probabilistic schedulability of the overall system. Considering the overhead for checkpointing in a DMTR-HP control system, we obtain the optimal number of subslots for maximum reliability using our DMTR-HP reliability model. In addition, we compare the reliabilities of DMTR-HP, DMTR using GCDP scheduling (DMTR-GCDP), and conventional dual-modular redundancy (DMR).  相似文献   

2.
Primary/Backup has been well studied as an effective fault-tolerance technique. In this paper, with the objectives of tolerating a single permanent fault and maintaining system reliability with respect to transient faults, we study dynamic-priority based energy-efficient fault-tolerance scheduling algorithms for periodic real-time tasks running on multiprocessor systems by exploiting the primary/backup technique while considering the negative effects of the widely deployed Dynamic Voltage and Frequency Scaling (DVFS) on transient faults. Specifically, by separating primary and backup tasks on their dedicated processors, we first devise two schemes based on the idea of Standby-Sparing (SS): For Paired-SS, processors are organized as groups of two (i.e., pairs) and the existing SS scheme is applied within each pair of processors after partitioning tasks to the pairs. In Generalized-SS, processors are divided into two groups (of potentially different sizes), which are denoted as primary and secondary processor groups, respectively. The main (backup) tasks are scheduled on the primary (secondary) processor group under the partitioned-EDF (partitioned-EDL) with DVFS (DPM) to save energy. Moreover, we propose schemes that allocate primary and backup tasks in a mixed manner to better utilize system slack on all processors for more energy savings. On each processor, the Preference-Oriented Earliest Deadline (POED) scheduler is adopted to run primary tasks at scaled frequencies as soon as possible (ASAP) and backup tasks at the maximum frequency as late as possible (ALAP) to save energy. Our empirical evaluations show that, for systems with a given number of processors, there normally exists a configuration for Generalized-SS with different number of processors in primary and backup groups, which leads to better energy savings when compared to that of the Paired-SS scheme. Moreover, the POED-based schemes normally have more stable performance and can achieve better energy savings.  相似文献   

3.
Energy-efficient task allocation and scheduling schemes with deterministic fault-tolerance capabilities are proposed for symmetric multiprocessor systems executing tasks with hard real-time constraints. The proposed heuristic is proven to achieve energy savings by optimally balancing application workload among processors in a system. Based on the observation that fault-free operation is expected to remain dominant in the near future and the probability of the worst case faults is low, an optimistic fault-tolerant heuristic is then proposed to achieve maximum energy savings in the absence of faults while degrading gradually to meet application timing requirements in the worst case of faults. Simulation results show that compared to state-of-art allocation and scheduling schemes proposed heuristic achieves average energy savings of up to 70%. It is also shown that optimistic approach is more resilient to variations in application utilizations and fault occurrences beyond system specifications.  相似文献   

4.
Distributed fault-tolerance can mask the effect of a limited number of permanent faults, while self-stabilization provides forward recovery after an arbitrary number of transient faults hit the system. FTSS (Fault-Tolerant Self-Stabilizing) protocols combine the best of both worlds since they tolerate simultaneously transient and (permanent) crash faults. To date, deterministic FTSS solutions either consider static (i.e. fixed point) tasks, or assume synchronous scheduling of the system components.In this paper, we present the first study of deterministic FTSS solutions for dynamic tasks in asynchronous systems, considering the unison problem as a benchmark. Unison can be seen as a local clock synchronization problem as neighbors must maintain digital clocks at most one time unit away from each other, and increment their own clock value infinitely often. We present several impossibility results for this difficult problem and propose an FTSS solution (when the problem is solvable) for the state model that exhibits optimal fault-containment.  相似文献   

5.
This paper presents an optimal checkpoint strategy for fault-tolerance in real-time systems where transient faults occur in Poisson distribution. In our environment, multiple real-time tasks with different deadlines and harmonic periods are scheduled in the system by rate-monotonic algorithm, and checkpoints are inserted at a constant interval in each task. When a fault is detected, the system carries out rollback to the latest checkpoint and re-executes tasks. The maximum number of re-executable checkpoints and an equation to check schedulability are derived, and the optimal number of checkpoints is selected to maximize the probability of completing all the tasks within their deadlines.  相似文献   

6.
朱晓敏  祝江汉  马满好 《软件学报》2011,22(7):1440-1456
容错调度是调度问题中一个重要的研究内容,是提高系统可靠性的有效手段.目前已有很多集群系统中实时任务的容错调度算法,但是这些算法都没有考虑到任务的QoS需求问题.提出了一种异构集群系统中具有QoS需求的实时任务容错调度算法FTQ(fault-tolerant QoS-based scheduling).该算法采用主版本/副版本(primary/backup,简称PB)技术,综合考虑了任务的时间限制、任务的QoS需求、系统的可靠性和系统资源的利用率,能够自适应地根据系统负载情况动态地调整任务的QoS级别和副版本的执行模式,从而提高了系统的灵活性、可靠性、可调度性和资源的利用率.对系统的可靠性进行了定量分析,并将其引入到容错调度算法中,提高了系统的可靠性.同时,在调度过程中尽量提前主版本的开始时间,推迟副版本的开始时间,以使任务的副版本采用被动执行模式或者使任务主版本和副版本的重叠部分尽量少,提高了资源的利用率.此外,采用了副版本重叠技术,并分析了副版本的最晚开始时间及其约束条件,提高了任务的调度成功率.通过大量的模拟实验,对FTQ,NOFTQ和DYFARS算法进行了比较.实验结果表明,FTQ算法的性能优于其他方法,具有更好的调度质量.  相似文献   

7.
同时多线程处理器允许多个线程同时执行,一方面提高了处理器的性能,另一方面也为通过线程冗余执行来容错提供了支持.冗余多线程结构将线程复制成两份,二者独立执行,并比较结果,从而实现检错或者容错.冗余多线程结构主要采用ICOUNT调度策略来解决线程间资源共享问题.然而这种策略有可能造成"饥饿"现象,并降低处理器吞吐率.提出一...  相似文献   

8.
网格环境具有异构性、动态性和不可靠性,为了合理而经济地利用资源,本文提出一个基于QoS且具有容错性的任务调度算法,以时间和费用的预算以及时间和费用的权重比值作为QoS参数。使计算过程和通信过程重叠,以隐藏网络时延。本文用随机Petri网模型描述网格环境中的任务调度模型;定义了随机Petfi肉的可达图,用来分析任务调度模型的性能。通过分析和模拟,反映此算法能够在满足用户的时间和费用的限制,具有容错性,任务完成时间短,以及综合花费少等优点。  相似文献   

9.
Aggressive technology scaling has dramatically increased the power density and degraded the reliability of embedded real-time systems. The goal of our research in this paper is to develop effective scheduling methods that can minimize the energy consumption and, at the same time, tolerate up to \(K\) transient faults when executing a hard real-time system scheduled according to the EDF policy. Three scheduling algorithms are presented in this paper. The first algorithm is an extension of a well-known fault oblivious low-power scheduling algorithm. The second algorithm intends to minimize the energy consumption under the fault-free situation while reserving adequate resources for recovery when faults strike. The third algorithm improves upon the first two by sharing the reserved resources and thus can achieve better energy efficiency. Simulation results show that the proposed algorithms consistently outperform other related approaches in energy savings.  相似文献   

10.
混合型实时容错调度算法的设计和性能分析   总被引:15,自引:2,他引:15  
以往文献中研究的实时容错调度算法都只能调度单一的具有容错需求的任务.该文建立了一个混合型实时容错调度模型,提出一种静态实时容错调度算法.该算法能同时调度具有容错需求的实时任务和无容错需求的实时任务.该文还提出了一个求解最小处理机个数的算法,用于对静态实时容错调度算法的性能进行模拟分析.为了提高静态调度算法的调度性能,提出了一种动态调度算法.最后,通过模拟实验分析了静态和动态调度算法的性能.实验表明,调度算法的性能与实时任务的个数、任务的计算时间、周期和处理机个数等系统参数相关.  相似文献   

11.
毛南  黄岚  王忠义  刘志存 《计算机工程与设计》2007,28(14):3433-3435,3439
简要回顾了容错技术的发展过程并分析了不同故障模型下系统的客错方式.对于瞬时故障、间歇性故障的容错可采用软件冗余方法,在实时嵌入式系统中采用软件容错时必须考虑任务的可调度性;而永久性故障则采用硬件冗余方法来解决.在此基础上,描述了一种实时双机嵌入式容错系统的模型,研究了构建容错系统需要解决的双机同步、故障检测及仲裁切换等关键问题和相应的解决方法.  相似文献   

12.
赵明旺 《自动化学报》1998,24(4):512-517
针对状态反馈闭环系统中的传感器故障容错控制问题,先基于稳定多项式分解导出该容错控制问题状态反馈闭环系统稳定的充分必要条件。在此基础上,基于相容非线性方程组数值解法,提出具有传感器故障容错控制的状态反馈律设计方法。还基于数值优化解方法,提出面向闭环系统极点配置的另一状态反馈容错控制律设计方法。计算机仿真算例表明此方法的有效性。  相似文献   

13.
同构计算环境中一种快速有效的静态任务调度算法   总被引:9,自引:1,他引:9  
快速有效的调度任务是多处理器计算环境中的一个关键问题.目前任务调度算法中刻画任务依赖关系最流行的模型是DAG,在以前的文献中,提出了一种新的更实际、更普遍的TTIG模型及其相应的MATE算法(基于同构计算环境).延伸了TTIG模型,并提出基于同构系统的新的算法及两种启发式方法(GBHA1和GBHA2).GBHA以组的形式尽量消除图中回路,因而能获得任务图的全局信息,具有更好的调度性能.在模拟实验中,将此算法与MATE和其他同构环境中基于DAG的有效调度算法,在不同测试条件下进行了比较,结果显示GBHA在性能上明显优于MATE,与基于DAG模型的调度算法比较而言,在性能方面各有千秋,但在算法时间复杂度方面具有显著的优势.  相似文献   

14.
All existing fault-tolerance job scheduling algorithms for computational grids were proposed under the assumption that all sites apply the same fault-tolerance strategy. They all ignored that each grid site may have its own fault-tolerance strategy because each site is itself an autonomous domain. In fact, it is very common that there are multiple fault-tolerance strategies adopted at the same time in a large-scale computational grid. Various fault-tolerance strategies may have different hardware and software requirements. For instance, if a grid site employs the job checkpointing mechanism, each computation node must have the following ability. Periodically, the computational node transmits the transient state of the job execution to the server. If a job fails, it will migrate to another computational node and resume from the last stored checkpoint. Therefore, in this paper we propose a genetic algorithm for job scheduling to address the heterogeneity of fault-tolerance mechanisms problem in a computational grid. We assume that the system supports four kinds fault-tolerance mechanisms, including the job retry, the job migration without checkpointing, the job migration with checkpointing, and the job replication mechanisms. Because each fault-tolerance mechanism has different requirements for gene encoding, we also propose a new chromosome encoding approach to integrate the four kinds of mechanisms in a chromosome. The risk nature of the grid environment is also taken into account in the algorithm. The risk relationship between jobs and nodes are defined by the security demand and the trust level. Simulation results show that our algorithm has shorter makespan and more excellent efficiencies on improving the job failure rate than the Min–Min and sufferage algorithms.  相似文献   

15.
We consider real-time systems in highly safety context where tasks have to meet strict deadlines. Tasks are periodic, may have offsets, share critical resources and be precedence constrained. Off-line scheduling should be of great help for such systems, but methods proposed in the literature cannot deal with them. Our aim is to extend and improve the well-known cyclicity result of Leung and Merill to every scheduling algorithm and to systems of interacting tasks with offsets. One of the main benefit of our result is to enable the use of off-line scheduling methods for those real-time critical systems.  相似文献   

16.
针对计算流体力学应用开发框架容错支持能力的不足,提出了一种新的容错周期优化方法。该方法基于系统故障的概率建模,计算得到理想最优容错周期;并结合计算流体力学应用场数据输出的特点,在线确定实际检查点备份时机。三个典型应用的实验结果表明,在不同平均无故障时间的系统上,与固定时间步进行容错的方法相比,该方法总能够得到最优的容错开销。用户可以基于该方法通过框架接口便捷地设置容错周期,并有效降低容错所引起的开销。  相似文献   

17.
A task migration method is proposed for energy savings in multiprocessor real-time systems. The method is based on the portioned scheduling technique which classifies each task as a fixed task or a migratable task. The basic task migration problem with specific parameters is formulated as a linear programming problem to minimize average power. Then, the method is extended to more general case with a complete migration algorithm. Moreover, a scheduling algorithm is proposed for migratable tasks. Simulation results on two processor models demonstrated significant energy savings over existing methods.  相似文献   

18.
Real-time systems (RTS) are those whose correctness depends on satisfying the required functional as well as the required temporal properties. Due to the criticality of such systems, recovery from faults is an essential part of a RTS. In many systems, such as those supporting space applications, single event upsets (SEUs) are the prevalent type of faults; SEUs are transient faults and affect a single task at a time. We present a scheme to guarantee that the execution of real-time tasks can tolerate SEUs and intermittent faults assuming any queue-based scheduling technique. Three algorithms are presented to solve the problem of adding fault tolerance to a queue of real-time tasks by reserving sufficient slack in a schedule so that recovery can be carried out before the task deadline without compromising guarantees given to other tasks. The first algorithm is a dynamic programming optimal solution, the second is a linear-time heuristic for scheduling dynamic tasks, and the third algorithm comprises extensions to address queues with gaps between tasks (gaps are caused by precedence, resource, or timing constraints). We show through simulations that the heuristics closely approximate the optimal algorithm. Finally, we describe the implementation of the modified admission control algorithm, non-preemptive scheduler, and recovery mechanism in the FT-RT-Mach operating system.  相似文献   

19.
Tasks in hard real-time systems are required to meet preset deadlines, even in the presence of transient faults, and hence the analysis of worst-case finish time (WCFT) must consider the extra time incurred by re-executing tasks that were faulty. Existing solutions can only estimate WCFT and usually result in significant under- or over-estimation. In this work, we conclude that a sufficient and necessary condition of a task set experiencing its WCFT is that its critical task incurs all expected transient faults. A method is presented to identify the critical task and WCFT in O(|V | + |E|) where |V | and |E| are the number of tasks and dependencies between tasks, respectively. This method finds its application in testing the feasibility of directed acyclic graph (DAG) based task sets scheduled in a wide variety of fault-prone multi-processor systems, where the processors could be either homogeneous or heterogeneous, DVS-capable or DVS-incapable, etc. The common practices, which require the same time complexity as the proposed critical-task method, could either underestimate the worst case by up to 25%, or overestimate by 13%. Based on the proposed critical-task method, a simulated-annealing scheduling algorithm is developed to find the energy efficient fault-tolerant schedule for a given DAG task set. Experimental results show that the proposed critical-task method wins over a common practice by up to 40% in terms of energy saving.  相似文献   

20.
A scientific workflow, usually consists of a good mix of fine and coarse computational granularity tasks displaying varied runtime requirements. It has been observed that fine grained tasks incur more scheduling overhead than their execution time, when executed on widely distributed platforms. Task clustering is extensively used, in such situations, as a runtime optimization method which involves combining multiple short duration tasks into a cluster, to be scheduled on a single resource. This helps in minimizing the scheduling overheads of the fine grained tasks. However, tasks grouping curtails the degree of parallelism and hence needs to be done optimally. Though a number of task clustering techniques have been developed to reduce the impact of system overheads, they fail to identify the appropriate number of clusters at each level of workflow in order to achieve maximum possible parallelism. This work proposes a level based autonomic Workflow-and-Platform Aware (WPA) task clustering technique which takes into consideration both; the workflow structure and the underlying resource set size for task clustering. It aims to achieve maximum possible parallelism among the tasks at a level of a workflow while minimizing the system overheads and resource wastage. A comparative study with current state of the art task clustering approaches on four well-known scientific workflows show that the proposed method significantly reduces the overall workflow execution time and at the same time is able to consolidate the load onto minimum possible resources.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号