期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张智超桑楠熊光泽《计算机工程与设计》2004,25(4):508-511,602

时间冗余作为容错的重要手段被广泛应用于安全关键实时系统中。传统容错调度算法为失败任务的重运行(Re-execute)预留了大量的空闲时间，但是重运行的使用会降低系统的资源利用率。提出了一种基于检查点机制的容错调度算法CP-PRA，通过降低错误恢复需要的时间，可以有效地提高系统的资源利用率。给出了该算法的可调度奈件，并证明了其算法的正确性。相似文献

2.

A nonpreemptive real-time scheduler with recovery from transient faults and its implementation

Mosse D. Melhem R. Sunondo Ghosh 《IEEE transactions on pattern analysis and machine intelligence》2003,29(8):752-767

Real-time systems (RTS) are those whose correctness depends on satisfying the required functional as well as the required temporal properties. Due to the criticality of such systems, recovery from faults is an essential part of a RTS. In many systems, such as those supporting space applications, single event upsets (SEUs) are the prevalent type of faults; SEUs are transient faults and affect a single task at a time. We present a scheme to guarantee that the execution of real-time tasks can tolerate SEUs and intermittent faults assuming any queue-based scheduling technique. Three algorithms are presented to solve the problem of adding fault tolerance to a queue of real-time tasks by reserving sufficient slack in a schedule so that recovery can be carried out before the task deadline without compromising guarantees given to other tasks. The first algorithm is a dynamic programming optimal solution, the second is a linear-time heuristic for scheduling dynamic tasks, and the third algorithm comprises extensions to address queues with gaps between tasks (gaps are caused by precedence, resource, or timing constraints). We show through simulations that the heuristics closely approximate the optimal algorithm. Finally, we describe the implementation of the modified admission control algorithm, non-preemptive scheduler, and recovery mechanism in the FT-RT-Mach operating system. 相似文献

3.

A Framework for Software Fault Tolerance in Real-Time Systems

《IEEE transactions on pattern analysis and machine intelligence》1983,(3):355-364

Real-time systems often have very high reliability requirements and are therefore prime candidates for the inclusion of fault tolerance techniques. In order to provide tolerance to software faults, some form of state restoration is usually advocated as a means of recovery. State restoration can be expensive and the cost is exacerbated for systems which utilize concurrent processes. The concurrency present in most real-time systems and the further difficulties introduced by timing constraints suggest that providing tolerance for software faults may be inordinately expensive or complex. We believe that this need not be the case, and propose a straightforward pragmatic approach to software fault tolerance'which is believed to be applicable to many real-time systems. The approach takes advantage of the structure of real-time systems to simplify error recovery, and a classification scheme for errors is introduced. Responses to each type of error are proposed which allow service to be maintained. 相似文献

4.

Fault-tolerance through scheduling of aperiodic tasks in hardreal-time multiprocessor systems

Ghosh S. Melhem R. Mosse D. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(3):272-284

Real time systems are being increasingly used in several applications which are time critical in nature. Fault tolerance is an important requirement of such systems, due to the catastrophic consequences of not tolerating faults. We study a scheme that provides fault tolerance through scheduling in real time multiprocessor systems. We schedule multiple copies of dynamic, aperiodic, nonpreemptive tasks in the system, and use two techniques that we call deallocation and overloading to achieve high acceptance ratio (percentage of arriving tasks scheduled by the system). The paper compares the performance of our scheme with that of other fault tolerant scheduling schemes, and determines how much each of deallocation and overloading affects the acceptance ratio of tasks. The paper also provides a technique that can help real time system designers determine the number of processors required to provide fault tolerance in dynamic systems. Lastly, a formal model is developed for the analysis of systems with uniform tasks 相似文献

5.

DFTS: A dynamic fault-tolerant scheduling for real-time tasks in multicore processors

Mohammad H. Mottaghi Hamid R. Zarandi 《Microprocessors and Microsystems》2014

This paper presents a dynamic scheduling for real-time tasks in multicore processors to tolerate single and multiple transient faults. The scheduling is performed based on three important issues: (1) current released tasks, (2) current available processor cores, and (3) consideration of the number of faults and their occurrences. Using tasks utilization along with a defined criticality threshold in the proposed scheduling method, current ready tasks are divided into critical- and noncritical ones. Based on whether a task is critical or noncritical, an appropriate fault-tolerance policy is exploited. Moreover, scheduling decisions are made to fulfill two key goals: (1) increasing scheduling feasibility and (2) decreasing the total tasks execution time. Several simulation experiments are carried out to compare the proposed method with two well-known methods, called checkpointing with rollback recovery and hardware replication. Experimental results reveal that in the presence of multiple transient faults, the feasibility rate of the proposed method is considerably higher than the other well-known fault-tolerance methods. Moreover, the average timing overhead of this method is lower than the traditional methods. 相似文献

6.

Error Recovery in a Real-Time Multiprocessor System

下载免费PDF全文

Li Weihua Yuan Youguang 《计算机科学技术学报》1992,7(1):83-87

In this paper,a new scheme for recovering errors due to transient faults in a real-time multiprocessor system is presented.The scheme,called dynamic redundancy at the task level,is implemented in a real-time multitasking environment,Utilizing the facilities in the operating system,the scheme makes backup tasks for the primary tasks as redundancy.The paper introdues an algorithm to generate a fault tolerant schedule for the tasks so that they recover errors as retry of checkpointing does.A reliability model is proposed to evahuste the effectiveness of the scheme. 相似文献

7.

Robust finite mixture regression for heterogeneous targets

Jian Liang Kun Chen Ming Lin Changshui Zhang Fei Wang 《Data mining and knowledge discovery》2018,32(6):1509-1560

Finite Mixture Regression (FMR) refers to the mixture modeling scheme which learns multiple regression models from the training data set. Each of them is in charge of a subset. FMR is an effective scheme for handling sample heterogeneity, where a single regression model is not enough for capturing the complexities of the conditional distribution of the observed samples given the features. In this paper, we propose an FMR model that (1) finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously, (2) achieves shared feature selection among tasks and cluster components, and (3) detects anomaly tasks or clustered structure among tasks, and accommodates outlier samples. We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework. The proposed model is evaluated on both synthetic and real-world data sets. The results show that our model can achieve state-of-the-art performance. 相似文献

8.

容错优先级混合式分配搜索算法 总被引：1，自引：0，他引：1

李俊曹万华阳富民涂刚卢炎生罗威《计算机研究与发展》2007,44(11):1912-1919

在实时系统中,由于任务未能及时产生正确结果将导致灾难性后果,容错对于实时系统的有效性及可靠性至关重要.基于最坏响应时间计算的可调度性分析,提出了一种容错优先级混合式分配搜索算法.这种算法通过允许替代任务既能运行在高优先级别上,又可运行在低优先级别上,有效地提高了系统的容错能力.通过实验测试,与目前所知的同类算法相比,在提高系统容错能力方面更为有效. 相似文献

9.

A distributed recovery block approach to fault-tolerant executionof application tasks in hypercubes

Kim K.H. Kavianpour A. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(1):104-111

An approach to fault-tolerant execution of real-time application tasks in hypercubes is proposed. The approach is based on the distributed recovery block (DRB) scheme and does not require special hardware mechanisms in support of fault tolerance. Each task is assigned to a pair of processors forming a DRB computing station for execution in a dual-redundant and self-checking mode. Assignment of all tasks in an application in such a form is called the full DRB mapping. The DRB scheme was developed as an approach to uniform treatment of hardware and software faults with the effect of fast forward recovery. However, if the system developer is concerned with hardware fault possibilities only, then forming DRB stations becomes a mechanical process not burdening the application software designer in any way. A procedure for converting an efficient nonredundant task-to-processor mapping into an efficient full DRB mapping is presented 相似文献

10.

Real-Time Model-Based Fault Detection and Isolation for UGVs

A. Monteriù P. Asthana K. P. Valavanis S. Longhi 《Journal of Intelligent and Robotic Systems》2009,56(4):425-439

The paper presents a model-based sensor fault detection and isolation system applied in real-time to unmanned ground vehicles. Structural analysis is applied on the nonlinear model of the vehicle for building the residual generation module, followed by an ad-hoc residual evaluation module for detecting single and multiple sensor faults. The overall proposed diagnosis scheme has been tested in real-time on a real mobile robot in an outdoors environment and for different tasks. The obtained experimental results are satisfactory in terms of diagnosis performance and real-time implementation. 相似文献

11.

Design of fault-tolerant large-scale VOD servers: With emphasis onhigh-performance and low-cost

Golubchik L. Muntz R.R. Cheng-Fu Chou Berson S. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(4):363-386

Recent technological advances in digital signal processing, data compression techniques, and high-speed communication networks have made Video-on-Demand (VOD) servers feasible. A challenging task in such systems is servicing multiple clients simultaneously while satisfying real-time requirements of continuous delivery of objects at specified rates. To accomplish these tasks and realize economies of scale associated with servicing a large user population, a VOD server requires a large disk subsystem. Although a single disk is fairly reliable, a large disk farm can have an unacceptably high probability of disk failure. Furthermore, due to real-time constraints, the reliability requirements of VOD systems are even more stringent than those of traditional information systems. Traditional RAID solutions are inadequate due to poor resource usage. Thus, in this paper, we present alternative schemes which provide a high degree of reliability at low disk storage, bandwidth, and memory costs for on-demand multimedia servers. Moreover, we discuss some of the main issues and trade-offs associated with providing fault tolerance in multidisk VOD systems. We would like to impress upon the reader that one of the main points of this paper is the exposition of trade-offs and issues associated with designing fault-tolerant VOD servers. It is not the case that one fault tolerance scheme is absolutely better than another, but rather that one must understand the trade-offs as well as one's system constraints and then choose a fault tolerance scheme accordingly 相似文献

12.

Global scheduling based reliability-aware power management for multiprocessor real-time systems

Xuan Qi Dakai Zhu Hakan Aydin 《Real-Time Systems》2011,47(2):109-142

Reliability-aware power management (RAPM) has been a recent research focus due to the negative effects of the popular power management technique dynamic voltage and frequency scaling (DVFS) on system reliability. As a result, several RAPM schemes have been studied for uniprocessor real-time systems. In this paper, for a set of frame-based independent real-time tasks running on multiprocessor systems, we study global scheduling based RAPM (G-RAPM) schemes. Depending on how recovery blocks are scheduled and utilized, both individual-recovery and shared-recovery based G-RAPM schemes are investigated. An important dimension of the G-RAPM problem is how to select the appropriate subset of tasks for energy and reliability management (i.e., scale down their executions while ensuring that they can be recovered from transient faults). We show that making such decision optimally (i.e., the static G-RAPM problem) is NP-hard. Then, for the individual-recovery based approach, we study two efficient heuristics, which rely on local and global task selections, respectively. For the shared-recovery based approach, a linear search based scheme is proposed. The schemes are shown to guarantee the timing constraints. Moreover, to reclaim the dynamic slack generated at runtime from early completion of tasks and unused recoveries, we also propose online G-RAPM schemes which exploit the slack-sharing idea studied in previous work. The proposed schemes are evaluated through extensive simulations. The results show the effectiveness of the proposed schemes in yielding energy savings while simultaneously preserving system reliability and timing constraints. For the static version of the problem, the shared-recovery based scheme is shown to provide better energy savings compared to the individual-recovery based scheme, in virtue of its ability to leave more slack for DVFS. Moreover, by reclaiming the dynamic slack generated at runtime, online G-RAPM schemes are shown to yield better energy savings. 相似文献

13.

Online check and recovery techniques for dependable embeddedprocessors

Pflanz M. Vierhaus H.T. 《Micro, IEEE》2001,21(5):24-40

Efficient online check and fast recovery techniques for embedded systems aim to detect single or multiple errors within the same clock cycle in which they occur. It is argued that such techniques can enable fast error correction; detection of illegal states, micro- rollback for transient and permanent faults; and prioritized, controlled recovery 相似文献

14.

实时系统的多任务调度 总被引：13，自引：2，他引：13

刘怀胡继峰《计算机工程》2002,28(3):43-44,150

讨论了实时系统多任务的调度，对速率单调调度算法进行了改进，以便其能应用于具有非周期任务的实时系统，同时对系统的瞬时过载有一定的适应性。最后，给出了系统中任务可调度的条件。相似文献

15.

基于RMS调度周期、非周期混合任务集的一种新方法 总被引：3，自引：0，他引：3

谢拴勤牛云林文《计算机应用研究》2006,23(8):76-79

提出了一种利用速率单调（RMS）算法确定计算机实时系统中整个任务集优先级的新方法。该方法利用数理统计的规律克服了普通RMS算法只能对系统中周期任务进行有效调度而不能对系统中的非周期任务进行有效调度的局限,扩大了RMS算法的适用范围,简化了非周期任务的处理过程,减小了系统开销。利用该方法在先进飞机电气综合控制与管理系统中进行了整个任务集的可调度性测试、验证,并给出了任务集的实际调度的验证实例。相似文献

16.

Fault Tolerant Faddeeva Algorithm

《Journal of Parallel and Distributed Computing》1998,53(1):78-89

We present an algorithm based fault tolerant scheme suitable for array implementations of the Faddeeva algorithm. Our technique corrects errors due to multiple transient, intermittent, or permanent faults provided these are restricted to a single column of the array. We show how to find the location of the faulty column and to determine the correct Schur complement from the erroneous one. The fault recovery algorithm is of quadratic complexity in the number of rows of the input matrix while the hardware overhead is approximately four times the number of rows. 相似文献

17.

Measurement and analysis of workload effects on fault latency inreal-time systems

Woodbury M.H. Shin K.G. 《IEEE transactions on pattern analysis and machine intelligence》1990,16(2):212-216

The authors demonstrate the need to address fault latency in highly reliable real-time control computer systems. It is noted that the effectiveness of all known recovery mechanisms is greatly reduced in the presence of multiple latent faults. The presence of multiple latent faults increases the possibility of multiple errors, which could result in coverage failure. The authors present experimental evidence indicating that the duration of fault latency is dependent on workload. A synthetic work generator is used to vary the workload, and a hardware fault injector is applied to inject transient faults of varying durations. This method makes it possible to derive the distribution of fault latency duration. Experimental results obtained from the fault-tolerant multiprocessor at the NASA Airlab are presented and discussed 相似文献

18.

Dynamic FTSS in asynchronous systems: The case of unison

Swan Dubois Maria Potop-Butucaru Sébastien Tixeuil 《Theoretical computer science》2011,412(29):3418-3439

Distributed fault-tolerance can mask the effect of a limited number of permanent faults, while self-stabilization provides forward recovery after an arbitrary number of transient faults hit the system. FTSS (Fault-Tolerant Self-Stabilizing) protocols combine the best of both worlds since they tolerate simultaneously transient and (permanent) crash faults. To date, deterministic FTSS solutions either consider static (i.e. fixed point) tasks, or assume synchronous scheduling of the system components.In this paper, we present the first study of deterministic FTSS solutions for dynamic tasks in asynchronous systems, considering the unison problem as a benchmark. Unison can be seen as a local clock synchronization problem as neighbors must maintain digital clocks at most one time unit away from each other, and increment their own clock value infinitely often. We present several impossibility results for this difficult problem and propose an FTSS solution (when the problem is solvable) for the state model that exhibits optimal fault-containment. 相似文献

19.

实时嵌入式容错系统的关键技术研究

毛南黄岚王忠义刘志存《计算机工程与设计》2007,28(14):3433-3435,3439

简要回顾了容错技术的发展过程并分析了不同故障模型下系统的客错方式.对于瞬时故障、间歇性故障的容错可采用软件冗余方法,在实时嵌入式系统中采用软件容错时必须考虑任务的可调度性;而永久性故障则采用硬件冗余方法来解决.在此基础上,描述了一种实时双机嵌入式容错系统的模型,研究了构建容错系统需要解决的双机同步、故障检测及仲裁切换等关键问题和相应的解决方法. 相似文献

20.

Preference-oriented fixed-priority scheduling for periodic real-time tasks

《Journal of Systems Architecture》2016

Traditionally, real-time scheduling algorithms prioritize tasks solely based on their timing parameters and cannot effectively handle tasks that have different execution preferences. In this paper, for a set of periodic real-time tasks running on a single processor, where some tasks are preferably executed as soon as possible (ASAP) and others as late as possible (ALAP), we investigate Preference-Oriented Fixed-Priority (POFP) scheduling techniques. First, based on Audsley’s Optimal Priority Assignment (OPA), we study a Preference Priority Assignment (PPA) scheme that attempts to assign ALAP (ASAP) tasks lower (higher) priorities, whenever possible. Then, by considering the non-work-conserving strategy, we exploit the promotion times of ALAP tasks and devise an online dual-queue based POFP scheduling algorithm. Basically, with the objective of fulfilling the execution preferences of all tasks, the POFP scheduler retains ALAP tasks in the delay queue until their promotion times while putting ASAP tasks into the ready queue right after their arrivals. In addition, to further expedite (delay) the executions of ASAP (ALAP) tasks using system slack, runtime techniques based on dummy and wrapper tasks are investigated. The proposed schemes are evaluated through extensive simulations. The results show that, compared to the classical fixed-priority Rate Monotonic Scheduling (RMS) algorithm, the proposed priority assignment scheme and POFP scheduler can achieve significant improvement in terms of fulfilling the execution preferences of both ASAP and ALAP tasks, which can be further enhanced at runtime with the wrapper-task based slack management technique. 相似文献