期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Design and Verification of Distributed Recovery Blocks with CSP

W.L. Yeung S.A. Schneider 《Formal Methods in System Design》2003,22(3):225-248

A case study on the application of Communicating Sequential Processes (CSP) to the design and verification of fault-tolerant real-time systems is presented. The distributed recovery block (DRB) scheme is a design technique for the uniform treatment of hardware and software faults in real-time systems. Through a simple fault-tolerant real-time system design using the DRB scheme, the case study illustrates a paradigm for specifying fault-tolerant software and demonstrates how the different behavioural aspects of a fault-tolerant real-time system design can be separately and systematically specified, formulated, and verified using an integrated set of formal techniques based on CSP. 相似文献

2.

PLR: A Software Approach to Transient Fault Tolerance for Multicore Architectures

Shye Alex Blomstedt Joseph Moseley Tipp Reddi Vijay Janapa Connors Daniel A. 《Dependable and Secure Computing, IEEE Transactions on》2009,6(2):135-148

Transient faults are emerging as a critical concern in the reliability of general-purpose microprocessors. As architectural trends point toward multicore designs, there is substantial interest in adapting such parallel hardware resources for transient fault tolerance. This paper presents process-level redundancy (PLR), a software technique for transient fault tolerance, which leverages multiple cores for low overhead. PLR creates a set of redundant processes per application process and systematically compares the processes to guarantee correct execution. Redundancy at the process level allows the operating system to freely schedule the processes across all available hardware resources. PLR uses a software-centric approach to transient fault tolerance, which shifts the focus from ensuring correct hardware execution to ensuring correct software execution. As a result, many benign faults that do not propagate to affect program correctness can be safely ignored. A real prototype is presented that is designed to be transparent to the application and can run on general-purpose single-threaded programs without modifications to the program, operating system, or underlying hardware. The system is evaluated for fault coverage and performance on a four-way SMP machine and provides improved performance over existing software transient fault tolerance techniques with a 16.9 percent overhead for fault detection on a set of optimized SPEC2000 binaries. 相似文献

3.

A system architecture for fault tolerance in concurrent software

Ancona M. Dodero G. Gianuzzi V. Clematis A. Fernandez E.B. 《Computer》1990,23(10):23-32

A system architecture called the recovery metaprogram (RMP) is proposed. It separates the application from the recovery software, giving programmers a single environment that lets them use the most appropriate fault-tolerance scheme. To simplify the presentation of the RMP approach, it is assumed that the fault model is limited to faults originating in the application software, and that the hardware and kernel layers can mask their own faults from the RMP. Also, relationships between backward and forward error recovery are not considered. Some RMP examples are given, and a particular RMP implementation is described 相似文献

4.

A cost effective fault-tolerant scheme for RAIDs 总被引：1，自引：0，他引：1

下载免费PDF全文

方粮卢锡城《计算机科学技术学报》2003,18(2):0-0

The rapid progress in mass storage technology has made it possible for designers to implement large data storage systems for a variety of applications.One of the efficient ways to build large storage systems is to use RAIDs only when one error occurs .But in large RAIDs systems ,the fault probability will increase when the number of disks increases ,and the use of disks with big storage capacity will cause the recovering time to prolong,thus the probability of the second disk‘‘‘‘‘‘‘‘s fault will incerease Therefore,it is necessary to develop methods to recover data when two or more errors have occurred In this paper,a fault tolerant scheme is proposed based on extended Reed-Solomon code,a recovery procedure is designed to correct up to two errors which is implemented by software and hardware together,and the scheme is verified by computer simulation,In this scheme,only two redundant disks are used to recover up to two disks‘‘‘‘‘‘‘‘ fault .The encoding and decoding methods,and the implementation based on software and hardware are described.The application of the scheme in software RAIDs that are builit in cluster computers are also described .Compared with the existing methods such as EVENODD and DH ,the proposed scheme has distinct improvement in implementation and redundancy. 相似文献

5.

Combining mapping and partitioning exploration for NoC-based embedded systems

Sébastien Le Beux Guy Bois Gabriela Nicolescu Youcef Bouchebaba Michel Langevin Pierre Paulin 《Journal of Systems Architecture》2010,56(7):223-232

Networks on Chip (NoC) have emerged as the key paradigm for designing a scalable communication infrastructure for future Systems on Chip (SoC). An important issue in NoC design is how to map an application on this architecture and how to determine the hardware/software partition that satisfies the performance, cost and flexibility requirements. In this paper, we propose an approach that concurrently optimizes the mapping and the partitioning of streaming applications. The proposed approach exploits multiobjective evolutionary algorithms that are fed by execution performances scores corresponding to the evaluated mappings and partitioning ability to pipeline execution of the streaming application. As result, most promising solutions are highlighted for mapping multimedia applications onto a SoC architecture interconnecting 16 nodes through 2D-Mesh and Ring NoC. 相似文献

6.

一种支持容错的任务并行程序设计模型

王一拙陈旭计卫星苏岩王小军石峰《软件学报》2016,27(7):1789-1804

任务并行程序设计模型已成为并行程序设计的主流,其通过发掘任务并行性来提高并行计算机的系统性能.提出一种支持容错的任务并行程序设计模型,将容错技术融入到任务并行程序设计模型中,在保证性能的同时提高系统可靠性.该模型以任务为调度、执行、错误检测与恢复的基本单位,在应用级实现容错支持.采用一种Buffer-Commit计算模型支持瞬时错误的检测与恢复;采用应用级无盘检查点实现节点故障类型永久错误的恢复;采用一种支持容错的工作窃取任务调度策略获得动态负载均衡.实验结果表明,该模型以较低的性能开销提供了对硬件错误的容错支持. 相似文献

7.

A static mapping heuristics to map parallel applications to heterogeneous computing systems

Ranieri Baraglia Renato Ferrini Pierluigi Ritrovato 《Concurrency and Computation》2005,17(13):1579-1605

In order to minimize the execution time of a parallel application running on a heterogeneously distributed computing system, an appropriate mapping scheme is needed to allocate the application tasks to the processors. The general problem of mapping tasks to machines is a well‐known NP‐hard problem and several heuristics have been proposed to approximate its optimal solution. In this paper we propose a static graph‐based mapping algorithm, called Heterogeneous Multi‐phase Mapping (HMM), which permits suboptimal mapping of a parallel application onto a heterogeneous computing distributed system by using a local search technique together with a tabu search meta‐heuristic. HMM allocates parallel tasks by exploiting the information embedded in the parallelism forms used to implement an application, and considering an affinity parameter, that identifies which machine in the heterogeneous computing system is most suitable to execute a task. We compare HMM with some leading techniques and with an exhaustive mapping algorithm. We also give an example of mapping of two real applications using HMM. Experimental results show that HMM performs well demonstrating the applicability of our approach. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

8.

Hardware and software for intelligent robotic systems

Kimon P. Valavanis Peter H. Yuan 《Journal of Intelligent and Robotic Systems》1989,1(4):343-373

A hardware and software methodology for the design of the three interactive levels of intelligent robotic systems is proposed. The organization level is modeled as an expert system, the coordination level as a loosely coupled parallel processing system and the execution level as a series of specific hardware components which execute specific tasks. Microprocessor-based configurations and discrete logic design techniques are utilized for the overall system hardware configuration. The proposed methodology, does not violate the system hierarchical structure. A case study demonstrates the feasibility of the approach. 相似文献

9.

传感器网络中协作任务的实时调度

胡侃刘云生《计算机科学》2007,34(10):65-69

在传感器网络实时监测应用中,大量传感器散布在监测区域中感知监测域的各种环境或监测对象的信息,一组功能有限的传感器往往相互协作地完成一个大的实时感知任务,协作性是传感器网络的重要特性,它要求实时任务之间的资源共享。单纯的实时系统为保证任务的实时性通常采用资源隔离机制而不能很好地解决传感器网络环境中的采集流数据处理的协作性问题。本文基于服务器的调度框架,使用了实时环境中的时间属性,将数据时间与程序时间相结合,从而将应用语义与系统中运行的程序相联系,提出了一种基于时间依赖关系的实时调度模式,并给出了基于此模式的事件驱动并发数据流程图模型及其实现机制。分析表明,该模型能有效地解决传感器网络监测区域中采集流数据处理过程的协作性问题,减少了数据丢失,提高了系统响应的实时性。相似文献

10.

Error detection and diagnosis for fault tolerance in distributed systems

Kassem Saleh Khaled Al-Saqabi 《Information and Software Technology》1998,39(14-15)

The early error detection and the understanding of the nature and conditions of an error occurrence can be useful to make an effective and efficient recovery in distributed systems. Various distributed system extensions were introduced for the implementation of fault tolerance in distributed software systems. These extensions rely mainly on the exchange of contextual information appended to every transmitted application specific message. Ideally, this information should be used for checkpointing, error detection, diagnosis and recovery should a transient failure occur later during the distributed program execution. In this paper, we present a generalized extension suitable for fault-tolerant distributed systems such as communication software systems and its detection capabilities are shown. Our extension is based on the execution of message validity test prior to the transmission of messages and the piggybacking of contextual information to facilitate the detection and diagnosis of transient faults in the distributed system. 相似文献

11.

Fault-Tolerant Rate-Monotonic Scheduling 总被引：11，自引：0，他引：11

Ghosh Sunondo Melhem Rami Mossé Daniel Sarma Joydeep Sen 《Real-Time Systems》1998,15(2):149-181

Due to the critical nature of the tasks in hard real-time systems, it is essential that faults be tolerated. In this paper, we present a scheme which can be used to tolerate faults during the execution of preemptive real-time tasks. We describe a recovery scheme which can be used to re-execute tasks in the event of single and multiple transient faults and discuss conditions that must be met by any such recovery scheme. We then extend the original Rate Monotonic Scheduling (RMS) scheme and the exact characterization of RMS to provide tolerance for single and multiple transient faults. We derive schedulability bounds for sets of real-time tasks given the desired level of fault tolerance for each task or subset of tasks. Finally, we analyze and compare those bounds with existing bounds for non-fault-tolerant and other variations of RMS. 相似文献

12.

Hybrid Approach to Faster Functional Verification with Full Visibility

Chin-Lung Chuang Wei-Hsiang Cheng Liu C.-N.J. Dong-Jung Lu 《Design & Test of Computers, IEEE》2007,24(2):154-162

For functional verification, software simulation provides full controllability and observability, whereas hardware emulation offers speed. This article describes a new platform that leverages the advantages of both. This platform implements an efficient scheme to record the internal behavior of an FPGA emulator and replay the relevant segment of a simulation in a software environment for debugging. Experimental results show an order-of-magnitude savings in debugging time compared to a software-only simulation approach. 相似文献

13.

Automatic task mapping and heterogeneity-aware fault tolerance: The benefits for runtime optimization and application development

《Journal of Systems Architecture》2015,61(10):628-638

The best mapping of a task to one or more processing units in a heterogeneous system depends on multiple variables. Several approaches based on runtime systems have been proposed that determine the best mapping under given circumstances automatically. Some of them also consider dynamic events like varying problem sizes or resource competition that may change the best mapping during application runtime but only a few even consider that task execution may fail. While aging or overheating are well-known causes for sudden faults, the ongoing miniaturization and the growing complexity of heterogeneous computing are expected to create further threats for successful application execution. However, if properly incorporated, heterogeneous systems also offer the opportunity to recover from different types of faults in hardware as well as in software. In this work, we propose a combination of both topics, dynamic performance-oriented task mapping and dependability, to leverage this opportunity. As we will show, this combination not only enables tolerating faults in hardware and software with minor assistance of the developer, it also provides benefits for application development itself and for application performance in case of faults due to a new metric and automatic data management. 相似文献

14.

A semi-static approach to mapping dynamic iterative tasks onto heterogeneous computing systems

《Journal of Parallel and Distributed Computing》2006,66(1):77-98

Minimization of the execution time of an iterative application in a heterogeneous parallel computing environment requires an appropriate mapping scheme for matching and scheduling the subtasks of a given application onto the processors. Often, some of the characteristics of the application subtasks are unknown a priori or change from iteration to iteration during execution-time based on the inputs being processed. In such a scenario, it may not be feasible to use the same off-line-derived mapping for each iteration of the application. One possibility is to employ a semi-static methodology that starts with an initial mapping but dynamically performs remapping between application iterations by observing the effects of the changing characteristics of the application's input data, called dynamic parameters, on the application's execution time. A contribution in this paper is to implement and evaluate a semi-static methodology involving the on-line use of off-line-derived mappings. The off-line phase is based on a genetic algorithm (GA) to generate high-quality mappings for a range of values for the dynamic parameters. A dynamic parameter space partitioning and sampling scheme is proposed that partitions the parameter space into a number of hyper-rectangles, within which the “best” mapping for each hyper-rectangle is stored in a mapping table. During the on-line phase, the actual dynamic parameters are observed and the off-line-derived mapping table is referenced to choose the most suitable mapping. Experimental results indicate that the semi-static approach outperforms a dynamic on-line approach and performs reasonably close to an infeasible on-line GA approach. Furthermore, the semi-static approach considerably outperforms the method of using the same mapping for all iterations. 相似文献

15.

Software fault tolerance in real-time systems

《Information Sciences》1987,42(3):255-282

The paper proposes a technique for providing software fault tolerance in real-time applications demanding fast response and a high degree of reliability. It is shown that a reasonably flexible interprocess communication can be supported with only a small increase in complexity and overhead. The two most prominent features of the proposed scheme are (1) it attempts to exploit fault-avoidance techniques as much as possible to reduce the overhead of fault tolerance and (2) it controls the propagation of errors so as to enable efficient recovery. Formal proofs of the system operation are developed. Besides showing that the scheme works as expected, the arguments serve to highlight the assumptions needed for provably correct operation. Some issues relating to hardware fault tolerance are also considered. 相似文献

16.

智能卡中抗高阶功耗攻击AES算法实现技术

在提取AES算法中各种变换的公共操作的基础上定义相应的基于随机掩码的原子操作并以硬件方式实现。将AES算法中各种变换转换为随机掩码原子操作的序列密码算法运算过程中使用不同的随机量对所有中间结果进行掩码。在此基础上以软硬件结合的方式实现AES算法掩码的原子操作以硬件方式实现而运算流程控制以软件方式实现并且结合运算流程随机化技术进一步提高实现的安全性。安全分析表明这种实现技术可以抗一阶功耗攻击和高阶功耗攻击。结果表明所提出的AES算法实现方法的硬件实现开销较小并且具有高安全性适合于智能卡实现。《计算机工程与科学》2009,31(12):16-19

在提取AES算法中各种变换的公共操作的基础上,定义相应的基于随机掩码的原子操作,并以硬件方式实现。将AES算法中各种变换转换为随机掩码原子操作的序列,密码算法运算过程中使用不同的随机量对所有中间结果进行掩码。在此基础上,以软硬件结合的方式实现AES算法,掩码的原子操作以硬件方式实现,而运算流程控制以软件方式实现;并且结合运算流程随机化技术进一步提高实现的安全性。安全分析表明,这种实现技术可以抗一阶功耗攻击和高阶功耗攻击。实验结果表明,所提出的AES算法实现方法的硬件实现开销较小,并且具有高安全性,适合于智能卡实现。相似文献

17.

Supporting model-driven development using a process-centered software engineering environment

Rita Suzana Pitangueira Maciel Ramon Araújo Gomes Ana Patrícia Magalhães Bruno C. Silva João Pedro B. Queiroz 《Automated Software Engineering》2013,20(3):427-461

The adoption of Model-Driven Development (MDD) is increasing and it is widely recognized as an important approach for building software systems. In addition to traditional development process models, an MDD process requires the selection of metamodels and mapping rules for the generation of the transformation chain which produces models and application code. In this context, software process tasks should be performed in a specific sequence, with the correct input artifacts to produce the output ones. However, existing support tools and transformation engines for MDD do not have a process-centered focus that addresses different kinds of software process activities, such as application modeling and testing to guide the developers. Furthermore, they do not enable process modeling nor the (semi) automated execution of activities during process enactment. The MoDErNE (Model Driven Process-Centered Software Engineering Environment) uses process-centered software engineering environment concepts to improve MDD process specification and enactment by using a metamodeling foundation. In MoDErNE, a software process model may be enacted several times in different software projects. This paper details the MoDErNE environment, its approach and architecture and also the case studies through which the tool was evaluated. 相似文献

18.

Fault tolerant permutation mapping in multistage interconnection network

《Journal of Systems Architecture》2000,46(3):297-300

An efficient scheme for fault tolerant mapping of permutations is designed. The proposed algorithm uses extra passes through the network, instead of additional hardware. 相似文献

19.

基于可重构阵列的CNN数据量化方法

朱家扬蒋林李远成宋佳刘帅《计算机应用研究》2024,41(4):1070-1076

针对卷积神经网络(CNN)模型中大量卷积操作,导致网络规模大幅增加,从而无法部署到嵌入式硬件平台,以及不同粒度数据与底层硬件结构不协调导致计算效率低的问题,基于项目组开发的可重构阵列处理器,面向支持多种位宽的运算单元,通过软硬件协同和可重构计算方法,采用KL(Kullback-Leibler)散度自定义量化阈值和随机取整进行截断处理的方式,寻找参数定长的最佳基点位置,设计支持多种计算粒度并行操作的指令及其卷积映射方案,并以此实现三种不同位宽的动态数据量化。实验结果表明,将权值与特征图分别量化到8 bit可以在准确率损失2%的情况下将模型压缩为原来的50%左右;将测试图像量化到三种位宽下进行硬件测试的加速比分别达到1.012、1.273和1.556,最高可缩短35.7%的执行时间和降低56.2%的访存次数,同时仅带来不足1%的相对误差,说明该方法可以在三种量化位宽下实现高效率的神经网络计算,进而达到硬件加速和模型压缩的目的。相似文献

20.

Implementation of the conversation scheme in message-baseddistributed computer systems

Yang S.-M. Kim K.H. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(5):555-572

Several different approaches for implementing conversations in message-based distributed computer systems (DCSs) are discussed. Two different exit control strategies (synchronous and asynchronous) and three different approaches to execution of the conversation acceptance test (centralized, decentralized, and semicentralized) are examined and compared in terms of system performance and implementation cost. An efficient approach to run-time management of recovery information based on an extension of the recovery cache scheme is also discussed. The two major types of conversation structures, name-linked recovery block and abstract data type conversations, are examined to analyze which execution approaches are the most efficient for each conversation structure. As a case study, an unmanned vehicle system is used to illustrate how the approaches can be used in a realistic real-time application 相似文献