期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reconfiguring one-time programmable FPGAs

《Micro, IEEE》1999,19(6):53-63

Field-programmable gate arrays can suffer from a variety of faults, ranging from wire anomalies and defects to inoperative programmable connections. The solution to these faults depends on whether or not we are dealing with a reprogrammable FPGA or a one time programmable (OTP) FPGA. To correct faults, developers can reconfigure FPGAs such as those made by Xilinx and Altera by reprogramming. These devices can be programmed many times, for different designs and applications. Correcting faults in OTP FPGAs, such as those made by Actel is more difficult. For one thing, OTP FPGAs are based on antifuses. With an antifuse, the FPGAs configuration information has an initial (default) value that can be changed, but once changed cannot be restored. Therefore, the procedures to bypass faulty cells or faulty routing in an OTP FPGA must meet more stringent requirements than for reprogrammable FPGAs. The “Reconfiguration Approaches” sidebar describes two methods other researchers have tried. This article describes our approach to reconfiguring OTP FPGAs. We explain how we determine if reconfiguration is feasible, the algorithms we used, and the results of our experiments on a generic OTP FPGA model and a generic detail router 相似文献

2.

Formal verification of fault tolerance in safety-critical reconfigurable modules

Jerker Hammarberg Simin Nadjm-Tehrani 《International Journal on Software Tools for Technology Transfer (STTT)》2005,7(3):268-279

Demands for higher flexibility in aerospace applications has led to increasing deployment of reconfiguarble modules. In several cases the industry is looking into Field Programmable Gate Arrays (FPGA) as a means of efficient adaption of existing components. This paper addresses the safety analysis issues for reconfigurable modules with an emphasis on FPGAs. FPGAs act as digital hardware but in the context of safety analysis they should be treated as software, i.e. with added demands on formal analysis. The contributions of this paper are twofold. First, we illustrate a development process using a language with formal semantics (Esterel) for design, formal verification of high-level design, and automatic code generation down to synthesizable VHDL. We argue that this process reduces the likelihood of systematic (permanent) faults in the design, and still produces VHDL code that may be of acceptable quality (size of FPGA, delay). Secondly, in a general approach that is equally applicable to other formal design languages, we illustrate how the effect of transient fault modes and faults in external modules can be formally studied. We modularly extended the component design model with fault models that represent specific or random faults (e.g. radiation leading to bit flips in the component under design), and transient or permanent faults in the rest of the environment. Some faults corrupt inputs to the component and others jeopardise the effect of output signals that control the environment. This process supports a formal version of Failure Modes and Effects Analysis (FMEA). The set-up is then used to formally determine which (single or multiple) fault modes cause violation of the top-level safety-related property, much in the spirit of fault-tree analyses (FTA). All of this is done with out building the fault tree and using a common model for design and for safety analyses. An aerospace hydraulic monitoring system is used to illustrate the analysis of fault tolerance . 相似文献

3.

Efficient dynamic priority based soft error mitigation techniques for configuration memory of FPGA hardware

《Microprocessors and Microsystems》2017

Radiation-induced single bit upsets (SBUs) and multi-bit upsets (MBUs) are more prominent in Field Programmable Gate Arrays (FPGAs) due to the presence of a large number of latches in the configuration memory (CM) of FPGAs. At the same time, SBUs and MBUs in the CM can permanently or temporarily affect the hardware circuit implemented on FPGA. Hence, error mitigation and recovery techniques are necessary to protect the FPGA hardware from permanent faults arising due to such SBUs and MBUs. Different existing techniques used to mitigate the effect of soft errors in FPGA have high overhead and their implementations are also quite complex. In this paper, we have proposed efficient single bit as well as multi-bit error correcting methods to correct errors in the CM of FPGAs using simple parity equations and Erasure code. These codes are easy to implement, and the needed decoding circuits are also simple. Use of Dynamic Partial Reconfiguration (DPR) along with a simple hardware scheduling algorithm based download manager helps to perform the error correction in the CM without suspending the operations of the other hardware blocks. We propose a first of its kind methodology for novel transient fault correction using efficient error correcting codes with hardware scheduling for FPGAs. To validate the design we have tested the proposed methodology with Kintex FPGA. We have also measured different parameters like fault recovery time, power consumption, resource overhead and error correction efficiency to estimate the performance of our proposed methods. 相似文献

4.

FPGA测试中故障屏蔽现象的解决方法

李雪莲赵建新《电脑开发与应用》2007,20(10):39-41

以Xilinx公司FPGA的理论结构为结构模型,提出了FPGA测试中的故障屏蔽现象,并给出该现象的产生原因和判断条件,在此基础上进一步提出相应的解决办法和建议,避免或减少了故障屏蔽现象的出现,提高了测试FPGA的故障覆盖率。相似文献

5.

Toward Increasing FPGA Lifetime 总被引：1，自引：0，他引：1

Srinivasan S. Krishnan R. Mangalagiri P. Yuan Xie Narayanan V. Irwin M.J. Sarpatwari K. 《Dependable and Secure Computing, IEEE Transactions on》2008,5(2):115-127

Field-Programmable Gate Arrays (FPGAs) have been aggressively moving to lower gate length technologies. Such a scaling of technology has an adverse impact on the reliability of the underlying circuits in such architectures. Various different physical phenomena have been recently explored and demonstrated to impact the reliability of circuits in the form of both transient error susceptibility and permanent failures. In this work, we analyze the impact of two different types of hard errors, namely, Time- Dependent Dielectric Breakdown (TDDB) and Electromigration (EM) on FPGAs. We also study the performance degradation of FPGAs over time caused by Hot-Carrier Effects (HCE) and Negative Bias Temperature Instability (NBTI). Each study is performed on the components of FPGAs most affected by the respective phenomena, from both the performance and reliability perspective. Different solutions are demonstrated to counter each failure and degradation phenomena to increase the operating lifetime of the FPGAs. 相似文献

6.

Dynamic scheduling of task graphs in multi-FPGA systems using critical path

Ramezani Reza 《The Journal of supercomputing》2021,77(1):597-618

SRAM-based FPGAs feature high performance and flexibility. Thus, they have found many applications in modern high-performance computing (HPC) systems. These systems suffer from the limitation of the computing resources problem for running HPC applications. Therefore, multi-FPGA systems have been emerged to alleviate such resource limitations. In this regard, efficient scheduling strategies are required to dynamically steer the execution of applications—represented as task graphs—on a set of connected FPGAs. In this paper, a heuristic-based dynamic critical path-aware scheduling technique named CPA is presented to schedule task graphs on multi-FPGA systems. The proposed technique, by considering the computation and communication capabilities of FPGAs, dynamically assigns priority to tasks in different steps in order to achieve better makespans. The proposed technique has been evaluated by conducting several experiments on real-world and three different shapes of random task graphs with different number of tasks, and its efficiency has been compared with that of three task graph scheduling approaches. The obtained results demonstrate that the proposed CPA technique outperforms well-known heuristic scheduling strategies and improves their makespan by 13.47% on average. In addition, the experiments show that the proposed technique generates the schedules in the order of milliseconds and the average of its yielded makespans is 12.05% longer than that of an optimum schedule.

相似文献

7.

一种向分支指令后插入冗余指令的容错微结构

张仕健胡伟武《计算机学报》2007,30(10):1674-1680

随着深亚微米工艺的广泛应用,瞬态故障已成为芯片失效的主要原因.文中提出了一种向分支指令后插入冗余指令的容错微结构,利用分支误预测浪费的处理带宽,降低了冗余执行导致的性能损失.实验结果表明,该技术的性能损失在6%～31%之间,平均为21%,明显低于MBI技术而和DIE技术的性能损失相当.该技术能够检测流水线上各阶段发生的瞬态故障并能恢复处理器状态,故障检测延时短,需要的硬件开销也较小,非常适合提高带有简单预测机制的嵌入式微处理器的容错能力. 相似文献

8.

Communication adaptive self-stabilizing group membership service

Dolev S. Schiller E. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(7):709-720

This paper presents the first (randomized) algorithm for implementing self-stabilizing group communication services in an asynchronous system. Our algorithm converges rapidly to legal behavior and is communication adaptive, namely, the communication volume is high when the system recovers from the occurrence of faults and is low once a legal state is reached. Communication adaptability is achieved by a new technique that combines transient fault detectors. 相似文献

9.

Performance evaluation and optimal design for FPGA-based digit-serial DSP functions

Hanho LeeAuthor Vitae Gerald E. SobelmanAuthor Vitae 《Computers & Electrical Engineering》2003,29(2):357-377

As field programmable gate array (FPGA) technology has steadily improved, FPGAs are now viable alternatives to other technology implementations for high-speed classes of digital signal processing (DSP) applications. Digit-serial DSP architectures have been effective implementation method for FPGAs. In this work, a method of implementing digit-serial DSP architectures on FPGAs is presented, and their performance is evaluated with the objective of finding and developing the most efficient digit-serial DSP architectures on FPGAs. This paper discusses area costs and operational delays of the various digit-serial DSP functions and presents the area/delay models on Xilinx XC4000-series FPGAs. These area/delay models can make predictions of performance and hardware resource utilization before a lengthy layout and synthesis process is undertaken. The results show that the area/delay models proposed here are valid and the digit-serial DSP designs are promising candidates for efficient FPGA implementations. 相似文献

10.

PLR: A Software Approach to Transient Fault Tolerance for Multicore Architectures

Shye Alex Blomstedt Joseph Moseley Tipp Reddi Vijay Janapa Connors Daniel A. 《Dependable and Secure Computing, IEEE Transactions on》2009,6(2):135-148

Transient faults are emerging as a critical concern in the reliability of general-purpose microprocessors. As architectural trends point toward multicore designs, there is substantial interest in adapting such parallel hardware resources for transient fault tolerance. This paper presents process-level redundancy (PLR), a software technique for transient fault tolerance, which leverages multiple cores for low overhead. PLR creates a set of redundant processes per application process and systematically compares the processes to guarantee correct execution. Redundancy at the process level allows the operating system to freely schedule the processes across all available hardware resources. PLR uses a software-centric approach to transient fault tolerance, which shifts the focus from ensuring correct hardware execution to ensuring correct software execution. As a result, many benign faults that do not propagate to affect program correctness can be safely ignored. A real prototype is presented that is designed to be transparent to the application and can run on general-purpose single-threaded programs without modifications to the program, operating system, or underlying hardware. The system is evaluated for fault coverage and performance on a four-way SMP machine and provides improved performance over existing software transient fault tolerance techniques with a 16.9 percent overhead for fault detection on a set of optimized SPEC2000 binaries. 相似文献

11.

Transient fault tolerance in digital systems 总被引：1，自引：0，他引：1

Sosnowski J. 《Micro, IEEE》1994,14(1):24-35

It is hard to shield systems effectively from transient faults (fault avoidance techniques). So some other means must be employed to assure appropriate levels of transient fault tolerance (insensitivity to transient faults). They are based on fault-masking and fault recovery ideas. Having analyzed this problem, the author identifies critical design points and outlines some practical solutions that refer to efficient on-line detectors (detecting errors during the system operation) and error handling procedures. This framework provides a basis for understanding transient fault problems in digital systems. It can be helpful in selecting optimum techniques to mask or eliminate transient fault effects in developed systems 相似文献

12.

Extended overlay architectures for heterogeneous FPGA cluster management

《Journal of Systems Architecture》2017

This paper proposes a novel approach for the hardware virtualization of FPGA resources, based on overlay architectures. Overlays are reconfigurable architectures synthesized on top of commercial-of-the-shelf (COTS) FPGAs. They have demonstrated to improve portability, speed up reconfiguration, and promote resource abstraction hence durability. This work demonstrates how slightly extending the architecture overlaying on top of COTS FPGAs can bring novel features for sake of improved management of hardware tasks, and ensure the binary compatibility among heterogeneous FPGAs. This comes along with a deployment platform and a software stack offering an operating system service. As a result, the platform is capable of node-to-node a hardware application live migration, while operating a cluster of heterogeneous FPGAs. Besides, the proposed software stack ensures backward compatibility when introducing a new overlay architecture. This paper also introduces accurate cost models for the early estimation of the reconfiguration time overhead. This approach that has been demonstrated in DASIP international conference is evaluated in this paper on both the Xilinx Artix-7 and Altera Cyclone V C9 FPGAs. 相似文献

13.

网络化控制系统瞬时故障恢复和安全控制研究综述

周纯杰黄雄峰秦元庆杨明月《控制与决策》2011,26(10):1441-1446

鉴于瞬时故障是导致控制系统事故的主要故障形式,瞬时故障恢复是保证系统安全的重要手段,首先,介绍了当前通过主动冗余和基于系统模型分析进行瞬时故障恢复的方法;然后,综述这些技术在网络化控制系统的通信网络、网络节点、系统层面瞬时故障恢复和安全控制中的应用研究;最后,对网络化控制系统瞬时故障恢复和安全控制方法的发展趋势进行了展望. 相似文献

14.

Enhancing Security of FPGA-Based Embedded Systems with Combinational Logic Binding

下载免费PDF全文

Ji-Liang Zhang Wei-Zheng Wang Xing-Wei Wang Zhi-Hua Xia 《计算机科学技术学报》2017,32(2):329-339

With the increasing use of field-programmable gate arrays (FPGAs) in embedded systems and many embedded applications, the failure to protect FPGA-based embedded systems from cloning attacks has brought serious losses to system developers. This paper proposes a novel combinational logic binding technique to specially protect FPGA-based embedded systems from cloning attacks and provides a pay-per-device licensing model for the FPGA market. Security analysis shows that the proposed binding scheme is robust against various types of malicious attacks. Experimental evaluations demonstrate the low overhead of the proposed technique. 相似文献

15.

Dependable design technique for system-on-chip

Pavel Hana 《Journal of Systems Architecture》2008,54(3-4):452-464

A technique for highly reliable digital design for two FPGAs under a processor control is presented. Two FPGAs are used in a duplex configuration system design, but better dependability parameters are obtained by the combination of totally self-checking blocks based on a parity predictor. Each FPGA can be reconfigured when a SEU fault is detected. This reconfiguration is controlled by a control unit implemented in a processor. Combinational circuit benchmarks have been considered in all our experiments and computations. All our experimental results are obtained from a XILINX FPGA implementation using EDA tools. The dependability model and dependability calculations are presented to document the improved reliability parameters. 相似文献

16.

工业无线传感器网络节点通信中的瞬时故障恢复

胡浩黄雄锋杨明月周纯杰《软件》2011,32(9):12-15

无线传感器网络在工业领域应用广泛,由于电磁干扰、电源突然中断、软件突发错误等瞬时故障导致系统失效,影响系统安全,其控制和恢复在涉及安全的领域越发得到重视。考虑到瞬时故障在网络的多个层次可能都有发生,提出了一种三层故障处理机制,在芯片级采用硬件逻辑调整、在节点级采用重传和跳频、在系统级采用冗余路由等措施进行故障控制和恢复。实验结果表明,采用三层故障处理机制后,网络的丢包率和端对端的时延有效降低,节点通信的可靠性和安全性得到有效的提高。相似文献

17.

Generating synthetic benchmark circuits for accelerated life testing of field programmable gate arrays using genetic algorithm and particle swarm optimization

《Applied Soft Computing》2015

Accelerated life testing (ALT) of a field programmable gate array (FPGA) requires it to be configured with a circuit that satisfies multiple criteria. Hand-crafting such a circuit is a herculean task as many components of the criteria are orthogonal to each other demanding a complex multivariate optimization. This paper presents an evolutionary algorithm aided by particle swarm optimization methodology to generate synthetic benchmark circuits (SBC) that can be used for ALT of FPGAs. The proposed algorithm was used to generate a SBC for ALT of a commercial FPGA. The generated SBC when compared with a hand-crafted one, demonstrated to be more suitable for ALT, measured in terms of meeting the multiple criteria. The SBC generated by the proposed technique utilizes 8.37% more resources; operates at a maximum frequency which is 40% higher; and has 7.75% higher switching activity than the hand-crafted one reported in the literature. The hand-crafted circuit is very specific to the particular device of that family of FPGAs, whereas the proposed algorithm is device-independent. In addition, it took several man months to hand-craft the SBC, whereas the proposed algorithm took less than half-a-day. 相似文献

18.

Sorting networks on FPGAs

Rene Mueller Jens Teubner Gustavo Alonso 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(1):1-23

Computer architectures are quickly changing toward heterogeneous many-core systems. Such a trend opens up interesting opportunities but also raises immense challenges since the efficient use of heterogeneous many-core systems is not a trivial problem. Software-configurable microprocessors and FPGAs add further diversity but also increase complexity. In this paper, we explore the use of sorting networks on field-programmable gate arrays (FPGAs). FPGAs are very versatile in terms of how they can be used and can also be added as additional processing units in standard CPU sockets. Our results indicate that efficient usage of FPGAs involves non-trivial aspects such as having the right computation model (a sorting network in this case); a careful implementation that balances all the design constraints in an FPGA; and the proper integration strategy to link the FPGA to the rest of the system. Once these issues are properly addressed, our experiments show that FPGAs exhibit performance figures competitive with those of modern general-purpose CPUs while offering significant advantages in terms of power consumption and parallel stream evaluation. 相似文献

19.

Phase clocks for transient fault repair

Herman T. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(10):1048-1057

Phase clocks are synchronization tools that implement a form of logical time in distributed systems. For systems tolerating transient faults by self-repair of damaged data, phase clocks can enable reasoning about the progress of distributed repair procedures. This paper presents a phase clock algorithm suited to the model of transient memory faults in asynchronous systems with read/write registers. The algorithm is self-stabilizing and guarantees accuracy of phase clocks within O(k) time following an initial state that is. 相似文献

20.

Fault-Tolerant Rate-Monotonic Scheduling 总被引：11，自引：0，他引：11

Ghosh Sunondo Melhem Rami Mossé Daniel Sarma Joydeep Sen 《Real-Time Systems》1998,15(2):149-181

Due to the critical nature of the tasks in hard real-time systems, it is essential that faults be tolerated. In this paper, we present a scheme which can be used to tolerate faults during the execution of preemptive real-time tasks. We describe a recovery scheme which can be used to re-execute tasks in the event of single and multiple transient faults and discuss conditions that must be met by any such recovery scheme. We then extend the original Rate Monotonic Scheduling (RMS) scheme and the exact characterization of RMS to provide tolerance for single and multiple transient faults. We derive schedulability bounds for sets of real-time tasks given the desired level of fault tolerance for each task or subset of tasks. Finally, we analyze and compare those bounds with existing bounds for non-fault-tolerant and other variations of RMS. 相似文献