首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Occurrence of faults in Network on Chip (NoC) is inevitable as the feature size is continuously decreasing and processing elements are increasing in numbers. Faults can be revocable if it is transient. Transient fault may occur inside router, or in the core or in communication wires. Examples of transient faults are overflow of buffers in router, clock skew, cross talk, etc.. Revocation of transient faults can be done by retransmission of faulty packets using oblivious or adaptive routing algorithms. Irrevocable faults causes non-functionality of segment and mainly occurs during fabrication process. NoC reliability increases with the efficient routing algorithms, which can handle the maximum faults without deadlock in network. As transient faults are temporary and can be easily revoked using retransmission of packet, permanent faults require efficient routing to route the packet by bypassing the nonfunctional segments. Thus, our focus is on the analysis of adaptive minimal path fault tolerant routing to handle the permanent faults. Comparative analysis between partial adaptive fault tolerance routing West-First, North-Last, Negative-First, Odd Even, and Minimal path Fault Tolerant routing (MinFT) algorithms with the nodes and links failure is performed using NoC Interconnect RoutinG and Application Modeling simulator (NIRGAM) for the 2D Mesh topology. Result suggests that MinFT ensures data transmission under worst conditions as compared to other adaptive routing algorithms.  相似文献   

2.
As one of the main trends of communication technology for 3D integrated circuits, the 3D networks-on-chip (NoCs) have drawn high concern from the academia. The links are main components of the NoCs. For the permanent link faults, the fault-tolerant routing scheme has been regarded as an effective mechanism to ensure the performance of the 2D NoCs. In this paper, we propose a low-overhead fault-tolerant routing scheme called LOFT for 3D Mesh NoCs without requiring any virtual channels (VCs). LOFT is a deadlock-free scheme by adopting a logic-based routing named LBDRe2 guided by a turn model Complete-OE. The experimental results show that LOFT possesses better performance, improved reliability and lower overhead compared with the state-of-the-art reliable routing schemes.  相似文献   

3.
This paper presents a quantitative reliability analysis of a system designed to tolerate both hardware and software faults. The system achieves integrated fault tolerance by implementing N-version programming (NVP) on redundant hardware. The system analysis considers unrelated software faults, related software faults, transient hardware faults, permanent hardware faults, and imperfect coverage. The overall model is Markov in which the states of the Markov chain represent the long-term evolution of the system-structure. For each operational configuration, a fault-tree model captures the effects of software faults and transient hardware faults on the task computation. The software fault model is parameterized using experimental data associated with a recent implementation of an NVP system using the current design paradigm. The hardware model is parameterized by considering typical failure rates associated with hardware faults and coverage parameters. The authors results show that it is important to consider both hardware and software faults in the reliability analysis of an NVP system, since these estimates vary with time. Moreover, the function for error detection and recovery is extremely important to fault-tolerant software. Several orders of magnitude reduction in system unreliability can be observed if this function is provided promptly  相似文献   

4.
This paper presents a novel asynchronous design approach for multiple input multiple output (MIMO) satellite communication (SatCom) systems. One of the main challenges for MIMO SatCom systems is that these are prone to transient faults that typically are attributable to radiation hazards. Hence, instead of using conventional synchronous circuits, we conceive our design using asynchronous circuits since it inherently has a high tolerance to transient fault. Additionally, we adopt accelerated dual paths (ADP) design into our system. By carefully arranging the data flow between the two paths, the ADP design approach can help to further accelerate the asynchronous system and increase the reliability of the system by circumventing transient faults induced delay, as well as tolerating latch-ups and other permanent faults. The numerical results show that this design approach provides promising results. For example, the proposed design can decrease the delay overhead of the entire system from 43.5 to 19.8 % at the fault rate of 400/clock cycle.  相似文献   

5.
Low power fault tolerance design techniques trade reliability to reduce the area cost and the power overhead of integrated circuits by protecting only a subset of their workload or their most vulnerable parts. However, in the presence of faults not all workloads are equally susceptible to errors. In this paper, we present a low power fault tolerance design technique that selects and protects the most susceptible workload. We propose to rank the workload susceptibility as the likelihood of any error to bypass the logic masking of the circuit and propagate to its outputs. The susceptible workload is protected by a partial Triple Modular Redundancy (TMR) scheme. We evaluate the proposed technique on timing-independent and timing-dependent errors induced by permanent and transient faults. In comparison with unranked selective fault tolerance approach, we demonstrate a) a similar error coverage with a 39.7% average reduction of the area overhead or b) a 86.9% average error coverage improvement for a similar area overhead. For the same area overhead case, we observe an error coverage improvement of 53.1% and 53.5% against permanent stuck-at and transition faults, respectively, and an average error coverage improvement of 151.8% and 89.0% against timing-dependent and timing-independent transient faults, respectively. Compared to TMR, the proposed technique achieves an area and power overhead reduction of 145.8% to 182.0%.  相似文献   

6.
Best Effort (BE) and Guaranteed Throughput services (GT) are the two broad categories of communication services provided in NoC. Few of the existing NoC architectures provide both of these services. GT based services, which are based on circuit switching or connection oriented mechanisms of packet switching, are usually preferred for real time traffic while packet switching services are provided by the BE architecture. In this paper, biologically inspired fault tolerant techniques are implemented on these two different services. Biologically inspired techniques offer novel ways of making NoCs fault tolerant; faults in NoCs arise partly due to advanced nanoscale manufacturing processes and the complex communication requirements of the processing elements (PEs). The proposed NoCs fault-tolerant methods (synaptogenesis and sprouting) are adapted from the biological brain׳s robust fault tolerant mechanisms. These techniques are implemented on both BE and GT services. From the experimental results, the BE architecture was efficiently utilizing the bandwidth compared to GT services, while throughput utilization of GT services were better. The accepted traffic (flit/cycle/node) of the BE architecture is 6.31% better than GT architecture while the accepted traffic of the bio-inspired techniques is 72.12% better than traditional fault tolerant techniques.  相似文献   

7.
Extensive system testing is mandatory nowadays to achieve high product quality. Telecommunication systems are particularly sensitive to such a requirement; to maintain market competitiveness, manufacturers need to combine reduced costs, shorter life cycles, advanced technologies, and high quality. Moreover, strict reliability constraints usually impose very low fault latencies and a high degree of fault detection for both permanent and transient faults. This article analyzes major problems related to testing complex telecommunication systems, with particular emphasis on their memory modules, often so critical from the reliability point of view. In particular, advanced BIST-based solutions are analyzed, and two significant industrial case studies presented  相似文献   

8.
Due to the shrinking of feature size and significant reduction in noise margins, nanoscale circuits have become more susceptible to manufacturing defects, interference from radiation and noise-related transient faults. Many of these faults are not permanent in nature but their occurrence can result in malfunctioning of circuits, either due to complexity of digital circuits or due to interaction with software. A fault-tolerant scheme such as triple-modular redundancy (TMR) is being implemented increasingly in digital systems. One of the drawbacks of this scheme is that the reliability of the voter circuit is assumed to be very high, which may not be true. Most of the implementation of digital circuits is in the form of integrated circuit; so all the circuit elements are fabricated with same technology and hence reliability of all the components is usually same. In this paper we are presenting a novel fault-tolerant voter circuit which itself can tolerate a fault and give error free output by improving the overall system’s reliability.  相似文献   

9.
By benefiting from the development of the semiconductor technology, many-core system-on-chips (SoCs) have been widely used in electronic devices. Network-on-chips (NoCs) can address the massive stress of on-chip communications due to the advantages of high bandwidth, low latency, and good flexibility. Since deep sub-micron era, the reliability has become a critical constraint for integrated circuits. To provide correct data transmission, fault-tolerant NoCs have been researched widely in last decades, and many valuable designs have been proposed. This work introduces and summarizes the state-of-the-art technologies for fault diagnosis and fault recovery in fault-tolerant NoCs. Moreover, this work makes prospects for the future's research.  相似文献   

10.
The Secure Hash Algorithm is the most popular hash function currently used in many security protocols such as SSL and IPSec. Like other cryptographic algorithms, the hardware implementation of hash functions is of great importance for high speed applications. Because of the iterative structure of hash functions, a single error in their hardware implementation could result in a large number of errors in the final hash value. In this paper, we propose a novel time-redundancy-based fault diagnostic scheme for the implementation of SHA-1 and SHA-512 round computations. This scheme can detect permanent as well as transient faults as opposed to the traditional time redundancy technique which is only capable of detecting transient errors. The proposed design does not impose significant timing overhead to the original implementation of SHA-1 and SHA-512 round computation. We have implemented the proposed design for SHA-1 and SHA-512 on Xilinx xc2p7 FPGA. It is shown that for the proposed fault detection SHA-1 and SHA-512 round computations, there are, respectively, 3% and 10% reduction in the throughput with 58% and 30% area overhead as compared to the original schemes. The fault simulation of the implementation shows that almost 100% fault coverage can be achieved using the proposed scheme for transient and permanent faults.  相似文献   

11.
In this work different VHDL-based fault injection techniques (simulator commands, saboteurs and mutants) have been compared and applied in the validation of a fault-tolerant system. Some extensions and implementation designs of these techniques have been introduced. As a complement of these injection techniques, a wide set of fault models (including several non-usual models) have been implemented. We have injected both transient and permanent faults on the system model, using two different workloads, with the help of a fault injection tool that we have developed. We have studied the pathology of the propagated errors, measured their latencies, and calculated both detection and recovery coverages. Results show that coverages for transient faults can be obtained quite accurately with any of the three techniques. This enables the use of different abstraction level models for the same system. We have also verified significant differences in implementation and simulation cost between the studied injection techniques.  相似文献   

12.
Three dimensional (3D) integrated systems become a reality nowadays, as Thru-Silicon-Via (TSV) technologies mature. 3D integration promises significant performance and energy efficiency improvements by reducing the signal travel distances and integrating more capabilities on a single chip. High integration costs, thermal management, and poor reliability and yield are major challenges of TSV based 3D chips. High structural and parametric fault rates due to manufacturing defects makes it difficult to achieve high interconnect yield using only spare-based repair solutions. In this paper we address the TSV yield issue by implementing the inter-die links of 3D chips as Configurable fault-tolerant Serial Links (CSLs). When there are not enough available functional TSVs, faults are tolerated by performing data serialization. CSLs help reduce chip costs by improving the TSV yield with very few or no spares at all. For 3D Networks-on-Chip (3D NoCs) we show that the CSL yield improvement comes with moderate area overheads (~12–26%) and small performance penalties (less than 5% average latency overhead).  相似文献   

13.
Radiation induced faults in digital systems have started gathering major attention in recent years due to increasing reliability concern for future technologies. For future technologies, multiple transient faults (MTF) originating from a single radiation hit are expected to occur more frequently. Further, due to continuous massive scaling in device geometry, a particle with moderate linear energy transfer (LET) values is expected to affect more than one module/device during striking. Additionally, incessant escalation in operating speed with evolution of technology has increased likelihood of multi-cycle transient (MCT) faults in digital circuits. This calls for novel solutions for concurrently tackling multi-cycle transient and multi-transient fault resiliency at a higher design abstraction level such as behavioral level. This paper proposes a novel approach for generating simultaneous multi-cycle transient and multiple transient fault resilient designs during high level synthesis (HLS) of application specific datapath processors using the framework of dual modular redundancy. Results of the proposed approach on benchmarks indicated generation of low cost MCT–MFT resilient designs during HLS within acceptable runtime.  相似文献   

14.
The problem of system recovery from transient faults is addressed using retry techniques. A probabilistic model for the activity of faulty periods, and a fault analysis to derive the optimum retry period are presented. Distribution functions are derived to represent the case of false alarm, where a transient fault is flagged as permanent, and the case of a miss because too many faults coexist, overcoming the checker's capability to detect them. These derivations are compared with the results of a simulation program representing the model. Other factors influencing the value of the retry period are discussed  相似文献   

15.
Real-time computers are often used in embedded, life-critical applications where high reliability is important. A common approach to make such systems dependable is to vote on redundant processors executing multiple copies of the same task. The most popular redundant structure is triple modular redundancy (TMR). The processors that make up such systems are subject not only to independently occurring permanent and transient faults, but to correlated transient faults, such as electromagnetic interference (EMI) caused by the operating environment. This paper proposes two new scheduling strategies for TMR computer-controllers. Both strategies can tolerate correlated faults as well as independent faults. These strategies, TMR-R (TMR with rotated task group) and TMR-Q (TMR with quintuple computation), are developed using task grouping and assignment. To evaluate the reliability of these strategies, a discrete-time Markov model for control systems is devised. Reliability equations for the TMR-R and TMR-Q are derived from state transitions of sampling intervals based on the Markov model. The reliability of these TMR is proved by comparing them with a conventional TMR, using numerical analysis. These proposed strategies are anticipated to be useful for control systems operating in harsh environments, such as controllers of airplanes or nuclear power plants  相似文献   

16.
This paper presents an approach for increasing the lifetime of systems implemented on SRAM-based FPGAs, by introducing fault tolerance properties enabling the system to autonomously manage the occurrence of both transient and permanent faults. On the basis of the foreseen mission time and application environment, the designer is supported in the implementation of a system able to reconfigure itself, either by reloading the correct configuration in case of transient faults, or by relocating part of the functionality in presence of permanent faults. The result is a system implementation offering good performance and correct functionality even when faults occur. The proposed approach is evaluated in a case study to highlight the overall characteristics of the final implementation.  相似文献   

17.
We present a novel Partial Virtual channel Sharing (PVS) NoC architecture which reduces the impact of faults on performance and also tolerates faults within the routing logic. Without PVS, failure of a component impairs the fault-free connected components, which leads to considerable performance degradation. Improving resource utilization is key in enhancing or sustaining performance with minimal overhead when faults or overload occurs. In the proposed architecture, autonomic virtual-channel buffer sharing is implemented with a novel algorithm that determines the sharing of buffers among a set of ports. The runtime allocation of the buffers depends on incoming load and fault occurrence. In addition, we propose an efficient technique for maintaining the accessibility of a processing element (PE) to the network even if its router is faulty. Our techniques can be used in any NoC topology and for both, 2D and 3D NoCs. The synthesis results for an integrated video conference application demonstrate 22 % reduction in average packet latency compared to state-of-the-art virtual channel (VC) based NoC architecture. Extensive quantitative simulation has been carried out with synthetic benchmarks. Simulation results reveal that the PVS architecture improves the performance significantly in presence of faults, compared to other VC-based NoC architectures.  相似文献   

18.
Wireless sensor networks are susceptible to failures of nodes and links due to various physical or computational reasons. Some physical reasons include a very high temperature, a heavy load over a node, and heavy rain. Computational reasons could be a third-party intrusive attack, communication conflicts, or congestion. Automated fault diagnosis has been a well-studied problem in the research community. In this paper, we present an automated fault diagnosis model that can diagnose multiple types of faults in the category of hard faults and soft faults. Our proposed model implements a feed-forward neural network trained with a hybrid metaheuristic algorithm that combines the principles of exploration and exploitation of the search space. The proposed methodology consists of different phases, such as a clustering phase, a fault detection and classification phase, and a decision and diagnosis phase. The implemented methodology can diagnose composite faults, such as hard permanent, soft permanent, intermittent, and transient faults for sensor nodes as well as for links. The proposed implementation can also classify different types of faulty behavior for both sensor nodes and links in the network. We present the obtained theoretical results and computational complexity of the implemented model for this particular study on automated fault diagnosis. The performance of the model is evaluated using simulations and experiments conducted using indoor and outdoor testbeds.  相似文献   

19.
Increasing vulnerability of transistors and interconnects due to scaling is continuously challenging the reliability of future microprocessors. Lifetime reliability is gaining attention over performance as a design factor even for lower-end commodity applications. In this work we present a low-power hybrid fault tolerant architecture for reliability improvement of pipelined microprocessors by protecting their combinational logic parts. The architecture can handle a broad spectrum of faults with little impact on performance by combining different types of redundancies. Moreover, it addresses the problem of error propagation in nonlinear pipelines and error detection in pipeline stages with memory interfaces. Our case-study implementation of a fault tolerant MIPS microprocessor highlights four main advantages of the proposed solution. It offers (i) 11.6 % power saving, (ii) improved transient error detection capability, (iii) lifetime reliability improvement, and (iv) more effective fault accumulation effect handling, in comparison with TMR architectures. We also present a gate-level fault-injection framework that offers high fidelity to model physical defects and transient faults.  相似文献   

20.
Applying system-level fault-tolerant techniques such as active redundancy is a promising way to enhance the system reliability for safety-related applications. Embedded system design using active redundancy is a challenging task that involves solving two major problems, namely finding the optimal redundancy configuration and mapping/scheduling of the application (including the redundant components) to the platform under timing and reliability constraints. This paper presents a framework for automatic synthesis of fault-tolerant designs on multiprocessor platforms. The core of the framework consists of: (1) a reliability analysis, that computes the system-level reliability in the presence spatial and temporal redundancy, and (2) an optimization approach for reliability-aware design space exploration. The proposed approach considers both transient and permanent faults and is among the first to support system design using imperfect fault detectors. The framework takes an application model, a platform model and a set of application requirements as input, and generates the recommended design parameters, including task-to-processor binding, task schedule and the selection/placement of redundancy. The effectiveness of our approach is illustrated using several case studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号