期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reliability Analysis of Fault-Tolerant Bus-Based Interconnection Networks

Fathollah Bistouni Mohsen Jahanshahi 《Journal of Electronic Testing》2016,32(5):541-568

Multi-processor systems need interconnection networks (INs) in order to make the connection among the processors, memory modules, and nodes. Bus interconnection network is the simplest and least expensive one among all the INs. Therefore, bus network is easily understood and preferred by manufactures for implementation. However, a bus network is inherently a non-fault tolerant and blocking network. To cope with these problems, a solution is to use several buses in parallel on a network. Based on this idea, various schemes can be designed for a bus network: (1) Multiple-bus with full bus-memory connection, (2) Multiple-bus with single bus-memory connection, (3) Multiple-bus with partial bus-memory connection, and (4) Multiple-bus with class-based memory connection. On the other hand, a metric for the efficiency of fault-tolerant systems is its reliability. Although, there is no detailed analysis of the reliability of bus-based networks, this paper presents accurate and complete reliability analysis of bus-based networks to achieve these aims: (1) Determining the most efficient design of bus-based networks in terms of reliability, cost-effectiveness, and blocking issues, (2) Providing new methods for evaluating the performance of bus-based networks. 相似文献

2.

Decomposition in Reliability Analysis of Fault-Tolerant Systems

Trivedi Kishor S. Geist Robert M. 《Reliability, IEEE Transactions on》1983,(5):463-468

Two important problems which arise in modeling fault-tolerant systems with ultra-high reliability requirements are discussed. 1) Any analytic model of such a system has a large number of states, making the solution computationally intractable. This leads to the need for decomposition techniques. 2) The common assumption of exponential holding times in the states is intolerable while modeling such systems. Approaches to solving this problem are reviewed. A major notion described in the attempt to deal with reliability models with a large number of states is that of behavioral decomposition followed by aggregation. Models of the fault-handling processes are either semi-Markov or simulative in nature, thus removing the usual restrictions of exponential holding times within the coverage model. The aggregate fault-occurrence model is a non-homogeneous Markov chain, thus allowing the times to failure to possess Weibull-like distributions. There are several potential sources of error in this approach to reliability modeling. The decomposition/aggregation process involves the error in estimating the transition parameters. The numerical integration involves discretization and round-off errors. Analysis of these errors and questions of sensitivity of the output (R(t)) to the inputs (failure rates and recovery model parameters) and to the initial system state acquire extreme importance when dealing with ultra-high reliability requirements. 相似文献

3.

Reliability & Safety Analysis of a Fault-Tolerant Controller

Johnson Barry W. Aylor James H. 《Reliability, IEEE Transactions on》1986,35(4):355-362

This paper analyzes a fault-tolerant, microprocessor-based controller for an electric wheelchair. Two candidate architectures are considered, including reconfigurable duplication and stand-by sparing. The difference in the reliability and safety of the two candidates is determined through the use of Markov models. Safety is paramount in the wheelchair application because of the need to protect the physically disabled wheelchair user;reliability by itself is insufficient for selecting an appropriate architecture in this application. The results show that reconfigurable duplication is safer than standby sparing even though standby sparing is more reliable. Because of the better safety, reconfigurable duplication is the preferred approach for the wheelchair application. Safety is extremely important in the selection of a fault-tolerant architecture for the electric wheelchair control system. Standby sparing provides a conceptually simple approach that achieves a higher reliability than reconfigurable duplication. However, reconfigurable duplication has a higher safety for a given fault coverage. Because of the need for safety in the electric wheelchair control system, reconfigurable duplication is the selected approach. 相似文献

4.

RPR容错功能的可靠性分析

王磊谢军《光通信技术》2004,28(1):32-33

采用"网络连通率"作为度量网络可靠性的参数,通过对系统模型的数学解析,建立了具有源路由功能和回绕功能的弹性分组环的可靠性数学模型。通过数值分析,得出了环处理器故障率对系统可靠性的影响。相似文献

5.

双环双容错功能局域网的可靠性分析

张兴周李绪友刘昭和《通信学报》1999,20(2):2

采用“平均有效终端数”作为度量网络可靠性的参数,通过对系统模型的数学解析,建立了具有旁路和环路回接双容错功能的双环局域网(LAN)的可靠性数学模型。通过数值例,考察了网络结构因素对系统可靠性的影响,得到了最佳实际终端数存在的结论。相似文献

6.

Super-High Reliability Fault-Tolerant System NURECS-3000

Asami Kazuo Yanai Katsuya Ito Tetsuo 《Industrial Electronics, IEEE Transactions on》1986,(2):148-151

相似文献

7.

Reliability of Single-Error Correction Protected Memories

《Reliability, IEEE Transactions on》2009,58(1):193-201

Reliability is a critical factor for systems operating in radiation environments. Among the different components in a system, memories are one of the parts most sensitive to soft errors due to their relatively large area. Due to their large cost, traditional techniques like Triple Modular Redundancy are not used to protect memories. A typical approach is to apply Error Correction Codes to correct single errors, and detect double errors. This type of codes, for example those based on Hamming, provides an initial level of protection. Detected single errors are usually corrected using scrubbing, by which the memory positions are periodically re-written after a fixed (deterministic scrubbing), or variable period (probabilistic scrubbing). These traditional models usually offer good results when calculating the reliability of memories (e.g. through the Mean Time To Failure). However, there are some particularities that are not modeled through these approaches, to the best of our knowledge. One of these particularities is how double errors are handled. In a traditional approach, two errors in the same word produce always a system failure (only single errors can be corrected). However, if the two (or more) errors affect the same bit, either the second one reinforces the first one (keeping just a single error), or corrects it. In both scenarios, the resulting situation does not trigger a system failure, which has a direct impact on the reliability of the memory. In this paper, traditional reliability models are refined to handle the mentioned scenarios, which produces a more precise analysis in the calculation of mean time to failure for memory systems. 相似文献

8.

An Information Theoretical Framework for Analysis and Design of Nanoscale Fault-Tolerant Memories Based on Low-Density Parity-Check Codes

Vasic B. Chilappagari S.K. 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(11):2438-2446

In this paper, we develop a theoretical framework for the analysis and design of fault-tolerant memory architectures. Our approach is a modification of the method developed by Taylor and refined by Kuznetsov. Taylor and Kuznetsov (TK) showed that memory systems have nonzero computational (storage) capacity, i.e., the redundancy necessary to ensure reliability grows asymptotically linearly with the memory size. The restoration phase in the TK method is based on low-density parity-check codes which can be decoded using low complexity decoders. The equivalence of the restoration phase in the TK method and faulty Gallager B algorithm enabled us to establish a theoretical framework for solving problems in reliable storage on unreliable media using the large body of knowledge in codes on graphs and iterative decoding gained in the past decade. 相似文献

9.

Reliability Modeling and Analysis of Computer Networks

Raghavendra C. S. Makam S. V. 《Reliability, IEEE Transactions on》1986,35(2):156-160

The reliability analysis of computer communication networks is generally based on Boolean algebra and probability theory. This paper discusses various reliability problems of computer networks including terminal-pair connectivity, tree connectivity, and multi-terminal connectivity. This paper also studies the dynamic computer network reliability by deriving time-dependent expressions for reliability measures assuming Markov behavior for failures and repairs. This allows computation of task and mission related measures such as mean time to first failure and mean time between failures. A detailed analysis of the bridge network is presented. 相似文献

10.

Analysis and Fault Modeling of Actual Resistive Defects in ATMEL eFlash Memories

P.-D. Mauroux A. Virazel A. Bosio L. Dilillo P. Girard S. Pravossoudovitch B. Godard G. Festes L. Vachez 《Journal of Electronic Testing》2012,28(2):215-228

The embedded Flash (eFlash) technology can be subject to defects creating functional faults. In this paper, we first generalize the electrical model of the ATMEL TSTAC™ eFlash memory technology proposed in [10]. The model is composed of two layers: a functional layer representing the Floating Gate (FG) and a programming layer able to determine the channel voltage level controlling the Fowler-Nordheim tunneling effect. The proposed model is validated by means of simulations and comparisons with ATMEL silicon data. Then, we present a complete analysis of actual resistive defects (open and short) that may affect the ATMEL TSTAC™ eFlash array by considering the proposed model on a hypothetical 4 × 4 array. This analysis highlights the interest of the proposed model to provide a realistic set of fault models that has to be tested, thus enhancing existing solutions for TSTAC™ eFlash testing. 相似文献

11.

Performance/Reliability Measures for Fault-Tolerant Computing Systems

Osaki Shunji 《Reliability, IEEE Transactions on》1984,(4):268-271

Some fault-tolerant computing systems are discussed and existing reliability measures are explained. Some performance/reliability measures are introduced. Several systems are compared by using numerical examples with the new measures. 相似文献

12.

Fault-Tolerant ICs: The Reliability of TMR Yield-Enhanced ICs

Haifley Tim Bhatt Atul 《Reliability, IEEE Transactions on》1987,(2):224-226

The use of triple modular redundancy (TMR) for reliability enhancement is well known. This paper presents a simple method' for predicting the reliability of integrated circuits (ICs) which use TMR for yield enhancement. A simple yield-model is included as it is necessary to factor in the effect of consumption of redundancy paths due to wafer fabrication defects. TMR implementation is briefly discussed as well. 相似文献

13.

Reliability Modeling and Analysis of General Modular Redundant Systems

Mathur Francis P. de Sousa Paulo T. 《Reliability, IEEE Transactions on》1975,(5):296-299

Hardware redundancy has been used in the design of fault-tolerant digital systems. A synthesis of protective hardware redundancy techniques is proposed and a generalized reliability model suitable for many fault-tolerant configurations is developed. This model, to be called General Modular Redundancy (GMR), yields as particular cases several known models of redundant structures. 相似文献

14.

容错计算机系统的可靠性建模和分布式仿真

刘逢清《南京邮电学院学报(自然科学版)》2008,(5):84-88

对容错计算机系统的可靠性进行评估是一项重要而困难的工作。为容错计算机系统的各个组成部件建立了部件级故障模型,并以此为基础建立了整个容错计算机系统的可靠性测评模型。基于所建立模型,我们提出了分布式仿真的思想,它把系统的各个部件仿真为网络上的一个站点,并增设一台服务器来对整个系统的可靠性性能进行统计和分析。仿真结果验证了建立模型的正确性和方法的有效性。相似文献

15.

基于自动建模方法的互连可靠性分析

朱凡赵丽霞《固体电子学研究与进展》2014,(6)

金属互连线中的电迁移现象是导致互连线失效的主要原因,影响电迁移(EM)的因素有温度、电流、材料特性和互连线的几何尺寸等。采用有限元分析方法,探究导致电迁移的这些因素的工作机制,以及这些机制之间的相互影响,并且采用原子通量散度(AFD)来衡量电迁移的大小。使用先进的自动建模并仿真的方法,得到互连线的材料属性、几何尺寸、电流密度,以及外界环境温度与AFD的关系,通过AFD的变化规律分析互连线的可靠性。结果表明,温度升高、电流增大、尺寸减小,都会降低互连线的可靠性。相似文献

16.

软件可靠性模型中样点可信度

陈文《现代电子技术》2009,32(20):103-106

将未确知有理数理论应用于数据建模是常见的建模方法,为了使其更有效地应用于软件可靠性的建模,结合软件失效数据的特点,从样点及样点的可信度两方面,研究未确知理论应用于软件可靠性模型的方法,提出对样点补偿的观点,并给出用补偿后样点计算可信度的计算方法.为未确知理论应用于软件可靠性的建模提供了新思路和新方法,并结合实例说明这一方法的有效性. 相似文献

17.

Reliability Modeling and Analysis of Communication Networks with Dependent Failures

Lam Y. Li V. 《Communications, IEEE Transactions on》1986,34(1):82-84

This paper presents a new model to study the reliability of communication networks in which link failures are statistically dependent. The approach tries to identify and model explicitly the events that cause communication link failures. No conditional probabilities are needed, and so two major difficulties inherent to them, namely, an exponential number of conditional probabilities to deal with and a consistency requirement to satisfy, are avoided. For reliability computations, some existing algorithms for finding network reliability can be used with minor modifications and no significant increase in computational complexity. 相似文献

18.

Comparison of Memory Chip Organizations vs Reliability in Virtual Memories

Matick Richard E. 《Reliability, IEEE Transactions on》1983,(1):48-58

Random access memory organizations typically are chosen for maximum reliability, based on the operation of the memory box itself without concern for the remainder of the computing system. This had led to widespread use of the 1-bit-per-chip, or related organization which uses error correcting codes to minimize the effects of failures occurring in some basic unit such as a word or double word (32 to 64 bits). Such memory boxes are used quite commonly in paged virtual memory systems where the unit for protection is really a page (4K bytes), or in a cache where the unit for protection is a block (32 to 128 bytes), not a double word. With typical high density memory chips and typical ranges of failure rates, the 1-bit-per-chip organization can often maximize page failures in a virtual memory system. For typical cases, a paged virtual memory using a page-per-chip organization can substantially improve reliability, and is potentially far superior to other organizations. This paper first describes the fundamental considerations of organization for memory systems and demonstrates the underlying problems with a simplified case. Then the reliability in terms of lost pages per megabyte due to hard failures over any time period is analyzed for a paged virtual memory organized in both ways. Normalized curves give the lost pages per Mbyte as a function of failure rate and accumulated time. Assuming reasonable failure rates can be achieved, the page-per-chip organization can be 10 to 20 times more reliable than a 1-bit-per-chip scheme. 相似文献

19.

Fault Leveling Techniques for Yield and Reliability Enhancement of NAND Flash Memories

Shyue-Kung Lu Shang-Xiu Zhong Masaki Hashizume 《Journal of Electronic Testing》2018,34(5):559-570

Novel fault leveling techniques based on address remapping (AR) are proposed in this paper. We can change the logical-to-physical address mapping of the page buffer such that faulty cells within a flash page can be evenly distributed into different codewords. Therefore, the adopted ECC scheme can correct them effectively. Based on the production test or on-line BIST results, the fault bitmap can be used for executing the heuristic fault leveling analysis (FLA) algorithm and evaluating control words used to steer fault leveling. A new page buffer architecture suitable for address remapping is also proposed. According to experimental results, repair rate, yield, and reliability can be improved significantly with negligible hardware overhead. 相似文献

20.

Reliability Modeling Using SHARPE 总被引：1，自引：0，他引：1

Sahner Robin A. Trivedi Kishor S. 《Reliability, IEEE Transactions on》1987,(2):186-193

Combinatorial models such as fault trees and reliability block diagrams are efficient for model specification and often efficient in their evaluation. But it is difficult, if not impossible, to allow for dependencies (such as repair dependency and near-coincident-fault type dependency), transient and intermittent faults, standby systems with warm spares, and so on. Markov models can capture such important system behavior, but the size of a Markov model can grow exponentially with the number of components in this system. This paper presents an approach for avoiding the large state space problem. The approach uses a hierarchical modeling technique for analyzing complex reliability models. It allows the flexibility of Markov models where necessary and retains the efficiency of combinatorial solution where possible. Based on this approach a computer program called SHARPE (Symbolic Hierarchical Automated Reliability and Performance Evaluator) has been written. The hierarchical modeling technique provides a very flexible mechanism for using decomposition and aggregation to model large systems; it allows for both combinatorial and Markov or semi-Markov submodels, and can analyze each model to produce a distribution function. The choice of the number of levels of models and the model types at each level is left up to the modeler. Component distribution functions can be any exponential polynomial whose range is between zero and one. Examples show how combinations of models can be used to evaluate the reliability and availability of large systems using SHARPE. 相似文献