首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents an integrated approach to the design of an ultrareliable memory system using a variety of coding and modularization techniques on each of the memory subsystem elements. The overall objective is to provide a properly operating memory system in spite of any single indigenous fault (regardless of the number of failures which might ensue). In other words, the system has the capability automatically to: (1) detect single faults and multiple failures, (2) mask failures to prevent malfunctions, without interrupting service, (3) isolate the fault to a replaceable module, and (4) reconfigure the faulty unit out of the system. The storage medium and retrieval circuits are checked and corrected by coding techniques. Some redundancy is used on the subunits, but the total redundancy is less than 20% of the system cost, and diagnostic software is eliminated.  相似文献   

2.
This paper develops a reliability model for a paged memory system wherein the pages of memory are physically distributed among several arrays of memory chips. Any of the available pages can be used to satisfy the required memory capacity. This paper also develops a reliability model for a page or block of memory words imbedded in an array. The model assumes that memory chips have failure modes that are catastrophic to a row, to a column, to the whole physical array, or to individual bits. Spare columns or data lines are used to enhance reliability. SECDED (Single Error Correction, Double Error Detection) provides the hard-fault detection mechanism and complete fault coverage for soft faults such as 1-bit upsets. A highly reliable memory system design is described that implements a paging scheme, uses a SECDED code for hard fault detection and isolation, and uses three levels of sparing to recover from failures. The significance of this paper is that it considers failure modes associated with interfacing a memory chip into an array of memory chips. These failure modes have an impact beyond the boundaries of an individual chip; they affect the entire physical array and must be considered in the reliability model. When this is done the reliability model permits trading off page size and array size with reliability.  相似文献   

3.
Application of error correcting coding is often employed to improve system operation and reliability. By means of suitable reliability models and simple analysis, the effect of error correcting coding of memory words on the overall reliability of the system is discussed. Introduction of error correcting facilities will generally have three significant effects on the system: 1) increased hardware, which is also subject to failures and hence tends to lower reliability; 2) the system's ability to function in the presence of a certain class of failures; and 3) quicker detection of errors, which also means an improved repair rate. To illustrate the extent to which the above three factors govern the reliability improvement due to coding, three types of systems are considered. These systems use the same basic processor and memory units but differ in their structure and complexity. Other factors besides the three above which control the reliability improvement due to coding are the system structure and the relative sizes of processor and memory hardware.  相似文献   

4.
Real-time furnace modeling and diagnostics   总被引:1,自引:0,他引:1  
Precise control of process temperature has become increasingly important in today's semiconductor industry. Multizone batch furnaces are used widely in current manufacturing lines, and high reliability of furnace systems is a crucial factor in achieving high product yield. However, uncertainty caused by sensor noise and failure may degrade reliability. In this work, the authors develop a methodology based on thermal modeling and sensor fusion techniques to detect temperature sensor failures, power supply failures, and system faults for the multizone furnace systems. The typical types of failures have been defined. The impact of single failures and different combinations of failures on the system behavior has been studied. The furnace system has been modeled based on both physical considerations and experimental data extraction. The fault detection methodology has been tested in simulations. Principal component analysis is utilized for choosing data types for different fault detection purposes. Sensor fusion is used to enhance reliability. Simulation results show that all different types of failures can be detected when data are rich enough. Experimental results show that all single failures and some of the failure combinations can be estimated when only steady-state and cooling-down data are utilized.  相似文献   

5.
Memory Fault Modeling Trends: A Case Study   总被引:1,自引:1,他引:0  
In recent years, embedded memories are the fastest growing segment of system on chip. They therefore have a major impact on the overall Defect per Million (DPM). Further, the shrinking technologies and processes introduce new defects that cause previously unknown faults; such faults have to be understood and modeled in order to design appropriate test techniques that can reduce the DPM level. This paper discusses a new memory fault class, namely dynamic faults, based on industrial test results; it defines the concept of dynamic faults based on the fault primitive concept. It further shows the importance of dynamic faults for the new memory technologies and introduces a systematic way for modeling them. It concludes that current and future SRAM products need to consider testability for dynamic faults or leave substantial DPM on the table, and sets a direction for further research.  相似文献   

6.
Checkers are used in digital circuits to detect both intermittent and stuck-at faults. The most common error detectors are parity checkers. Such circuits are themselves subject to failures. The use of parity trees is outlined, and techniques for testing them are surveyed. The effect of the checker's structure on its testability is discussed. Several fault models are considered: single stuck-at, multiple stuck-at, and bridging faults. The effectiveness of single stuck-at fault test sets in detecting multiple stuck-at and bridging faults is described. Upper bounds for the double fault coverage of the minimal single fault test are given for different tree structures. The testabilities of some selected checkers are examined to illustrate the concepts developed. A built-in self-test is proposed  相似文献   

7.
Random access memory organizations typically are chosen for maximum reliability, based on the operation of the memory box itself without concern for the remainder of the computing system. This had led to widespread use of the 1-bit-per-chip, or related organization which uses error correcting codes to minimize the effects of failures occurring in some basic unit such as a word or double word (32 to 64 bits). Such memory boxes are used quite commonly in paged virtual memory systems where the unit for protection is really a page (4K bytes), or in a cache where the unit for protection is a block (32 to 128 bytes), not a double word. With typical high density memory chips and typical ranges of failure rates, the 1-bit-per-chip organization can often maximize page failures in a virtual memory system. For typical cases, a paged virtual memory using a page-per-chip organization can substantially improve reliability, and is potentially far superior to other organizations. This paper first describes the fundamental considerations of organization for memory systems and demonstrates the underlying problems with a simplified case. Then the reliability in terms of lost pages per megabyte due to hard failures over any time period is analyzed for a paged virtual memory organized in both ways. Normalized curves give the lost pages per Mbyte as a function of failure rate and accumulated time. Assuming reasonable failure rates can be achieved, the page-per-chip organization can be 10 to 20 times more reliable than a 1-bit-per-chip scheme.  相似文献   

8.
与2D存储器相比,3D存储器能够提供更大的容量、更高的带宽、更低的延迟和功耗,但成品率低。为了解决这个问题,提出一种有效的3D存储器内建自修复方案。将存储阵列的每一行或每一列划分成几个行块或列块,在不同层的行块或列块之间进行故障单元的映射,使不同层同一行或同一列的故障在逻辑上映射到同一层中,从而使一个冗余行或冗余列能够修复更多的故障,大大增加了冗余资源利用率和故障修复率。实验结果表明,与其他修复方案相比,该方案的修复率更高,实现相同修复率所需的冗余资源更少,增加的面积开销几乎可忽略不计。  相似文献   

9.
《Microelectronics Journal》2015,46(7):598-616
Classical manufacturing test verifies that a circuit is fault free during fabrication, however, cannot detect any fault that occurs after deployment or during operation. As complexity of integration rises, frequency of such failures is increasing for which on-line testing (OLT) is becoming an essential part in design for testability. In majority of the works on OLT, single stuck at fault model is considered. However in modern integration technology, single stuck at fault model can capture only a small fraction of real defects and as a remedy, advanced fault models such as bridging faults, transition faults, delay faults, etc. are now being considered. In this paper we concentrate on bridging faults for OLT. The reported works on OLT using bridging fault model have considered non-feedback faults only. The basic idea is, as feedback bridging faults may cause oscillations, detecting them on-line using logic testing is difficult. However, not all feedback bridging faults create oscillations and even if some does, there are test patterns for which the fault effect is manifested logically. In this paper it is shown that the number of such cases is not insignificant and discarding them impacts OLT in terms of fault coverage and detection latency. The present work aims at developing an OLT scheme for bridging faults including the feedback bridging faults also, that can be detected using logic test patterns. The proposed scheme is based on Binary Decision Diagrams, which enables it to handle fairly large circuits. Results on ISCAS 89 benchmarks illustrate that consideration of feedback bridging faults along with non-feedback ones improves fault coverage, however, increase in area overhead is marginal, compared to schemes only involving non-feedback faults.  相似文献   

10.
11.
Online multiple-model-based fault diagnosis and accommodation   总被引:1,自引:0,他引:1  
While most research attention has been focused on fault detection and diagnosis, much less research effort has been devoted to failure accommodation. Due to the inherent complexity of nonlinear systems, most model-based analytical redundancy fault diagnosis and accommodation (FDA) studies deal with the linear systems, which are subjected to simple additive or multiplicative faults. This assumption has limited the effectiveness and usefulness in practical applications. In this paper, the online fault accommodation (FA) control problems under multiple catastrophic or incipient failures are investigated. The main interest is focused on dealing with the unanticipated component failures in the most general formulation. Through discrete-time Lyapunov stability theory, the sufficient conditions to guarantee the system online stability and to meet performance criteria under failures are derived. A systematic procedure for proper FA under the unanticipated failures is developed. The approach is to combine the control technique derived from discrete-time Lyapunov theory with the modern intelligent technique that is capable of self-optimization and online adaptation for real-time failure estimation. In addition, a complete architecture of FDA is proposed by incorporating the intelligent fault tolerant control strategy with a cost-effective fault detection scheme and a multiple-model based failure diagnosis process to efficiently handle the false alarms and the accommodation of both the anticipated and unanticipated failures in online situations. The simulation results, including a three-tank benchmark problem, substantiate the feasibility study of the proposed FDA framework and provide a promising potential to spin-off applications in industrial and aerospace engineering.  相似文献   

12.
This paper presents methods for recovering from channel failures, link failures, and node failures in wavelength-division multiplexed (WDM) point-to-point links and ring networks with limited wavelength conversion/switching capabilities at the nodes. Different recovery schemes are presented to handle each type of failure. Each scheme is evaluated based on the network hardware configuration required to support it and the performance and management overheads associated with fault recovery. Although similar recovery techniques have been used in conventional networks such as SONET, the constraints due to limited wavelength conversion require new and more complex solutions  相似文献   

13.
Large-scale integrated (LSI) memory circuit reliability is reviewed. Reliability of large-scale integrated memory circuits is discussed. The major physical mechanisms for failures in memory LSIs and measures to counter these failures are reviewed. Fault-tolerant techniques, divided into the spare row/column line substitution. (SLS) technique and the on-chip error-correcting code (ECC) technique, developed to overcome hard and soft failures are described. Design approaches for realizing high performance and high reliability are also discussed  相似文献   

14.
Bitonic sorters have recently been proposed to construct along with banyan networks the switching fabric of future broadband networks. Unfortunately, a single fault in a bitonic sorter may have disastrous consequences for the switching system. Therefore, a bitonic sorter must be proved to be free of faults before it can be used. We study the topological properties of bitonic sorters and present an efficient fault diagnosis procedure to detect, locate, and identify the fault type of single faults. Our diagnosis procedure can detect most single faults in two tests. Faults which cannot be detected in two tests can always be detected in four tests. Several binary search techniques are developed to locate a faulty sorting element (i.e. a 2×2 sorter)  相似文献   

15.
With increasing inter-die and intra-die parameter variations in sub-100-nm process technologies, new failure mechanisms are emerging in CMOS circuits. These failures lead to reduction in reliability of circuits, especially the area-constrained SRAM cells. In this paper, we have analyzed the emerging failure mechanisms in SRAM caches due to transistor V/sub t/ variations, which results from process variations. Also we have proposed solutions to detect those failures efficiently. In particular, in this work, SRAM failure mechanisms under transistor V/sub t/ variations are mapped to logic fault models. March test sequences have been optimized to address the emerging failure mechanisms with minimal overhead on test time. Moreover, we have proposed a design for test circuit to complement the March test sequence for at-speed testing of SRAMs. The proposed technique, referred as double sensing, can be used to test the stability of SRAM cells during read operations. Using the proposed March test sequence along with the double sensing technique, a test time reduction of 29% is achieved, compared to the existing test techniques with the same fault coverage. We have also demonstrated that double sensing can be used during SRAM normal operation for online detection and correction of any number of random read faults.  相似文献   

16.
一种基于存储器故障原语的March测试算法研究   总被引:1,自引:0,他引:1  
研究高效率的系统故障测试算法,建立有效的嵌入式存储器测试方法,对提高芯片良品率、降低芯片生产成本,具有十分重要的意义.从存储器基本故障原语测试出发,在研究MarchLR算法的基础上,提出March LSC新算法.该算法可测试现实的连接性故障,对目前存储器的单一单元故障及耦合故障覆盖率提升到100%.采用March LSC算法,实现了内建自测试电路(MBIST).仿真实验表明,March LSC算法能很好地测试出嵌入式存储器故障,满足技术要求.研究结果具有重要的应用参考价值.  相似文献   

17.
当今,嵌入式存储器在SoC芯片面积中所占的比例越来越大,成为SoC芯片发展的一个显著特点。由于本身单元密度很高,嵌入式存储器比芯片上面的其它元件更容易造成硅片缺陷,成为影响芯片成品率的一个重要因素。本文对采用MARCH-C算法的嵌入式存储器内建自测试进行了改进,实现了对嵌入式存储器故障的检测和定位,能够准确判断故障地址和故障类型,使嵌入式存储器故障修复更加快捷、准确,同时达到故障覆盖率高、测试时间短的目的。  相似文献   

18.
With the rapid shrinking of technology and growing integration capacity, the probability of failures in Networks-on-Chip (NoCs) increases and thus, fault tolerance is essential. Moreover, the unpredictable locations of these failures may influence the regularity of the underlying topology, and a regular 2D mesh is likely to become irregular. Thus, for these failure-prone networks, a viable routing framework should comprise a topology-agnostic routing algorithm along with a cost-effective, scalable routing mechanism able to handle failures, irrespective of any particular failure patterns. Existing routing techniques designed to route irregular topologies efficiently lack flexibility (logic-based), scalability (table-based) or relaxed switch design (uLBDR-based). Designing an efficient routing implementation technique to address irregular topologies remains a pressing research problem. To address this, we present a fault resilient routing mechanism for irregular 2D meshes resulting from failures. To handle irregularities, it avoids using routing tables and employs a few fixed configuration bits per switch resulting in a scalable approach. Experiments demonstrate that the proposed approach is guaranteed to tolerate all locations of single and double-link failures and most multiple failures. Also, unlike uLBDR it is not restricted to any particular switching technique and does not replicate any extra messages. Along with fault tolerance, the proposed mechanism can achieve better network performance in fault-free cases. The proposed technique achieves graceful performance degradation during failure. Compared to uLBDR, our method has 14% less area requirements and 16% less overall power consumption.  相似文献   

19.
Unlike cases where only a single failure occurs, fault detection and isolation of multiple sensor and actuator failures for engines are difficult to achieve because of the interactive effects of the failed components. If faults all appear either in sensors only or in actuators only, many existing residual generators which provide decoupled residual signals can be employed directly to obtain proper fault detection and isolation. However, when both sensor and actuator failures occur at the same time, their mutual effects on residuals make fault isolation particularly difficult. Under such circumstances, further decision logic is required. In the paper, the authors propose a hexadecimal decision table to relate all possible failure patterns to the residual code. The residual code is obtained through simple threshold testing of the residuals, which are the output of a general scheme of residual generators. The proposed diagnostic system incorporating the hexadecimal decision table has been successfully applied to automotive engine sensors and actuators in both simulation and experimental analyses. Enhancement of the present diagnostic performance by implementing an additional sensor is also described  相似文献   

20.
A new diagnostic test technique for operating margin problems in LSI memory has been developed that makes it possible to distinguish the failed circuit block exactly even if plural failed blocks exist. This method consists of two techniques using a newly developed time domain method (TDM). One is a technique that divides the multiple failure due to the plural failed circuit blocks into single failures. The other is a technique that distinguishes the single failure mode and then locates the failed circuit block. Moreover, a detailed diagnostic technique, combining bit mapping with the new diagnostic test technique, has also been developed. This technique enables distinguishing the failure mode more precisely. It is shown that 50 percent of the distinguished failure modes have 1-block resolution and 33 percent of them have 2-block resolution. For practical purposes, the failed circuit block can almost be distinguished with 1-block resolution by investigating the failure modes in connection with the physical layout.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号