首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
随着芯片密度的不断增加和对可靠性要求的不断提高,高性能处理器的容错设计越来越受到关注.对近年来高性能处理器的差错校正技术进行了分析和比较,它们被分为时钟级差错恢复、指令级差错恢复、线程级差错恢复以及重构等4类,研究对象包括研究方案、原型和产品.研究结果表明,以片上多处理器和/或同时多线程为特征的高性能处理器除了沿用传统的容错技术之外,多以固有的、旨在为改善性能而重复设置的硬件资源为基础来设计容错机制和调度方案.  相似文献   

2.
Wide attention was recently given to the problem of fault-tolerance in neural networks; while most authors dealt with aspects related to specific VLSI implementations, attention was also given to the intrinsic capacity of survival to faults characterizing the neural modes. The present paper tackles this second theme, considering in particular multilayered feed forward nets. One of the main goals is to identify the real influence of faults on the neural computation in order to show that neural paradigms cannot be considered intrinsically fault tolerant (i.e., able to survive to faults, even several of the most common and simple ones). A high abstraction level (corresponding to the neural graphs) is taken as the basis of the study and a corresponding error model is introduced. The effects of such errors induced by faults are analytically derived to verify the probability of intrinsic masking in the final neural outputs. Then, conditions allowing for complete compensation of the errors induced by faults through weight adjustment are evaluated to test the masking abilities of the network. The designer of a neural architecture should perform such a mathematical analysis to check the actual fault-tolerance features of his or her system. Unfortunately, this involves a very high computational overhead. As a cost-effective alternative for the designer, the use of a behavioral simulation is proposed for a quantitative evaluation of the error effect on the neural computation. Repeated learning (i.e., a new application of the learning procedure on the faulty network) is then experimented to induce error masking. Experimental results prove that even single errors affect the computation in a relevant way and that weight redistribution is not able to induce complete masking after a fault occurred, i.e., the network cannot be considered per se intrinsically fault tolerant and it is not possible to rely on learning only in order to achieve complete masking abilities. Mapping criteria of physical faults onto the abstract errors are finally examined to show the usability of the proposed analysis in evaluating the actual robustness of a neural networks' implementation and in identifying the critical areas where architectural redundancy should be introduced to achieve fault tolerance.  相似文献   

3.
计算机系统容错技术研究   总被引:2,自引:1,他引:1  
针对计算机系统中软、硬件可靠性问题的不同特点,讨论容错技术的最新发展现状,分析计算机系统中的各种容错方法,包括传统的冗余设计、错误回卷恢复机制以及当前研究较多的一般化容错设计方法等,研究目前已有的一些容错方法在反应延迟、容错成本、精确量化、异构同步、可靠性建模等方面存在的缺陷以及待解决关键问题,并对如何进一步更好地完善和使用这些容错方法进行总结。  相似文献   

4.
We focus on automated addition of masking fault-tolerance to existing fault-intolerant distributed programs. Intuitively, a program is masking fault-tolerant, if it satisfies its safety and liveness specifications in the absence and presence of faults. Masking fault-tolerance is highly desirable in distributed programs, as the structure of such programs are fairly complex and they are often subject to various types of faults. However, the problem of synthesizing masking fault-tolerant distributed programs from their fault-intolerant version is NP-complete in the size of the program’s state space, setting the practicality of the synthesis problem in doubt. In this paper, we show that in spite of the high worst-case complexity, synthesizing moderate-sized masking distributed programs is feasible in practice. In particular, we present and implement a BDD-based synthesis heuristic for adding masking fault-tolerance to existing fault-intolerant distributed programs automatically. Our experiments validate the efficiency and effectiveness of our algorithm in the sense that synthesis is possible in reasonable amount of time and memory. We also identify several bottlenecks in synthesis of distributed programs depending upon the structure of the program at hand. We conclude that unlike verification, in program synthesis, the most challenging barrier is not the state explosion problem by itself, but the time complexity of the decision procedures.  相似文献   

5.
Masking fault-tolerance guarantees that programs continually satisfy their specification in the presence of faults. By way of contrast, nonmasking fault-tolerance does not guarantee as much: it merely guarantees that when faults stop occurring, program executions converge to states from where programs continually (re)satisfy their specification. We present in this paper a component based method for the design of masking fault-tolerant programs. In this method, components are added to a fault-intolerant program in a stepwise manner, first, to transform the fault-intolerant program into a nonmasking fault-tolerant one and, then, to enhance the fault-tolerance from nonmasking to masking. We illustrate the method by designing programs for agreement in the presence of Byzantine faults, data transfer in the presence of message loss, triple modular redundancy in the presence of input corruption, and mutual exclusion in the presence of process fail-stops. These examples also serve to demonstrate that the method accommodates a variety of fault-classes. It provides alternative designs for programs usually designed with extant design methods, and it offers the potential for improved masking fault-tolerant programs  相似文献   

6.
A major hurdle in building a quantum computer is overcoming noise, since quantum superpositions are fragile. Developed over the last couple of years, schemes for achieving fault tolerance based on error detection, rather than error correction, appear to tolerate as much as 3–6% noise per gate—an order of magnitude higher than previous procedures. However, proof techniques could not show that these promising fault-tolerance schemes tolerated any noise at all; the distribution of errors in the quantum state has correlations that conceivably could grow out of control. With an analysis based on decomposing complicated probability distributions into mixtures of simpler ones, we rigorously prove the existence of constant tolerable noise rates (“noise thresholds”) for error-detection-based schemes. Numerical calculations indicate that the actual noise threshold this method yields is lower-bounded by 0.1% noise per gate.  相似文献   

7.
Fault-tolerant systems have found wide applications in military,industrial and commercial areas.Most of these systems are constructed by multiple-modular redundancy or error control coding techniques,They need some fault-tolerant specific components (such as voter,switcher,encoder,or decoder) to implement error-detecting or error-correcting functions.However, the problem of error detection location or correction for fault-tolerance specific components them-selves has not been solved properly so far.Thus ,the dependability of a whole fault-tolerant system will be greatly affected.This paper presents a theory of robust fault-masking digital circuits for characterizing fault-tolerant systems with the ability of concurrent error location and a new scheme of dual-modular redundant systems with partially robust fault-masking prperty.A Basic robust fault-masking circuit is composed of a basic functional circuit and an error-locting corrector,Such a circuit not only has the ability of concurrent error correction,but also has the ability of concurrent error location.According to this circuit model ,for a partially robust fault-making dual-modular redundant system,two redundant modules based on alternating-complementary logic consist of the basic functional circuit.An error-correction specific circuit named as alternating-complementary corrector is used as the error-locating corrector.The performance(such as hardware complexity, time delay) of the scheme is analyzed.  相似文献   

8.
精度和可靠性为车载组合导航系统重要的性能指标,基于输出校正的无重置联邦滤波算法在航位推算/惯性导航/GPS(DR/INS/GPS)车载导航系统中具有最高的容错性能,但长时间导航误差发散。采用局部反馈对无重置滤波算法进行改进,研究改进算法的主滤波器权值矩阵构造问题,两局部滤波器相互独立,可以实现容错性能最佳,局部反馈则可提高车载导航系统的子滤波器精度。三组车载导航试验结果证明,改进算法长时间导航精度优于基于输出校正的无重置联邦滤波算法。  相似文献   

9.
A convolutional code can be used to detect or correct infinite sequences of errors or to correct infinite sequences of erasures. First, erasure correction is shown to be related to error detection, as well as error detection to error correction. Next, the active burst distance is exploited, and various bounds on erasure correction, error detection, and error correction are obtained for convolutional codes. These bounds are illustrated by examples.  相似文献   

10.
We focus on the problem of synthesizing failsafe fault-tolerance where fault-tolerance is added to an existing (fault-intolerant) program. A failsafe fault-tolerant program satisfies its specification (including safety and liveness) in the absence of faults. However, in the presence of faults, it satisfies its safety specification. We present a somewhat unexpected result that, in general, the problem of synthesizing failsafe fault-tolerant distributed programs from their fault-intolerant version is NP-complete in the state space of the program. We also identify a class of specifications, monotonic specifications, and a class of programs, monotonic programs, for which the synthesis of failsafe fault-tolerance can be done in polynomial time (in program state space). As an illustration, we show that the monotonicity restrictions are met for commonly encountered problems, such as Byzantine agreement, distributed consensus, and atomic commitment. Furthermore, we evaluate the role of these restrictions in the complexity of synthesizing failsafe fault-tolerance. Specifically, we prove that if only one of these conditions is satisfied, the synthesis of failsafe fault-tolerance is still NP-complete. Finally, we demonstrate the application of monotonicity property in enhancing the fault-tolerance of (distributed) nonmasking fault-tolerant programs to masking.  相似文献   

11.
All existing fault-tolerance job scheduling algorithms for computational grids were proposed under the assumption that all sites apply the same fault-tolerance strategy. They all ignored that each grid site may have its own fault-tolerance strategy because each site is itself an autonomous domain. In fact, it is very common that there are multiple fault-tolerance strategies adopted at the same time in a large-scale computational grid. Various fault-tolerance strategies may have different hardware and software requirements. For instance, if a grid site employs the job checkpointing mechanism, each computation node must have the following ability. Periodically, the computational node transmits the transient state of the job execution to the server. If a job fails, it will migrate to another computational node and resume from the last stored checkpoint. Therefore, in this paper we propose a genetic algorithm for job scheduling to address the heterogeneity of fault-tolerance mechanisms problem in a computational grid. We assume that the system supports four kinds fault-tolerance mechanisms, including the job retry, the job migration without checkpointing, the job migration with checkpointing, and the job replication mechanisms. Because each fault-tolerance mechanism has different requirements for gene encoding, we also propose a new chromosome encoding approach to integrate the four kinds of mechanisms in a chromosome. The risk nature of the grid environment is also taken into account in the algorithm. The risk relationship between jobs and nodes are defined by the security demand and the trust level. Simulation results show that our algorithm has shorter makespan and more excellent efficiencies on improving the job failure rate than the Min–Min and sufferage algorithms.  相似文献   

12.
Replica determinism in distributed real-time systems: A brief survey   总被引:1,自引:0,他引:1  
Replication of entities is a convenient technique to achieve fault-tolerance. The problem of replica determinism thereby is to assure, that replicated entities show consistent behavior in the absence of failures. Possible sources for replica non-determinism as well as basic requirements and strategies to enforce replica determinism are presented. The problem of replica determinism enforcement under real-time constraints is surveyed in the context of the communication problem for distributed systems. Furthermore the close interdependence between replica determinism on the one side and synchronization strategies, handling of failures and redundancy preservation on the other side is reviewed. The impact of synchronous or asynchronous approaches on replication strategies is also discussed.  相似文献   

13.
针对原有桥梁检测仪器误差矫正方法数据矫正精度较差的问题,设计基于激光线扫描的桥梁检测仪器误差自动化矫正方法。设定激光扫描仪参数并应用其完成误差数据的捕捉与处理,应用模糊C均值算法得出误差值的变化区间,结合马尔科夫链构建桥梁检测仪器误差矫正模型,完成检测数据矫正。引用Excel VBA程序实现桥梁检测仪器误差自动化矫正,设定数据库数据项格式,保证数据精度。至此,基于激光线扫描的桥梁检测仪器误差自动化矫正方法设计完成。设计对比实验,制定实验流程,选定实验测试点获取实验结果。与原有方法相比,此方法数据矫正精度较高且精度波动较小。由此可知,此方法优于原有方法,将其应用可有效降低桥梁检测误差的产生。  相似文献   

14.
在云存储技术中,云存储系统的数据容错十分重要,直接关系到整个系统的可用性。当前多数分布式存储系统通过多副本来保证数据的可用性,然而,多副本存储方式也使得数据存储空间翻倍增加,为了降低存储空间,提高数据可用性,有些分布式存储系统开始采用纠错码技术来提高数据可用性和降低数据存储空间占用。通过对MooseFS分布式文件系统进行分析,提出了一种基于MooseFS的纠错码实现方法。通过数据存储效率测试,该方法能够保证常用的“热数据”按照多副本存储,不常用的“冷数据”按照纠错码方式存储,在保证可靠性的同时极大地降低了多副本方式空间占用量。  相似文献   

15.
无数据缓存的容错环形NoC   总被引:1,自引:0,他引:1  
提出一种分层双组双环NoC拓扑结构,该结构中链路分为两组环网,其中有一组环网为主环,另一组为备用环网,用于NoC网络的容错.每组环网中包含一个控制环和一个数据环,控制环采用包的形式交换结点之间路由、链路错误和差错控制信息,数据环用电路交换方式进行数据通信.针对以上NoC拓扑结构,提出交换结点无需缓冲区的三级流水线结构,使得各个IP之间的数据通信延时最小.环网中采用时分复用和优先级相结合的机制,实现了公平路由和带宽的空分复用.仿真结果表明,该结构可以有效避免拥塞、死锁和饥饿,保证带宽充分利用,与理论分析一致.  相似文献   

16.
文本校对在新闻发布、书刊出版、语音输入、汉字识别等领域有着极其重要的应用价值,是自然语言处理领域中的一个重要研究方向。该文对中文文本自动校对技术进行了系统性的梳理,将中文文本的错误类型分为拼写错误、语法错误和语义错误,并对这三类错误的校对方法进行了梳理,对中文文本自动校对的数据集和评价方法进行了总结,最后展望了中文文本自动校对技术的未来发展。  相似文献   

17.
分布式系统中时间同步应用日益广泛,本文研究了分布式系统时间同步技术及时间同步容错策略,给出了误差估算方法,并将滑动窗口演算法应用于时间同步容错策略。  相似文献   

18.
阐述了在中文文本校错系统研究和实现过程中 ,面向文本错误查找与纠错建议产生的语言知识获取及知识库构建的思想及其实现算法 .针对数据稀疏问题探讨了查错知识库的存取技术 ,针对不同错误源 ,重点研究了相似码词典、字驱动双向词典和骨架键词典的构造方法 .基于所构建的知识库而实现的中文文本校错系统 ,其查错的召回率和精确率以及纠错建议的有效率都得到很大的提高  相似文献   

19.
针对传统方法存在管道内壁缺陷检测误差校正率较低、校正时间过长等问题,提出微细管道内壁缺陷检测误差自动校正方法。该方法获取管道内壁全景图,对管道内壁全景图进行展开、预处理,并提取管道病害区域的集合特征,引入查询误差校正表以及结合双线性插值对特征值进行求解,获取微细管道内壁缺陷检测误差。同时,其通过投影不变性原理以及相关的几何特征,计算误差直线的斜率,利用该直线的斜率求解线性方程组得到校正参数,通过校正参数完成误差校正。实验验证可知,所提方法能够有效提高管道内壁缺陷检测误差校正率,减少误差校正时间。  相似文献   

20.
处理器容错技术研究与展望   总被引:3,自引:1,他引:3  
随着生产工艺的进步和硅形体尺寸的缩小,计算机系统面临着前所未有的瞬态故障影响,可信计算已经成为桌面级和嵌入式系统设计和应用的热点,其中以处理器的可信设计为核心.首先,从容错技术角度对处理器提出了一种新颖的、比较全面的分类方法;在此基础上,以处理器容错技术发展趋势为线索,对目前流行的处理器结构、微结构的容错机制和容错技术以及不同层次上有代表性的最新研究成果做了介绍和分析;最后,对处理器容错技术研究新趋势及其发展方向提出了意见和建议.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号