首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
Summary and Conclusions -A novel methodology is proposed for designing fault-tolerant real-time multi-processor systems-on-a-chip to achieve optimal productivity. The methodology employs the heterogeneous built-in-self-repair (BISR) based on graceful degradation and yield enhancement techniques as an embedded optimization engine. The technique exploits the flexibility provided in task-level scheduling and algorithm selection steps. A hardware fault model is developed for modern super-scalar processors and multi-processors which enables an efficient treatment of the synthesis and compilation goals. For the first time, heterogeneous BISR is used at the task level. The key idea is to adapt scheduling and algorithm selection to the available nonfaulty resources. If there is a fault in memory, the algorithms that use less memory are selected and the scheduler exploits the other abundant resource, viz, the processors, more vigorously to compensate for the loss of part of memory. Similarly, a fault in a processor is backed up by memory. The synthesis approach minimizes the degradation in performance for single or multiple faults using simulated annealing-based algorithm selection, scheduling, and assignment algorithms. On the large set of examples this adaptive algorithm selection and scheduling technique has achieved important improvement of throughput compared to conventional nonadaptive schemes. The experimental results also indicate that important improvement in productivity can be achieved by using the extra throughput gained from the technique.  相似文献   

2.
3.
分析了存储器产生错误的原因 ,提出了提高其可靠性的有效途径。结合航天计算机可靠性增长计划 ,给出了一套利用纠检错芯片对其进行容错的方案 ,并给出了通过 CPL D器件实现的仿真结果。最后对容错存储器的可靠性进行了分析。  相似文献   

4.
Processor cores embedded in systems-on-a-chip (SoCs) are often deployed in critical computations, and when affected by faults they may produce dramatic effects. When hardware hardening is not cost-effective, software implemented hardware fault tolerance (SIHFT) can be a solution to increase SoCs’ dependability, but it increases the time for running the hardened application, as well as the memory occupation. In this paper we propose a method that eliminates the memory overhead, by exploiting a new approach to instruction hardening and control flow checking. The proposed method hardens an application online during its execution, without the need for introducing any change in its source code, and is non-intrusive, since it does not require any modification in the main processor’s architecture. The method has been tested with two widely used architectures: a microcontroller and a RISC processor, and proven to be suitable for hardening SoCs against transient faults and also for detecting permanent faults.  相似文献   

5.
The problem of computing the system's Failure Frequency is reduced to the problem of computing its Availability. This is performed by a transformation operator. However, the existing transformation operator is not practical because its transformation time increases exponentially with system size. To overcome this difficulty, this paper proposes a new method of transforming the Availability expression of a system into the corresponding Failure Frequency expression of the system. This method is based on a matrix approach using a 2 /spl times/ 2 matrix consisting of 0, Availability, and Failure Frequency in an appropriate manner. This transformation also enables algorithms for computing the Availability of a system to be transformed into algorithms for computing its Failure Frequency by replacing parameters and operations with matrix parameters and operations. The computation time after transformation is linear with respect to the original Availability algorithm. This implies that the problems of computing other well-known reliability measures including, Availability, Unavailability, MTBF, MTTR, MCT, Failure Rate, Failure Rate, and Failure Frequency, are reduced to only the Availability computation problem.  相似文献   

6.
Beside universality and very low latency, Youssef's randomized self-routing algorithms [25] have high tolerance for multiple faults and more strikingly have the potential for fault tolerance without diagnosis. In this paper we study the performance of Youssef's routing algorithms for faulty Clos networks in the presence of multiple faults in multiple columns with and without fault detection. We show that with fault detection and diagnosis, randomized routing algorithms provide scalable, very efficient and fault tolerant routing mechanisms. Without fault detection and diagnosis, randomized routing provides good fault tolerance for faulty switches in either the first or the second column. The delays become large for faults in the third column or for faults in more than one column. In conclusion, randomized routing enables the system to run without periodic fault detection/diagnosis, and if and when the performance degrades beyond a certain threshold, diagnosis can be performed to improve the routing performance. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

7.
We give a new decomposition algorithm to route a rearrangeable three-stage Clos network in O(nr2) time, which is faster than all existing decomposition algorithms. By performing a row-wise matrix decomposition, this algorithm routes all possible permutations, thus overcoming the limitation on realizable permutations exhibited by many other routing algorithms. This algorithm is extended to the fault tolerant Clos network which has extra switches in each stage, where it provides fault tolerance under faulty conditions and reduces routing time under submaximal fault conditions  相似文献   

8.
虚拟机动态迁移技术为虚拟化系统的资源调度提供了强有力的支撑,Post-Copy算法作为虚拟机动态迁移的两个核心算法之一,凭借其总体迁移时间稳定与迁移停机时间短的优点,一直是国内外学者研究的热点问题。对虚拟机的故障容错机制、迁移过程中的内存页面传输方式与缺页错误的关联性,以及QEMU-KVM平台源码进行了深入的研究,提出了基于事务同步的故障容错方法以提升Post-Copy迁移算法的稳定性。试验结果表明,提出的虚拟机Post-Copy迁移优化算法,能保证迁移过程中源端虚拟机故障、目标端虚拟机故障以及网络故障的迅速修复,能通过较小的代价解决稳定性问题,所提出的方法有效地提升了Post-Copy迁移算法的稳定性,也为以后的优化研究方向提供了参考。  相似文献   

9.
In this paper two dynamic configuration schemes are discussed for megabit BiCMOS static random access memories (SRAMs). Dynamic reconfiguration schemes allows failure detection at the chip level and automatic reconfiguration to fault free memory cells within the chip. The first scheme is a standby system approach where the I/O lines of the memory can be dynamically switched to spare bit slices in the SRAM. This scheme is implemented through a switching network at the memory interface. Every memory access is controlled by a fault status table (FST) which memorizes the fault conditions of each memory block. This FST is implemented outside the memory system. A second dynamic reconfiguration scheme for BiCMOS SRAMs is addressed through a graceful degradation approach. Basic design considerations and performance evaluation of megabit BiCMOS SRAMs using dynamic reconfiguration schemes are presented. The basic properties of the proposed schemes and a prototype VLSI chip implementation details are discussed. BiCMOS SRAM access time improvement of about 35%, chip area of 25%, and chip yield of 10% are achieved, respectively, as compared to conventional methods. A comparison of reliability improvement of 1 Mb BiCMOS SRAMs using dynamic configuration schemes is presented. These two dynamic reconfiguration schemes have considerable importance in reliability improvement when compared to conventional methods. The major advantage is that the size of reconfiguration of the system can be considerably reduced.  相似文献   

10.
Recently, energy efficiency measurement has been emphasized in China for energy conservation purpose, while there are appealing requirement for plug-and-play measuring manner without long outage time. Different with previous approaches, we present a sparse fault tolerant (SFT) method to calculate electricity energy efficiency under IEC PC-118 cloud architecture to accommodate the low speed data acquisition network. It is implemented with only one modification of the incoming line to monitoring the bus voltage. An attenuation distributed approximating approach is developed for fundamental, harmonic and interharmonic frequency and phasor estimation during energy measurement. The performance is verified with different SNR, and the results show that the proposed approach is highly resilient to noise can works well under sparse sampling environments. To guarantee the security of the power system, the trivial signal disturbance is involved as additional noise to verify the performance of SFT, and the performance is also compared with several typical algorithms, such as DFT, WIDFT, S-LMS, IDFT. The experimental results show that SFT can be used under noisy environment for SFT approach and the accuracy can be improved by the selection criteria of residues rather than improve the sampling rate. It can be applied to any form of signal and can be used online without blackout or power line interruption.  相似文献   

11.
Energy conservation of the sensor nodes is the most important issue that has been studied extensively in the design of wireless sensor networks (WSNs). In many applications, the nodes closer to the sink are overburdened with huge traffic load as the data from the entire region are forwarded through them to reach the sink. As a result, their energy gets exhausted quickly and the network is partitioned. This is commonly known as hot spot problem. Moreover, sensor nodes are prone to failure due to several factors such as environmental hazards, battery exhaustion, hardware damage and so on. However, failure of cluster heads (CHs) in a two tire WSN is more perilous. Therefore, apart from energy efficiency, any clustering or routing algorithm has to cope with fault tolerance of CHs. In this paper, we address the hot spot problem and propose grid based clustering and routing algorithms, combinedly called GFTCRA (grid based fault tolerant clustering and routing algorithms) which takes care the failure of the CHs. The algorithms follow distributed approach. We also present a distributed run time management for all member sensor nodes of any cluster in case of failure of their CHs. The routing algorithm is also shown to tolerate the sudden failure of the CHs. The algorithms are tested through simulation with various scenarios of WSN and the simulation results show that the proposed method performs better than two other grid based algorithms in terms of network lifetime, energy consumption and number of dead sensor nodes.  相似文献   

12.
当今空间计算机必须具有强实时下的高速处理能力和自主工作方式下的高可靠性,而对长寿命卫星而言,其可靠性要求使得任何一种模式的单机结构都难以胜任,于是各种各样的冗余方案溶进了星载计算机设计中,而有目的地识别和选择一种结构使其在有限资源的条件下最大限度地实现容错,同时又能达到所要求的性能,这正是本文所追求的目标,这里阐明的是一种模块化的容错结构,它使用简易的冗余内总线.将不同功能的冗余模块紧密地耦合在一起,从而使系统级的性能可以很方便的进行扩展,功能上可以灵活地实现集中或分布,从而达到了既适应空间计算和控制要求,又满足容错的性能要求的目标。  相似文献   

13.
<正> 一、引言 二维阵列结构是一种用途广泛的易于在VLSI中实现的结构。由于这种结构适合于开发并行运算,所以矩阵相乘、傅里叶变换、卷积等信号处理运算均可采用二维阵列结构。但这 种结构的弱点是,一旦阵列中的某一个单元出现故障,它将影响整个阵列的正常运行。为了提高二维阵列的可靠性,在这种结构中引入容错是必要的。 对于二维阵列的容错设计,目前有两类方法:一是结构冗余法,即提供冗余硬件;二是时间冗余法,即通过加倍处理时间实现容错。  相似文献   

14.
This paper presents an integrated approach to the design of an ultrareliable memory system using a variety of coding and modularization techniques on each of the memory subsystem elements. The overall objective is to provide a properly operating memory system in spite of any single indigenous fault (regardless of the number of failures which might ensue). In other words, the system has the capability automatically to: (1) detect single faults and multiple failures, (2) mask failures to prevent malfunctions, without interrupting service, (3) isolate the fault to a replaceable module, and (4) reconfigure the faulty unit out of the system. The storage medium and retrieval circuits are checked and corrected by coding techniques. Some redundancy is used on the subunits, but the total redundancy is less than 20% of the system cost, and diagnostic software is eliminated.  相似文献   

15.
The maintainability, reliability, and availability of a computer system are closely bonded to insure continuing service of a system. The ability of a system to tolerate failures or faults while operating is a principal requirement of a fault tolerant system. A fault tolerant system's design must incorporate considerations for maintenance and reliability in order to provide its ultimate requirement-available operation. These factors are considered in the design philosophy presented in this paper, identified as FAULTPROOF. FAULTPROOF design incorporates redundancy, reliability, maintainability, and adaptability to augment normally accepted fault tolerant design. The design approach described utilizes a hierarchical interconnection mechanism, Intelligent Networked Partitioning, to isolate faulted components.  相似文献   

16.
This paper presents a new methodology for RAM testing based on the PS(n, k) fault model (the k out of n pattern sensitive fault model). According to this model the contents of any memory cell which belongs to an n-bit memory block, or the ability to change the contents, is influenced by the contents of any k -1 cells from this block. The proposed methodology is a transparent BIST technique, which can be efficiently combined with on-line error detection. This approach preserves the initial contents of the memory after the test and provides for a high fault coverage for traditional fault and error models, as well as for pattern sensitive faults. This paper includes the investigation of testing approaches based on transparent pseudoexhaustive testing and its approximations by deterministic and pseudorandom circular tests. The proposed methodology can be used for periodic and manufacturing testing and require lower hardware and time overheads than the standard approaches.This work was supported by the NSF under Grant MIP9208487 and NATO under Grant 910411.  相似文献   

17.
This paper proposes a code placement problem, its ILP formulation, and a heuristic algorithm for reducing the total energy consumption of embedded processor systems including a CPU core, on-chip and off-chip memories. Our approach exploits a non-cacheable memory region for an effective use of a cache memory and as a result, reduces the number of off-chip accesses. Our algorithm simultaneously finds a code layout for a cacheable region, a scratchpad region, and the other non-cacheable region of the address space so as to minimize the total energy consumption of the processor system. Experiments using a commercial embedded processor and an off-chip SDRAM demonstrate that our algorithm reduces the energy consumption of the processor system by 23% without any performance degradation compared to the best result achieved by the conventional approach.  相似文献   

18.
19.
Design and implementation of a distributed evolutionary computing software   总被引:3,自引:0,他引:3  
Although evolutionary algorithm is a powerful optimization tool, its computation cost involved in terms of time and hardware resources increases as the size or complexity of the problem increases. One promising approach to overcome this limitation is to exploit the inherent parallelism of evolutionary algorithms by creating an infrastructure necessary to support distributed evolutionary computing using existing Internet and hardware resources. This paper presents a Java-based distributed evolutionary computing software (Paladin-DEC), which enhances the concurrent processing and performance of evolutionary algorithms by allowing inter-communications of subpopulations among various computers over the Internet. Such a distributed system enables individuals to migrate among multiple subpopulations according to some patterns to induce diversity of elite individuals periodically, in a way that simulates the species evolve in natural environment. The Paladin-DEC software is capable of keeping data integrity throughout the computation, and is incorporated with the features of robustness, security, fault tolerance, and work balancing. The effectiveness and advantages of the Paladin-DEC are illustrated upon two case studies of drug scheduling in cancer chemotherapy and searching probe sets of yeast genome.  相似文献   

20.
Reconfigurability of processor arrays is important due to two reasons (1) to efficiently execute different algorithms and (2) to isolate faulty processors. An array processor that is reconfigurable by the user any number of times to yield a different topology or to isolate faults is envisaged in this paper. The system has a host or controller that broadcasts a command to the interconnect to configure itself into a particular fashion. The interconnect uses static-RAM programming technology and can be programmed to different configurations by sending a different set of bits to the configuration random access memory (RAM) in the interconnect. We present three designs reconfigurable into array, ring, mesh, or Illiac mesh topologies. The first design provides no redundancy or fault tolerance. The second design is capable of graceful degradation by bypassing faulty elements. The third design is capable of graceful degradation by rerouting. The details of the interconnect and the configuration RAM contents for typical configurations are illustrated. It is seen that reconfigurable interconnect results in a highly reconfigurable or polymorphic computer  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号