期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

陈禾毛志刚叶以正《计算机研究与发展》1999,36(10):1246-1252

文中提出了一种快速离散余弦变换电路的开发错误检测结构。为了达到１００％的故障覆盖率,ＦＣＴ采用基于第３类离散余弦变换的Ｂ．Ｇ．Ｌｃｅ算法蝶型结构实现。相似文献

2.

Design of the coarse-grained reconfigurable architecture DART with on-line error detection

S.M.A.H. Jafri S.J. Piestrak O. Sentieys S. Pillement 《Microprocessors and Microsystems》2014

This paper presents the implementation of the coarse-grained reconfigurable architecture (CGRA) DART with on-line error detection intended for increasing fault-tolerance. Most parts of the data paths and of the local memory of DART are protected using residue code modulo 3, whereas only the logic unit is protected using duplication with comparison. These low-cost hardware techniques would allow to tolerate temporary faults (including so called soft errors caused by radiation), provided that some technique based on re-execution of the last operation is used. Synthesis results obtained for a 90 nm CMOS technology have confirmed significant hardware and power consumption savings of the proposed approach over commonly used duplication with comparison. Introducing one extra pipeline stage in the self-checking version of the basic arithmetic blocks has allowed to significantly reduce the delay overhead compared to our previous design. 相似文献

3.

AdaBoost-based face detection for embedded systems

Ming Yang James Crenshaw Bruce Augustine Russell Mareachen Ying Wu 《Computer Vision and Image Understanding》2010,114(11):1116-1125

Face detection is a widely studied topic in computer vision, and recent advances in algorithms, low cost processing, and CMOS imagers make it practical for embedded consumer applications. As with graphics, the best cost-performance ratio is achieved with dedicated hardware. In this paper, we design an embedded face detection system for handheld digital cameras or camera phones. The challenges of face detection in embedded environments include an efficient pipeline design, bandwidth constraints set by low cost memory, a need to find parallelism, and how to utilize the available hardware resources efficiently. In addition, consumer applications require reliability which calls for a hard real-time approach to guarantee that processing deadlines are met. Specifically, the main contributions of the paper include: (1) incorporation of a Genetic Algorithm in the AdaBoost training to optimize the detection performance given the number of Haar features; (2) a complexity control scheme to meet hard real-time deadlines; (3) a hardware pipeline design for Haar-like feature calculation and a system design exploiting several levels of parallelism. The proposed architecture is verified by synthesis to Altera’s low cost Cyclone II FPGA. Simulation results show the system can achieve about 75–80% detection rate for group portraits. 相似文献

4.

Reducing Soft Errors through Operand Width Aware Policies 总被引：1，自引：0，他引：1

Ergin Oguz Unsal Osman Vera Xavier Gonzalez Antonio 《Dependable and Secure Computing, IEEE Transactions on》2009,6(3):217-230

Soft errors are an important challenge in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors with each new microprocessor generation. In this paper, we propose simple mechanisms that effectively reduce the vulnerability to soft errors in a processor. Our designs are generally motivated by the fact that many of the produced and consumed values in the processors are narrow and their upper order bits are meaningless. Soft errors caused by any particle strike to these higher order bits can be avoided by simply identifying these narrow values. Alternatively, soft errors can be detected or corrected on the narrow values by replicating the vulnerable portion of the value inside the storage space provided for the upper order bits of these operands. As a faster but less fault tolerant alternative to ECC and parity, we offer a variety of schemes that make use of narrow values and analyze their efficiency in reducing soft error vulnerability of different data-holding components of a processor. On average, techniques that make use of the narrowness of the values can provide 49 percent error detection, 45 percent error correction, or 27 percent error avoidance coverage for single bit upsets in the first level data cache across all Spec2K. In other structures such as the immediate field of the issue queue, an average error detection rate of 64 percent is achieved. 相似文献

5.

Modeling and characterizing GPGPU reliability in the presence of soft errors

Jingweijia Tan Yang Yi Fangyang Shen Xin Fu 《Parallel Computing》2013

The general-purpose computing on graphic processing units (GPGPUs) becomes increasingly popular due to its high computational throughput for data parallel applications. Modern GPU architectures have limited capability for error detection and fault tolerance since they are originally designed for graphics processing. However, the rigorous execution correctness is required for general-purpose applications, which makes reliability a growing concern in the GPGPU architecture design. With CMOS processing technologies continuously scaling down to the nano-scale, on-chip soft error rate (SER) has been predicted to increase exponentially. GPGPUs with hundreds of cores integrated into a single chip are prone to manifest high SER. This paper explores a first step to model and characterize GPGPU reliability in light of soft errors. We develop GPGPU-SODA (GPGPU SOftware Dependability Analysis), a framework to estimate the soft-error vulnerability of GPGPU microarchitecture. By using GPGPU-SODA, we observe that several microarchitecture structures in GPGPUs exhibit high soft-error susceptibility, and the structure vulnerability is sensitive to the workload characteristics (e.g. branch divergences, memory access pattern). We further investigate the impact of several architectural optimizations on GPU soft-error robustness. For example, we find that increasing the number of threads supported by GPU significantly affects the GPGPU soft-error robustness. However, changing the warp scheduling policy has little impact on the structure vulnerability. The observations made in this study provide designers the useful guidance to build resilient GPGPUs: a comprehensive resiliency solution for GPGPUs should consider the entire GPGPU design instead of solely focusing on a particular structure. 相似文献

6.

基于软错误的动态程序可靠性分析和评估

熊磊覃庆平《小型微型计算机系统》2011,32(11)

基于软件实现的软错误容错方法不需要硬件开销,被认为是一种高效的软错误容错方法,而动态的实现这种方法能覆盖更多种类型的程序,因而能覆盖更多的软错误,分析硬件软错误对程序执行时代码和数据的逻辑影响,并建立了硬件软错误条件下程序运行可靠性评估模型.本文的工作为基于软件动态软错误容错算法的提出提供了理论基础,也为程序可靠性的评估提供了一种方法.我们依据体系结构层硬件对指令执行的影响将硬件构件进行分类,并分析了不同的硬件构件对程序代码和数据的逻辑影响.基于软错误对程序代码和数据的影响模型,建立了软错误条件下程序运行可靠性评估模型.最后,在实验中,对软错误条件下程序影响模型和程序运行可靠性评估模型进行了验证,实验结果证明了本文的分析和评估结果. 相似文献

7.

A novel concurrent error detection scheme for FFT networks

Tao D.L. Hartmann C.R.P. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(2):198-221

The algorithm-based fault tolerance techniques have been proposed to obtain reliable results at very low hardware overhead. Even though 100% fault coverage can be theoretically obtained by using these techniques, the system performance, i.e., fault coverage and throughput, can be drastically reduced due to many practical problems, e.g., round-off errors. A novel algorithm-based fault tolerance scheme is proposed for fast Fourier transform (FFT) networks. It is shown that the proposed scheme achieves 100% fault coverage theoretically. An accurate measure of the fault coverage for FFT networks is provided by taking the round-off error into account. The proposed scheme is shown to provide concurrent error detection capability to FFT networks with low hardware overhead, high throughput, and high fault coverage 相似文献

8.

片上多核处理器容软错误执行模型

龚锐戴葵王志英《计算机学报》2008,31(11)

随着工艺的进步,微处理器将面临越来越严重的软错误威胁.文中提出了两种片上多核处理器容软错误执行模型:双核冗余执行模型DCR和三核冗余执行模型TCR.DCR在两个冗余的内核上以一定的时间间距运行两份相同的线程,store指令只有在进行了结果比较以后才能提交.每个内核增加了硬件实现的现场保存与恢复机制,以实现对软错误的恢复.文中选择的现场保存点有利于隐藏现场保存带来的时间开销,并且采用了特殊的机制保证恢复执行和原始执行过程中load数据的一致性.TCR执行模型通过在3个不同的内核上运行相同的线程实现对软错误的屏蔽.在检测到软错误以后,TCR可以进行动态重构,屏蔽被软错误破坏的内核.实验结果表明,与传统的软错误恢复执行模型CRTR相比,DCR和TCR对核间通信带宽的需求分别降低了57.5%和54.2%.在检测到软错误的情况下,DCR的恢复执行带来5.2%的性能开销,而TCR的重构带来的性能开销为1.3%.错误注入实验表明,DCR能够恢复99.69%的软错误,而TCR实现了对SEU(Single Event Upset)型故障的全面屏蔽. 相似文献

9.

Performance counter based online pipeline bugs detection using machine learning techniques

《Microprocessors and Microsystems》2021

The growing complexity of new features in multicore processors imposes significant pressure towards functional verification. Although a large amount of time and effort are spent on it, functional design bugs escape into the products and cause catastrophic effects. Hence, online design bug detection is needed to detect the functional bugs in the field. In this work, we propose a novel approach by leveraging Performance Monitoring Counters (PMC) and machine learning to detect and locate pipeline bugs in a processor. We establish the correlation between PMC events and pipeline bugs in order to extract the features to build and train machine learning models. We design and implement a synthetic bug injection framework to obtain datasets for our simulation. To evaluate the proposal, Multi2Sim simulator is used to simulate the x86 architecture model. An x86 fault model is developed to synthetically inject bugs in x86 pipeline stages. PMC event values are collected by executing the SPEC CPU2006 and MiBench benchmarks for both bug and no-bug scenarios in the x86 simulator. This training data obtained through simulation is used to build a Bug Detection Model (BDM) that detects a pipeline bug and a Bug Location Model (BLM) that locates the pipeline unit where the bug occurred. Simulation results show that both BDM and BLM provide an accuracy of 97.3% and 91.6% using Decision tree and Random forest, respectively. When compared against other state of art approaches, our solution can locate the pipeline unit where the bug occurred with a high accuracy and without using additional hardware. 相似文献

10.

Explore prediction for instruction level redundant execution in fault tolerant microprocessors

《Journal of Systems Architecture》2016

Many devices with modern microprocessor have generated an increased attention for transient soft errors. Previous strategies for instruction level temporal redundancy in super-scalar out-of-order processors have up to 45% performance degradation in certain applications compared to normal execution. The reason is that the redundant workload slows down the normal execution. Solutions are proposed to avoid certain redundant execution by reusing the result of the previously executed instructions, but there are still limitations on the instruction level parallelism and the pipeline throughput. In this paper, we propose a novel technique to recover the performance gap between instruction level temporal redundancy and normal execution. We present a set of micro-architectural extensions to implement the reliability prediction and integrate it with the issue logic of a dual instruction stream superscalar core, and conduct extensive evaluations to demonstrate how it can solve the performance problem. Experiments show that in average it can gain back nearly 71.13% of the overall IPC loss caused by redundant execution. Generally, it exhibits much performance and power efficiency within a high transient error rate. 相似文献

11.

A novel tool-flow for zero-overhead cross-domain error resilient partially reconfigurable X-TMR for SRAM-based FPGAs

《Journal of Systems Architecture》2017

Triple Modular Redundancy is a widely used fault-tolerance methodology for highly-reliable electronic systems mapped on SRAM-based FPGAs. However, the state-of-the-art TMR techniques are unable to effectively deal with cross-domain errors and increased scrubbing time due to growing size of configuration memory. In order to deal with the aforementioned problems, this work proposes a TMR architecture that exploits the fracturable nature of Look Up Tables for simultaneously mapping of majority-voting and error detection at the granularity of TMR domains. An associated CAD flow is developed for partial reconfiguration of TMR domains incorporating changes to the technology mapping, placement and bitstream generation phases. Our results demonstrate that we can achieve significant reduction in repairing times along with better resilience to cross-domain errors with zero hardware overhead compared to the existing TMR methodologies. 相似文献

12.

Improved error detection performance of logic implication checking in FPGA circuits

《Microprocessors and Microsystems》2020

Increased feature scaling to achieve high performance of miniaturized circuits has increased concerns related to their reliability as smaller circuits age faster. This means that more computational errors due to defects are expected in modern nanoscale circuits. Logic implication checking is a concurrent error detection technique that can detect a partial number of these errors at reduced hardware costs. However, implications-based error detection suffers from a low error coverage in FPGA-implemented circuits making it useless for any practical purposes. In this paper, we identify the reasons for a degraded performance of implication checking in FPGAs and propose multi-wire implications towards achieving better error detection probabilities (P_detection). The addition of multi-wire implications boosts the number of candidate implications and contributes more valuable implications thereby increasing the average P_detection achieved by almost 1.7 times at around 65.7% with only a 25% increase in the average area overhead for the given test circuits. Moreover, we show that the efficiency of implications in detecting errors not only varies from one circuit to another but that it also depends largely on the specific implementation of the circuit under test as supported through analytic analyses and comparisons between experimental results obtained from hardware fault injection of the implemented circuits and fault simulations on corresponding circuit netlists. 相似文献

13.

Reliable advanced encryption standard hardware implementation: 32- bit and 64-bit data-paths

《Microprocessors and Microsystems》2021

Cryptographic primitives are extensively used in today's applications to provide the desired security. Malicious or accidental faults that occur in the hardware implementations of cryptographic primitives, specifically in this paper the Advanced Encryption Standard (AES), can result in an erroneous output of encryption/decryption process and reduce the reliability of the cryptographic hardware. The use of a suitable fault-tolerant scheme for AES, to recover it from failures or attacks and bring it back to an operational state, is crucial for reliability, and consequently for security purposes. In this paper, two novel online fault-tolerant schemes are proposed for AES. In the proposed fault-tolerant architecture, the round path is modified and divided it into two pipeline stages. The proposed fault-tolerant schemes are based on a combination of hardware and time redundancies, where a new hardware redundancy is proposed for the AES round function and a time redundancy for the hardware of the AES key expansion unit. The presented fault-tolerant schemes are valid for all versions of AES and are independent of its S-box implementation manner. Both ASIC and FPGA implementations of the original and the proposed fault-tolerant AES along with Full TMR (Triple Modular Redundancy) and Full TTR (Triple Time Redundancy) structures are reported as traditional fault-tolerant schemes. It is shown that the first proposed fault-tolerant architecture, named TMRrp&TTRke32, outperforms these approaches and the previous report in the literature in terms of area overhead and therefore power consumption. Also, the other approach, named TMRrp&TTRke64, is better than the other approaches in achieving a trade-off between area overhead and throughput overhead. 相似文献

14.

GS-DMR: Low-overhead soft error detection scheme for stencil-based computation

《Parallel Computing》2015

Soft errors are becoming a prominent problem for massive parallel scientific applications. Dual-modular redundancy (DMR) can provide approximately 100% error coverage, but it has the problem of overhead excessive. Stencil kernel is one of the most important routines applied in the context of structured grids. In this paper, we propose Grid Sampling DMR (GS-DMR), a low-overhead soft error detection scheme for stencil-based computation. Instead of comparing the whole set of the results in the traditional DMR, GS-DMR just compares a subset of the results according to sampling on the grid data, which is based on the error propagation pattern on the grid. We also design a fault tolerant (FT) framework combining GS-DMR with checkpoint technology, and provide theoretical analysis and an algorithm for the optimal FT parameters. Experimental results on Tianhe-2 supercomputer demonstrate that GS-DMR can achieve a good FT effect for stencil-based computation, and the effect is greatly improved for massively parallel applications, reducing the total FT overhead up to 51%. 相似文献

15.

Tolerating Radiation-Induced Transient Faults in Modern Processors

Xiaobin Li Jean-Luc Gaudiot 《International journal of parallel programming》2010,38(2):85-116

As MOS device sizes continue shrinking, lower charges, for example those charges carried by single ionizing particles of naturally occurring radiation, are sufficient to upset the functioning of complex modern microprocessors. In order to handle these inevitable errors, designs should include fault-tolerant features so that the processors can continue to correctly perform despite the occurrence of errors. The main goal of this work is to develop architecture mechanisms to protect processors against the effect of such radiation-induced transient faults. It should first be noted that, from a program execution perspective, many faults manifest themselves as control flow errors that cause processors to violate the correct sequencing of instructions. We present here at first a basic compile-time signature assignment algorithm and describe a novel approach to improve the fault detection coverage of the basic algorithm. Moreover, to allow the processor to efficiently check the run-time sequence and detect control flow errors, we introduce an on-chip assigned-signature checker which is capable of executing three additional instructions (SIC, SIJ, SIJC). Second, since the very concept of simultaneous multi-threading (SMT) provides the necessary redundancy, some proposals have been made to run two copies of the same thread on top of SMT platforms in order to detect and correct soft errors. This allows, upon detection of an error, the rolling back of the processor state to a known safe point, and then a retry of the instructions, thereby effecting a completely error-free execution. This paper has focused on two crucial implementation issues introduced by this scheme: (1) the design trade-off between the fault detection coverage versus design costs; (2) the possible occurrence of deadlock situations. 相似文献

16.

The Effects of an ARMOR-based SIFT environment on the performance and dependability of user applications

Whisnant K. Iyer R.K. Kalbarczyk Z.T. Jones P.H. III Rennels D.A. Some R. 《IEEE transactions on pattern analysis and machine intelligence》2004,30(4):257-277

Few, distributed software-implemented fault tolerance (SIFT) environments have been experimentally evaluated using substantial applications to show that they protect both themselves and the applications from errors. We present an experimental evaluation of a SIFT environment used to oversee spaceborne applications as part of the Remote Exploration and Experimentation (REE) program at the Jet Propulsion Laboratory. The SIFT environment is built around a set of self-checking ARMOR processes running on different machines that provide error detection and recovery services to themselves and to the REE applications. An evaluation methodology is presented in which over 28,000 errors were injected into both the SIFT processes and two representative REE applications. The experiments were split into three groups of error injections, with each group successively stressing the SIFT error detection and recovery more than the previous group. The results show that the SIFT environment added negligible overhead to the application's execution time during failure-free runs. Correlated failures affecting a SIFT process and application process are possible, but the division of detection and recovery responsibilities in the SIFT environment allows it to recover from these multiple failure scenarios. Only 28 cases were observed in which either the application failed to start or the SIFT environment failed to recognize that the application had completed. Further investigations showed that assertions within the SIFT processes-coupled with object-based incremental checkpointing-were effective in preventing system failures by protecting dynamic data within the SIFT processes. 相似文献

17.

Error Detection and Fault Tolerance in ECSM Using Input Randomization 总被引：1，自引：0，他引：1

Dominguez-Oviedo Agustin Hasan M. Anwar 《Dependable and Secure Computing, IEEE Transactions on》2009,6(3):175-187

For some applications, elliptic curve cryptography (ECC) is an attractive choice because it achieves the same level of security with a much smaller key size in comparison with other schemes such as those that are based on integer factorization or discrete logarithm. For security reasons, especially to provide resistance against fault-based attacks, it is very important to verify the correctness of computations in ECC applications. In this paper, error-detecting and fault-tolerant elliptic curve cryptosystems are considered. Error detection may be a sufficient countermeasure for many security applications; however, fault-tolerant characteristic enables a system to perform its normal operation in spite of faults. For the purpose of detecting errors due to faults, a number of schemes and hardware structures are presented based on recomputation or parallel computation. It is shown that these structures can be used for detecting errors with a very high probability during the computation of the elliptic curve scalar multiplication (ECSM). Additionally, we show that using parallel computation along with either PV or recomputation, it is possible to have fault-tolerant structures for the ECSM. If certain conditions are met, these schemes are more efficient than others such as the well-known triple modular redundancy. Prototypes of the proposed structures for error detection and fault tolerance have been implemented, and experimental results have been presented. 相似文献

18.

TRUSS: a reliable, scalable server architecture

Gold B.T. Kim J. Smolens J.C. Chung E.S. Liaskovitis V. Nurvitadhi E. Falsafi B. Hoe J.C. Nowatzyk A.G. 《Micro, IEEE》2005,25(6):51-59

Traditional techniques that mainframes use to increase reliability -special hardware or custom software - are incompatible with commodity server requirements. The Total Reliability Using Scalable Servers (TRUSS) architecture, developed at Carnegie Mellon, aims to bring reliability to commodity servers. TRUSS features a distributed shared-memory (DSM) multiprocessor that incorporates computation and memory storage redundancy to detect and recover from any single point of transient or permanent failure. Because its underlying DSM architecture presents the familiar shared-memory programming model, TRUSS requires no changes to existing applications and only minor modifications to the operating system to support error recovery. 相似文献

19.

Techniques for on-demand structural redundancy for massively parallel processor arrays

《Journal of Systems Architecture》2015,61(10):615-627

In this paper, we present techniques for providing on-demand structural redundancy for Coarse-Grained Reconfigurable Array (CGRAs) and a calculus for determining the gains of reliability when applying these replication techniques from the perspective of safety-critical parallel loop program applications. Here, for protecting massively parallel loop computations against errors like soft errors, well-known replication schemes such as Dual Modular Redundancy (DMR) and Triple Modular Redundancy (TMR) must be applied to each single Processor Element (PE) rather than one based on application requirements for reliability and Soft Error Rates (SERs). Moreover, different voting options and signal replication schemes are investigated. It will be shown that hardware voting may be accomplished at negligible hardware cost, i. e. less than two percent area overhead per PE, for a class of reconfigurable processor arrays called Tightly Coupled Processor Arrays (TCPAs). As a major contribution of this paper, a formal analysis of the reliability achievable by each combination of replication and voting scheme for parallel loop executions on CGRAs in dependence of a given SER and application timing characteristics (schedule) is elaborated. Using this analysis, error detection latencies may be computed and proper decisions which replication scheme to choose at runtime to guarantee a maximal probability of failure on-demand can be derived. Finally, fault-simulation results are provided and compared with the formal analysis of reliability. 相似文献

20.

基于网络内容的无阻塞近似流分类的并行建模

李旭东徐扬李竞刘斌《计算机研究与发展》2005,42(6):938-944

针对大字符集语言的特点,提出一种并行硬件模型实现基于网络内容的近似流分类.由于采用并行设计和流水线设计,该模型在大规则库下仍有较好的性能,并可适用于高速网络.该并行模型有如下特点:①通过采用不同的规则组合器可完成插入、删除、替代和交换错误的近似匹配;②通过配置参数,可灵活控制近似匹配的程度;③可直接应用于大字符集语言下的网络内容流分类;④针对中文环境做了概率建模,分析了并行硬件模型对网络分组的匹配概率,证明该模型在一般情况下具有较好的可应用性. 相似文献