期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Stochastic testing of processing cores in a many-core architecture

《Integration, the VLSI Journal》2016

A promising solution to reliability challenges in nano-scale fabrication technologies is self-test and reconfiguration. In this direction, we propose an autonomous test mechanism for online detection of permanent faults in many-core processors. Several hardware test components are incorporated in the many-core architecture. Some of these components distribute software-based self-test routines among the processing cores and make each test routine accessible for a limited amount of time. A processing core that has an idle slot executes the test routine, otherwise it skips it without loss of test continuity. Several components of the proposed test architecture monitor behavior of the processing cores during execution of test routines, detect faulty cores, and make their omission from the system possible. We propose the use of an extended form of Petri NET modeling method to model and analyze the proposed test mechanism and tune our test architecture to preserve quality of test, and at the same time, manage the overall test time. Our experimental results show that test time and hardware overhead of the proposed test mechanism are low and its performance overhead is zero. Furthermore, the proposed test architecture can efficiently scale to a many-core with a large number of processing cores. 相似文献

2.

A fault-tolerant protocol for election in chordal-ring networkswith fail-stop processor failures

Yi Pan Singh G. 《Reliability, IEEE Transactions on》1997,46(1):11-17

In a distributed computer system, a group of processors is connected by communication links into a network. Each processor (node) of the network has an identity (a unique integer value) that is not related to its position in the network (its address). A processor's identity is known only to the processor. In the problem of leader election, exactly one processor among a network of processors has to be distinguished as the leader. Previously, many efficient election protocols have been proposed for networks with a sense of direction. In particular, the sequential search is used for election in a reliable complete network, and a multi-token search method is used in a faulty complete network. However, election protocols on a faulty ChRgN (chordal ring network) have not been investigated by other researchers. This paper addresses this issue by: studying the problem of leader election in an asynchronous ChRgN with a sense of direction and with the presence of undetectable fail-stop processor failures; proposing a new election protocol which (a) combines the concept of sequential search and multi-token search techniques, and (b) uses an efficient routing algorithm to reduce the total number of messages used; presenting a protocol for a ChRgN of n processors with I chords/processor and at most f fail-stop faulty processors, with message complexity O(n+(n/l)log(n)+k·f), where k is the number of processors starting the election process spontaneously and at most f相似文献

3.

Design and performance evaluation of an adaptive FPGA for network applications

Christoforos Kachris Stephan Wong 《Microelectronics Journal》2009,40(7):1103-1110

This paper presents the design, implementation and performance evaluation of a coarse-grain dynamically reconfigurable FPGA platform for multi-service edge and access network devices. The platform consists of two MicroBlaze RISC processors and a number of hardware co-processors used for the processing of packet payloads (Data Encryption Standard (DES) and Lempel-Ziv Compression). The co-processors can be connected either directly to the processors or using a shared bus. The functionality of the co-processors is dynamically reconfigured to meet the requirements of the network workload. The system has been implemented on the Xilinx Virtex II Pro platform and the network traces from real passive measurements have been used for performance evaluation. The use of dynamically reconfigurable co-processors for network applications shows that the performance speedup versus a static version varies from 12% to 35% in the best case and from 10% to 15% on average, depending on the network traffic fluctuation. 相似文献

4.

Performance analysis of fault-tolerant systems in parallelexecution of conversations

Kim K.H. Heu S. Yang S.M. 《Reliability, IEEE Transactions on》1989,38(1):96-102

The execution overhead inherent in the conversation scheme, which is a scheme for realizing fault-tolerant cooperating processes free of the domino effect, is analyzed. Multiprocessor multicomputer systems capable of parallel execution of conversation components are considered and a queuing network model of such systems is adopted. Based on the queuing model, various performance indicators, including system throughput, average number of processors idling inside a conversation due to the synchronization required, and average time spent in the conversation, have been evaluated numerically for several application environments. The numeric results are discussed and several essential performance characteristics of the conversation scheme are derived. For example, when the number of participant processes is not large, say less than six, the system performance is highly affected by the synchronization required on the processes in a conversation, and not so much by the probability of acceptance-test failure 相似文献

5.

Design of self-diagnostic boards by signature analysis

Karpovsky M.G. Nagvajara P. 《Industrial Electronics, IEEE Transactions on》1989,36(2):241-245

The authors present a single-faulty-chip diagnostic technique which requires only two reference signatures for any number of chips on the original board. With this technique, it is possible to reduce substantially the hardware overhead compared to the diagnostic technique based on separate testing of each chip on the board. The technique can be also used for identification of faulty printed boards in a system or for identification of faulty processors in a multiprocessor system 相似文献

6.

Design of a self-reconfiguring interconnection network forfault-tolerant VLSI processor arrays

Pateras S. Rajski J. 《Reliability, IEEE Transactions on》1989,38(1):40-50

An interconnection network capable of spontaneously reconfiguring a VLSI processor array on detection of faulty processors is presented. Although the reconfiguration process is global, the network control circuitry is localized around each processor and is therefore completely modular. The structure of the control circuitry is fixed and thus independent of the array size or the number of spare processors. The network performance in yield enhancement is analyzed through Monte Carlo simulation. The network effectiveness in using surviving processors is close to that of an ideal network (one capable of tolerating as many faulty processors per row as there are spare processors per row). Strategies involved in testing the fault-tolerant array are also presented. Test circuitry is placed around each of the processors to enable testing of all the processors in parallel. The same circuitry is used to test the interconnection network efficiently. The additional silicon area requirements due to the network and the test circuitries are examined through the design of a prototype fault-tolerant array 相似文献

7.

Partitioning Processor Arrays under Resource Constraints

Jürgen Teich Lothar Thiele Lee Z. Zhang 《The Journal of VLSI Signal Processing》1997,17(1):5-20

A single integer linear programming model for optimally scheduling partitioned regular algorithms is presented. The herein presented methodology differs from existing methods in the following capabilities: 1) Not only constraints on the number of available processors and communication capabilities are taken into account, but also local memories and constraints on the size of available memories. 2) Different types of processors can be handled. 3) The size of the optimization model (number of integer variables) is independent of the size of the tiles to be executed. Hence, 4) the number of integer variables in the optimization model is greatly reduced such that problems of relevant size can be solved in practical execution time. 相似文献

8.

Guest Editors'; Introduction

Parhi Keshab K. Nishitani Takao Yamauchi Hironori 《Journal of Signal Processing Systems》1997,16(1):5-7

A single integer linear programming model for optimally scheduling partitioned regular algorithms is presented. The herein presented methodology differs from existing methods in the following capabilities: 1) Not only constraints on the number of available processors and communication capabilities are taken into account, but also local memories and constraints on the size of available memories. 2) Different types of processors can be handled. 3) The size of the optimization model (number of integer variables) is independent of the size of the tiles to be executed. Hence, 4) the number of integer variables in the optimization model is greatly reduced such that problems of relevant size can be solved in practical execution time.

相似文献

9.

Distributed diagnosis for homogeneous systems

Yoon-Hwa Choi 《Journal of Electronic Testing》1990,1(3):201-211

This article presents a distributed fault-diagnosis algorithm for identifying faulty and fault-free units (processors, PEs, cells) in homogeneous systems. It is based on local comparison among units in a system and dissemination of the test results. Each unit performs comparison with its neighbors by using its own comparator. Unlike other approaches, the algorithm does not assume that diagnostic circuits are fault free. The algorithm is simple enough to be realized with small circuit overhead. The results are especially useful in locating faulty units in processor arrays implemented on a single chip or wafer. Computer simulation has shown that even for low unit yields, extremely high performance (fault coverage) can be obtained by adjusting algorithm parameters. 相似文献

10.

Mitigating Memory Wall Effects in High-Clock-Rate and Multicore CMOS 3-D Processor Memory Stacks

《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2009,97(1):108-122

Three-dimensional chip (3-D) stacking technology provides a new approach to address the so-called memory wall problem. Memory processor chip stacking reduces this memory wall problem, permitting faster clock rates (with suitable processor logic) or permitting multicore access to shared memory using a large number of vertical vias between tiers in the stack, for ultrawide bit path transfer of data and address information to and from various levels of cache. Although a limited amount of parallel access is possible using conventional two-dimensional (2-D) chip memory-processor approaches, 3-D memory-processor stacking greatly extends this to much larger capacity memories. We evaluate high-clock-rate processors as well as shared memory processors with a large number of cores. Various architectural design options to reduce the impact of the memory wall on the processor performance are explored and validated through simulations. Certain architectural features can be implemented in a 3-D chip, such as an ultrawide, ultrashort vertical bus with low parasitic resistance and the elimination of conventional electrostatic discharge, and packaging parasitics required in multiple package 2-D solutions. The objective is to reduce the clocks per instruction figure of merit for high clock speeds in order to deliver significant performance levels. High-clock-rate processors can be designed with SiGe heterostructure bipolar transistors to obtain processors operating on the order of 16 or 32 GHz. 相似文献

11.

Eventually Byzantine Agreement on CDS-based mobile ad hoc network

Mao-Lun Chiang^{Author Vitae} 《Ad hoc Networks》2012,10(3):388-400

Reliability is an important research topic in the study of distributed systems. Under many circumstances, a healthy processor in a distributed system needs to reach a common agreement before performing some special tasks even if the faults exist. In order to achieve fault-tolerance in distributed systems, one must deal with the Byzantine Agreement (BA) problem. Most BA problem require all the healthy processors to obtain an agreement at the same round, this kind of agreement is called an Immediate Byzantine Agreement (IBA). Another kind of agreement, Eventual Byzantine Agreement (EBA), allows its participants to reach a common agreement at different rounds when the f_act < f_p (f_act is the number of actual arbitrary faulty processors; f_p is the number of tolerate arbitrary faulty processors). However, the traditional EBA problem is solved in well-defined networks, but the Mobile Ad hoc NETworks (MANETs) are increasing in popularity. Therefore, EBA problem is revisited under dual failure mode (processors and transmission media) in the MANET. The proposed protocol, Early Dual Agreement Protocol (EDAP), can achieve agreement while tolerating the maximum number of faulty processors and transmission media in a MANET by using the minimum number of message exchanges. Furthermore, our protocol can manage and organize the network efficiently even if the processors move around the network. 相似文献

12.

Synthesis of custom interleaved memory systems

Song Chen Postula A. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(1):74-83

This paper presents a novel approach to the synthesis of interleaved memory systems that is especially suited for application-specific processors. Our synthesis system generates the optimized interleaved memories for a specific algorithm and finds the best mapping of arrays in that algorithm onto the memory system to achieve high performance. The design space is four-dimensional (4-D) and comprises the number of memory banks, the type of memory components, the storage scheme, and the range of clock period in the system. Optimal designs are found among the Pareto points (a set of nondominated points in the design space) computed for our memory model under the performance and cost criteria set by the designer. The memory model includes all the components of an interleaved memory system and covers a lookup table-based address generation with data alignment. The synthesis is based on a general periodic storage scheme, which enables efficient handling of irregular and overlapped access patterns. The synthesis process is the exhaustive search of the heavily pruned design space, and the pruning is based on mathematically proven properties of periodic storage schemes. This paper presents the theorems, the synthesis algorithm, and the methods of effective word and bank address generation. Examples are given to illustrate the effectiveness of our method 相似文献

13.

Sequential diagnosis of processor array systems

Jun Zhao Meyer F.J. Park N. Lombardi F. 《Reliability, IEEE Transactions on》2004,53(4):487-498

We examine the diagnosis of processor array systems formed as two-dimensional arrays, with boundaries, and either four or eight neighbors for each interior processor. We employ a parallel test schedule. Neighboring processors test each other, and report the results. Our diagnostic objective is to find a fault-free processor or set of processors. The system may then be sequentially diagnosed by repairing those processors tested faulty according to the identified fault-free set, or a job may be run on the identified fault-free processors. We establish an upper bound on the maximum number of faults which can be sustained without invalidating the test results under worst case conditions. We give test schedules and diagnostic algorithms which meet the upper bound as far as the highest order term. We compare these near optimal diagnostic algorithms to alternative algorithms, both new and already in the literature, and against an upper bound ideal case algorithm, which is not necessarily practically realizable. For eight-way array systems with N processors, an ideal algorithm has diagnosability 3N/sup 2/3/-2N/sup 1/2/ plus lower-order terms. No algorithm exists which can exceed this. We give an algorithm which starts with tests on diagonally connected processors, and which achieves approximately this diagnosability. So the given algorithm is optimal to within the two most significant terms of the maximum diagnosability. Similarly, for four-way array systems with N processors, no algorithm can have diagnosability exceeding 3N/sup 2/3//2/sup 1/3/-2N/sup 1/2/ plus lower-order terms. And we give an algorithm which begins with tests arranged in a zigzag pattern, one consisting of pairing nodes for tests in two different directions in two consecutive test stages; this algorithm achieves diagnosability (3/2)(5/2)/sup 1/3/N/sup 2/3/-(5/4)N/sup 1/2/ plus lower-order terms, which is about 0.85 of the upper bound due to an ideal algorithm. 相似文献

14.

Fault buffers

Tayyeb Mahmood Soontae Kim 《Design Automation for Embedded Systems》2013,17(2):411-438

Voltage scaling can be applied to cache memories to reduce their energy consumptions. However, reduced supply voltage to the cache memories increases the number of defective SRAM cells due to process variations, which will decrease their yields and nullify the benefits of voltage scaling. To mitigate this problem, we propose a fault buffer-based scheme for L1 caches. Faults are identified and isolated at the granularity of individual words in the L1 caches. Actively used faulty cache words are dynamically allocated in the fault buffers. The fault buffers are organized as multiple banks for low cost implementation and can be dynamically reconfigured to reflect varying performance demands of programs. This dynamic scheme is shown to be more energy- and area-efficient than, and to be performing comparably to, the previously proposed static schemes. 相似文献

15.

Self-repairing processor modules

Kilmer W.L. 《Reliability, IEEE Transactions on》1995,44(2):327-332

A processor is any self-contained computer of at least personal-computer capability. The paper explores how much the processor mean time-to-failure can be improved by replacing it with an N-processor module, where each processor in the module consists of a copy of the original processor augmented with a communication protocol unit. The copy of the original processor is faulty with probability, p_c, and the protocol unit is faulty with probability, p. The asynchronous N-processor module uses a Byzantine agreement (F-ID-P) algorithm to identify which of its processors disagreed with a module consensus. The identified processors are presumed faulty, and the module replaces them with duplicates from a set of standbys. The F-ID-P algorithm is a modification of Bracha's, which guarantees that in a module of 3t+1 processors, up to t faults can be identified by at least t+1 non-faulty processors. The module fails if faults in more than t of its processors prevent it from: 1) obtaining a correct consensus, or 2) executing the algorithm. The F-ID-P algorithm departs from Bracha's by using a random instead of an adversary scheduler of message delays. Simulation showed that almost always F-ID-P algorithm correctly identified all of a module's faulty processors if more than half of them were nonfaulty. Thus F-ID-P algorithm was about 3/2 more fault tolerant than guaranteed. Also, compared to a single processor's mean number of decisions to failure, the F-ID-P module was 841 times better when N=37, down to 5.1 times better when N=10 相似文献

16.

Optimal design of fault-tolerant distributed systems based on arecursive algorithm

Pham H. Upadhyaya S.J. 《Reliability, IEEE Transactions on》1991,40(3):375-379

The authors address the issue of optimal design (in terms of the number of processors) of a distributed system which is based on a recursive algorithm for fault tolerance (RAFT). The reliability and performance of the system using RAFT are determined as a function of reliability of individual processors and the number of fault modes in a processor. Also discussed are how to determine the design policies when the objective is to minimize the average system failure. Several numerical examples illustrate the results 相似文献

17.

Scalability analysis in gracefully-degradable large systems

Najjar W.A. Gaudiot J.-L. 《Reliability, IEEE Transactions on》1991,40(2):189-197

The scalability of large degradable homogeneous multiprocessors is analyzed. The objective is to assess the limitations, imposed by reliability considerations, on the number of processors. The analysis of the mean-time-to-failure and the mission-time shows that, for a given value of the coverage factor, there exists a value of the number of processors at which these measures are maximal. As the system size is increased beyond this value, the reliability of the system becomes a rapidly decreasing function of the number of processors. For computations with linear speed-up, the amount of reliable computational work is constant for large system-sizes. When the speed-up is not linear, this amount is a decreasing function of the number of processors. Therefore, for large system-sizes and same technology, increasing the number of processors results in a decrease of the average amount of reliable computational work the system can deliver. Graceful degradation in large fault-tolerant systems is not scalable 相似文献

18.

Two families of optimal identifying codes in binary Hamming spaces 总被引：1，自引：0，他引：1

Ranto S.M. Honkala I.S. Laihonen T.K. 《IEEE transactions on information theory / Professional Technical Group on Information Theory》2002,48(5):1200-1203

A motivation for identifying codes comes from quality control in multiprocessor systems, that is, we are able, with the aid of these codes, to find faulty processors in such a system. We give a construction of two infinite families of optimal codes, which identify up to two malfunctioning processors in Hamming spaces 相似文献

19.

Fault-tolerant VLSI sorters

P. J. Varman I. V. Ramakrishnan D. S. Fussell 《Circuits, Systems, and Signal Processing》1987,6(2):153-174

Parallel sorting algorithms have been proposed for VLSI implementation. Random defects in the silicon wafer and fabrication errors render processors in the wafer faulty, and may cause these algorithms to fail despite a significant number of nonfaulty processors. This paper presents twofault-tolerant pipelined sorting algorithms that would work on a wafer comprised of faulty and nonfaulty processors. Both the algorithms useO(n) processors and requireO(n) time to sortn elements.P. J. Varman's research was supported by an IBM Faculty Development Award, I. V. Ramakrishnan's by the ONR Grant N00014-84-K-0530 and NSF Grant ECS-84-04399, and D. S. Fussell's by NSF Grant MCS-8104017. 相似文献

20.

Programmable active memories: reconfigurable systems come of age

Vuillemin J.E. Bertin P. Roncin D. Shand M. Touati H.H. Boucard P. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1996,4(1):56-69

相似文献