首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 41 毫秒
1.
The design of a fault-tolerant rectangular array of processing elements (PEs) is presented in which the reconfiguration is done by means of on-chip distributed logic, without the help of any external host. Spare PEs are included in every column of the array, and faulty PEs are bypassed within a column to facilitate reconfiguration in the presence of faults. Scan paths are used to enhance the testability of the array. PEs are tested locally using near-neighbor comparisons without the need of an external host. Because the interconnections between logical neighbors are short, the speed penalty for reconfiguration is very small. Any amount of redundancy can be incorporated in the array without changing the topology of the scheme or the design of the reconfiguration switches. The scheme is well suited for very large-area, high-density chips and wafer-scale integration. In order to demonstrate the capabilities of the scheme and evaluate its performance, an experimental chip consisting of a 6×4 array was designed, fabricated, and tested. Details of the design and the implementation of the chip are presented. The scheme is also analyzed for yield and area utilization for a range of array sizes and PE survival probabilities  相似文献   

2.
In this paper we present new algorithms for reconfiguring arrays of identical Processing Elements (PEs) in the presence of faults. In particular, we consider a well-studied reconfiguration model which consists of a rectangular array of PEs with spare columns of PEs on one side. In the presence of faulty PEs, reconfiguration is achieved by constructing alogical array using only the healthy non-spare and spare PEs. Note that one can always successfully reconfigure the array as long as the number of faulty PEs is no more than the number of spare PEs. The general objective, however, is to derive a logical array such that the geometric distances betweenlogical neighbors (i.e., PEs that are connected in the reconfigured array) are kept small. This criterion is motivated by the fact that shorter interconnects reduce the communication delays among the PEs, and also lead to less routing hardware. The problem of determining a reconfiguration that minimizes the length of the longest interconnect ishard and several researchers have presented sub-optimal algorithms that seem to have satisfactory performance. In this paper we develop anew efficient algorithm that can reconfigure any array with arbitrary patterns of faulty PEs. Furthermore we show that our algorithm performs better than most of the other algorithms developed for similar models.This work was supported in part by the SDIO/IST U.S. Army Research Office through Contract DAAL03-90-G-0108.  相似文献   

3.
A self-pruning binary tree (SPBT) interconnection network architecture that tolerate faults in a wafer scale integration (WSI) environment is proposed. The goal of the SPBT network is to provide a reliable and a quickly reconfigured interconnection network architecture for linear WSI arrays. The proposed architecture uses a bottom-up approach to reconfigure a linear pipelined array on a potentially defective WSI array using a binary tree interconnection scheme. The binary tree is generated by successive formation of hierarchical modules. For N processing elements (PEs) on the wafer, reconfiguration time is O(log N). The propagation delay is bounded by Θ(log N) and is independent of the number of faulty PEs. Faults in the switching network as well as faulty processing elements are tolerated  相似文献   

4.
The author proposes a self-routing fault-tolerant switching network for asynchronous transfer mode (ATM) switching systems. The network has many subswitches to enhance the fault tolerance of the conventional multistage interconnection network which only has a unique path. The subswitches provide large numbers of alternative paths between switching stages and allow the network to tolerate multiple paths. The routing algorithm is quite simple. The paths can also be used to route cells under the condition that internal cell contentions occur in switching elements. A reliability analysis shows a quantitative measurement of the improvement in fault tolerance as compared with previously presented fault-tolerant networks. A performance analysis and simulation results show that the proposed network has a high level of maximum throughput. In addition, that level of throughput is maintained with reasonable cell delay even though the number of faulty components increases in the network  相似文献   

5.
基于神经网络的单通道冗余VLSI/WSI阵列重构算法   总被引:1,自引:0,他引:1       下载免费PDF全文
高琳  张军英  许进 《电子学报》2001,29(12):1685-1688
本文提出了一个基于Hopfield网络的单通道冗余VLSI/WSI阵列重构算法,根据阵列中缺陷单元的分布情况,构造相应的矛盾图模型,将阵列的重构问题转化为求矛盾图的独立集且使得独立集的顶点数恰为缺陷单元的个数,有效地解决了阵列的重构问题.实验结果表明,与传统的启发式方法相比,基于本文所提出的图论模型而采用的神经网络方法是一种简单、快速、高效的算法.  相似文献   

6.
7.
The paper presents the problem of fault tolerance in VLSI array structures: its aim is to discuss architectures capable of surviving a number of random faults while keeping costs (in terms of added silicon area and of increased processing time) as low as possible. Two different approaches are presented, both based upon introduction of simple patterns of faults and by global reconfiguration techniques (rather than one-to-one substitution of faulty elements by spare ones). Various solutions are compared, and relative performances are discussed in order to determine criteria for selecting the one most suitable to particular applications.  相似文献   

8.
This article presents a distributed fault-diagnosis algorithm for identifying faulty and fault-free units (processors, PEs, cells) in homogeneous systems. It is based on local comparison among units in a system and dissemination of the test results. Each unit performs comparison with its neighbors by using its own comparator. Unlike other approaches, the algorithm does not assume that diagnostic circuits are fault free. The algorithm is simple enough to be realized with small circuit overhead. The results are especially useful in locating faulty units in processor arrays implemented on a single chip or wafer. Computer simulation has shown that even for low unit yields, extremely high performance (fault coverage) can be obtained by adjusting algorithm parameters.  相似文献   

9.
In this paper, we present an enhanced fault tolerance in large-scale optical switches through innovations in architecture and control logic design. A large-scale switch is constructed from a network of 2×2 optical switch elements (SEs). Classic switch network architectures, such as the Benes, are not designed with fault tolerance in mind. There are three major contributions in this paper: (1) we developed an analytical method, referred to as the probability accumulation method, to calculate the average connection blocking probability in a faulty switch network; (2) we provided a failure-aware routing algorithm to effectively circumvent connections from defected SEs in a dilated Benes switch; and (3) we improved the connectivity pattern of the Benes network to further reduce the blocking probability, especially when the SE failure rate is low.  相似文献   

10.
Mesh-connected processor array is an extensively investigated architecture in parallel processing. Massive studies have addressed the problem of using reconfiguration algorithms to solve the fault tolerance of faulty mesh-connected processor arrays. However, the subarrays generated by the previous studies still contain large interconnection length, which will lead to the increase of capacitance, power dissipation and dynamic communication cost. First, a mathematical model is established for the array reconfiguration. Then, the proposed method treats the interconnections between each PEs as a function with different integer variables, which can be solved by using effective integer programming techniques. Finally, an effective solver is called to find the optimal solution. Simulation results show that the proposed method can reduce the interconnection length of the array in the row and column directions simultaneously, thereby generating a subarray with the shortest interconnection length. On a 32 × 32 host array with fault density of 30%, the total interconnection length of the subarray can be reduced by 8.36% compared with state-of-the-art, and the average interconnection length can be reduced by 39.30%, which is more closer to the lower bound.  相似文献   

11.
A large-scale asynchronous transfer mode (ATM) switch fabric that can be constructed with currently feasible technology is proposed. Based on analysis of the technology, it is found that module interconnection becomes the bottleneck for a large fast packet switch. Fault tolerance for the switch is achieved by dynamic reconfiguration of the module interconnection network. The design improves system reliability with relatively low hardware overhead. An abstract model of the replacement problem for the design is presented, and the problem is transformed into a well-known assignment problem. The maximum fault tolerance is found, and a fast replacement algorithm is given. The reconfiguration capability can also be used to ameliorate imbalanced traffic flows. The authors formulate this traffic flow assignment problem for the switch fabric and show that the problem is NP-hard. A simple heuristic algorithm is proposed, and an example is given  相似文献   

12.
A systolic array of dedicated processing elements (PEs) is presented as the heart of a multi-model neural-network accelerator. The instruction set of the PEs makes possible to implement several widely-used neural models, including multi-layer Perceptrons with the back-propagation learning rule and Kohonen feature maps. Each PE holds an element of the synaptic weight matrix. An instantaneous swapping mechanism for the weight matrix makes the efficient implementation of neural networks larger than the physical PE array possible. A systolically-flowing instruction accompanies each input vector propagating in the array. This avoids the need of emptying and refilling the array when the operating mode of the array is changed. Fixed point arithmetic is used in the PE. The problem of optimally scaling real variables in fixed-point format is addressed. p ]Both the GENES IV chip, containing a matrix of 2×2 PEs, and an auxiliary arithmetic circuit have been manufactured and successfully tested. The MANTRA I machine has been built around these chips. Peak performances of the full system are between 200 and 400 MCPS in the evaluation phase and between 100 and 200 MCUPS during the learning phase (depending on the algorithm being implemented).  相似文献   

13.
In this paper, authors propose a method based on the modified particle swarm optimization (PSO) for beam reconfiguration of linear array of mutually coupled parallel half-wavelength dipole antennas with real excitation voltage amplitude distribution. Two different beam pairs are generated, one pencil/pencil beam pair and another pencil/flat-top beam pair in the horizontal plane. One beam is changed to another through switching while sharing a common amplitude distribution. Two examples are presented, one without ground plane and another in presence of ground plane. Dipoles are connected to its feed network through a switch, so that it can be turned on or off, depending on the switch position. Beam reconfiguration is achieved by suitably turning the array elements on or off using same voltage excitation distribution. Modified PSO is used to compute the excitation voltages as well as the switching configuration for each pattern having a prefixed side lobe level. The current in the driven and parasitic elements is determined via induced EMF method considering the current distribution on each dipole to be sinusoidal. Proposed method efficiently synthesizes dual-beam switching the power pattern from pencil to pencil and pencil to flat-top having same or different side lobe levels using common excitation voltages. It calculates the maximum variation of the active impedance of driven elements and the power losses when the radiation patterns switch from one beam to another. The paper calculates the array directivity as the distances between antenna array and the ground pane varies. Three other state-of-the-art metaheuristics like differential evolution, gravitational search algorithm, artificial bee colony algorithm are also employed for achieving a comparative evaluation.  相似文献   

14.
Soumen  Amiya  S.   《Integration, the VLSI Journal》2007,40(4):525-535
Achieving fault-tolerance through incorporation of redundancy and reconfiguration is quite common. The distribution of faults can have several impacts on the effectiveness of any reconfiguration scheme; in fact, patterns of faults occurring at strategic locations may render an entire VLSI system unusable regardless of its component redundancy and its reconfiguration capabilities. Such fault patterns are called catastrophic fault patterns (CFPs). In this paper, we characterize catastrophic fault patterns in mesh networks when the links are bidirectional or unidirectional. We determine the minimum number of faults required for a fault pattern to be catastrophic. We consider the problem of testing whether a fault pattern is catastrophic. When a fault pattern is not catastrophic we study the problem of finding optimal reconfiguration strategies, where optimality is with respect to either the number of processing elements in the reconfigured network (the reconfiguration is optimal if such a number is maximized) or the number of bypass links to activate in order to reconfigure the array (the reconfiguration is optimal if such a number is minimized). The problem of finding a reconfiguration strategy that is optimal with respect to the size of the reconfigured network is NP-complete, when the links are bidirectional, while it can be solved in polynomial time, when the links are unidirectional. Considering optimality with respect to the number of bypass links to activate, we provide algorithms which efficiently find an optimal reconfiguration.  相似文献   

15.
Fault tolerance is the ability of a system to perform its function reliably in the presence of faulty hardware or software components. For a system to have this property, many separate issues are involved: fault confinement, fault detection, fault masking, retry, diagnosis, reconfiguration, recovery, restart, repair, and reintegration. These issues are discussed, and are applied to two well-known fault tolerance distributed systems.  相似文献   

16.
A new application-independent approach for evaluating the fault tolerance of field-programmable gate-array (FPGA) interconnect structures is presented. Signal routing in the presence of faulty resources at switch block and FPGA levels is analyzed; this problem is directly related to the fault tolerance of FPGA interconnects for testing and reconfiguration at manufacturing and run-time applications. Two criteria are proposed and used as figure-of-merit for evaluating different FPGA interconnect architectures. The proposed approach is based on the number of available paths between pairs of end points and the probability to establish a one-to-one mapping between all input and output end points. A probabilistic approach is also presented to evaluate the fault-tolerant routing of the entire FPGA by connecting switch blocks in chains, as required for testing and to account for the input–output (I/O) pin restrictions of an FPGA chip. All possible interconnect faults for programmable switches and wiring channels are considered in the fault model. The proposed method is applicable to arbitrary switch block structures. Experimental results on commercial as well as academic designed FPGAs are presented and analyzed.  相似文献   

17.
In order to accommodate the variety of algorithms with different performance in specific application and improve power efficiency,reconfigurable architecture has become an effective methodology in academia and industry.However,existing architectures suffer from performance bottleneck due to slow updating of contexts and inadequate flexibility.This paper presents an H-tree based reconfiguration mechanism(HRM)with Huffman-coding-like and mask addressing method in a homogeneous processing element(PE)array,which supports both programmable and data-driven modes.The proposed HRM can transfer reconfiguration instructions/contexts to a particular PE or associated PEs simultaneously in one clock cycle in unicast,multicast and broadcast mode,and shut down the unnecessary PE/PEs according to the current configuration.To verify the correctness and efficiency,we implement it in RTL synthesis and FPGA prototype.Compared to prior works,the experiment results show that the HRM has improved the work frequency by an average of 23.4%,increased the updating speed by 2×,and reduced the area by 36.9%;HRM can also power off the unnecessary PEs which reduced 51%of dynamic power dissipation in certain application configuration.Furthermore,in the data-driven mode,the system frequency can reach 214 MHz,which is 1.68×higher compared with the programmable mode.  相似文献   

18.

The aggressively scaled CMOS technology is increasingly threatening the dependability of network-on-chips (NoCs) architecture. In a mesh-based NoC, a faulty router or broken link may isolate a well functional processing element (PE). Also, a set of faulty routers may form isolated regions, which can degrade the design. In this paper, we propose a router-level redundancy (RLR) fault-tolerant scheme that differs from the traditional microarchitecture-level redundancy (MLR) approach to relieve the problem of isolated PE and isolated region. By simply adding one spare router within each router set in a mesh, RLR can be created and connection paths between adjacent routers can be diversified. To exploit this extra resource, two reconfiguration algorithms are demonstrated to detour observed faulty routers/links. The proposed RLR fault-tolerant scheme can tolerate at most one faulty router within a router set. After the reconfiguration, the original mesh topology is maintained. As a result, the proposed architecture does not need any support from the network layer routing algorithms. The scheme has been evaluated based on the three fault-tolerant metrics: reliability, mean time to failure (MTTF), and yield. The experimental results show that the performance RLR increases as the size of NoC grows; however, the relative connection cost decreases at the same time. This characteristic makes our architecture suitable for large-scale NoC designs.

  相似文献   

19.
Coarse-grained reconfigurable architectures (CGRAs) require many processing elements (PEs) and a configuration memory unit (configuration cache) for reconfiguration of its PE array. Although this structure is meant for high performance and flexibility, it consumes significant power. Specially, power consumption by configuration cache is explicit overhead compared to other types of intellectual property (IP) cores. Reducing power is very crucial for CGRA to be more competitive and reliable processing core in embedded systems. In this paper, we propose a reusable context pipelining (RCP) architecture to reduce power-overhead caused by reconfiguration. It shows that the power reduction can be achieved by using the characteristics of loop pipelining, which is a multiple instruction stream, multiple data stream (MIMD)-style execution model. RCP efficiently reduces power consumption in configuration cache without performance degradation. Experimental results show that the proposed approach saves much power even with reduced configuration cache size. Power reduction ratio in the configuration cache and the entire architecture are up to 86.33% and 37.19%, respectively, compared to the base architecture.  相似文献   

20.
A systematic efficient fault diagnosis method for reconfigurable VLSI/WSI array architectures is presented. The basic idea is to utilize the output data path independence among a subset of processing elements (PEs) based on the topology of the array under test. The divide and conquer technique is applied to reduce the complexity of test application and enhance the controllability and observability of a processor array. The array under test is divided into nonoverlapping diagnosis blocks. Those PEs in the same diagnosis block can be diagnosed concurrently. The problem of finding diagnosis blocks is shown equivalent to a generalizedEight Queens problem. Three types of PEs and one type of switches, which are designed to be easily testable and reconfigurable, are used to show how to apply this approach. The main contribution of this paper is an efficient switch and link testing procedure, and a novel PE fault diagnosis approach which can speed up the testing by at leastO(V1/2) for the processor arrays considered in this paper, where V is the number of PEs. The significance of our approach is the ability to detect as well as to locate multiple PE, switch, and link faults with little or no hardware overhead.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号