共查询到20条相似文献,搜索用时 15 毫秒
1.
A hardware accelerator for self-organizing feature maps is presented. We have developed a massively parallel architecture that, on the one hand, allows a resource-efficient implementation of small or medium-sized maps for embedded applications, requiring only small areas of silicon. On the other hand, large maps can be simulated with systems that consist of several integrated circuits that work in parallel. Apart from the learning and recall of self-organizing feature maps, the hardware accelerates data pre- and postprocessing. For the verification of our architectural concepts in a real-world environment, we have implemented an ASIC that is integrated into our heterogeneous multiprocessor system for neural applications. The performance of our system is analyzed for various simulation parameters. Additionally, the performance that can be achieved with future microelectronic technologies is estimated. 相似文献
2.
Ishfaq Ahmad 《The Journal of supercomputing》1995,9(1-2):135-162
Building large-scale parallel computer systems for time-critical applications is a challenging task since the designers of such systems need to consider a number of related factors such as proper support for fault tolerance, efficient task allocation and reallocation strategies, and scalability. In this paper we propose a massively parallel fault-tolerant architecture using hundreds or thousands of processors for critical applications with timing constraints. The proposed architecture is based on an interconnection network called thebisectional network. A bisectional network is isomorphic to a hypercube in that a binary hypercube network can be easily extended as a bisectional network by adding additional links. These additional links add to the network some rich topological properties such as node symmetry, small diameter, small internode distance, and partitionability. The important property of partitioning is exploited to propose a redundant task allocation and a task redistribution strategy under realtime constraints. The system is partitioned into symmetric regions (spheres) such that each sphere has a central control point. The central points, calledfault control points (FCPs), are distributed throughout the entire system in an optimal fashion and provide two-level task redundancy and efficiently redistribute the loads of failed nodes. FCPs are assigned to the processing nodes such that each node is assigned two types of FCPs for storing two redundant copies of every task present at the node. Similarly, the number of nodes assigned to each FCP is the same. For a failure-repair system environment the performance of the proposed system has been evaluated and compared with a hypercube-based system. Simulation results indicate that the proposed system can yield improved performance in the presence of a high number of node failures. 相似文献
3.
4.
A general architecture for fault tolerant control is proposed. The architecture is based on the (primary) YJBK parameterization of all stabilizing compensators and uses the dual YJBK parameterization to quantify the performance of the fault tolerant system. The approach suggested can be applied for additive faults, parametric faults and for system structural changes. The modelling for each of these fault classes is described. The method allows for design of passive as well as for active fault handling. Also, the related design method can be fitted either to guarantee stability or to achieve graceful degradation in the sense of guaranteed degraded performance. A number of fault diagnosis problems, fault tolerant control problems, and feedback control with fault rejection problems are formulated/considered, mainly from a fault modelling point of view. The method is illustrated on a servo example including an additive fault and a parametric fault. 相似文献
5.
Leila Notash 《野外机器人技术杂志》2000,17(3):149-157
Parallel manipulators with redundant joint displacement sensing can be exploited to develop fault tolerant implementations. This is possible since fundamental problems of the associated kinematics can still be solved after the elimination of faulty sensor readings. The ability of detecting faulty sensor readings is a requirement of any fault tolerant implementation scheme. A sensor fault detection method is presented for redundantly sensed parallel manipulators. A broad class of three‐branch manipulators is considered where each branch consists of three main‐arm joints and supports a common payload through respective passive spherical joints. The detection method is based on the comparison of forward displacement solutions for different cases of joint sensor readings. The existence of common solutions based on the branches–sensors considered, is used to effectively identify the existence of a failed sensor. Once a faulty sensor is identified, continued (fault tolerant) operation is possible using a forward displacement solution based on the readings of the accurate sensors. The detection method is implemented in a computer simulation of a calibrated three‐branch parallel manipulator. © 2000 John Wiley & Sons, Inc. 相似文献
6.
A 3-D optical architecture currently under investigation is described. This model, a single-instruction, multiple-data (SIMD) system, exploits spatial parallelism and processes 2-D binary images as fundamental computational entities using symbolic substitution logic. This system effectively implements highly structured data-parallel algorithms, such as signal and image processing, partial differential equations, multidimensional numerical transforms, and numerical supercomputing. The model includes a hierarchical mapping technique that helps design the algorithms and maps them onto the proposed optical architecture. The symbolic substitution logic and the mapping of data-parallel algorithms are discussed. The theoretical performance of the optical system was estimated and compared with that of electronic SIMD array processors. Preliminary results show that the system provides greater computational throughput and efficiency than its electronic counterparts 相似文献
7.
Lewis Stiller 《The Journal of supercomputing》1991,5(2-3):99-117
Efficient space and time exploitation of symmetry in domains on highly parallel, distributed-memory architecture is, in certain cases, equivalent to routing along a labeled group action graph, with computation associated with each group element label, where the group of symmetries acts on the processors. The algebraic structure of the group can sometimes be analyzed to determine, a priori, space and time efficient routing schedules on the hardware network (which, in practice, is often another group action graph). The algorithms we develop were implemented on a 64K-processor CM-2 and used to solve certain natural classes of chess endgames, part of whose search space is invariant under a noncommutative crystallographic group. This program runs 400 times faster than any previous implementation, and discovered many interesting new results in the area; some of these results are not solvable in practice with current serial techniques because the time and space requirements are too large. It seems interesting that it was possible, albeit with difficulty, to implement efficiently certain irregular chess rules on the CM-2, which is optimized for regular data sets.An earlier version of this paper was presented at Supercomputing '90.Partially supported by NSF/DARPA Grant CCR-8908092. 相似文献
8.
Hancu M.V.A. Iwasaki K. Sato Y. Sugie M. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(11):1169-1184
Presents new principles for online monitoring in the context of multiprocessors (especially massively parallel processors) and then focuses on the effect of the aliasing probability on the error detection process. In the proposed test architecture, concurrent testing (or online monitoring) at the system level is accomplished by enforcing the run-time testing of the data and control dependences of the algorithm currently being executed on the parallel computer. In order to help in this process, each message contains both source and destination addresses. At each message source, the sequence of destination addresses of the outgoing messages is compressed on a block basis. At the same time, at each destination, the sequence of source addresses of all incoming messages is compressed, also on a block basis. Concurrent compression of the instructions executed by the PEs is also possible. As a result of this procedure, an image of the data dependences and of the control flow of the currently running algorithm is created. This image is compared, at the end of each computational block, with a reference image created at compilation time. The main results of this work are in proposing new principles for the online system-level testing of multiprocessor systems, based on signaturing and monitoring the data dependences together with the control dependences, and in providing an analytical model and analysis for the address compression process used for monitoring the data routing process 相似文献
9.
An efficient implementation of parallel eigenvalue computation for massively parallel processing 总被引:4,自引:0,他引:4
This paper describes an efficient implementation and evaluation of a parallel eigensolver for computing all eigenvalues of dense symmetric matrices. Our eigensolver uses a Householder tridiagonalization method, which has higher parallelism and performance than conventional methods when problem size is relatively small, e.g., the order of 10,000. This is very important for relevant practical applications, where many diagonalizations for such matrices are required so often. The routine was evaluated on the 1024 processors HITACHI SR2201, and giving speedup ratios of about 2–5 times as compared to the ScaLAPACK library on 1024 processors of the HITACHI SR2201. 相似文献
10.
Supercube: An optimally fault tolerant network architecture 总被引:2,自引:0,他引:2
Arunabha Sen 《Acta Informatica》1989,26(8):741-748
Summary A new class of interconnection network topology is proposed for parallel and distributed processing. The attractive features of this class include (a) the network can be constructed for any number of computing nodes, (b) the network is incrementally expandable, i.e., a new node can easily be added to the existing network, (c) it has good fault-tolerant characteristics (measured by the connectivity of the network graph) and (d) it has small delay characteristics (measured by the diameter of the network graph). The node connectivity of the network is equal to the minimum node degree. In this sense the network is optimally fault-tolerant. 相似文献
11.
To support parallel processing of data-intensive applications, the interconnection network of a parallel/distributed machine must provide high end-to-end communication bandwidth and handle the bursty and concentrated communication patterns generated by dynamic load balancing and data collection operations. A large-scale interconnection network architecture called a virtual bus is proposed. The virtual bus can scale to terabits-per-second end-to-end communication bandwidth with low queuing delay for nonuniform traffic. A terabit virtual bus architecture can be efficiently implemented for less than 5% of the total cost of an eight-thousand-node system. In addition, the virtual bus has an open system parallel interface that is flexible enough to support up to gigabytes per second data transfer rates, different grades of services, and broadcast operation. Such flexibility makes the virtual bus a plausible open system communication backbone for a broad range of applications 相似文献
12.
Bounded degree networks like deBruijn graphs or wrapped butterfly networks are very important from VLSI implementation point of view as well as for applications where the computing nodes in the interconnection networks can have only a fixed number of I/O ports. One basic drawback of these networks is that they cannot provide a desired level of fault tolerance because of the bounded degree of the nodes. On the other hand, networks like hypercube (where degree of a node grows with the size of a network) can provide the desired fault tolerance but the design of a node becomes problematic for large networks. In their attempt to combine the best of the both worlds, authors in [IEEE Transactions on Parallel and Distributed Systems 4(9) (1993) 962] proposed hyper-deBruijn (HD) networks that have many additional features of logarithmic diameter, partitionability, embedding, etc. But, HD networks are not regular, are not optimally fault tolerant and the optimal routing is relatively complex. Our purpose in the present paper is to extend the concepts used in the above-mentioned reference to propose a new family of scalable network graphs that retain all the good features of HD networks and at the same time are regular and maximally fault tolerant; the optimal point to point routing algorithm is significantly simpler than that of the HD networks. We have developed some new interesting results on wrapped butterfly networks in the process. 相似文献
13.
Javad Akbari Torkestani 《Computers & Security》2009,28(1-2):40-46
Disk arrays, or RAIDs, have become the solution to increase the capacity, bandwidth and reliability of most storage systems. In spite of its high redundancy level, disk mirroring is a popular RAID paradigm, because replicating data also doubles the bandwidth available for processing read requests, improves the reliability and achieves fault tolerance. In this paper, we present a new RAID architecture called RAID-RMS in which a special hybrid mechanism is used to map the data blocks to the cluster. The main idea behind the proposed algorithm is to combine the data block striping and disk mirroring technique with a data block rotation. The resulting architecture improves the parallelism reliability and efficiency of the RAID array. We show that the proposed architecture is able to serve many more disk requests compared to the other mirroring-based architectures. We also argue that a more balanced disk load is attained by the given architecture, especially when there are some disk failures. 相似文献
14.
《Journal of Systems Architecture》2013,59(7):482-491
Network-on-Chip (NoC) is widely used as a communication scheme in modern many-core systems. To guarantee the reliability of communication, effective fault tolerant techniques are critical for an NoC. In this paper, a novel fault tolerant architecture employing redundant routers is proposed to maintain the functionality of a network in the presence of failures. This architecture consists of a mesh of 2 × 2 router blocks with a spare router placed in the center of each block. This spare router provides a viable alternative when a router fails in a block. The proposed fault-tolerant architecture is therefore referred to as a quad-spare mesh. The quad-spare mesh can be dynamically reconfigured by changing control signals without altering the underlying topology. This dynamic reconfiguration and its corresponding routing algorithm are demonstrated in detail. Since the topology after reconfiguration is consistent with the original error-free 2D mesh, the proposed design is transparent to operating systems and application software. Experimental results show that the proposed design achieves significant improvements on reliability compared with those reported in the literature. Comparing the error-free system with a single router failure case, the throughput only decreases by 5.19% and latency increases by 2.40%, with about 45.9% hardware redundancy. 相似文献
15.
《Parallel Computing》2002,28(7-8):967-993
This paper describes a software architecture that allows image processing researchers to develop parallel applications in a transparent manner. The architecture's main component is an extensive library of data parallel low level image operations capable of running on homogeneous distributed memory MIMD-style multicomputers. Since the library has an application programming interface identical to that of an existing sequential library, all parallelism is completely hidden from the user.The first part of the paper discusses implementation aspects of the parallel library, and shows how sequential as well as parallel operations are implemented on the basis of so-called parallelizable patterns. A library built in this manner is easily maintainable, as extensive code redundancy is avoided. The second part of the paper describes the application of performance models to ensure efficiency of execution on all target platforms. Experiments show that for a realistic application performance predictions are highly accurate. These results indicate that the core of the architecture forms a powerful basis for automatic parallelization and optimization of a wide range of imaging software. 相似文献
16.
The basics of task-flow architecture and the simulated wafer-scale implementation of flowing tasks (SWIFT), a register-transfer simulator that investigates the behavior of task-flow programs, are discussed. SWIFT simulates a ring of cells with two pipeline stages between successive cells. Each cell contains an arithmetic logic unit (ALU), a receive queue for holding incoming transmission packets, and a memory for storing memory packets (MPs). The chain wafer-scale integration (WSI) architecture that allows linear arrays to be configured from the working cells on a partially good wafer is applied to task-flow-machine implementations. Results from a limited Monte Carlo simulation run to predict yields for a 164-cell wafer configured using the chain WSI technique are presented. Results of a simulated sparse matrix-vector multiplication application of the task-flow architecture are also presented 相似文献
17.
为了提高分布应用的健壮性,通常需要开发人员编写相应的容错代码。现有的CORBA构件模型通过定义构件的端口特征,以组装的方式实现代码的二进制级复用,它使用户能够快速开发和部署分布应用。在此基础上,如何在构件模型下快速灵活地建立容错应用成为一个令人关注的话题。通过设计构件模型下的容错体系结构,提供了快速灵活开发容错应用的机制,并提出了解决两种失效类型的容错策略和算法。 相似文献
18.
R. Taghavi 《Engineering with Computers》1996,12(3-4):178-185
HEXAR, a new software product developed at Cray Research, Inc., automatically generates good quality meshes directly from surface data produced by computeraided design (CAD) packages. The HEXAR automatic mesh generator is based on a proprietary and parallel algorithm that relies on pattern recognition, local mesh refinement and coarsening, and variational mesh smoothing techniques to create all-hexahedral volume meshes. HEXAR generates grids two to three orders of magnitude faster than current manual approaches. Although approximate by design, the resulting meshes have qualities acceptable by many commercial structural and CFD (computational fluid dynamics) software. HEXAR turns mesh generation into an automatic process for most commercial engineering applications. 相似文献
19.
Al Davis 《LISP and Symbolic Computation》1992,5(1-2):7-47
TheMayfly is a scalable general-purpose parallel processing system being designed at HP Laboratories, in collaboration with colleagues at the University of Utah. The system is intended to efficiently support parallel variants of modern programming languages such as Lisp, Prolog, and Object Oriented Programming models. These languages impose a common requirement on the hardware platform to supportdynamic system needs such as runtime type checking and dynamic storage management. The main programming language for the Mayfly is a concurrent dialect of Scheme. The system is based on a distributed-memory model, and communication between processing elements is supported by message passing. The initial prototype of Mayfly will consist of 19 identical processing elements interconnected in a hexagonal mesh structure. In order to achieve the goal of scalable performance, each processing element is a parallel processor as well, which permits the application code, runtime operating system, and communication to all run in parallel. A 7 processing element subset of the prototype is presently operational. This paper describes the hardware architecture after a brief background synopsis of the software system structure. 相似文献
20.
针对NCL电路数据编码方式的特点,提出了一种并行数据处理的NCL电路结构,通过同时对两路双轨编码数据流的并行处理,提前计算出下一个无效数据,缩短了无效数据维持时间。此结构应用到4×4乘法器的设计,采用COMS 0.18 ?滋m工艺,乘法器在非流水模式下和2级流水模式下分别进行了综合、布局布线和仿真,与传统NCL 4×4乘法器相比,无效数据维持时间分别缩短了32.9%和33.2%。 相似文献