首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A fault-tolerant and heuristic routing algorithm for faulty hypercube systems is described.To improve the efficiency,the algorithm adopts a heuristic backtracking strategy and each node has an array to record its all neighbors‘ faulty link information to avoid unnecessary searching for the known faulty links.Furthermore,the faulty link information is dynamically accumulated and the technique of heuristically searching for optimal link is used.The algorithm routes messages through the minimum feasible path between the sender and receiver if at least one such path exists,and takes the optimal path with higher probability when faulty links exist in the faulty hypercube.  相似文献   

2.
We consider the following basic communication problems in a hypercube network of processors: the problem of a single processor sending a different packet to each of the other processors, the problem of simultaneous broadcast of the same packet from every processor to all other processors, and the problem of simultaneous exchange of different packets between every pair of processors. The algorithms proposed for these problems are optimal in terms of execution time and communication resource requirements; that is, they require the minimum possible number of time steps and packet transmissions. In contrast, algorithms in the literature are optimal only within an additive or multiplicative factor.  相似文献   

3.
A fully unsafe hypercube according to the global safety can be split into a unique set of maximal safe subcubes. Multicasting in a maximal safe subcube can be completed reliably based on information related to the maximal safe subcube. A time-optimal multicasting exists if (1) the multicast source is locally safe in the minimum subcube that contains the source and destinations (called a multicast subcube), or (2) the spanning subcube between each destination and the source is safe. We show a sufficient condition for the existence of a multicasting is: the multicast subcube is safe or the spanning subcube between the source and each destination is either safe or is contained in a safe subcube. Methods are presented to set up a partial multicast tree when the above sufficient conditions fail. It is shown that effectiveness of the algorithm can be improved drastically using the partial multicast tree setup technique. Extensive simulation results are also presented.  相似文献   

4.
We have developed a one-dimensional parallel global adaptive quadrature algorithm for a machine with hypercube architecture, and studied the heuristics present in the algorithm. A mathematical model for how long time it takes to process a balanced tree of sub-intervals under certain (almost implementable) conditions is developed. The results from numerical experiments are given. The speedups achieved depend on the problem and are ranging from 0.83 to 30.5 on a 32-processor hypercube.  相似文献   

5.
Software-based rerouting for fault-tolerant pipelined communication   总被引:1,自引:0,他引:1  
This paper presents a software-based approach to fault-tolerant routing in networks using wormhole or virtual cut-through switching. When a message encounters a faulty output link, it is removed from the network by the local router and delivered to the messaging layer of the local node's operating system. The message passing software can reroute this message, possibly along nonminimal paths. Alternatively, the message may be addressed to an intermediate node, which will forward the message to the destination. A message may encounter multiple faults and pass through multiple intermediate nodes. The proposed techniques are applicable to both obliviously and adaptively routed networks. The techniques are specifically targeted toward commercial multiprocessors where the mean time to repair (MTTR) is much smaller than the mean time between router failures (MTBF), i.e., it is sufficient to tolerate a maximum of three failures. This paper presents requirements for buffer management, deadlock freedom, and livelock freedom. Simulation results are presented to evaluate the degradation in latency and throughput as a function of the number and distribution of faults. There are several advantages of such an approach. Router designs are minimally impacted, and thus remain compact and fast. Only messages that encounter faulty components are affected, while the machine is ensured of continued operation until the faulty components can be replaced. The technique leverages existing network technology, and the concepts are portable across evolving switch and router designs. Therefore, we feel that the technique is a good candidate for incorporation into the next generation of multiprocessor networks  相似文献   

6.
This paper presents a new approach for parallel heuristic algorithms based on adaptive parallelism. Adaptive parallelism was used to dynamically adjust the parallelism degree of the application with respect to the system load. This approach demonstrates that high-performance computing using a hundred of heterogeneous workstations combined with massively parallel machines is feasible to solve large optimization problems with respect to the personal character of workstations. The fault-tolerant algorithm allows a minimal loss of computation in case of failures. The proposed algorithm exploits the properties of this class of applications in order to reduce the complexity of the algorithm in terms of the checkpoint files size and the control messages exchanged. The parallel heuristic algorithm combines different search strategies: simulated annealing and tabu search. Encouraging results have been obtained in solving the quadratic assignment problem. We have improved the best known solutions for some large real-world problems.  相似文献   

7.
Building large-scale parallel computer systems for time-critical applications is a challenging task since the designers of such systems need to consider a number of related factors such as proper support for fault tolerance, efficient task allocation and reallocation strategies, and scalability. In this paper we propose a massively parallel fault-tolerant architecture using hundreds or thousands of processors for critical applications with timing constraints. The proposed architecture is based on an interconnection network called thebisectional network. A bisectional network is isomorphic to a hypercube in that a binary hypercube network can be easily extended as a bisectional network by adding additional links. These additional links add to the network some rich topological properties such as node symmetry, small diameter, small internode distance, and partitionability. The important property of partitioning is exploited to propose a redundant task allocation and a task redistribution strategy under realtime constraints. The system is partitioned into symmetric regions (spheres) such that each sphere has a central control point. The central points, calledfault control points (FCPs), are distributed throughout the entire system in an optimal fashion and provide two-level task redundancy and efficiently redistribute the loads of failed nodes. FCPs are assigned to the processing nodes such that each node is assigned two types of FCPs for storing two redundant copies of every task present at the node. Similarly, the number of nodes assigned to each FCP is the same. For a failure-repair system environment the performance of the proposed system has been evaluated and compared with a hypercube-based system. Simulation results indicate that the proposed system can yield improved performance in the presence of a high number of node failures.  相似文献   

8.
《Parallel Computing》1988,6(2):225-233
Fast Fourier Transforms are a widely-used and powerful tool for the analysis and solution of many problems. They have been used in such diverse areas as medicine, acoustics, image processing, system design and many other fields. By transforming the data the problem may be simpler, more tractable or more efficiently solved and for many applications (e.g. speech processing) the data may be much more easily understood in the transform domain. Therefore fast algorithms for implementing transforms are vital for any powerful computer.This paper describes the implementation of a Fast Fourier Transform on a 64-node INTEL hypercube and shows how the hypercube architecture may be efficiently used. Usually the FFT is only a part of the solution process and the data on the hypercube has to be arranged in a certain manner for the efficient solution of the whole problem. A common way is for the data to be distributed according to a Gray code, so that neighbouring points in the domain are in neighbouring processors. We present a number of results on Gray codes which characterise a certain family of Gray codes, and show that Fast Fourier Transforms on data distributed among processors according to a Gray code can also be efficiently implemented on the hypercube.  相似文献   

9.
Dou  Wanfeng  Li  Yanan 《The Journal of supercomputing》2018,74(6):2776-2800
The Journal of Supercomputing - Viewshed analysis has widely been used in various spatial analysis applications. But the expense of viewshed computation remains high both in time and space...  相似文献   

10.
传统的自适应片上网络(NoC)容错路由算法采用一步一比较的方式来确定最优端口, 未能有效降低传输延迟。根据数据包在2D Mesh NoC前若干连续的跳数内最优端口固定的特点, 提出了一种基于报文检测的快速(FPIB)自适应容错路由算法。算法采用跳步比较的方式来减少数据包的路由时间, 并使用模糊优先级策略来进行容错路由计算。实验结果表明, 与uLBDR容错路由算法相比, 该算法能有效地降低平均延迟, 且实现算法的硬件开销更低。  相似文献   

11.
In this paper, the problem of classifying handwritten data with respect to gender is addressed. A classification method based on Gaussian Mixture Models is applied to distinguish between male and female handwriting. Two sets of features using on-line and off-line information have been used for the classification. Furthermore, we combined both feature sets and investigated several combination strategies. In our experiments, the on-line features produced a higher classification rate than the off-line features. However, the best results were obtained with the combination. The final gender detection rate on the test set is 67.57%, which is significantly higher than the performance of the on-line and off-line system with about 64.25 and 55.39%, respectively. The combined system also shows an improved performance over human-based classification. To the best of the authors’ knowledge, the system presented in this paper is the first completely automatic gender detection system which works on on-line data. Furthermore, the combination of on-line and off-line features for gender detection is investigated for the first time in the literature.  相似文献   

12.
This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achieves high speed by efficiently utilizing the parallelism of the GPU throughout the whole algorithm. Initially, GPU-based bucketsort or quicksort splits the list into enough sublists then to be sorted in parallel using merge-sort. The algorithm is of complexity nlognnlogn, and for lists of 8 M elements and using a single Geforce 8800 GTS-512, it is 2.5 times as fast as the bitonic sort algorithms, with standard complexity of n(logn)2n(logn)2, which for a long time was considered to be the fastest for GPU sorting. It is 6 times faster than single CPU quicksort, and 10% faster than the recent GPU-based radix sort. Finally, the algorithm is further parallelized to utilize two graphics cards, resulting in yet another 1.8 times speedup.  相似文献   

13.
We present in this paper an efficient algorithm for solving the integral Knapsack problem on hypercube. The main idea is to represent the computations of the dynamic programming formulation as a precedence graph (which has the structure of an irregular mesh). Then, we propose a time optimal scheduling algorithm for computing the irregular meshes on hypercube.  相似文献   

14.
可重构电子系统芯片级在线自主容错方法研究   总被引:2,自引:2,他引:2  
可重构电子系统芯片固定型故障的传统容错设计往往采用集中式控制方法,存在测试时间长、硬件资源利用率低、对外部控制器依赖性高等问题。因此,设计了一种具有分布式自主容错能力的可重构细胞阵列,通过将细胞内部查找表输出与参考值进行比较的方式进行循环检测,并利用冗余存储单元对故障查找表进行修复。以四位并行乘法器为例进行仿真验证,实验结果表明,新型可重构阵列的自主容错设计方法,比现有设计的硬件开销小,修复时间短,容错能力强,且设计复杂度不受阵列规模影响。  相似文献   

15.
Fast image retrieval using color-spatial information   总被引:1,自引:0,他引:1  
In this paper, we present an image retrieval system that employs both the color and spatial information of images to facilitate the retrieval process. The basic unit used in our technique is a single-colored cluster, which bounds a homogeneous region of that color in an image. Two clusters from two images are similar if they are of the same color and overlap in the image space. The number of clusters that can be extracted from an image can be very large, and it affects the accuracy of retrieval. We study the effect of the number of clusters on retrieval effectiveness to determine an appropriate value for “optimal' performance. To facilitate efficient retrieval, we also propose a multi-tier indexing mechanism called the Sequenced Multi-Attribute Tree (SMAT). We implemented a two-tier SMAT, where the first layer is used to prune away clusters that are of different colors, while the second layer discriminates clusters of different spatial locality. We conducted an experimental study on an image database consisting of 12,000 images. Our results show the effectiveness of the proposed color-spatial approach, and the efficiency of the proposed indexing mechanism. Received August 1, 1997 / Accepted December 9, 1997  相似文献   

16.
A parallel architecture for an on-line implementation of the recursive least squares (RLS) identification algorithm on a field programmable gate array (FPGA) is presented. The main shortcoming of this algorithm for on-line applications is its computational complexity. The matrix computation to update error covariance consumes most of the time. To improve the processing speed of the RLS architecture, a multi-stage matrix multiplication (MMM) algorithm was developed. In addition, a trace technique was used to reduce the computational burden on the proposed architecture. High throughput was achieved by employing a pipelined design. The scope of the architecture was explored by estimating the parameters of a servo position control system. No vendor dependent modules were used in this design. The RLS algorithm was mapped to a Xilinx FPGA Virtex-5 device. The entire architecture operates at a maximum frequency of 339.156 MHz. Compared to earlier work, the hardware utilization was substantially reduced. An application-specific integrated circuit (ASIC) design was implemented in 180 nm technology with the Cadence RTL compiler.  相似文献   

17.
提出了一种基于光场的偏振图像快速提取及实时处理方法。以多核DSP TMS320C6678为核心处理器,实现了从光场图像的采集到偏振图像的提取以及处理等一系列连续过程。对系统硬件以及软件设计进行了详细介绍。系统采用DSP+FPGA的方式,其中FPGA模块实现cameralink相机接口以及图像采集,DSP模块实现图像的快速处理算法。光场图像采集、偏振图像提取及偏振信息反演及融合算法和软件的优化是保证系统高效工作的关键部分,并且进行了重点讨论,提出了相应的解决方案。实验结果表明,系统实现了多核DSP并行运算处理,比单核DSP运算速度提高4倍左右。  相似文献   

18.
The European Space Agency’s Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalog will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments.  相似文献   

19.
冗余并联机器人具有冗余容错能力,能在局部故障的情况下继续工作.而设备的机械间隙无法完全避免,并且较难建立精准的机械间隙模型,因此基于运动学冗余的常规容错方法较难实现精准容错控制.基于生理止血调控机制,考虑冗余并联机器人控制特性和机械间隙因素,提出一种新颖的精准容错控制器.与止血机制类似,该控制器不但能够进行子通道误差优化,而且能进行全局故障辨识和容错补偿.利用2-DOF冗余并联机器人进行了真实实验.结果表明,提出的精准容错控制器的控制精度和容错能力均比传统控制器有较大提高.  相似文献   

20.
分布式并行数据库系统中节点信息的动态管理和维护   总被引:1,自引:0,他引:1  
陈建英  左朝树  王莉 《计算机应用》2005,25(9):2000-2003
针对分布式并行数据库系统DPSQL中节点状态变化的随机性和不可预测性,提出节点管理和维护的动态解决方案。该方案通过在DPSQL运行过程中对活动节点信息表正确性和一致性的维护实现,保证了各服务器节点间通信的正确性和节点内部各子系统间的默契配合,从而保证了DPSQL的高效服务。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号