首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Programming a hypercube multicomputer   总被引:1,自引:0,他引:1  
Ranka  S. Won  Y. Sahni  S. 《Software, IEEE》1988,5(5):69-77
The issues encountered by programmers are explored. A brief overview of parallel architectures is followed by an example problem of image-template matching. The programming consideration for this problem are discussed  相似文献   

2.
Squared error clustering algorithms for single-instruction multiple-data (SIMD) hypercubes are presented. The algorithms are shown to be asymptotically faster than previously known algorithms and require less memory per processing element (PE). For a clustering problem with N patterns, M features per pattern, and K clusters, the algorithms complete in O(k+log NM ) steps on NM processor hypercubes. This is optimal up to a constant factor. These results are extended to the case in which NMK processors are available. Experimental results from a multiple-instruction, multiple-data (MIMD) medium-grain hypercube are also presented  相似文献   

3.
The reliability of processors is an important issue for designing a massively parallel processing system for which fault-tolerant computing is crucial. In order to achieve high system reliability and availability, a faulty processor (node) when found should be replaced by a fault-free processor. Within a multiprocessor system, the technique of identifying faulty nodes by constructing tests on the nodes and interpreting the test outcomes is known as system-level diagnosis. The topological structure of a multicomputer system can be modeled by a graph of which the vertices and edges correspond to nodes and links of the system, respectively. This work presents a system-level diagnosis algorithm for a generalized hypercube which is an attractive variance of a hypercube. The proposed algorithm is based on the PMC model and can isolate all faulty nodes to within a set which contains at most one fault-free node. If the total number of nodes to be diagnosed in a generalized hypercube is N, the proposed algorithm can run in O(Nlog?N) time, and being superior to Yang??s algorithm proposed in 2004, it can diagnose not only a hypercube but also a generalized hypercube.  相似文献   

4.
This paper describes a system-level diagnosis algorithm for hypercube multicomputer systems. The algorithm is based on the PMC model and can isolate all faulty processors to within a set that contains at most one fault-free processor. If we denote by N the total number of processors in a hypercube system to be diagnosed, then, based on the judiciously designed data structures, the algorithm can run in O(Nlog2N) time; whereas the best-known diagnosis algorithm, the YML algorithm, runs in O(N2.5) time. Consequently, the new algorithm is remarkably superior to the YML algorithm in terms of the time cost.  相似文献   

5.
The implementation of Lee's maze routing algorithm on an MIMD hypercube multiprocessor computer can follow several plausible mappings and synchronization strategies. These are evaluated experimentally on an NCUBE/7 hypercube computer with 64 processors. Different grid partitioning and mapping strategies result in a different balance between computation and communication time. The total routing time is significantly impacted by the synchronization and termination detection scheme used. Further, by rearranging the computation, it is possible to overlap much of the interprocessor communication with the computation and realize a significant reduction in the overall run time. By choosing the right partitioning and synchronization scheme and by overlapping computation and communication, a good speedup is obtained on large routing grids.  相似文献   

6.
Multicomputer cache simulation results derived from address traces collected from an Intel iPSC/2 hypercube multicomponent are presented. The primary emphasis is on examining how increasing the number of processor nodes executing a parallel application affects the overall multicomputer cache performance. The effects on multicomputer direct-mapped cache performance of application-specific data partitioning, data access patterns, communication distribution, and communication frequency are illustrated. The effects of system accesses on total cache performance are explored, as well as the reasons for application-specific differences in cache behavior for system and user accesses. Comparing user code results with full user and system code analysis reveals the significant effect of system accesses, and this effect increases with multicomputer size. The time distribution of an application's message-passing operations is found to more strongly affect cache performance than the total amount of time spent in message-passing code  相似文献   

7.
Optimizing large join queries that consist of many joins has been recognized as NP-hard. Most of the previous work focuses on a uniprocessor environment. In a multiprocessor, the location of each join adds another dimension to the complexity of the problem. In this paper, we examine the feasibility of exploiting the inherent parallelism in optimizing large join queries on a hypercube multiprocessor. This includes using the multiprocessor not only to answer the large join query but also to optimize it. We propose an algorithm to estimate the cost of a parallel large join plan. Three heuristics are provided for generating an initial solution, which is further optimized by an iterative local-improvement method. The entire process of parallel query optimization and execution is simulated on an Intel iPSC/2 hypercube machine. Our experimental results show that the performance of each heuristic depends on the characteristics of the query  相似文献   

8.
Discretisation, as one of the basic data preparation techniques, has played an important role in data mining. This article introduces a new hypercube division-based (HDD) algorithm for supervised discretisation. The algorithm considers the distribution of both class and continuous attributes and the underlying correlation structure in the data set. It tries to find a minimal set of cut points, which divides the continuous attribute space into a finite number of hypercubes, and the objects within each hypercube belong to the same decision class. Finally, tests are performed on seven mix-mode data sets, and the C5.0 algorithm is used to generate classification rules from the discretised data. Compared with the other three well-known discretisation algorithms, the HDD algorithm can generate a better discretisation scheme, which improves the accuracy of classification and reduces the number of classification rules.  相似文献   

9.
The performance evaluation, workload characterization, and trace-driven simulation of a hypercube multicomputer running realistic workloads are presented. Eleven representative parallel applications were selected as benchmarks. Software monitoring techniques were then used to collect execution traces. Based on the measurement results, both the computation and communication behavior of these parallel programs were investigated. The various time interval distributions were modeled by statistical functions which were verified by a nonlinear regression technique using the empirical data. The temporal and spatial localities of message destinations were also studied. A model for the temporal locality of message length was introduced and used to analyze the communication traces. A trace-drive simulation environment, which uses the communication patterns of the parallel programs as inputs, was developed to study the behavior of the communication hardware under real workload. Simulation results on DMA and link utilizations are reported  相似文献   

10.
We present a new parallel algorithm for computing N point lagrange interpolation on an n-dimensional hypercube with total number of nodes p = 2n. Initially, we consider the case when N = p. The algorithm is extended to the case when only p (p fixed) processors are available, p < N. We assume that N is exactly divisible by p. By dividing the hypercube into subcubes of dimension two, we compute the products and sums appearing in Lagrange's formula in a novel way such that wasteful repetitions of forming products are avoided. The speed up and efficiency of our algorithm is calculated both theoretically and by simulating it over a network of PCs.  相似文献   

11.
The M-Machine is an experimental multicomputer being developed to test architectural concepts motivated by the constraints of modern semiconductor technology and the demands of programming systems. The M-Machine computing nodes are connected with a 3-D mesh network; each node is a multithreaded processor incorporating 9 function units, on-chip cache, and local memory. The multiple function units are used to exploit both instruction-level and thread-level parallelism. A user accessible message passing system yields fast communication and synchronization between nodes. Rapid access to remote memory is provided transparently to the user with a combination of hardware and software mechanisms. This paper presents the architecture of the M-Machine and describes how its mechanisms attempt to maximize both single thread performance and overall system throughput. The architecture is complete and the MAP chip, which will serve as the M-Machine processing node, is currently being implemented.  相似文献   

12.
Given a set of nodes in a distributed system, a coterie is a collection of subsets of the set of nodes such that any two subsets have a nonempty intersection and are not properly contained in one another. A subset of nodes in a coterie is called a quorum. An algorithm, called the join algorithm, which takes nonempty coteries as input, and returns a new, larger coterie called a composite coterie is introduced. It is proved that a composite coterie is nondominated if and only if the input coteries are nondominated. Using the algorithm, dominated or nondominated coteries may be easily constructed for a large number of nodes. An efficient method for determining whether a given set of nodes contains a quorum of a composite coterie is presented. As an example, tree coteries are generalized using the join algorithm, and it is proved that tree coteries are nondominated. It is shown that the join algorithm may be used to generate read and write quorums which may be used by a replica control protocol  相似文献   

13.
In this paper, we first describe a model for mapping the backpropagation artificial neural net learning algorithm onto a massively parallel computer architecture with a 2D-grid communications network. We then show how this model can be sped up by hypercube inter-processor connections that provide logarithmic time segmented parallel prefix operations. This approach can serve as a general model for implementing algorithms for layered neural nets on any massively parallel computers that have 2D-grid or hypercube communication networks.

We have implemented this model on the Connection Machine CM-2 — a general purpose, massively parallel computer with a hypercube topology. Initial tests show that this implementation offers about 180 million interconnections per second (IPS) for feed-forward computation and 40 million weight updates per second (WUPS) for learning. We use our model to evaluate this implementation: what machine-specific features have helped improve the performance and where further improvements can be made.  相似文献   


14.
In this paper, we consider the problem of nonpreemptively scheduling independent jobs so as to minimize overall finish time on an m-dimensional hypercube system. This problem is NP-hard. We propose a polynomial time approximation algorithm and prove that the absolute performance ratio of the algorithm does not exceed 1.875. This is the first algorithm achieving an absolute performance ratio less than two by a constant  相似文献   

15.
16.
Several problems modeled by dynamic programming have been solved using a coarse-grain multicomputer parallel model (CGM for short). These problems use either polyadic dynamic programming or monadic non-serial dynamic programming. In this paper, we address the general case: we propose a parallel algorithm in the CGM model with p processors for the Optimal String Parenthesizing Problem or Minimum Cost Parenthesizing Problem, which is a typical polyadic non-serial dynamic programming problem. The algorithm we obtain requires ?(2p)1/2? communication rounds and, at most, O(n 3/p) time-steps on p processors. This new CGM algorithm performs better than the previously most efficient solution, which uses p communication rounds.  相似文献   

17.
The paper deals with the implementation of global time in multicomputer systems. After a formalization of the synchronization problem, techniques to estimate the synchronization delay and to compensate the drift error are proposed. Then SYNC_WAVE, a clock synchronization algorithm where the values of a reference clock are diffused in a wave-like manner, is described. SYNC_WAVE has no provision for fault-tolerance and is specially designed to introduce low CPU and communication overhead, in order to support performance analysis applications efficiently. An implementation of the devised algorithm in a transputer-based system is presented, showing the accuracy results obtained. Finally SYNC_WAVE is compared to other synchronization algorithms and several of its possible applications are suggested.  相似文献   

18.
空间数据库中空间连接操作是最重要、最耗时的操作之一,基于BFRJ算法研究了一种对中间连接索引优化排序的空间连接算法OBFRJ,该算法使用广度优先顺序对两棵R树进行同步遍历,对生成的中间连接索引采用了一种空间填充曲线进行排序,使得在下一层的连接时出现页错误的次数减少。实验结果表明,该算法在磁盘访问次数以及CPU代价上都要小于DFRJ和BFRJ算法。  相似文献   

19.
A novel algorithm for relational equijoin is presented. The algorithm is a modification of merge join, but promises superior performance for medium-size inputs. In many cases it even compares favorably with both merge join and hybrid hash join, which is shown using analytic cost functions  相似文献   

20.
This paper presents a parallel distributive join algorithm for cube-connected multiprocessors. The performance analysis shows that the proposed algorithm has an almost linear speedup over the sequential distributive join algorithm as the number of processors increases, and its performance is comparable to that of the parallel hybrid-hash join algorithm. A big advantage of the proposed algorithm over hash-based join algorithms is that it does not have the bucket overflow problem caused by nonuniform hashing of the smaller operand relation. Moreover, the proposed algorithm can easily support the nonequijoin operation, which is very hard to implement by using hash-based join algorithms  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号