共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents several algorithms for solving problems using massively parallel SIMD hypercube and shuffle-exchange computers. The algorithms solve a wide variety of problems, but they are related because they all use a common strategy. Specifically, all of the algorithms use a divide-and-conquer approach to solve a problem withN inputs using a parallel computer withP processors. The structural properties of the problem are exploited to assure that fewer thanN data items are communicated during the division and combination steps of the divide-and-conquer algorithm. This reduction in the amount of data that must be communicated is central to the efficiency of the algorithm.This paper addresses four problems, namely the multiple-prefix, data-dependent parallel-prefix, image-component-labeling, and closest-pair problems. The algorithms presented for the data-dependent parallel-prefix and closest-pair problems are the fastest known whenN P and the algorithms for the multiple-prefix and image-component-labeling problems are the fastest known whenN is sufficiently large with respect toP.This work was supported in part by our NSF Graduate Fellowship. 相似文献
2.
Doruk Bozdağ Assefaw H. Gebremedhin Fredrik Manne Erik G. Boman Umit V. Catalyurek 《Journal of Parallel and Distributed Computing》2008
We present a scalable framework for parallelizing greedy graph coloring algorithms on distributed-memory computers. The framework unifies several existing algorithms and blends a variety of techniques for creating or facilitating concurrency. The latter techniques include exploiting features of the initial data distribution, the use of speculative coloring and randomization, and a BSP-style organization of computation and communication. We experimentally study the performance of several specialized algorithms designed using the framework and implemented using MPI. The experiments are conducted on two different platforms and the test cases include large-size synthetic graphs as well as real graphs drawn from various application areas. Computational results show that implementations that yield good speedup while at the same time using about the same number of colors as a sequential greedy algorithm can be achieved by setting parameters of the framework in accordance with the size and structure of the graph being colored. Our implementation is freely available as part of the Zoltan parallel data management and load-balancing library. 相似文献
3.
In this paper we discuss the implementation of an ADI method for solving the diffusion equation on three parallel/vector computers. The computers were chosen so as to encompass a variety of architectures. They are the MPP, an SIMD machine with 16-Kbit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the Flex/32 and Cray/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally conclusions are presented. 相似文献
4.
In this note some comments on the paper: D.J. Evans and Shao-wen Mai, Two parallel algorithms for the convex hull problem in a two dimensional space, Parallel Computing 2 (1985) 313–326 are given. 相似文献
5.
Two parallel algorithms for determining the convex hull of a set of data points in two dimensional space are presented. Both are suitable for MIMD parallel systems. The first is based on the strategy of divide-and-conquer, in which some simplest convex-hulls are generated first and then the final convex hull of all points is achieved by the processes of merging 2 sub-convex hulls. The second algorithm is by the process of picking up the points that are necessarily in the convex hull and discarding the points that are definitely not in the convex hull. Experimental results on a MIMD parallel system of 4 processors are analysed and presented. 相似文献
6.
Francisco Heron de Carvalho Junior Cenez Araújo de Rezende 《Journal of Parallel and Distributed Computing》2013
Component-oriented programming has been applied to address the requirements of large-scale applications from computational sciences and engineering that present high performance computing (HPC) requirements. However, parallelism continues to be a challenging requirement in the design of CBHPC (Component-Based High Performance Computing) platforms. This paper presents strong evidence about the efficacy and the efficiency of HPE (Hash Programming Environment), a CBHPC platform that provides full support for parallel programming, on the development, deployment and execution of numerical simulation code onto cluster computing platforms. 相似文献
7.
Performance modeling for large industrial or scientific codes is of value for program tuning or for selection of new machines when benchmarking is not yet possible. We discuss an empirical method of estimating runtime for certain large parallel programs where computational work is estimated by regression functions based on measurements and time cost of communication is modeled by program analysis and benchmarks for communication primitives. The method is demonstrated with the local weather model (LM) of the German Weather Service (DWD) on SP-2, T3E, and SX-4. The method is an economic way of developing performance models because only a moderate number of measurements is required. The resulting model is sufficiently accurate even for very large test cases. 相似文献
8.
As is well known, the strategy of divide-and-conquer is widely used in problem solving. The method of partitioning is also a fundamental strategy for the design of a parallel algorithm. The problem of enumerating the spanning trees of a graph arises in several contexts such as computer-aided design and computer networks. A parallel algorithm for solving the problem is presented in this paper. It is based on the principle of the inclusion and exclusion of sets, and not directly based on the partitioning of the graph itself. The results of the preliminary experiments on a MIMD system appear promising. 相似文献
9.
The paper deals with the workspace and dynamic performance evaluation of the PRR–PRR parallel manipulator in spray-painting equipment. Functional workspace of planar fully parallel robots is often limited because of interference among their mechanical components. The proposed 3-DOF planar parallel manipulator with two kinematic chains connecting the moving platform to the base can reduce interference while still maintaining 3 DOFs. Based on the kinematics, four working modes are analyzed and singularity is studied. The workspace is investigated and the inverse dynamics is formulated using the virtual work principle. The dynamic performance evaluation indices are designed on the basis of maximum and minimum magnitude of acceleration vector of the moving platform produced by a unit actuated force. The index not only can evaluate the accelerating performance of a manipulator, but also can reflect the isotropy of accelerating performance. Workspace and dynamic performances of the four working modes are compared and the optimal working mode for the painting of a large object with conical surface is determined. 相似文献
10.
11.
C. De StefanoAuthor VitaeA. Della CioppaAuthor Vitae A. MarcelliAuthor Vitae 《Pattern recognition》2002,35(5):1025-1037
In this paper we propose a method for evaluating the performance of an evolutionary learning system aimed at producing the optimal set of prototypes to be used by a handwriting recognition system. The trade-off between generalization and specialization embedded into any learning process is managed by iteratively estimating both consistency and completeness of the prototypes, and by using such an estimate for tuning the learning parameters in order to achieve the best performance with the smallest set of prototypes. Such estimation is based on a characterization of the behavior of the learning system, and is accomplished by means of three performance indices. Both the characterization and the indices do not depend on either the system implementation or the application, and therefore allow for a truly black-box approach to the performance evaluation of any evolutionary learning system. 相似文献
12.
Abdel-Elah 《Performance Evaluation》2005,60(1-4):223-236
In this paper we investigate the performance of distributed heuristic search methods based on a well-known heuristic search algorithm, the iterative deepening A* (IDA*). The contribution of this paper includes proposing and assessing a distributed algorithm for IDA*. The assessment is based on space, time and solution quality that are quantified in terms of several performance parameters such as generated search space and real execution time among others. The experiments are conducted on a cluster computer system consisting of 16 hosts built around a general-purpose network. The objective of this research is to investigate the feasibility of cluster computing as an alternative for hosting applications requiring intensive graph search. The results reveal that cluster computing improves on the performance of IDA* at a reasonable cost. 相似文献
13.
This paper describes some preliminary results about a workload characterization study at a national supercomputer center. This study is part of a larger project to develop an analytic methodology for the construction of arepresentative benchmark set. Our study reports on 292,254 user jobs during the period from June 1992 to July 1993 and represents about one-half of the wall clock time available on the four-processor CRAY Y-MP at the National Center for Supercomputing Applications during this time. The performance statistics we gathered in some sensedefine the workload because of the large fraction of the available time that was recorded. These statistics will allow us to reverse engineer a set of benchmarks that analytically represents the workload at this site. 相似文献
14.
Code scalability, crucial on any parallel system, determines how well parallel code avoids becoming a bottleneck as its host computer is made larger. The scalability of computer code can be estimated by statistically designed experiments that empirically approximate a multivariate Taylor expansion of the code's execution response function. Each suspected code bottleneck corresponds to a first-order term in the expansion, the coefficient for that term indicating how sensitive execution is to changes in the suspect location. However, it is the expansion coefficients for second-order interactions between code segments and the number of processors that are fundamental to discovering which program elements impede parallel speedup. A new, unified view of these second-order coefficients yields an informal relative scalability test of high utility in code development. Discussion proceeds through actual examples, including a straightforward illustration of the test applied to SLALOM, a complex, multiphase benchmark. A quick graphical shortcut makes the scalability test readily accessible. 相似文献
15.
16.
Li He Furong Li 《International Journal of Parallel, Emergent and Distributed Systems》2016,31(1):34-46
In high performance computers, a popular interconnection network, the folded hypercube (FHC), possesses smaller diameter, larger connectivity, better reliability and fault tolerance capability as compared with a hypercube counterpart. This paper addresses the fault identification of FHC multiprocessor interconncection systems under the MM* model. The pessimistic one-step diagnosability of FHC networks is first determined. On the basis, a pessimistic one-step diagnosis algorithm tailored for FHC multiprocessor systems is proposed. The presented algorithm can isolate all faulty nodes to within a set which has at most one fault-free node, and can run in linear time. 相似文献
17.
A. R. Hurson 《International journal of parallel programming》1984,13(6):491-508
Recent developments in VLSI technology have made it possible to design and implement smaller, cheaper, faster, and more reliable computer components. In addition, these advances have been a major drive for the design and development of the so-called suitable algorithm for VLSI implementation.The finite state automaton, its concept, computational power, and applications have been under investigation since the early 60's. Most of the studies have addressed the deterministic model of the finite state automaton with no degree of parallelism in the operations.This paper introduces a hardware architecture and VLSI design of the parallel finite state model. In addition, it addresses the time and space complexities of the proposed organization. Moreover, the paper represents the performance evaluation of the proposed architecture as a special purpose hardware recognizer device capable of performing pattern matching and text retrieval operations.The proposed model simulates a parallel finite state automaton by utilization of a number of identical and independent units called CELLs, which have associative capability. Each cell represents a state and transitions out of that state. 相似文献
18.
Giuseppe Romanazzi Peter K. JimackChristopher E. Goodyer 《Advances in Engineering Software》2011,42(5):247-258
We propose a model for describing and predicting the parallel performance of a broad class of parallel numerical software on distributed memory architectures. The purpose of this model is to allow reliable predictions to be made for the performance of the software on large numbers of processors of a given parallel system, by only benchmarking the code on small numbers of processors. Having described the methods used, and emphasized the simplicity of their implementation, the approach is tested on a range of engineering software applications that are built upon the use of multigrid algorithms. Despite their simplicity, the models are demonstrated to provide both accurate and robust predictions across a range of different parallel architectures, partitioning strategies and multigrid codes. In particular, the effectiveness of the predictive methodology is shown for a practical engineering software implementation of an elastohydrodynamic lubrication solver. 相似文献
19.
A mesh-vertex finite volume scheme for solving the Euler equations on triangular unstructured meshes is implemented on a MIMD (multiple instruction/multiple data stream) parallel computer. Three partitioning strategies for distributing the work load onto the processors are discussed. Issues pertaining to the communication costs are also addressed. We find that the spectral bisection strategy yields the best performance. The performance of this unstructured computation on the Intel iPSC/860 compares very favorably with that on a one-processor CRAY Y-MP/1 and an earlier implementation on the Connection Machine.The authors are employees of Computer Sciences Corporation. This work was funded under contract NAS 2-12961 相似文献
20.
We present a fast vector algorithm which solves tridiagonal linear equations by an optimum synthesis of the inherently recursive Gaussian elimination and the parallel though complex cyclic reduction. The idea is to perform an incomplete cyclic reduction to bring the dimension of the tridiagonal system efficiently below a characteristic size n* and then to solve the remaining system by Gaussian elimination. Extensive numerical experiments on the CYBER 205 and the CRAY X-MP computers reveal a maximum vector speedup of 13 and prove n* to reflect the architecture of the vector computer. The performance is further enhanced when a feq right-hand sides are treated simultaneously. 相似文献