首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The paper deals with the parallelization of Delaunay triangulation, a widely used space partitioning technique. Two parallel implementations of a three-dimensional incremental construction algorithm are presented. The first is based on the decomposition of the spatial domain, while the second relies on the master-slaves approach. Both parallelization strategies are evaluated, stressing practical issues rather than theoretical complexity. We report on the exploitation of two different parallel environments: a tightly coupled distributed memory MIMD architecture and a network of workstations co-operating under the Linda environment Then, a third hybrid solution is proposed, specifically addressed to the exploitation of higher parallelism. It combines the other two solutions by grouping the processing nodes of the multicomputer into clusters and by exploiting parallelism at two different levels.  相似文献   

This paper describes a parallel implementation of the Gröbner Bases algorithm and the Characteristic Sets method using a grid environment. The two algorithms, their parallelization, and grid-enabled implementations are presented. The performance of the implementations has been evaluated and the experiments have demonstrated considerable speedups.  相似文献   

We describe the parallelization of a first-order logic theorem prover that is based on the hyper-linking proof procedure (HLPP). Four parallel schemes – process level, clause level, literal level, and flow level – are developed for two types of sequential implementation of HLPP: list based and network based. The motivation for developing each parallel scheme is presented, and the architecture and implementation details of each scheme are described. Issues about parallel processing, such as serialization and synchronization, load balancing, and access conflicts, are examined. Speedups over sequential implementations are attained, and timing results for benchmark problems are provided.  相似文献   

The development of intelligent transportation systems (ITS) and the resulting need for the solution of a variety of dynamic traffic network models and management problems require faster‐than‐real‐time computation of shortest path problems in dynamic networks. Recently, a sequential algorithm was developed to compute shortest paths in discrete time dynamic networks from all nodes and all departure times to one destination node. The algorithm is known as algorithm DOT and has an optimal worst‐case running‐time complexity. This implies that no algorithm with a better worst‐case computational complexity can be discovered. Consequently, in order to derive algorithms to solve all‐to‐one shortest path problems in dynamic networks, one would need to explore avenues other than the design of sequential solution algorithms only. The use of commercially‐available high‐performance computing platforms to develop parallel implementations of sequential algorithms is an example of such avenue. This paper reports on the design, implementation, and computational testing of parallel dynamic shortest path algorithms. We develop two shared‐memory and two message‐passing dynamic shortest path algorithm implementations, which are derived from algorithm DOT using the following parallelization strategies: decomposition by destination and decomposition by transportation network topology. The algorithms are coded using two types of parallel computing environments: a message‐passing environment based on the parallel virtual machine (PVM) library and a multi‐threading environment based on the SUN Microsystems Multi‐Threads (MT) library. We also develop a time‐based parallel version of algorithm DOT for the case of minimum time paths in FIFO networks, and a theoretical parallelization of algorithm DOT on an ‘ideal’ theoretical parallel machine. Performances of the implementations are analyzed and evaluated using large transportation networks, and two types of parallel computing platforms: a distributed network of Unix workstations and a SUN shared‐memory machine containing eight processors. Satisfactory speed‐ups in the running time of sequential algorithms are achieved, in particular for shared‐memory machines. Numerical results indicate that shared‐memory computers constitute the most appropriate type of parallel computing platforms for the computation of dynamic shortest paths for real‐time ITS applications.  相似文献   

Two multi-thread based parallel implementations of the lattice Boltzmann method for non-uniform grids on different hardware platforms are compared in this paper: a multi-core CPU implementation and an implementation on General Purpose Graphics Processing Units (GPGPU). Both codes employ second order accurate compact interpolation at the interfaces, coupling grids of different resolutions. Since the compact interpolation technique is both simple and accurate, it produces almost no computational overhead as compared to the lattice Boltzmann method for uniform grids in terms of node updates per second. To the best of our knowledge, the current paper presents the first study on multi-core parallelization of the lattice Boltzmann method with inhomogeneous grid spacing and nested time stepping for both CPUs and GPUs.  相似文献   

When formulated as a system of linear inequalities, the image restoration problem yields huge, unstructured, sparse matrices even for images of small size. To solve the image restoration problem, we use the surrogate constraint methods that can work efficiently for large problems. Among variants of the surrogate constraint method, we consider a basic method performing a single block projection in each step and a coarse-grain parallel version making simultaneous block projections. Using several state-of-the-art partitioning strategies and adopting different communication models, we develop competing parallel implementations of the two methods. The implementations are evaluated based on the per iteration performance and on the overall performance. The experimental results on a PC cluster reveal that the proposed parallelization schemes are quite beneficial.  相似文献   

This paper presents the design and the application of asynchronous models of parallel evolutionary algorithms. An overview of the existing parallel evolutionary algorithm (PEA) models and available implementations is given. We present new PEA models in the form of asynchronous algorithms and implicit parallelization, as well as experimental data on their efficiency. The paper also discusses the definition of speedup in PEAs and proposes an appropriate speedup measurement procedure. The described parallel EA algorithms are tested on problems with varying degrees of computational complexity. The results show good efficiency of asynchronous and implicit models compared to existing parallel algorithms.  相似文献   

Particle swarm optimization (PSO) is an evolutionary heuristics-based method used for continuous function optimization. PSO is stochastic yet very robust. Nevertheless, real-world optimizations require a high computational effort to converge to a good solution for the problem. In general, parallel PSO implementations provide good performance. However, this depends heavily on the parallelization strategy used as well as the number and characteristics of the exploited processors. In this paper, we propose a cooperative strategy, which consists of subdividing an optimization problem into many simpler sub-problems. Each of these focuses on a distinct subset of the problem dimensions. The optimization work for all the selected sub-problems is done in parallel. We map the work onto four different parallel high-performance multiprocessors, which are based on multi- and many-core architectures. The performance of the strategy thus implemented is evaluated for four well known benchmark functions with high-dimension and different complexity. The obtained speedups are compared to that yielded by a serial PSO implementation.  相似文献   

New parallel computational techniques are introduced for the parallelization of Generic Approximate Sparse Inverse multigrid methods, based on Portable Operating System Interface for UniX (POSIX) threads, for multicore systems. Parallelization of the Generic Approximate Sparse Inverse Matrix (GenAspI) algorithm is achieved based on a new computational approach, namely “strip,” which utilizes the data independence of the rows assigned in each available processor. Additionally, new parallel computational techniques are proposed for the parallelization of a modified multigrid V-Cycle method, based on POSIX Threads, for multicore systems. The modified V-Cycle utilized a Parallel PGenAspI Preconditioned Bi-Conjugate Gradient STABilized (BiCGSTAB) as a coarse solver to ensure better parallel performance of the multigrid method. For parallelization purposes, a replication of the multigrid method function is executed on each processor with different index bands and with proper synchronization points to ensure less thread-creation overhead and to maximize parallel performance. Theoretical estimates on speedups and efficiency are also presented. Finally, numerical results for the performance of the PGenAspI algorithm and the PGenAspI–MGV method for solving classical two-dimensional boundary value problems on multicore computer systems are presented. The implementation issues of the proposed method are also discussed using POSIX threads on multicore systems.  相似文献   

In this paper we discuss the design and implementation of an intelligent program parallelization system, called InParS. This system in based on intelligent parallelization models proposed by many researchers in the area of parallelizing compilers. The presented experiment is one of few attempts toward investigating the viability of artificial intelligence techniques in automatic program parallelization. The early version of InParS was aimed at transforming Fortran-like DO loops into a vector code well-suited for vector processors. The new version of InParS targets distributed memory parallel computers. Some preliminary research results are also presented, which give an indication of how incorporating artificial intelligence techniques can contribute towards the success of automatic program parallelization.  相似文献   

Monte Carlo (MC) methods for numerical integration seem to be embarrassingly parallel on first sight. When adaptive schemes are applied in order to enhance convergence however, the seemingly most natural way of replicating the whole job on each processor can potentially ruin the adaptive behaviour. Using the popular VEGAS-Algorithm as an example an economic method of semi-micro parallelization with variable grain-size is presented and contrasted with another straightforward approach of macro-parallelization. A portable implementation of this semi-micro parallelization is used in the xloops-project and is made publicly available.  相似文献   

We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of meshes, but had not previously been captured in benchmarks. The new suite, named NPB (NAS parallel benchmarks) multi-zone, is derived from the NPB suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the message passing interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on four different parallel computers. We also use an empirical formula to investigate the performance characteristics of the hybrid parallel codes.  相似文献   

The kinetic Monte Carlo (kMC) method is used in many scientific fields in applications involving rare-event transitions. Due to its discrete stochastic nature, efforts to parallelize kMC approaches often produce unbalanced time evolutions requiring complex implementations to ensure correct statistics. In the context of parallel kMC, the sequential update technique has shown promise by generating high quality distributions with high relative efficiencies for short-range systems. In this work, we provide an extension of the sequential update method in a parallel context that rigorously obeys detailed balance, which guarantees exact equilibrium statistics for all parallelization settings. Our approach also preserves nonequilibrium dynamics with minimal error for many parallelization settings, and can be used to achieve highly precise sampling.  相似文献   

A popular approach to providing nonexperts in parallel computing with an easy-to-use programming model is to design a software library consisting of a set of preparallelized routines, and hide the intricacies of parallelization behind the library's API. However, for regular domain problems (such as simple matrix manipulations or low-level image processing applications-in which all elements in a regular subset of a dense data field are accessed in turn) speedup obtained with many such library-based parallelization tools is often suboptimal. This is because interoperation optimization (or: time-optimization of communication steps across library calls) is generally not incorporated in the library implementations. We present a simple, efficient, finite state machine-based approach for communication minimization of library-based data parallel regular domain problems. In the approach, referred to as lazy parallelization, a sequential program is parallelized automatically at runtime by inserting communication primitives and memory management operations whenever necessary. Apart from being simple and cheap, lazy parallelization guarantees to generate legal, correct, and efficient parallel programs at all times. The effectiveness of the approach is demonstrated by analyzing the performance characteristics of two typical regular domain problems obtained from the field of low-level image processing. Experimental results show significant performance improvements over nonoptimized parallel applications. Moreover, obtained communication behavior is found to be optimal with respect to the abstraction level of message passing programs.  相似文献   

The message-passing interface (MPI) has become the standard in achieving effective results when using the message passing paradigm of parallelization. Codes written using MPI are extremely portable and are applicable to both clusters and massively parallel computing platforms. Since MPI uses the single program, multiple data (SPMD) approach to parallelism, good performance requires careful tuning of the serial code as well as careful data and control flow analysis to limit communication. We discuss optimization strategies used and their degree of success to increase performance of an MPI-based unstructured finite element simulation code written in Fortran 90. We discuss performance results based on implementations using several modern massively parallel computing platforms including the SGI Origin 3800, IBM Nighthawk 2 SMP, and Cray T3E-1200.  相似文献   

Many significant engineering and scientific problems involve optimization of some criteria over a combinatorial configuration space. The two methods most often used to solve these problems effectively-simulated annealing (SA) and genetic algorithms (GA)-do not easily lend themselves to massive parallel implementations. Simulated annealing is a naturally serial algorithm, while GA involves a selection process that requires global coordination. This paper introduces a new hybrid algorithm that inherits those aspects of GA that lend themselves to parallelization, and avoids serial bottle-necks of GA approaches by incorporating elements of SA to provide a completely parallel, easily scalable hybrid GA/SA method. This new method, called Genetic Simulated Annealing, does not require parallelization of any problem specific portions of a serial implementation-existing serial implementations can be incorporated as is. Results of a study on two difficult combinatorial optimization problems, a 100 city traveling salesperson problem and a 24 word, 12 bit error correcting code design problem, performed on a 16 K PE MasPar MP-1, indicate advantages over previous parallel GA and SA approaches. One of the key results is that the performance of the algorithm scales up linearly with the increase of processing elements, a feature not demonstrated by any previous parallel GA or SA approaches, which enables the new algorithm to utilize massive parallel architecture with maximum effectiveness. Additionally, the algorithm does not require careful choice of control parameters, a significant advantage over SA and GA  相似文献   

A functional programming language supporting implicit parallelization of programs is described. The language is based on four operations of composition, of which three can perform parallel processing. Functional programs are represented schematically to use a dynamic parallelization algorithm. The implemented algorithms make it possible to dynamically distribute the load between processors and control the grain of parallelism. Experimental results for the efficiency of the implemented system obtained on examples of typical problems are presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号