期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

CONSAT: A parallel constraint satisfaction system

Kinson Ho Hans W. Guesgen Paul N. Hilfinger 《LISP and Symbolic Computation》1994,7(2-3):195-210

In this paper we describe the parallelization of a medium-size symbolic fixed-point computation, CONSAT. CONSAT is a constraint satisfaction system that computes globally consistent solutions. The parallel version of CONSAT is implemented using abstractions from a parallel programming toolbox we developed. The toolbox is intended for novice parallel programmers, and programs based on abstractions from this toolbox may be executed on both uniprocessors and shared-memory multiprocessors without modifications. We explain how parallelism is introduced, and how concurrent accesses to shared data structures are handled. We will also describe the performance of CONSAT on sample inputs. 相似文献

2.

Interactive parallel programming using the ParaScope Editor

Kennedy K. McKinley K.S. Tseng C.-W. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(3):329-341

The ParaScope Editor, an intelligent interactive editor for parallel Fortran programs, which is the centerpiece of the ParaScope project, an integrated collection of tools to help scientific programmers implement correct and efficient parallel programs, is discussed. ParaScope Editor reveals to users potential hazards of a proposed parallelization in a program. It provides a variety of powerful interactive program transformations that have been shown useful in converting programs to parallel form. ParaScope Editor supports general user editing through a hybrid text and structure editing facility that incrementally analyzes the modified program for potential hazards. It is shown that ParaScope Editor supports an exploratory programming style in which users get immediate feedback on their various strategies for parallelization 相似文献

3.

Parallel Bayesian inference of range and reflectance from LaDAR profiles

Jing Ye Andrew M. Wallace Abdallah Al Zain John Thompson 《Journal of Parallel and Distributed Computing》2013

Bayesian analysis using reversible jump Markov chain Monte Carlo (RJMCMC) algorithms improves the measurement accuracy, resolution and sensitivity of full waveform laser detection and ranging (LaDAR), but at a significant computational cost. Parallel processing has the potential to significantly reduce the processing time, but although there have been several strategies for Markov chain Monte Carlo (MCMC) parallelization, adaptation of these strategies to RJMCMC may degrade parallel performance. 相似文献

4.

MCNP-4C多粒子输运蒙特卡罗程序的MPI并行化 总被引：1，自引：0，他引：1

邓力张文勇《数值计算与计算机应用》2006,27(1):52-59

三维连续截面多粒子输运蒙特卡罗程序MCNP-4C-经过MPI并行改造,实现了MPI 并行化．采用分段随机数发生器,并行取得了与串行完全一致的结果,500个处理器的计算速度较串行提高了460倍,并行效率达到92％,可计算包括临界在内的多粒子输运问题．相似文献

5.

Language-based vectorization and parallelization using intrinsics,OpenMP, TBB and Cilk Plus

Przemysław Stpiczyński 《The Journal of supercomputing》2018,74(4):1461-1472

The aim of this paper is to evaluate OpenMP, TBB and Cilk Plus as basic language-based tools for simple and efficient parallelization of recursively defined computational problems and other problems that need both task and data parallelization techniques. We show how to use these models of parallel programming to transform a source code of Adaptive Simpson’s Integration to programs that can utilize multiple cores of modern processors. Using the example of Belman–Ford algorithm for solving single-source shortest path problems, we advise how to improve performance of data parallel algorithms by tuning data structures for better utilization of vector extensions of modern processors. Manual vectorization techniques based on Cilk array notation and intrinsics are presented. We also show how to simplify such optimization using Intel SIMD Data Layout Template containers. 相似文献

6.

Computing programs containing band linear recurrences on vectorsupercomputers

Haigeng Wang Nicolau A. Keung S. Kai-Yeung Siu 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(8):769-782

Many large-scale scientific and engineering computations, e.g., some of the Grand Challenge problems, spend a major portion of execution time in their core loops computing band linear recurrences (BLRs). Conventional compiler parallelization techniques cannot generate scalable parallel code for this type of computation because they respect loop-carried dependences (LCDs) in programs, and there is a limited amount of parallelism in a BLR with respect to LCDs. For many applications, using library routines to replace the core BLR requires the separation of BLR from its dependent computation, which usually incurs significant overhead. In this paper, we present a new scalable algorithm called the Regular Schedule, for parallel evaluation of BLRs. We describe our implementation of the Regular Schedule and discuss how to obtain maximum memory throughput in implementing the schedule on vector supercomputers. We also illustrate our approach, based on our Regular Schedule, to parallelizing programs containing BLR and other kinds of code. Significant improvements in CPU performance for a range of programs containing BLR implemented using the Regular Schedule in C over the same programs implemented using highly optimized coded-in-assembly BLAS routines [11] are demonstrated on Convex C240. Our approach can be used both at the user level in parallel programming code containing BLRs, and in compiler parallelization of such programs combined with recurrence recognition techniques for vector supercomputers 相似文献

7.

Aspect‐oriented component assembly—a case study in parallel software design

C. Dangelmayr W. Blochinger 《Software》2009,39(9):807-832

In this paper we deal with building parallel programs based on sequential application code and generic components providing specific functionality for parallelization, like load balancing or fault tolerance. We describe an architectural approach employing aspect‐oriented programming to assemble arbitrary object‐oriented components. Several non‐trivial crosscutting concerns arising from parallelization are addressed in the light of different applications, which are representative of the most common types of parallelism. In particular, we demonstrate how aspect‐oriented techniques allow us to leave all existing code untouched. We evaluate and compare our approach with its counterparts in conventional object‐oriented programming. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

8.

A Parallel Implementation of the Simplex Function Minimization Routine

Donghoon Lee Matthew Wiswall 《Computational Economics》2007,30(2):171-187

This paper generalizes the widely used Nelder and Mead (Comput J 7:308–313, 1965) simplex algorithm to parallel processors. Unlike most previous parallelization methods, which are based on parallelizing the tasks required to compute a specific objective function given a vector of parameters, our parallel simplex algorithm uses parallelization at the parameter level. Our parallel simplex algorithm assigns to each processor a separate vector of parameters corresponding to a point on a simplex. The processors then conduct the simplex search steps for an improved point, communicate the results, and a new simplex is formed. The advantage of this method is that our algorithm is generic and can be applied, without re-writing computer code, to any optimization problem which the non-parallel Nelder–Mead is applicable. The method is also easily scalable to any degree of parallelization up to the number of parameters. In a series of Monte Carlo experiments, we show that this parallel simplex method yields computational savings in some experiments up to three times the number of processors. 相似文献

9.

The BonaFide C Analyzer: automatic loop-level characterization and coverage measurement

Sergio Aldea Diego R. Llanos Arturo Gonzalez-Escribano 《The Journal of supercomputing》2014,68(3):1378-1401

The advent of multicore technologies has increased the interest in parallelization techniques for existing sequential applications. These techniques include the need of detecting loops that are good candidates for parallelization, and classifying all variables of these loops according to their use, a task surprisingly hard to be carried out manually. In this paper, we introduce the BonaFide C Analyzer, an XML-based framework that combines static analysis of source code with profiling information to generate complete reports regarding all loops in a C application, including loop coverage, loop suitability for parallelization, a classification of all variables inside loops based on their accesses, and other hurdles that restrict the parallelization. This information allows to analyze how particular language constructs are used in real-world applications, and helps the programmer to parallelize the code. To show the features of the framework, we present the results of an in-depth loop characterization of C applications that are part of the SPEC CPU2006 benchmark suite. Our study shows that 47.72 % of loops present in the applications analyzed are potentially parallelizable with existent parallel programming models such as OpenMP, while an additional 37.7 % of loops could be run in parallel with the help of runtime speculative parallelization techniques. 相似文献

10.

Support for Parallel and Concurrent Programming in C++

N. I. V’yukova V. A. Galatenko S. V. Samborskii 《Programming and Computer Software》2018,44(1):35-42

C++ was originally designed as a sequential programming language. For development of multithreaded applications, libraries, such as Pthreads, Windows threads, and Boost, are traditionally used. The C++11 standard introduced some basic concepts and means for developing parallel and concurrent programs, but the direct use of these low-level means requires high programming skills and significant efforts. The absence of high-level models of parallelism in C++ is somewhat compensated for by various parallel libraries and directive parallelization tools (such as OpenMP), as well as by language extensions supported by some compilers (Intel CilkPlus). Nevertheless, we still require more advanced means to express parallelism in programs at the level of language standard and language library. In this survey, we consider the means for parallel and concurrent programming that are included into the C++17 standard, as well as some capabilities that are to be expected in the future standards. 相似文献

11.

Parallelization of adaptive MC integrators

Richard Kreckel 《Computer Physics Communications》1997,106(3):258-266

Monte Carlo (MC) methods for numerical integration seem to be embarrassingly parallel on first sight. When adaptive schemes are applied in order to enhance convergence however, the seemingly most natural way of replicating the whole job on each processor can potentially ruin the adaptive behaviour. Using the popular VEGAS-Algorithm as an example an economic method of semi-micro parallelization with variable grain-size is presented and contrasted with another straightforward approach of macro-parallelization. A portable implementation of this semi-micro parallelization is used in the xloops-project and is made publicly available. 相似文献

12.

Portable parallelizing Fortran compiler

A. Averbuch R. Dekel E. Gabber 《Concurrency and Computation》1996,8(2):91-123

The Portable Parallelizing Fortran Compiler (PPFC) is an additional component for the portable programming environment developed in Tel-Aviv University for scientific code. This environment supports portable and efficient programming of diverse MIMD multiprocessors, both distributed- and shared-memory. Till now this environment has consisted of two tools: the Virtual Machine for MultiProcessors (VMMP) and the Portable Parallelizing Pascal compiler (P³C). We have added the PPFC which is an automatic parallelizer compiler for the Fortran language. The compiler is fully automatic (does not require additional declarations to assist parallelization), which is characterized by loops operating on regular data structures, and produces efficient and portable code for a variety of multiprocessors from the same serial code. The parallel implementation uses the VMMP, which is a software package that provides a coherent set of services for explicitly parallel application programs running on diverse MIMD multiprocessors. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. The PPFC parallelized 12 out of the 24 Livermore Loops. It was also applied to parallelize all the 14 Fortran application programs that where parallelized by the P³C and achieved the same speed-ups and efficiencies. In most examples the PPFC achieved high speed-ups and efficiencies on all target multiprocessors. The PPFC emphasizes efficiency and code portability. Although PPFC employs a relatively simple data flow analysis, it produces efficient code for various widely used application programs. 相似文献

13.

蒙特卡罗程序MCNP-Ⅱ与MCNP-5并行效率比较

邓力张文勇徐涵李刚《计算机工程与科学》2009,31(Z1)

通过MCNP-5程序MPI并行功能开发,与早前作者对MCNP-4C串行程序进行MPI并行化的程序MCNP-Ⅱ进行比较,两个程序均能运行在YH大型并行计算机上,且不同规模、不同处理器的计算结果基本一致。比较显示,MC-NP-Ⅱ在计算效率和并行可扩展性方面均优于MCNP-5。相似文献

14.

OpenGR: A directive-based grid programming environment

Motonori Hirano Mitsuhisa Sato Yoshio Tanaka 《Parallel Computing》2005,31(10-12):1140

A new grid programming environment for remote procedure call (RPC) based master–worker type task parallelization is presented. The environment is realized through the use of a set of compiler directives, called OpenGR, and is implemented in the present study based on the Omni OpenMP compiler system and Ninf-G grid-enabled RPC system as a parallel execution mechanism. Using OpenGR directives, existing sequential applications can be readily adapted to the grid environment as master–worker parallel programs using the RPC architecture. The combination of OpenGR and OpenMP directives also allows for the hybrid parallelization of sequential programs, supporting both synchronous and asynchronous parallelism. 相似文献

15.

A rigorous sequential update strategy for parallel kinetic Monte Carlo simulation

Jerome P. Nilmeier Jaime Marian 《Computer Physics Communications》2014

The kinetic Monte Carlo (kMC) method is used in many scientific fields in applications involving rare-event transitions. Due to its discrete stochastic nature, efforts to parallelize kMC approaches often produce unbalanced time evolutions requiring complex implementations to ensure correct statistics. In the context of parallel kMC, the sequential update technique has shown promise by generating high quality distributions with high relative efficiencies for short-range systems. In this work, we provide an extension of the sequential update method in a parallel context that rigorously obeys detailed balance, which guarantees exact equilibrium statistics for all parallelization settings. Our approach also preserves nonequilibrium dynamics with minimal error for many parallelization settings, and can be used to achieve highly precise sampling. 相似文献

16.

Exploiting Distributed-Memory and Shared-Memory Parallelism on Clusters of SMPs with Data Parallel Programs

Benkner Siegfried Sipkova Viera 《International journal of parallel programming》2003,31(1):3-19

Clusters of SMPs are hybrid-parallel architectures that combine the main concepts of distributed-memory and shared-memory parallel machines. Although SMP clusters are widely used in the high performance computing community, there exists no single programming paradigm that allows exploiting the hierarchical structure of these machines. Most parallel applications deployed on SMP clusters are based on MPI, the standard API for distributed-memory parallel programming, and thus may miss a number of optimization opportunities offered by the shared memory available within SMP nodes. In this paper we present extensions to the data parallel programming language HPF and associated compilation techniques for optimizing HPF programs on clusters of SMPs. The proposed extensions enable programmers to control key aspects of distributed-memory and shared-memory parallelization at a high-level of abstraction. Based on these language extensions, a compiler can adopt a hybrid parallelization strategy which closely reflects the hierarchical structure of SMP clusters by automatically exploiting shared-memory parallelism based on OpenMP within cluster nodes and distributed-memory parallelism utilizing MPI across nodes. We describe the implementation of these features in the VFC compiler and present experimental results which show the effectiveness of these techniques. 相似文献

17.

Software for supercomputers of the future

Ken Kennedy 《The Journal of supercomputing》1992,5(4):251-262

This paper discusses the software challenges that must be met if we are to effectively use the massively parallel supercomputers of the future. Focusing on Fortran, the discussion includes an assessment of the prospects for fully automatic parallelization of conventional programs and presents the principal features needed in software environments for parallel programming. 相似文献

18.

Parallel computation and FASTA: confronting the problem of parallel database search for a fast sequence comparison algorithm 总被引：4，自引：0，他引：4

P L Miller P M Nadkarni N M Carriero 《Computer applications in the biosciences》1991,7(1):71-78

We have parallelized the FASTA algorithm for biological sequence comparison using Linda, a machine-independent parallel programming language. The resulting parallel program runs on a variety of different parallel machines. A straight-forward parallelization strategy works well if the amount of computation to be done is relatively large. When the amount of computation is reduced, however, disk I/O becomes a bottleneck which may prevent additional speed-up as the number of processors is increased. The paper describes the parallelization of FASTA, and uses FASTA to illustrate the I/O bottleneck problem that may arise when performing parallel database search with a fast sequence comparison algorithm. The paper also describes several program design strategies that can help with this problem. The paper discusses how this bottleneck is an example of a general problem that may occur when parallelizing, or otherwise speeding up, a time-consuming computation. 相似文献

19.

Implementation of functional parallel typified language (FPTL) on multicore computers

V. P. Kutepov P. N. Shamal’ 《Journal of Computer and Systems Sciences International》2014,53(3):345-358

A functional programming language supporting implicit parallelization of programs is described. The language is based on four operations of composition, of which three can perform parallel processing. Functional programs are represented schematically to use a dynamic parallelization algorithm. The implemented algorithms make it possible to dynamically distribute the load between processors and control the grain of parallelism. Experimental results for the efficiency of the implemented system obtained on examples of typical problems are presented. 相似文献

20.

High-Level Parallel Ant Colony Optimization with Algorithmic Skeletons

de Melo Menezes Breno A. Herrmann Nina Kuchen Herbert Buarque de Lima Neto Fernando 《International journal of parallel programming》2021,49(6):776-801

Parallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.

相似文献