期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

FracMAP: A user-interactive package for performing simulation and orientation-specific morphology analysis of fractal-like solid nano-agglomerates

Rajan K. Chakrabarty Mark A. Garro Shammah Chancellor Hans Moosmüller 《Computer Physics Communications》2009,180(8):1376-1381

Computer simulation techniques have found extensive use in establishing empirical relationships between three-dimensional (3d) and two-dimensional (2d) projected properties of particles produced by the process of growth through the agglomeration of smaller particles (monomers). In this paper, we describe a package, FracMAP, that has been written to simulate 3d quasi-fractal agglomerates and create their 2d pixelated projection images by restricting them to stable orientations as commonly encountered for quasi-fractal agglomerates collected on filter media for electron microscopy. Resulting 2d images are analyzed for their projected morphological properties.

Program summary

Program title: FracMAPCatalogue identifier: AEDD_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDD_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 4722No. of bytes in distributed program, including test data, etc.: 27 229Distribution format: tar.gzProgramming language: C++Computer: PCOperating system: Windows, LinuxRAM: 2.0 MegabytesClassification: 7.7Nature of problem: Solving for a suitable fractal agglomerate construction under constraints of typical morphological parameters.Solution method: Monte Carlo approximation.Restrictions: Problem complexity is not representative of run-time, since Monte Carlo iterations are of a constant complexity.Additional comments: The distribution file contains two versions of the FracMAP code, one for Windows and one for Linux.Running time: 1 hour for a fractal agglomerate of size 25 on a single processor. 相似文献

2.

Extending the use of canonical and microcanonical Monte Carlo algorithms to spin models

Carlos E. Fiore Mário J. de Oliveira 《Computer Physics Communications》2009,180(9):1434-1441

相似文献

3.

Computer simulation of two continuous spin models using Wang-Landau-Transition-Matrix Monte Carlo algorithm

Shyamal Bhar 《Computer Physics Communications》2009,180(5):699-707

Monte Carlo simulation using a combination of Wang-Landau (WL) and Transition Matrix (TM) Monte Carlo algorithms to simulate two lattice spin models with continuous energy is described. One of the models, the one-dimensional Lebwohl-Lasher model has an exact solution and we have used this to test the performance of the mixed algorithm (WLTM). The other system we have worked on is the two-dimensional XY-model. The purpose of the present work is to test the performance of the WLTM algorithm in continuous models and to suggest methods for obtaining best results in such systems using this algorithm. 相似文献

4.

Fast SIMDized Kalman filter based track fit

S. Gorbunov U. Kebschull V. Lindenstruth 《Computer Physics Communications》2008,178(5):374-383

Modern high energy physics experiments have to process terabytes of input data produced in particle collisions. The core of many data reconstruction algorithms in high energy physics is the Kalman filter. Therefore, the speed of Kalman filter based algorithms is of crucial importance in on-line data processing. This is especially true for the combinatorial track finding stage where the Kalman filter based track fit is used very intensively. Therefore, developing fast reconstruction algorithms, which use maximum available power of processors, is important, in particular for the initial selection of events which carry signals of interesting physics.One of such powerful feature supported by almost all up-to-date PC processors is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achieving more operations per clock cycle. The novel Cell processor extends the parallelization further by combining a general-purpose PowerPC processor core with eight streamlined coprocessing elements which greatly accelerate vector processing applications.In the investigation described here, after a significant memory optimization and a comprehensive numerical analysis, the Kalman filter based track fitting algorithm of the CBM experiment has been vectorized using inline operator overloading. Thus the algorithm continues to be flexible with respect to any CPU family used for data reconstruction.Because of all these changes the SIMDized Kalman filter based track fitting algorithm takes 1 μs per track that is 10000 times faster than the initial version. Porting the algorithm to a Cell Blade computer gives another factor of 10 of the speedup.Finally, we compare performance of the tracking algorithm running on three different CPU architectures: Intel Xeon, AMD Opteron and Cell Broadband Engine. 相似文献

5.

HASPRNG: Hardware Accelerated Scalable Parallel Random Number Generators

JunKyu Lee Yu Bi Gregory D. Peterson Robert J. Hinde Robert J. Harrison 《Computer Physics Communications》2009,180(12):2574-2581

The Scalable Parallel Random Number Generators library (SPRNG) supports fast and scalable random number generation with good statistical properties for parallel computational science applications. In order to accelerate SPRNG in high performance reconfigurable computing systems, we present the Hardware Accelerated SPRNG library (HASPRNG). Ported to the Xilinx University Program (XUP) and Cray XD1 reconfigurable computing platforms, HASPRNG includes the reconfigurable logic for Field Programmable Gate Arrays (FPGAs) along with a programming interface which performs integer random number generation that produces identical results with SPRNG. This paper describes the reconfigurable logic of HASPRNG exploiting the mathematical properties and data parallelism residing in the SPRNG algorithms to produce high performance and also describes how to use the programming interface to minimize the communication overhead between FPGAs and microprocessors. The programming interface allows a user to be able to use HASPRNG the same way as SPRNG 2.0 on platforms such as the Cray XD1. We also describe how to install HASPRNG and use it. For HASPRNG usage we discuss a FPGA π-estimator for a High Performance Reconfigurable Computer (HPRC) sample application and compare to a software π-estimator. HASPRNG shows 1.7x speedup over SPRNG on the Cray XD1 and is able to obtain substantial speedup for a HPRC application.

Program summary

Program title: HASPRNGCatalogue identifier: AEER_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEER_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 594 928No. of bytes in distributed program, including test data, etc.: 6 509 724Distribution format: tar.gzProgramming language: VHDL (XUP and Cray XD1), C++ (XUP), C (Cray XD1)Computer: PowerPC 405 (XUP) / AMD 2.2 GHz Opteron processor (Cray XD1)Operating system: LinuxFile size: 15 MB (XUP) / 22 MB (Cray XD1)Classification: 4.13Nature of problem: Many computational science applications are able to consume large numbers of random numbers. For example, Monte Carlo simulations such as π-estimation are able to consume limitless random numbers forthe computation as long as hardware resources for the computing are supported. Moreover, parallel computational science applications require independent streams of random numbers to attain statistically significant results. The SPRNG library provides this capability, but at a significant computational cost. The library presented here accelerates the generators of independent streams of random numbers.Solution method: Multiple copies of random number generators in FPGAs allow a computational science application to consume large numbers of random numbers from independent, parallel streams. HASPRNG is a random number generators library to allow a computational science application to employ the multiple copies of random number generators to boost performance. Users can interface HASPRNG with software code executing on microprocessors and/or with hardware applications executing on FPGAs. 相似文献

6.

Library of sophisticated functions for analysis of nuclear spectra

Miroslav Morhá? Vladislav Matoušek 《Computer Physics Communications》2009,180(10):1913-1940

In the paper we present compact library for analysis of nuclear spectra. The library consists of sophisticated functions for background elimination, smoothing, peak searching, deconvolution, and peak fitting. The functions can process one- and two-dimensional spectra. The software described in the paper comprises a number of conventional as well as newly developed methods needed to analyze experimental data.

Program summary

Program title: SpecAnalysLib 1.1Catalogue identifier: AEDZ_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDZ_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 42 154No. of bytes in distributed program, including test data, etc.: 2 379 437Distribution format: tar.gzProgramming language: C++Computer: Pentium 3 PC 2.4 GHz or higher, Borland C++ Builder v. 6. A precompiled Windows version is included in the distribution packageOperating system: Windows 32 bit versionsRAM: 10 MBWord size: 32 bitsClassification: 17.6Nature of problem: The demand for advanced highly effective experimental data analysis functions is enormous. The library package represents one approach to give the physicists the possibility to use the advanced routines simply by calling them from their own programs. SpecAnalysLib is a collection of functions for analysis of one- and two-parameter γ-ray spectra, but they can be used for other types of data as well. The library consists of sophisticated functions for background elimination, smoothing, peak searching, deconvolution, and peak fitting.Solution method: The algorithms of background estimation are based on Sensitive Non-linear Iterative Peak (SNIP) clipping algorithm. The smoothing algorithms are based on the convolution of the original data with several types of filters and algorithms based on discrete Markov chains. The peak searching algorithms use the smoothed second differences and they can search for peaks of general form. The deconvolution (decomposition - unfolding) functions use the Gold iterative algorithm, its improved high resolution version and Richardson-Lucy algorithm. In the algorithms of peak fitting we have implemented two approaches. The first one is based on the algorithm without matrix inversion - AWMI algorithm. It allows it to fit large blocks of data and large number of parameters. The other one is based on the calculation of the system of linear equations using Stiefel-Hestens method. It converges faster than the AWMI, however it is not suitable for fitting large number of parameters.Restrictions: Dimensionality of the analyzed data is limited to two.Unusual features: Dynamically loadable library (DLL) of processing functions users can call from their own programs.Running time: Most processing routines execute interactively or in a few seconds. Computationally intensive routines (deconvolution, fitting) execute longer, depending on the number of iterations specified and volume of the processed data. 相似文献

7.

ESPResSo—an extensible simulation package for research on soft matter systems

H.J. Limbach A. Arnold B.A. Mann 《Computer Physics Communications》2006,174(9):704-727

相似文献

8.

Scalable and portable visualization of large atomistic datasets

Ashish Sharma Rajiv K. Kalia Aiichiro Nakano Priya Vashishta 《Computer Physics Communications》2004,163(1):53-64

A scalable and portable code named Atomsviewer has been developed to interactively visualize a large atomistic dataset consisting of up to a billion atoms. The code uses a hierarchical view frustum-culling algorithm based on the octree data structure to efficiently remove atoms outside of the user's field-of-view. Probabilistic and depth-based occlusion-culling algorithms then select atoms, which have a high probability of being visible. Finally a multiresolution algorithm is used to render the selected subset of visible atoms at varying levels of detail. Atomsviewer is written in C++ and OpenGL, and it has been tested on a number of architectures including Windows, Macintosh, and SGI. Atomsviewer has been used to visualize tens of millions of atoms on a standard desktop computer and, in its parallel version, up to a billion atoms.

Program summary

Title of program: AtomsviewerCatalogue identifier: ADUMProgram summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUMProgram obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandComputer for which the program is designed and others on which it has been tested: 2.4 GHz Pentium 4/Xeon processor, professional graphics card; Apple G4 (867 MHz)/G5, professional graphics cardOperating systems under which the program has been tested: Windows 2000/XP, Mac OS 10.2/10.3, SGI IRIX 6.5Programming languages used: C++, C and OpenGLMemory required to execute with typical data: 1 gigabyte of RAMHigh speed storage required: 60 gigabytesNo. of lines in the distributed program including test data, etc.: 550 241No. of bytes in the distributed program including test data, etc.: 6 258 245Number of bits in a word: ArbitraryNumber of processors used: 1Has the code been vectorized or parallelized: NoDistribution format: tar gzip fileNature of physical problem: Scientific visualization of atomic systemsMethod of solution: Rendering of atoms using computer graphic techniques, culling algorithms for data minimization, and levels-of-detail for minimal renderingRestrictions on the complexity of the problem: NoneTypical running time: The program is interactive in its executionUnusual features of the program: NoneReferences: The conceptual foundation and subsequent implementation of the algorithms are found in [A. Sharma, A. Nakano, R.K. Kalia, P. Vashishta, S. Kodiyalam, P. Miller, W. Zhao, X.L. Liu, T.J. Campbell, A. Haas, Presence—Teleoperators and Virtual Environments 12 (1) (2003)]. 相似文献

9.

ERI sorting for emerging processor architectures

Tirath Ramdas Gregory K. Egan 《Computer Physics Communications》2009,180(8):1221-1229

Electron Repulsion Integrals (ERIs) are a common bottleneck in ab initio computational chemistry. It is known that sorted/reordered execution of ERIs results in efficient SIMD/vector processing. This paper shows that reconfigurable computing and heterogeneous processor architectures can also benefit from a deliberate ordering of ERI tasks. However, realizing these benefits as net speedup requires a very rapid sorting mechanism. This paper presents two such mechanisms. Included in this study are analytical, simulation-based, and experimental benchmarking approaches to consider five use cases for ERI sorting, i.e. SIMD processing, reconfigurable computing, limited address spaces, instruction cache exploitation, and data cache exploitation. Specific consideration is given to existing cache-based processors, FPGAs, and the Cell Broadband Engine processor. It is proposed that the analyses conducted in this work should be built upon to aid the development of software autotuners which will produce efficient ab initio computational chemistry codes for a variety of computer architectures. 相似文献

10.

In situ ray tracing and computational steering for interactive blood flow simulation

Marco D. Mazzeo 《Computer Physics Communications》2010,181(2):355-3999

Recent algorithm and hardware developments have significantly improved our capability to interactively visualise time-varying flow fields. However, when visualising very large dynamically varying datasets interactively there are still limitations in the scalability and efficiency of these methods. Here we present a rendering pipeline which employs an efficient in situ ray tracing technique to visualise flow fields as they are simulated. The ray casting approach is particularly well suited for the visualisation of large and sparse time-varying datasets, where it is capable of rendering fluid flow fields at high image resolutions and at interactive frame rates on a single multi-core processor using OpenMP. The parallel implementation of our in situ visualisation method relies on MPI, requires no specialised hardware support, and employs the same underlying spatial decomposition as the fluid simulator. The visualisation pipeline allows the user to operate on a commodity computer and explore the simulation output interactively. Our simulation environment incorporates numerous features that can be utilised in a wide variety of research contexts. 相似文献

11.

Efficient parallel implementation of Bose Hubbard model: Exact numerical ground states and dynamics of gaseous Bose-Einstein condensates

Mary Ann E. Leung William P. Reinhardt 《Computer Physics Communications》2007,177(4):348-356

We present a parallel implementation of the Bose Hubbard model, using imaginary time propagation to find the lowest quantum eigenstate and real time propagation for simulation of quantum dynamics. Scaling issues, performance of sparse matrix-vector multiplication, and a parallel algorithm for determining nonzero matrix elements are described. Implementation of imaginary time propagation yields an O(N) linear convergence on a single processor and slightly better than ideal performance on up to 160 processors for a particular problem size. The determination of the nonzero matrix elements is intractable using sequential non-optimized techniques for large problem sizes. Thus, we discuss a parallel algorithm that takes advantage of the intrinsic structural characteristics of the Fock-space matrix representation of the Bose Hubbard Hamiltonian and utilizes a parallel implementation of a Fock state look up table to make this task solvable within reasonable timeframes. Our parallel algorithm demonstrates near ideal scaling on thousand of processors. We include results for a matrix 22.6 million square, with 202 million nonzero elements, utilizing 2048 processors. 相似文献

12.

Quantum Monte Carlo on graphical processing units

Amos G. Anderson William A. Goddard III 《Computer Physics Communications》2007,177(3):298-306

Quantum Monte Carlo (QMC) is among the most accurate methods for solving the time independent Schrödinger equation. Unfortunately, the method is very expensive and requires a vast array of computing resources in order to obtain results of a reasonable convergence level. On the other hand, the method is not only easily parallelizable across CPU clusters, but as we report here, it also has a high degree of data parallelism. This facilitates the use of recent technological advances in Graphical Processing Units (GPUs), a powerful type of processor well known to computer gamers. In this paper we report on an end-to-end QMC application with core elements of the algorithm running on a GPU. With individual kernels achieving as much as 30× speed up, the overall application performs at up to 6× faster relative to an optimized CPU implementation, yet requires only a modest increase in hardware cost. This demonstrates the speedup improvements possible for QMC in running on advanced hardware, thus exploring a path toward providing QMC level accuracy as a more standard tool. The major current challenge in running codes of this type on the GPU arises from the lack of fully compliant IEEE floating point implementations. To achieve better accuracy we propose the use of the Kahan summation formula in matrix multiplications. While this drops overall performance, we demonstrate that the proposed new algorithm can match CPU single precision. 相似文献

13.

Parallelization issues of a code for physically-based simulation of fabrics

Sergio Romero Emilio L. Zapata 《Computer Physics Communications》2004,162(3):188-202

The simulation of fabrics, clothes, and flexible materials is an essential topic in computer animation of realistic virtual humans and dynamic sceneries. New emerging technologies, as interactive digital TV and multimedia products, make necessary the development of powerful tools to perform real-time simulations. Parallelism is one of such tools. When analyzing computationally fabric simulations we found these codes belonging to the complex class of irregular applications. Frequently this kind of codes includes reduction operations in their core, so that an important fraction of the computational time is spent on such operations. In fabric simulators these operations appear when evaluating forces, giving rise to the equation system to be solved. For this reason, this paper discusses only this phase of the simulation. This paper analyzes and evaluates different irregular reduction parallelization techniques on ccNUMA shared memory machines, applied to a real, physically-based, fabric simulator we have developed. Several issues are taken into account in order to achieve high code performance, as exploitation of data access locality and parallelism, as well as careful use of memory resources (memory overhead). In this paper we use the concept of data affinity to develop various efficient algorithms for reduction parallelization exploiting data locality. 相似文献

14.

Reweighting histogram technique for the quantum Heisenberg ferromagnet

Adauto J.F. de Souza M.L. Lyra 《Computer Physics Communications》2002,146(1):16-23

In this work, we extend the reweighting histogram technique (RHT) to make it suitable for the study of quantum spin systems. Combining the Handscomb quantum Monte Carlo method with RHT, we compute some equilibrium critical parameters for the quantum S=1/2 Heisenberg ferromagnet on simple cubic lattices of sizes up to L=48. We found k_BT_c/J=1.67764(2), U∗=0.4504(1), γ=1.397(20), β=0.360(6) and ν=0.705(10). These are in perfect agreement with the best field theory and high temperature series results. Also, we found that the auto-correlation time near criticality scales roughly with the system's volume. Potential applications of the method to the study of general quantum spin systems are discussed. 相似文献

15.

PANMIN: sequential and parallel global optimization procedures with a variety of options for the local search strategy

F.V Theos D.G Papageorgiou 《Computer Physics Communications》2004,159(1):63-69

We present two sequential and one parallel global optimization codes, that belong to the stochastic class, and an interface routine that enables the use of the Merlin/MCL environment as a non-interactive local optimizer. This interface proved extremely important, since it provides flexibility, effectiveness and robustness to the local search task that is in turn employed by the global procedures. We demonstrate the use of the parallel code to a molecular conformation problem.

Program summary

Title of program: PANMINCatalogue identifier: ADSUProgram summary URL:http://cpc.cs.qub.ac.uk/summaries/ADSUProgram obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandComputer for which the program is designed and others on which it has been tested: PANMIN is designed for UNIX machines. The parallel code runs on either shared memory architectures or on a distributed system. The code has been tested on a SUN Microsystems ENTERPRISE 450 with four CPUs, and on a 48-node cluster under Linux, with both the GNU g77 and the Portland group compilers. The parallel implementation is based on MPI and has been tested with LAM MPI and MPICHInstallation: University of Ioannina, GreeceProgramming language used: Fortran-77Memory required to execute with typical data: Approximately O(n²) words, where n is the number of variablesNo. of bits in a word: 64No. of processors used: 1 or manyHas the code been vectorised or parallelized?: Parallelized using MPINo. of bytes in distributed program, including test data, etc.: 147163No. of lines in distributed program, including the test data, etc.: 14366Distribution format: gzipped tar fileNature of physical problem: A multitude of problems in science and engineering are often reduced to minimizing a function of many variables. There are instances that a local optimum does not correspond to the desired physical solution and hence the search for a better solution is required. Local optimization techniques can be trapped in any local minimum. Global Optimization is then the appropriate tool. For example, solving a non-linear system of equations via optimization, one may encounter many local minima that do not correspond to solutions, i.e. they are far from zeroMethod of solution: PANMIN is a suite of programs for Global Optimization that take advantage of the Merlin/MCL optimization environment [1,2]. We offer implementations of two algorithms that belong to the stochastic class and use local searches either as intermediate steps or as solution refinementRestrictions on the complexity of the problem: The only restriction is set by the available memory of the hardware configuration. The software can handle bound constrained problems. The Merlin Optimization environment must be installed. Availability of an MPI installation is necessary for executing the parallel codeTypical running time: Depending on the objective functionReferences: [1] D.G. Papageorgiou, I.N. Demetropoulos, I.E. Lagaris, Merlin-3.0. A multidimensional optimization environment, Comput. Phys. Commun. 109 (1998) 227-249. [2] D.G. Papageorgiou, I.N. Demetropoulos, I.E. Lagaris, The Merlin Control Language for strategic optimization, Comput. Phys. Commun. 109 (1998) 250-275. 相似文献

16.

The application hosting environment: Lightweight middleware for grid-based computational science

P.V. Coveney R.S. Saksena M. McKeown 《Computer Physics Communications》2007,176(6):406-418

Grid computing is distributed computing performed transparently across multiple administrative domains. Grid middleware, which is meant to enable access to grid resources, is currently widely seen as being too heavyweight and, in consequence, unwieldy for general scientific use. Its heavyweight nature, especially on the client-side, has severely restricted the uptake of grid technology by computational scientists. In this paper, we describe the Application Hosting Environment (AHE) which we have developed to address some of these problems. The AHE is a lightweight, easily deployable environment designed to allow the scientist to quickly and easily run legacy applications on distributed grid resources. It provides a higher level abstraction of a grid than is offered by existing grid middleware schemes such as the Globus Toolkit. As a result, the computational scientist does not need to know the details of any particular underlying grid middleware and is isolated from any changes to it on the distributed resources. The functionality provided by the AHE is ‘application-centric’: applications are exposed as web services with a well-defined standards-compliant interface. This allows the computational scientist to start and manage application instances on a grid in a transparent manner, thus greatly simplifying the user experience. We describe how a range of computational science codes have been hosted within the AHE and how the design of the AHE allows us to implement complex workflows for deployment on grid infrastructure. 相似文献

17.

A simulator for ensemble quantum computing

Jeffrey A. GeorgeMichael E. Colvin V.V. Krishnan 《Computer Physics Communications》2002,144(3):277-283

相似文献

18.

A fine grained parallel smooth particle mesh Ewald algorithm for biophysical simulation studies: Application to the 6-D torus QCDOC supercomputer

Bin Fang Yuefan Deng 《Computer Physics Communications》2007,177(4):362-377

In order to model complex heterogeneous biophysical macrostructures with non-trivial charge distributions such as globular proteins in water, it is important to evaluate the long range forces present in these systems accurately and efficiently. The Smooth Particle Mesh Ewald summation technique (SPME) is commonly used to determine the long range part of electrostatic energy in large scale molecular simulations. While the SPME technique does not give rise to a performance bottleneck on a single processor, current implementations of SPME on massively parallel, supercomputers become problematic at large processor numbers, limiting the time and length scales that can be reached. Here, a synergistic investigation involving method improvement, parallel programming and novel architectures is employed to address this difficulty. A relatively simple modification of the SPME technique is described which gives rise to both improved accuracy and efficiency on both massively parallel and scalar computing platforms. Our fine grained parallel implementation of the modified SPME method for the novel QCDOC supercomputer with its 6D-torus architecture is then given. Numerical tests of algorithm performance on up to 1024 processors of the QCDOC machine at BNL are presented for two systems of interest, a β-hairpin solvated in explicit water, a system which consists of 1142 water molecules and a 20 residue protein for a total of 3579 atoms, and the HIV-1 protease solvated in explicit water, a system which consists of 9331 water molecules and a 198 residue protein for a total of 29508 atoms. 相似文献

19.

Efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers

Chun-Yuan Lin Yeh-Ching Chung Jen-Shiuh Liu 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(7):625-639

Array operations are useful in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, atmosphere and ocean sciences, etc. In our previous work, we have proposed a scheme of extended Karnaugh map representation (EKMR) for multidimensional array representation. We have shown that sequential multidimensional array operation algorithms based on the EKMR scheme have better performance than those based on the traditional matrix representation (TMR) scheme. Since parallel multidimensional array operations have been an extensively investigated problem, we present efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers. In a data parallel programming paradigm, in general, we distribute array elements to processors based on various distribution schemes, do local computation in each processor, and collect computation results from each processor. Based on the row, column, and 2D mesh distribution schemes, we design data parallel algorithms for matrix-matrix addition and matrix-matrix multiplication array operations in both TMR and EKMR schemes for multidimensional arrays. We also design data parallel algorithms for six Fortran 90 array intrinsic functions: All, Maxval, Merge, Pack, Sum, and Cshift. We compare the time of the data distribution, the local computation, and the result collection phases of these array operations based on the TMR and the EKMR schemes. The experimental results show that algorithms based on the EKMR scheme outperform those based on the TMR scheme for all test cases. 相似文献

20.

Parallel Monte Carlo simulations by asynchronous domain decomposition

Alfred Uhlherr 《Computer Physics Communications》2003,155(1):31

A simple general method for performing Metropolis Monte Carlo condensed matter simulations on parallel processors is examined. The method is based on the cyclic generation of temporary discrete domains within the system, which are separated by distances greater than the inter-particle interaction range. Particle configurations within each domain are then sampled independently by an assigned processor, whilst particles outside these domains are held fixed. Results for a simulated Lennard-Jones fluid confirm that the method rigorously satisfies the detailed balance condition, and that the efficiency of configurational sampling scales almost linearly with the number of processors. Furthermore, the number of iterations performed on a given processor can be essentially arbitrary, with very low levels of inter-process communication. Provided the CPU time per step is not state-dependent, the method can then be used to perform large calculations as unsupervised background tasks on heterogeneous networks. 相似文献