期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Evaluating the basic performance of the Intel iPSC/860 parallel computer

Rudolf Berrendorf Jukka Helin 《Concurrency and Computation》1992,4(3):223-240

We evaluate the basic performance of the Intel iPSC/860 computer, which can have up to 128 Intel i860-based nodes connected together with a hypercube network topology. After giving a brief overview of the system, the properties and bottlenecks of the hardware architecture and software environment are discussed. Basic memory, scalar and vector performance of a single node is evaluated, and the communication performance and the overlap of computation and communication are analysed. 相似文献

2.

The Argonne/GMD macros in FORTRAN for portable parallel programming and their implementation on the Intel iPSC/2

L. Bomans D. Roose R. Hempel 《Parallel Computing》1990,15(1-3):119-132

A macro package for expressing message passing functions within parallel FORTRAN program is presented. It makes the user program fully portable among all parallel computers where the macros are implemented. The implementation on the Intel iPSC/2 hypercube is discussed in more detail. New message passing primitives have been added to the iPSC/2 operating system, offering the user a broader functionality at no efficiency loss. The full macro set, using these primitives, works with the same performance as the original Intel primitives. 相似文献

3.

Benchmarking the iPSC/2 hypercube multiprocessor

Luc Bomans Dirk Roose 《Concurrency and Computation》1989,1(1):3-18

In this paper the performance of the Intel iPSC/2 hypercube multiprocessor is analyzed. Computation and communication performance for a number of benchmarks are presented. We derive some fundamental performance parameters of the machine. Further, we investigate the difference between several communication schemes. Using the results of our measurements we can highlight some features and peculiarities in the iPSC/2 hardware and software. Where possible we make a comparison with the iPSC/1 and Ncube hypercubes. 相似文献

4.

Adapting a Navier-Stokes solver for three parallel machines

R. A. Fatoohi 《The Journal of supercomputing》1994,8(2):91-115

This paper presents the results of parallelizing a three-dimensional Navier-Stokes solver on a 32K-processor Thinking Machines CM-2, a 128-node Intel iPSC/860, and an 8-processor CRAY Y-MP. The main objective of this work is to study the performance of the flow solver, INS3D-LU code, on two distributed-memory machines, a massively parallel SIMD machine (CM-2) and a moderately parallel MIMD machine (iPSC/860), and compare it with its performance on a shared-memory MIMD machine with a small number of processors (Y-MP). The code is based on a Lower-Upper Symmetric-Gauss-Seidel implicit scheme for the pseudocompressibility formulation of the three-dimensional incompressible Navier-Stokes equations. The code was rewritten in CMFORTRAN with shift operations and run on the CM-2 using the slicewise model. The code was also rewritten with distributed data and Intel message-passing calls and run on the iPSC/860. The timing results for two grid sizes are presented and analyzed using both 32-bit and 64-bit arithmetic. Also, the impact of communication and load balancing on the performance of the code is outlined. The results show that reasonable performance can be achieved on these parallel machines. However, the CRAY Y-MP outperforms the CM-2 and iPSC/860 for this particular algorithm.The author is an employee of Computer Sciences Corporation. This work was funded through NASA Contract NAS 2-12961. 相似文献

5.

Minimizing Communication Penalty of Triangular Solvers by Runtime Mesh Configuration and Workload Redistribution

Wang Dianqin Chu Eleanor 《The Journal of supercomputing》1999,14(1):77-95

In this article, we study the effects of network topology and load balancing on the performance of a new parallel algorithm for solving triangular systems of linear equations on distributed-memory message-passing multiprocessors. The proposed algorithm employs novel runtime data mapping and workload redistribution methods on a communication network which is configured as a toroidal mesh. A fully parameterized theoretical model is used to predict communication behaviors of the proposed algorithm relevant to load balancing, and the analytical performance results correctly determine the optimal dimensions of the toroidal mesh, which vary with the problem size, the number of available processors, and the hardware parameters of the machine. Further enhancement to the proposed algorithm is then achieved through redistributing the arithmetic workload at runtime. Our FORTRAN implementation of the proposed algorithm as well as its enhanced version has been tested on an Intel iPSC/2 hypercube, and the same code is also suitable for executing the algorithm on the iPSC/860 hypercube and the Intel Paragon mesh multiprocessor. The actual timing results support our theoretical findings, and they both confirm the very significant impact a network topology chosen at runtime can have on the computational load distribution, the communication behaviors and the overall performance of parallel algorithms. 相似文献

6.

PARALLEL PROCESSING FOR CHROMOSOME RECONSTRUCTION FROM PHYSICAL MAPS - A CASE STUDY OF MIMD PARALLELISM ON THE HYPERCUBE

《International Journal of Parallel, Emergent and Distributed Systems》2012,27(4):231-252

Ordering clones from a genomic library into physical maps of whole chromosomes presents a pivotal computational problem in genetics. Previous research has shown the physical mapping problem to be isomorphic to the NP-complete Optimal Linear Arrangement (OLA) problem for which no polynomial-time algorithm for determining the optimal solution is known. Serial implementations of stochastic global optimization techniques such as simulated annealing yielded very good results but proved computationally intensive. The design, analysis and implementation of coarse-grained parallel MIMD algorithms for simulated annealing on the Intel iPSC/860 hypercube is presented. Data decomposition and control decomposition strategies based on Markov chain decomposition, perturbation methods and problem-specific annealing heuristics are proposed and applied to the physical mapping problem. A suite of parallel algorithms are implemented on an 8-node Intel iPSC/860 hypercube, exploiting the nearest-neighbor communication pattern on the Boolean hypercube topology. Convergence, speedup and scalability characteristics of the various parallel algorithms are analyzed and discussed. Results indicate a deterioration of performance when a single Markov chain of solution states is distributed across multiple processing elements in the Intel iPSC/860 hypercube. 相似文献

7.

Parallel superconductor code on the iPSC/860

G. A. Geist B. Ginatempo W. A. Shelton G. M. Stocks 《The Journal of supercomputing》1992,6(2):153-162

Researchers at Oak Ridge National Laboratory have developed an application code for calculating the electronic properties and energetics of disordered materials. The same source code has been compiled and run on workstations, Crays, and the Intel iPSC/860. This electronic structures code is capable of running at over 2 gigaflops on both an 8-processor CRAY Y-MP and a 128-processor Intel iPSC/860. Using this new KKR-CPA code, we executed density-of-state computations of a perovskite superconductor at a rate of 2527 megaflops on the Intel iPSC/860. This corresponds to a price/performance rate of 842 megaflops per $1 million based on the list price of this computer. Similar but smaller computations done on a network of ten IBM RS/6000 workstations executed at a price/performance rate of 1.3 gigaflops per $1 million.This research was supported by the Applied Mathematical Sciences Research Program, Office of Energy Research, and the Division of Materials Sciences, U.S. Department of Energy, under contract DE-AC05-84OR21400 with Martin Marietta Energy Systems, Inc. 相似文献

8.

A platform for biological sequence comparison on parallel computers 总被引：1，自引：0，他引：1

A S Deshpande D S Richards W R Pearson 《Computer applications in the biosciences》1991,7(2):237-247

We have written two programs for searching biological sequence databases that run on Intel hypercube computers. PSCANLIB compares a single sequence against a sequence library, and PCOMPLIB compares all the entries in one sequence library against a second library. The programs provide a general framework for similarity searching; they include functions for reading in query sequences, search parameters and library entries, and reporting the results of a search. We have isolated the code for the specific function that calculates the similarity score between the query and library sequence; alternative searching algorithms can be implemented by editing two files. We have implemented the rapid FASTA sequence comparison algorithm and the more rigorous Smith-Waterman algorithm within this framework. The PSCANLIB program on a 16 node iPSC/2 80386-based hypercube can compare a 229 amino acid protein sequence with a 3.4 million residue sequence library in approximately 16 s with the FASTA algorithm. Using the Smith-Waterman algorithm, the same search takes 35 min. The PCOMPLIB program can compare a 0.8 million amino acid protein sequence library with itself in 5.3 min with FASTA on a third-generation 32 node Intel iPSC/860 hypercube. 相似文献

9.

基于工作站机群的PVM系统的序列比对 总被引：1，自引：0，他引：1

刘寿强潘春华桂兵祥吕国斌墙芳躅《计算机工程》2002,28(5):89-90,96

序列比对是分子生物学研究领域的一个重要的工具。在DNA数据量急剧增加的今天，高效的序列比对算法在研究新发现的次序中显得非常重要。通过Smith和Waterman法用PVM系统在工作站机群上已完成了分布式序列比对法。也同样在Inter iPSC/860高效性能并行计算机上获得了成功。这个分布式Smith-Waterman算法在Internet GRAIL和GENQUEST上充当搜索工具。该文论述了此算法的实现和性能指标。相似文献

10.

A distributed-memory implementation of the MCHF atomic structure package

Charlotte Froese Fischer Ming Tong Murry Bentley Zuchang Shen C. Ravimohan 《The Journal of supercomputing》1994,8(2):117-134

The MCHF (Multiconfiguration Hartree-Fock) atomic structure package consists of a series of programs that predict a range of atomic properties and communicate information through files. Several of these have now been modified for the distributed-memory environment. On the Intel iPSC/860 the restricted amount of memory and the lack of virtual memory required a redesign of the data organization with large arrays residing on disk. The data structures also had to be modified. To a large extent, data could be distributed among the nodes, but crucial to the performance of the MCHF program was the global information that is needed for an even distribution of the workload. This paper outlines the computational problems that must be solved in an atomic structure calculation and describes the strategies used to distribute both the data and the workload on a distributed-memory system. Performance data are provided for some benchmark calculations on the Intel iPSC/860. 相似文献

11.

A shift-register sequence random number generator implemented on the microcomputers with 8088/8086 and 8087

《Computer Physics Communications》1987,47(1):129-137

A shift-register sequence random number generator is programmed in Intel 8088/8086 and 8087 assembly language. Its performance is good in terms of its speed, long period and consistently producing reliable in extensive statistical tests. This generator may promote the microcomputers with 8088/8086 and 8087 to become a workstation for developing programs for Monte Carlo simulations of lattice field theories. 相似文献

12.

Recursive Least-Squares Problems on Distributed-Memory Multiprocessors

《Journal of Parallel and Distributed Computing》1995,24(1):11-26

We discuss implementations of block algorithms for recursive least squares (RLS for short) problems on ring distributed-memory multiprocessors. We consider the sliding rectangular window case which involves triangularization followed by updating and downdating of the data matrix. We compare several schemes for computing the current least-squares solution, including a direct back-substitution scheme and a scheme where the previous solution vector is updated to the current solution vector by adding the so-called Kalman gain vector. The techniques are implemented on a linear array of transputers and on the Intel iPSC/2 hypercube, and evaluated with respect to their execution time and numerical accuracy. 相似文献

13.

基于RMC的蒙特卡罗程序性能优化

徐海坤匡邓晖刘杰龚春叶《计算机工程与科学》2021,43(4):634-640

蒙特卡罗MC方法是核反应堆设计和分析中重要的粒子输运模拟方法。MC方法能够模拟复杂几何形状且计算结果精度高,缺点是需要耗费大量时间进行上亿规模粒子模拟。如何提高蒙特卡罗程序的性能成为大规模蒙特卡罗数值模拟的挑战。基于堆用蒙特卡罗分析程序RMC,先后开展了基于TCMalloc动态内存分配优化、OpenMP线程调度策略优化、vector内存对齐优化和基于HDF5的并行I/O优化等一系列优化手段,对于200万粒子的算例,使其总体性能提高26.45%以上。相似文献

14.

面向多智能体博弈的并行蒙特卡洛树搜索算法研究

管延霞刘逊韵刘运韬谢旻徐新海《计算机工程与科学》2022,44(12):2128-2133

蒙特卡洛树搜索算法是一种常用的强化学习算法,博弈过程中动态空间的指数级增长是制约该算法学习效率的因素。基于并行方法对蒙特卡洛树搜索算法进行优化,提出基于胜率估值传递的并行蒙特卡洛树搜索算法。改进后的并行博弈搜索策略框架包含一个主进程和多个子进程,其中子进程用于探索,主进程根据子进程传递的胜率估值数据进行决策。结合多智能体博弈平台Pommerman进行实验验证,与传统的蒙特卡罗树搜索算法相比,并行蒙特卡罗树搜索算法有效提高了资源利用率、博弈胜率及决策效率。相似文献

15.

Three-dimensional electromagnetic particle-in-cell with Monte Carlo collision simulations on three MIMD parallel computers

J. Wang P. Liewer E. Huang 《The Journal of supercomputing》1997,10(4):331-348

A three-dimensional electromagnetic particle-in-cell code with Monte Carlo collision (PIC-MCC) is developed for MIMD parallel supercomputers. This code uses a standard relativistic leapfrog scheme incorporating Monte Carlo calculations to push plasma particles and to include collisional effects on particle orbits. A local finite-difference time-domain method is used to update the self-consistent electromagnetic fields. The code is implemented using the General Concurrent PIC (GCPIC) algorithm, which uses domain decomposition to divide the computation among the processors. Particles must be exchanged between processors as they move among subdomains. Message passing is implemented using the Express Cubix library and the PVM. We evaluate the performance of this code using a 512-processor Intel Touchstone Delta, a 512-processor Intel Paragon, and a 256-processor CRAY T3D. It is shown that a high parallel efficiency exceeding 95% has been achieved on all three machines for large problems. We have run PIC-MCC simulations using several hundred million particles with several million collisions per time step. For these large-scale simulations the particle push time achieved is in the range of 90–115 ns/particle/time step, and the collision calculation time in the range of a few hundred nanoseconds per collision. 相似文献

16.

A MIMD implementation of a parallel Euler solver for unstructured grids

V. Venkatakrishnan Horst D. Simon Timothy J. Barth 《The Journal of supercomputing》1992,6(2):117-137

A mesh-vertex finite volume scheme for solving the Euler equations on triangular unstructured meshes is implemented on a MIMD (multiple instruction/multiple data stream) parallel computer. Three partitioning strategies for distributing the work load onto the processors are discussed. Issues pertaining to the communication costs are also addressed. We find that the spectral bisection strategy yields the best performance. The performance of this unstructured computation on the Intel iPSC/860 compares very favorably with that on a one-processor CRAY Y-MP/1 and an earlier implementation on the Connection Machine.The authors are employees of Computer Sciences Corporation. This work was funded under contract NAS 2-12961 相似文献

17.

The Gamma database machine project 总被引：4，自引：0，他引：4

《Knowledge and Data Engineering, IEEE Transactions on》1990,2(1):44-62

The design of the Gamma database machine and the techniques employed in its implementation are described. Gamma is a relational database machine currently operating on an Intel iPSC/2 hypercube with 32 processors and 32 disk drives. Gamma employs three key technical ideas which enable the architecture to be scaled to hundreds of processors. First, all relations are horizontally partitioned across multiple disk drives, enabling relations to be scanned in parallel. Second, parallel algorithms based on hashing are used to implement the complex relational operators, such as join and aggregate functions. Third, dataflow scheduling techniques are used to coordinate multioperator queries. By using these techniques, it is possible to control the execution of very complex queries with minimal coordination. The design of the Gamma software is described and a thorough performance evaluation of the iPSC/s hypercube version of Gamma is presented 相似文献

18.

A case study in parallel computing: I. Homogeneous turbulence on a hypercube

Eric Jackson Zhen-Su She Steven A. Orszag 《Journal of scientific computing》1991,6(1):27-45

In this article we discuss the detailed implementation of a parallel pseudospectral code for integration of the Navier-Stokes equations on an Intel iPSC/860 Hypercube. Issues related to the basic efficient parallelization of the algorithm on a hypercube are discussed, as well as optimization issues specific to the iPSC/860 system. With the combination of optimizations presented, the code runs on a 32-node iPSC/860 system at a speed exceeding that of the fastest implementation on a Cray YMP by nearly 25%. 相似文献

19.

Service‐oriented middleware for financial Monte Carlo simulations on the cell broadband engine

T. Rotaru M. Dalheimer F.‐J. Pfreundt 《Concurrency and Computation》2010,22(5):643-657

Financial Monte Carlo simulations are computationally intensive applications that must meet tight deadlines in terms of job completion times. The completion time might have a huge impact on the financial profits made from decisions derived from the simulation results. Naturally, there is a huge interest in being able to simulate as fast as possible. While single simulations can be done on one machine, decisions often depend on portfolios of simulations. Distributing the workload among resources is crucial to achieve low latency. In this article we present a combination of a middleware with a high‐performance implementation of an Asian options evaluation code on the Cell Broadband Engine (CBE). We handle workload distribution with our PHASTGrid middleware and provide users with a web service interface to the whole infrastructure. The CBE is particularly suitable for Monte Carlo simulations. We implemented a well‐known algorithm on both the CBE and the Intel x86 multicore architectures. Both codes are integrated in our middleware, allowing a direct comparison of the performance and scalability. In addition to the Monte Carlo simulation, we also use different applications and compare our middleware with Globus. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

20.

Easy-to-use object-oriented parallel processing with Mentat

Grimshaw A.S. 《Computer》1993,26(5):39-51

Mentat, an object-oriented parallel processing system designed to directly address the difficulty of developing architecture-independent parallel programs, is discussed. The Mentat system consists of two components: the Mentat programming language and the Mentat runtime system. The Mentat programming language, which is based on C++, is described. Performance results from implementing the Mentat runtime system on a network of Sun 3 and 4 workstations, the Silicon Graphics Iris, the Intel iPSC/2, and the Intel iPSC/860 are presented 相似文献