期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

BJ－1并行计算机的设计与实现

韩承德薛一波《计算机学报》1995,(12)

并行处理是当今计算技术的关键技术，也是新一代计算机的结构特征．我们从基本原理和实现技术两个方面对并行处理技术进行了研究．本文介绍了ＢＪ－１并行计算机的设计原则、设计与实现、性能指标和性能测试结果．相似文献

2.

Hidden Surface Elimination on Parallel Processors

Julian C. Highfield Helmut E. Bez 《Computer Graphics Forum》1992,11(5):293-307

相似文献

3.

Uniform random number generators for parallel computers

Istvn Dek 《Parallel Computing》1990,15(1-3):155-164

Almost all simulational computations require uniformly distributed random numbers. Generators of uniform random numbers are considered and assessed with respect to their possible use on parallel computers. Two recent, commercially available computers are given special attention: the Connection Machine and the T Series. Feedback shift register type generators with a large Mersenne prime are recommended for implementation on these computers. 相似文献

4.

A parallel implementation of the Wang-Landau algorithm

Lixin Zhan 《Computer Physics Communications》2008,179(5):339-344

The Wang-Landau algorithm is a flat-histogram Monte Carlo method that performs random walks in the configuration space of a system to obtain a close estimation of the density of states iteratively. It has been applied successfully to many research fields. In this paper, we propose a parallel implementation of the Wang-Landau algorithm on computers of shared memory architectures by utilizing the OpenMP API for distributed computing. This implementation is applied to Ising model systems with promising speedups. We also examine the effects on the running speed when using different strategies in accessing the shared memory space during the updating procedure. The allowance of data race is recommended in consideration of the simulation efficiency. Such treatment does not affect the accuracy of the final density of states obtained. 相似文献

5.

Structural dynamic analysis on a parallel computer: The finite element machine

《Computers & Structures》1987,26(4):551-559

The development of general-purpose finite element computer software systems has provided the capability to analyze a wide range of linear and non-linear structural problems. However, these software systems are severely limited for non-linear response calculations because of the available speed on current sequential computers. Recent and projected advances in parallel multiple instruction multiple data (MIMD) computers provide an opportunity for significant gains in computing speed and for broadening the range of structural problems which may be solved. The key to these gains is the effective selection and implementation of algorithms which exploit parallel computing. This paper documents experiences solving transient response calculations on an experimental MIMD computer, termed the Finite Element Machine. The paper describes the algorithm used, its implementation for parallel computations, and results for representative one- and two-dimensional dynamic response test problems. The results show computation speedups of up to 7.83 for eight processors, and indicate that significant speedups of solution time are possible for non-linear dynamic response calculations through the use of many processors and appropriate parallel integration algorithms. The results are extremely encouraging and suggest that significant speedups in structural computations can be achieved through advances in parallel computers. 相似文献

6.

Parallel Software Abstractions for Structured Adaptive Mesh Methods

《Journal of Parallel and Distributed Computing》2001,61(6):713-736

We describe and analyze parallelization techniques for the implementation of portable structured adaptive mesh applications on distributed memory parallel computers. Such methods are difficult to implement on parallel computers because they employ elaborate dynamic data structures to selectively capture localized irregular phenomena. Our infrastructure supports a set of layered abstractions that encapsulate low-level details of resource management, such as grid generation, interprocessor communication, and load balancing. Our layered design also provides the flexibility necessary to accommodate new applications and to fine-tune performance. This flexibility has enabled us to show that the uniformity restrictions imposed by a data parallel Fortran implementation (e.g., HPF) would significantly impact performance of structured adaptive mesh methods. We present computational results from eigenvalue computation arising in materials design. 相似文献

7.

Massively parallel quantum computer simulator

K. De Raedt H. De Raedt B. Trieu 《Computer Physics Communications》2007,176(2):121-136

We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers. 相似文献

8.

The parallel ‘Deutschland-Modell’ - A message-passing version for distributed memory computers

Ulrich Schättler Elisabeth Krenzien 《Parallel Computing》1997,23(14):13-2226

The parallel ‘Deutschland-Modell’ and its implementation on distributed memory parallel computers using the message-passing library PARMACS 6.0 is described. Performance results on a Cray T3D are given and the problem of dynamical load imbalances is addressed. 相似文献

9.

Optimal implementation of morphological operations on neighborhood-connected parallel computers

X. Y. Jiang H. Bunke 《Annals of Mathematics and Artificial Intelligence》1995,13(3-4):301-315

To efficiently perform morphological operations on neighborhood-processing-based parallel image computers, we need to decompose structuring elements larger than the neighborhood that can be directly handled into neighborhood subsets. In the special case that the structuring element is a convex polygon, there are known decomposition algorithms in the literature. In this paper, we give an algorithm for the optimal decomposition of arbitrarily shaped structuring elements, enabling an optimal implementation of morphological operations on neighborhood-connected parallel computers in the general case. 相似文献

10.

Parallel processsing in finite flement structural analysis

Ahmed K. Noor 《Engineering with Computers》1988,3(4):225-241

A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on mechanisms for parallel processing, construction and implementation of parallel numerical algorithms, performance evaluation of parallel processing machines and numerical algorithms, and parallelism in finite element computations. A novel partitioning strategy is outlined for maximizing the degree of parallelism on computers with a small number of powerful processors. 相似文献

11.

HYPRE中多重网格解法器的并行可扩展性能分析

徐小文莫则尧曹小林《软件学报》2009,20(Z1):8-14

测试并分析了高性能预条件库HYPRE的多重网格解法器SMG和BoomerAMG在某国产大规模并行机数千个处理器上的可扩展性能,得到若干对线性解法器算法研究和并行实现技术发展具有启示性意义的结论.这些结论对实际复杂物理系统数值模拟中线性解法器的应用和发展具有一定的指导意义. 相似文献

12.

Implementation of the lanczos method for structural vibration analysis on a parallel computer

《Computers & Structures》1987,25(3):395-403

The eigenvalue problem associated with structural vibration analysis is a major, computationally-intensive activity in large-scale finite element calculations. Advances in parallel computers together with appropriate solution methods have the potential for providing high-speed computational power to aid eigenvalue solutions for these large problems. The key to exploiting this potential is the development of appropriate methods tailored for such parallel computers. This paper reports on experiences from a study involving the implementation of the Lanczos method on a parallel computer. The results of this study show that introducing shifts, assigning each processor a different region in the eigenvalue spectrum, and implementing the Lanczos calculation steps in parallel is an effective strategy for speeding up calculations. This approach provides good parallel performance and easy balance of processor workload. Two example vibration problems were solved to assess the behavior of the Lanczos implementation. The test-problem results include examples of the Lanczos phenomenon where lack of orthogonality in the vectors can result in spurious eigenvalues. Tests were incorporated in the parallel calculations which detected these spurious eigenvalues. The parallel eigenvalue algorithm demonstrates that significant speedups in calculation time can be realized over traditional sequential methods. 相似文献

13.

基于SHARC DSP芯片的并行加速板性能研究

高曙孙元龙高洁《计算机工程》2003,29(1):23-25

分析了基于SHARC DSP芯片的并行加速板的组成、结构特点、工作原理；分别以著名的分形问题Mandelbrot Set和一个非线性瞬态热传导方程的多重网格并行算法的实现为例，对这种并行加速板的性能进行了研究；在带有这种并行加速板的多种计算机平台上测试了这两种并行算法的运行结果，表明这种加速板适用面广、性能稳定、功能强大、使用方便、运算速度快，具有很好的应用前景。相似文献

14.

A multiprocessor implementation of joyce

Per Brinch Hansen 《Software》1989,19(6):579-592

Joyce is a programming language for parallel computers based on CSP and Pascal. A Joyce program defines concurrent agents which communicate through unbuffered channels. This paper describes a multiprocessor implementation of Joyce. 相似文献

15.

Asynchronous migration for parallel genetic programming on a computer cluster with multi-core processors

Shingo Kurose Kunihito Yamamori Masaru Aikawa Ikuo Yoshihara 《Artificial Life and Robotics》2012,16(4):533-536

An island model is a typical implementation of genetic programming on parallel computers with distributed memory. The island model has a migration facility that sends/receives some individuals in an island to/from another island to maintain diversity. The island model requires synchronization to migrate same-generation individuals between islands, and this synchronization causes an increase in computation time. This article proposes a new parallel genetic programming implementation based on the island model with asynchronous migration. Most recent computers are equipped with one or more multi-core processors, and are suitable for multi-threading. Therefore we employ a communication thread for migration between islands. The communication thread on a processor communicates with the communication thread on another processor to migrate individuals at appropriate intervals. Since the migration and other genetic operations can be independently processed on each core, and since we allow the exchange of individuals of different generations, no synchronization is needed in our implementation. In addition, a fitness calculation is also executed in parallel by the remaining cores. Experimental results show that the proposed method can reduce the computation time to about 17% in serial GP by using 40 threads. 相似文献

16.

An Efficient Parallel Algorithm to Solve Block–Toeplitz Systems

P.?Alonso Email author J.?M.?Badía A.?M.?Vidal 《The Journal of supercomputing》2005,32(3):251-278

In this paper, we present an efficient parallel algorithm to solve Toeplitz–block and block–Toeplitz systems in distributed memory multicomputers. This algorithm parallelizes the Generalized Schur Algorithm to obtain the semi-normal equations. Our parallel implementation reduces the communication cost and optimizes the memory access. The experimental analysis on a cluster of personal computers shows the scalability of the implementation. The algorithm is portable because it is based on standard tools and libraries, such as ScaLAPACK and MPI. 相似文献

17.

Parallel filtering and smoothing algorithms

McReynolds S. 《Automatic Control, IEEE Transactions on》1974,19(5):556-561

This paper develops algorithms for filtering and smoothing for parallel computers. Numerical results are presented and implementation details are discussed. In the example it is illustrated that parallel methods have better convergence properties than nonparallel methods for nonlinear problems. 相似文献

18.

并行油藏模拟软件的实现及在国产高性能计算机上的应用 总被引：5，自引：0，他引：5

曹建文潘峰姚继锋孙家昶赵国忠《计算机研究与发展》2002,39(8):973-980

主要介绍了百万网格点规模的精细油藏数值模拟在国产高性能并行计算机与微机机群系统上的应用情况 .针对若干组来自于国内油田的百万网格点实际数据 ,给出了在多种国产并行机环境下的运行结果 ,并作了分析与评价 .在此基础上 ,讨论并行油藏数值模拟软件高效实现过程中遇到的关键技术 ,探讨大型软件并行化过程中经常遇到的瓶颈问题及改进方案相似文献

19.

基于并行处理的一种语音生成工具之设计与实现

唐棠陆兵《计算机研究与发展》1994,31(8):46-50

本文介绍了一种基于并行处理的语音生成工具之设计实现，该工具可用来支持多媒体技术，各种有声软件，以及语音库的运行，其工作环境为ＩＢＭ－ＰＣ系列微型计算机及其兼容机。相似文献

20.

一种基于PVM的并行BP神经网络

王京辉乔卫民袁红辉《计算机工程》2005,31(12):178-180

通过分析单计算机的BP神经网络学习和执行算法,提出了使用PVM构造多计算机的并行神经网络。该并行神经网络的实现灵活应用于高可靠性和大规模数据的分析和处理中,同时,该BP神经网络的并行设计与实现,可广泛应用于其他神经网络模型的并行计算机实现。相似文献