共查询到20条相似文献,搜索用时 171 毫秒
1.
由高档微机或RISC工作站通过高速局域网连接呵成的集群系统的实现,使高性能计算机从研究与应用领域走进普通领域。该文介绍了如何在Linux操作系统下基于分布式存储结构构造一个由普通微机组成的Beowulf并行计算系统的方法。通过编制的并行计算算法对该Beowulf系统进行了并行效率的实际测试,测试结果表明该Beowulf系统具有很高的并行计算效率和并行加速比。 相似文献
2.
雷瑞林 《电脑编程技巧与维护》2009,(2):5-5
计算机集群以其较高的性价比、很好的可扩展性被广泛应用于各种计算密集的任务中。在面向话题的文本检索研究中,由于所需处理的文本量巨大,使用IMB公司的BladeCenter JS21计算机集群,采用SPMD(Single ProgramMultiple Data)并行算法模式实现了数据的并行处理。本文主要介绍有关混合编程语言程序设计、编程技巧以及调试等方面的经验。 相似文献
3.
4.
5.
并行程序设计环境MPICH的应用机理分析 总被引:5,自引:0,他引:5
阐述了在PC集群中运用消息传递接口MPICH(Message Passing Interface and Chameleon)进行并行程序设计的基本方法,并以计算圆周率π值的并行算法程序为例,介绍了MPICH中基本例程的功能和调用方法,并行程序设计的关键技术是如何处理好各个进程之间的通信问题,MPICH采用紧迫协议和约定协议来协调各个进程之间的通信,同时也提供了一些阻塞处理函数和非阻塞处理函数,它们能够使进程充分利用系统资源,大大增加用户编程的灵活性。 相似文献
6.
7.
8.
在以MPICH技术构建的局域网集群系统下,利用分子动力学并行计算软件Protomol和三维分子模拟软件VMD构建大规模并行计算平台,完成若干复杂分子动力学典型实例的仿真运算。计算结果表明:采用并行计算能持续有效地利用现有计算机资源,同时大幅度提高计算效率,在现有并行集群系统下可以获得3倍以上的加速比,为实现复杂分子动力学的深入研究提供了可行方案。 相似文献
9.
用于并行计算的PC集群系统构建* 总被引:2,自引:0,他引:2
在注射成形模拟研究过程中,涉及材料的牛顿和非牛顿黏性流动模拟和注射成形后期的冷却过程模拟,以及随时间变化各处的压力变化等科学和工程领域经常应用大规模科学计算。随着基于网格的计算和数据处理日益复杂,很多计算一般PC系统无法满足要求,需要超级计算环境。因为不断追求更高的计算精度和日益复杂的对象而扩大计算规模,传统的串行处理方式难以满足这些要求。因此,现代高性能计算的低成本、高效率成为选择并行计算的解决方式。重点阐述如何构建一个用于并行计算的PC集群系统,结合实例阐明MPI的实现方法,以及对PC集群系统进行了性 相似文献
10.
介绍了上位计算机与三菱FX系列可编程控制器实现通信的方法,并给出了利用Visual Basic的通信控制件编写的Windows环境下上位机通信程序。 相似文献
11.
《国际计算机数学杂志》2012,89(8):991-999
Large-scale parallelized distributed computing has been implemented in the message passing interface (MPI) environment to solve numerically, eight reaction-diffusion equations representing the anatomy and treatment of breast cancer. The numerical algorithm is perturbed functional iterations (PFI) which is completely matrix-free. Fully distributed computations with multiple processors have been implemented on a large scale in the serial PFI-code in the MPI environment. The technique of implementation is general and can be applied to any serial code. This has been validated by comparing the computed results from the serial code and those from the MPI-version of the parallel code. 相似文献
12.
13.
像其它许多领域一样,时间偏移机制在并行计算中也得到了充分的应用。实际上,并行计算并不能真正做到让各处理机都完全无时差地实现"并行"运算。由于各任务间存在数据依赖性,使得一些处理机不得不处于间歇等待状态,直至数据到达为止。通过一个典型的并行算法实例对时间偏移机制的作用过程作了详解,直观地描述了实现并行计算的实质,以便为用户在理解并行行为和设计并行程序时提供一些参考。 相似文献
14.
Athanasios I. Margaris 《International journal of parallel programming》2009,37(2):195-222
The objective of this paper is the review of the log file formats that allow the performance visualization of parallel applications
based on the usage of message passing interface (MPI) standard. These file formats have been designed by the LANS (Laboratory
for Advanced Numerical Software) group of the Argonne National Laboratory and they are distributed together with the corresponding
viewers as part of the MPE (multipurpose environment) library of the MPICH implementation of the MPI. The formats studied
in this paper is the ALOG, CLOG, SLOG1 and SLOG2 file formats—the formats are studied in chronological order and the main
features of their structures are presented. 相似文献
15.
As supercomputers scale to 1000 PFlop/s over the next decade, investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices for high-performance computing (HPC) hardware/software co-design is crucial. This paper summarizes recent efforts in designing and implementing a novel HPC hardware/software co-design toolkit. The presented Extreme-scale Simulator (xSim) permits running an HPC application in a controlled environment with millions of concurrent execution threads while observing its performance in a simulated extreme-scale HPC system using architectural models and virtual timing. This paper demonstrates the capabilities and usefulness of the xSim performance investigation toolkit, such as its scalability to 227 simulated Message Passing Interface (MPI) ranks on 960 real processor cores, the capability to evaluate the performance of different MPI collective communication algorithms, and the ability to evaluate the performance of a basic Monte Carlo application with different architectural parameters. 相似文献
16.
ScaLapack是一个并行计算软件包,适用于分布存储的MIMD并行机。ScaLapack提供若干线性代数求解功能,具有高效,可 移植,可伸缩,高可靠性的优点。 相似文献
17.
18.
The implementation and performance of the multidimensional Fast Fourier Transform (FFT) on a distributed memory Beowulf cluster is examined. We focus on the three-dimensional (3D) real transform, an essential computational component of Galerkin and pseudo-spectral codes. The approach studied is a 1D domain decomposition algorithm that relies on communication-intensive transpose operation involving P processors. Communication is based upon the standard portable message passing interface (MPI). We show that 1/P scaling for execution time at fixed problem size N3 (i.e., linear speedup) can be obtained provided that (1) the transpose algorithm is optimized for simultaneous block communication by all processors; and (2) communication is arranged for non-overlapping pairwise communication between processors, thus eliminating blocking when standard fast ethernet interconnects are employed. This method provides the basis for implementation of scalable and efficient spectral method computations of hydrodynamic and magneto-hydrodynamic turbulence on Beowulf clusters assembled from standard commodity components. An example is presented using a 3D passive scalar code. 相似文献
19.
Mohammad R. Hajihashemi Magda El-ShenaweeAuthor Vitae 《Journal of Parallel and Distributed Computing》2010
A parallelized version of the level-set algorithm based on the MPI technique is presented. TM-polarized plane waves are used to illuminate two-dimensional perfect electric conducting targets. A variety of performance measures such as the efficiency, the load balance, the weak scaling, and the communication/computation times are discussed. For electromagnetic inverse scattering problems, retrieving the target’s arbitrary shape and location in real time is considered as a main goal, even as a trade-off with algorithm efficiency. For the three cases considered here, a maximum speedup of 53X-84X is achieved when using 256 processors. However, the overall efficiency of the parallelized level-set algorithm is 21%–33% when using 256 processors and 26%–52% when using 128 processors. The effects of the bottlenecks of the level-set algorithm on the algorithm efficiency are discussed. 相似文献