期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

祝永志王国仁《计算机工程》2006,32(11):242-244

由高档微机或RISC工作站通过高速局域网连接呵成的集群系统的实现，使高性能计算机从研究与应用领域走进普通领域。该文介绍了如何在Linux操作系统下基于分布式存储结构构造一个由普通微机组成的Beowulf并行计算系统的方法。通过编制的并行计算算法对该Beowulf系统进行了并行效率的实际测试，测试结果表明该Beowulf系统具有很高的并行计算效率和并行加速比。相似文献

2.

关于电脑集群系统的混合编程的研究

雷瑞林《电脑编程技巧与维护》2009,(2):5-5

计算机集群以其较高的性价比、很好的可扩展性被广泛应用于各种计算密集的任务中。在面向话题的文本检索研究中,由于所需处理的文本量巨大,使用IMB公司的BladeCenter JS21计算机集群,采用SPMD（Single ProgramMultiple Data）并行算法模式实现了数据的并行处理。本文主要介绍有关混合编程语言程序设计、编程技巧以及调试等方面的经验。相似文献

3.

分子动力学在小型SMP集群中的并行计算

王亮白明泽羊金花《计算机应用》2010,30(Z1)

为提高分子动力学的模拟运算效率,在Linux环境下以MPICH技术构建的小型SMP集群系统上,对模拟体系采用改进的原子分解算法进行并行计算.将要模拟的原子平均分配给各个计算节点进行计算,节点间通过MPI进行通信.对进程间的接收和发送进行捆绑操作并采取非阻塞通信取代原有的阻塞通信,从而避免了死锁情况.实验结果表明:优化后的并行算法可以有效地利用计算机资源,提高运算效率,解决了实际测试中出现的死锁问题,在该集群系统上获得3倍以上的加速比. 相似文献

4.

基于Linux的Beowulf集群的实现 总被引：8，自引：4，他引：8

李贵明俞国扬罗家融《计算机工程》2003,29(11):49-51

中国科学院等离子体物理研究所是国内从事核聚变研究和托卡马克(Tokamak)实验的主要基地,由于实验研究的需要,有必要建立自己的大型并行计算机系统,进行托卡马克实验的数值模拟等研究课题。文章介绍了在普通PC机上利用Linux操作系统实现基于分布存储结构的Beowul集群系统的方法,同时也介绍了基于消息传递模型(Message Passing Interface,MPI)的并行程序设计方法。相似文献

5.

并行程序设计环境MPICH的应用机理分析 总被引：5，自引：0，他引：5

王文义刘辉《计算机应用》2002,22(4):1-3

阐述了在PC集群中运用消息传递接口MPICH（Message Passing Interface and Chameleon）进行并行程序设计的基本方法，并以计算圆周率π值的并行算法程序为例，介绍了MPICH中基本例程的功能和调用方法，并行程序设计的关键技术是如何处理好各个进程之间的通信问题，MPICH采用紧迫协议和约定协议来协调各个进程之间的通信，同时也提供了一些阻塞处理函数和非阻塞处理函数，它们能够使进程充分利用系统资源，大大增加用户编程的灵活性。相似文献

6.

机群环境下并行蒙特卡罗方法的研究与应用

王文凡张志鸿申杰《微计算机信息》2007,23(31):270-272

在分布式存储结构的机群系统上，采用可移植消息传递接口MPI与C语言绑定，设计并实现了并行蒙特卡罗算法．有效解决了计算量大、串行算法执行时间长的问题。通过对机群节点间通信时间开销的研究分析．采用主从式编程模型改进并行蒙特卡罗算法，实现了负载平衡，提高了机群处理器的利用率，进一步缩短了执行时间。相似文献

7.

MPJ并行编程框架的实现及安装配置

刘俊莉林晓锐王楚斌谭子义司徒祝坤《计算机与现代化》2009,(8):164-168

MPJ编程接口为Java应用程序提供类MPI消息传递应用.本文阐述MPJ并行编程框架的设计、实现,探讨其体系架构、实现机制及相关的技术特征,详细描述MPJ Express的安装配置过程,最后给出MPJ程序的运行例子. 相似文献

8.

在MPICH集群分布系统下复杂分子动力学的并行计算

李佳刘信安《计算机与应用化学》2005,22(11):963-966

在以MPICH技术构建的局域网集群系统下,利用分子动力学并行计算软件Protomol和三维分子模拟软件VMD构建大规模并行计算平台,完成若干复杂分子动力学典型实例的仿真运算。计算结果表明：采用并行计算能持续有效地利用现有计算机资源,同时大幅度提高计算效率,在现有并行集群系统下可以获得3倍以上的加速比,为实现复杂分子动力学的深入研究提供了可行方案。相似文献

9.

用于并行计算的PC集群系统构建* 总被引：2，自引：0，他引：2

李敏张宜生李德群《计算机应用研究》2009,26(3):1042-1043

在注射成形模拟研究过程中,涉及材料的牛顿和非牛顿黏性流动模拟和注射成形后期的冷却过程模拟,以及随时间变化各处的压力变化等科学和工程领域经常应用大规模科学计算。随着基于网格的计算和数据处理日益复杂,很多计算一般PC系统无法满足要求,需要超级计算环境。因为不断追求更高的计算精度和日益复杂的对象而扩大计算规模,传统的串行处理方式难以满足这些要求。因此,现代高性能计算的低成本、高效率成为选择并行计算的解决方式。重点阐述如何构建一个用于并行计算的PC集群系统,结合实例阐明MPI的实现方法,以及对PC集群系统进行了性相似文献

10.

在Windows环境下实现计算机与可编程控制器的通信

张凯《新浪潮．学网络》1997,(10):18-20,24

介绍了上位计算机与三菱ＦＸ系列可编程控制器实现通信的方法，并给出了利用ＶｉｓｕａｌＢａｓｉｃ的通信控制件编写的Ｗｉｎｄｏｗｓ环境下上位机通信程序。相似文献

11.

MPI-implementation of PFI-code for numerical modeling of the anatomy of breast cancer

《国际计算机数学杂志》2012,89(8):991-999

Large-scale parallelized distributed computing has been implemented in the message passing interface (MPI) environment to solve numerically, eight reaction-diffusion equations representing the anatomy and treatment of breast cancer. The numerical algorithm is perturbed functional iterations (PFI) which is completely matrix-free. Fully distributed computations with multiple processors have been implemented on a large scale in the serial PFI-code in the MPI environment. The technique of implementation is general and can be applied to any serial code. This has been validated by comparing the computed results from the serial code and those from the MPI-version of the parallel code. 相似文献

12.

大规模并行计算机系统并行性能模拟技术研究 总被引：2，自引：0，他引：2

徐传福车永刚王正华《计算机科学》2009,36(9):7-10

性能模拟技术是计算机系统性能评价的重要手段.介绍了面向大规模并行计算机系统以及消息传递应用程序的并行性能模拟技术,总结了相关的关键技术和国内外研究现状.对几个代表性的并行模拟器系统进行了详细介绍.结合并行计算机系统和应用的发展趋势,讨论了未来并行模拟器设计、实现面临的问题和可能的解决方案. 相似文献

13.

并行系统中时间偏移机制的典型应用算法分析

王文义梁福广《计算机科学》2012,39(2):311-313

像其它许多领域一样,时间偏移机制在并行计算中也得到了充分的应用。实际上,并行计算并不能真正做到让各处理机都完全无时差地实现"并行"运算。由于各任务间存在数据依赖性,使得一些处理机不得不处于间歇等待状态,直至数据到达为止。通过一个典型的并行算法实例对时间偏移机制的作用过程作了详解,直观地描述了实现并行计算的实质,以便为用户在理解并行行为和设计并行程序时提供一些参考。相似文献

14.

Log File Formats for Parallel Applications: A Review

Athanasios I. Margaris 《International journal of parallel programming》2009,37(2):195-222

The objective of this paper is the review of the log file formats that allow the performance visualization of parallel applications based on the usage of message passing interface (MPI) standard. These file formats have been designed by the LANS (Laboratory for Advanced Numerical Software) group of the Argonne National Laboratory and they are distributed together with the corresponding viewers as part of the MPE (multipurpose environment) library of the MPICH implementation of the MPI. The formats studied in this paper is the ALOG, CLOG, SLOG1 and SLOG2 file formats—the formats are studied in chronological order and the main features of their structures are presented. 相似文献

15.

Scaling to a million cores and beyond: Using light-weight simulation to understand the challenges ahead on the road to exascale

《Future Generation Computer Systems》2014

As supercomputers scale to 1000 PFlop/s over the next decade, investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices for high-performance computing (HPC) hardware/software co-design is crucial. This paper summarizes recent efforts in designing and implementing a novel HPC hardware/software co-design toolkit. The presented Extreme-scale Simulator (xSim) permits running an HPC application in a controlled environment with millions of concurrent execution threads while observing its performance in a simulated extreme-scale HPC system using architectural models and virtual timing. This paper demonstrates the capabilities and usefulness of the xSim performance investigation toolkit, such as its scalability to 2²⁷ simulated Message Passing Interface (MPI) ranks on 960 real processor cores, the capability to evaluate the performance of different MPI collective communication algorithms, and the ability to evaluate the performance of a basic Monte Carlo application with different architectural parameters. 相似文献

16.

ScaLapack的结构、功能和数据布局

秦忠国姜弘道《计算机工程》1998,24(3):21-22,40

ＳｃａＬａｐａｃｋ是一个并行计算软件包，适用于分布存储的ＭＩＭＤ并行机。ＳｃａＬａｐａｃｋ提供若干线性代数求解功能，具有高效，可移植，可伸缩，高可靠性的优点。相似文献

17.

Extending τ-Lop to model concurrent MPI communications in multicore clusters

《Future Generation Computer Systems》2016

相似文献

18.

Scalable parallel FFT for spectral simulations on a Beowulf cluster

P. Dmitruk L. -P. Wang W. H. Matthaeus R. Zhang D. Seckel 《Parallel Computing》2001,27(14):1921-1936

The implementation and performance of the multidimensional Fast Fourier Transform (FFT) on a distributed memory Beowulf cluster is examined. We focus on the three-dimensional (3D) real transform, an essential computational component of Galerkin and pseudo-spectral codes. The approach studied is a 1D domain decomposition algorithm that relies on communication-intensive transpose operation involving P processors. Communication is based upon the standard portable message passing interface (MPI). We show that 1/P scaling for execution time at fixed problem size N³ (i.e., linear speedup) can be obtained provided that (1) the transpose algorithm is optimized for simultaneous block communication by all processors; and (2) communication is arranged for non-overlapping pairwise communication between processors, thus eliminating blocking when standard fast ethernet interconnects are employed. This method provides the basis for implementation of scalable and efficient spectral method computations of hydrodynamic and magneto-hydrodynamic turbulence on Beowulf clusters assembled from standard commodity components. An example is presented using a 3D passive scalar code. 相似文献

19.

High performance computing for the level-set reconstruction algorithm

Mohammad R. Hajihashemi Magda El-ShenaweeAuthor Vitae 《Journal of Parallel and Distributed Computing》2010

A parallelized version of the level-set algorithm based on the MPI technique is presented. TM-polarized plane waves are used to illuminate two-dimensional perfect electric conducting targets. A variety of performance measures such as the efficiency, the load balance, the weak scaling, and the communication/computation times are discussed. For electromagnetic inverse scattering problems, retrieving the target’s arbitrary shape and location in real time is considered as a main goal, even as a trade-off with algorithm efficiency. For the three cases considered here, a maximum speedup of 53X-84X is achieved when using 256 processors. However, the overall efficiency of the parallelized level-set algorithm is 21%–33% when using 256 processors and 26%–52% when using 128 processors. The effects of the bottlenecks of the level-set algorithm on the algorithm efficiency are discussed. 相似文献

20.

一种基于MPI的并行体绘制算法 总被引：5，自引：0，他引：5

梁峰鲁强曾绍群《计算机工程》2005,31(13):171-173

介绍了基于MPI并行程序开发平台实现的一种三维重建并行处理算法。算法采用了Master-Slave并行计算模型，针对射线投射方法的特点，为减少运算时间，选择对图像空间进行任务划分的策略，并用任务池方法实现了动态负载平衡。通过对虚拟中国人女性一号(VCH-FI)的头部和脚部数据集的重建，表明该算法在任务规模和节点规模上具有较好的可扩展性。相似文献