期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

贾明飞董渭清黄泳翔侯宗浩《计算机工程与应用》2003,39(14):126-129

针对当前存在的大量非结构化MPI程序,该文提出一种在MPI程序中实现点对点通信原语到集合通信原语转换的方法,其基本思路是:分析非结构化消息传递并行代码的内部结构,建立Diophantine不等式系统,然后用Omega库运算得到点对点通信代码段的通信模式集,再辅以数据交换分析确定对应的集合通信原语并替换。相似文献

2.

点对点通信原语并行转换方法仿真研究

杨浩王越男《计算机仿真》2020,37(4):173-177

针对传统点对点通信原语并行转换方法无法集中分析内部数据结构,导致整体转换效果较差的问题,提出一种MPI程序下点对点通信原语并行转换方法。分析当前原语代码数据结构,完成对应结构化操作,基于并行解码的上行数据交换数据,根据数据理论分析获取数据节点冲突概率,引入高密度MDSCAN聚类算法实现符号的数据簇分类,利用Omega数据库的运算通信模式转换原语,实现通信原语转换为原语数据集。实验结果表明,研究方法的原语数据集抗压比和数据贴合度明显提高,数据显著性更好,转换效果更理想。相似文献

3.

MPI网络通信模型的数值应用 总被引：3，自引：0，他引：3

曹骥袁勇《计算机工程》2003,29(16):13-15

讨论并行支撑环境MPI的并行通信性能模型，测试了点对点和组通信下的若干性能指标，归纳出这些性能指标的统计模型，以作为工程问题并行计算可行性和可扩充性评价的基础。相似文献

4.

基于神经网络和信息检索的源代码注释生成

沈鑫周宇《计算机系统应用》2023,32(7):1-10

源代码注释生成旨在为源代码生成精确的自然语言注释,帮助开发者更好地理解和维护源代码.传统的研究方法利用信息检索技术来生成源代码摘要,从初始源代码选择相应的词或者改写相似代码段的摘要;最近的研究采用机器翻译的方法,选择编码器-解码器的神经网络模型生成代码段的摘要.现有的注释生成方法主要存在两个问题:一方面,基于神经网络的方法对于代码段中出现的高频词更加友好,但是往往会弱化低频词的处理;另一方面,编程语言是高度结构化的,所以不能简单地将源代码作为序列化文本处理,容易造成上下文结构信息丢失.因此,本文为了解决低频词问题提出了基于检索的神经机器翻译方法,使用训练集中检索到的相似代码段来增强神经网络模型;为了学习代码段的结构化语义信息,本文提出结构化引导的Transformer,该模型通过注意力机制将代码结构信息进行编码.经过实验,结果证明该模型在低频词和结构化语义的处理上对比当下前沿的代码注释生成的深度学习模型具有显著的优势. 相似文献

5.

MPI集合通信剖析技术的研究

崔奇谷建华《微机发展》2013,(10)

将MPI(Message Passing Interface)进程拓扑有效地映射到处理器拓扑上有助于提高MPI程序的通信性能。目前大部分的MPI进程映射只考虑点对点通信，很少考虑到集合通信，原因是获取集合通信的进程拓扑是比较困难的。目前大部分剖析(profiling)工具在剖析集合通信时只考虑了函数的接口语义，而忽视了实现语义，导致这些工具不能正确地获取集合通信进程之间的详细通信情况。本文提出了一套剖析算法，可以准确地计算出参与集合通信的每对进程之间的通信量，并以通信矩阵的形式给出进程拓扑。实验证明了剖析算法的正确性，并且通过这种剖析方法获取的进程拓扑能够提升进程到处理器核的映射实验效果。相似文献

6.

基于重排序变换和循环分布的通信优化算法

陈达智赵荣彩韩林丁锐赵捷《计算机科学》2012,39(9):296-301

针对现有通信优化算法无法使MPI自动并行化编译器生成加速比理想的消息传递程序问题,提出了一种基于重排序变换和循环分布的通信优化算法。该算法根据给出的过程间副作用集合和基于mpi_wait/mpi_irecv移动的重排序变换规则,有序地采用重排序变换和循环分布,尽可能安全地扩大点到点非阻塞通信中通信与计算的重叠窗口,使MPI自动并行化编译器生成具有更多计算重叠通信的消息传递代码。实验结果表明,该算法能够隐藏更多的点到点非阻塞通信开销,并且明显提升消息传递程序的加速比。相似文献

7.

一种优化MPI程序性能的改进方法

柯鹏聂鑫《现代计算机》2011,(18):3-6

在分布式存储系统上,MPI已被证实是理想的并行程序设计模型。MPI是基于消息传递的并行编程模型,进程间的通信是通过调用库函数来实现的,因此MPI并行程序中,通信部分代码的效率对该并行程序的性能有直接的影响。通过用集群通信函数替代点对点通信函数以及通过派生数据类型和建立新通信域这两种方式,两次改进DNS的MPI并行程序实现,并通过实验给出一个优化MPI并行程序的一般思路与方法。相似文献

8.

MPI非阻塞广播算法及性能研究

严忻恺郝子宇吴东谢向辉《计算机工程与科学》2013,35(9):20

MPI的3.0版新增了非阻塞集合通信.非阻塞集合通信兼顾非阻塞和集合通信的特点,与阻塞集合通信相比具有更低的同步开销,能够实现更多的计算通信重叠,带来性能提升.以广播为例详细介绍了广播通信的不同算法实现,比较了非阻塞与阻塞广播底层控制管理方法并进行了实验分析,提出了实现改进方法. 相似文献

9.

并行程序开发平台的可视化实现* 总被引：3，自引：0，他引：3

张信一李代平罗伟刚《计算机应用研究》2004,21(11):266-269

并行程序可视化平台的实现有利于网络并行计算的发展,基于WPVM 3.4平台,构建并实现了一个网络并行可视化平台,它由任务描述器、通信代码生成器、代码插入器等主要模块组成。主要讲述了该平台的可视化实现部分,阐述了如何将用户前台的设置按照规则转变为后台的PVM原语代码自动插入,帮助并行程序开发人员从复杂的并行通信的烦琐性和并行系统的底层运作中解放出来。相似文献

10.

串行程序在大粒度级的并行分解及可并行执行包的形成 总被引：1，自引：0，他引：1

罗昕于月芬罗静敏《小型微型计算机系统》1996,(8):35-40

本文提出了针对由划分阶段所形成的任务图［７］进行优化、合并的技术及相应的算法，用于在并行与通信开销间进行折衷，以使分解出的并行成份有尽可能高的执行效率。本文还给出了根据综合后的任务图形成可并行执行包，并在其中自动插入通信原语的方法。相似文献

11.

Parallel program analysis and restructuring by detection of point-to-point interaction patterns and their transformation into collective communication constructs 总被引：1，自引：0，他引：1

Beniamino Di Martino Antonino Mazzeo Nicola Mazzocca Umberto Villano 《Science of Computer Programming》2001,40(2-3):235-263

This paper deals with a technique that can support the re-engineering of parallel programs based on point-to-point communication primitives by detecting typical process interaction patterns in the code. Pattern detection is performed by the static analysis of the parallel program and by solving Diophantine sets of inequalities. The objective is to determine process interactions and to classify them into a set of commonly occurring interaction patterns.

Information on the patterns contained in the program, besides being useful for code comprehension and documentation, makes it possible to obtain more structured and, possibly, efficient versions of the same programs through the use of collective communication constructs. These are primitives for collective data movement or computation often available in current message-passing programming environments.

After the presentation of the basic program analysis technique, several examples involving the detection of common communication patterns are shown. Then the structure of PPAR, a prototype tool that allows the analysis of parallel programs written in Fortran 77 with calls to PVM or MPI unstructured communication primitives is outlined, and conclusions are drawn. 相似文献

12.

Scalability analysis of parallel Particle-In-Cell codes on computational grids

WeiFeng Tao DongSheng Cai Nishikawa Ken-ichi 《Computer Physics Communications》2008,179(12):855-864

We have performed benchmarks of two three-dimensional parallel Particle-In-Cell (PIC) codes that are similar but have quite different communication patterns on different computational Grids. An electrostatic code with only electrons based on the three-dimensional skeleton PIC code employs the FFT Poisson solver that uses collective communication patterns. Another is the TRISTAN (TRI-dimensional STATNford) code parallelized with MPI, an electromagnetic full particle code, which uses a field solver that only requires point-to-point neighbor communication patterns. We present the mpptest benchmarks on cluster-based computational Grids, where both the basic point-to-point communication patterns and the basic collective communication patterns used in these PIC codes are tested. The results of these benchmarks clearly allow us to quantify and understand the scalability of both communication patterns on the Grids. The present results show that the parallelized TRISTAN code (without all-to-all collective communication) is more scalable than the parallelized skeleton PIC code (with all-to-all collective communication), in cluster-based computational Grid systems where communication performances is poor. 相似文献

13.

Towards Scalable Java HPC with Hybrid and Native Communication Devices in MPJ Express

Ansar Javed Bibrak Qamar Mohsan Jameel Aamir Shafi Bryan Carpenter 《International journal of parallel programming》2016,44(6):1142-1172

MPJ Express is a messaging system that allows application developers to parallelize their compute-intensive sequential Java codes on High Performance Computing clusters and multicore processors. In this paper, we extend MPJ Express software to provide two new communication devices. The first device—called hybrid—enables MPJ Express to exploit hybrid parallelism on cluster of multicore processors by sitting on top of existing shared memory and network communication devices. The second device—called native—uses JNI wrappers in interfacing MPJ Express to native MPI implementations like MPICH and Open MPI. We evaluate performance of these devices on a range of interconnects including 1G/10G Ethernet, 10G Myrinet and 40G InfiniBand. In addition, we analyze and evaluate the cost of MPJ Express buffering layer and compare it with the performance numbers of other Java MPI libraries. Our performance evaluation reveals that the native device allows MPJ Express to achieve comparable performance to native MPI libraries—for latency and bandwidth of point-to-point and collective communications—which is a significant gain in performance compared to existing communication devices. The hybrid communication device—without any modifications at application level—also helps parallel applications achieve better speedups and scalability by exploiting multicore architecture. Our performance evaluation quantifies the cost incurred by buffering and its impact on overall performance of software. We witnessed comparative performance as both new devices improve application performance and achieve upto 90 % of the theoretical bandwidth available without application rewriting effort—including NAS Parallel Benchmarks, point-to-point and collective communication. 相似文献

14.

基于Linux数据链路层MPI通信机制的设计与实现

王巍李旺《电子技术应用》2012,38(2):127-130

针对MPI集群通信的特点,通过分析当前网络的通信结构和MPI的点到点通信模式,提出了一种基于数据链路层的集群通信机制,用以减少协议开销和内存拷贝次数,从而提高集群节点间的通信性能,并且通过实验验证了该机制的可行性。相似文献

15.

A Framework for Generating Distributed-Memory Parallel Programs for Block Recursive Algorithms

S. K. S. Gupta C. -H. Huang P. Sadayappan R. W. Johnson 《Journal of Parallel and Distributed Computing》1996,34(2):137

A framework for synthesizing communication-efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and Strassen's matrix multiplication is presented. This framework is based on an algebraic representation of the algorithms, which involves the tensor (Kronecker) product and other matrix operations. This representation is useful in analyzing the communication implications of computation partitioning and data distributions. The programs are synthesized under two different target program models. These two models are based on different ways of managing the distribution of data for optimizing communication. The first model uses point-to-point interprocessor communication primitives, whereas the second model uses data redistribution primitives involving collective all-to-many communication. These two program models are shown to be suitable for different ranges of problem size. The methodology is illustrated by synthesizing communication-efficient programs for the FFT. This framework has been incorporated into the EXTENT system for automatic generation of parallel/vector programs for block recursive algorithms. 相似文献

16.

An adaptive extension library for improving collective communication operations

O. Hartmann M. Kühnemann T. Rauber G. Rünger 《Concurrency and Computation》2008,20(10):1173-1194

In this paper, we present an adaptive extension library that combines the advantage of using a portable MPI library with the ability to optimize the performance of specific collective communication operations. The extension library is built on top of MPI and can be used with any MPI library. Using the extension library, performance improvements can be achieved by an orthogonal organization of the processors in 2D or 3D meshes and by decomposing the collective communication operations into several consecutive phases of MPI communication. Additional point‐to‐point‐based algorithms are also provided. The extension library works in two steps, an a priori configuration phase detecting possible improvements for implementing collective communication for the MPI library used and an execution phase selecting a better implementation during execution time. This allows an adaptation of the performance of MPI programs to a specific execution platform and communication situation. The experimental evaluation shows that significant performance improvements can be obtained for different MPI libraries by using the library extension for collective MPI communication operations in isolation as well as in the context of application programs. Copyright © 2007 John Wiley & Sons, Ltd. 相似文献

17.

POM:一个MPI程序的进程优化映射工具

卢兴敬商磊陈莉《计算机工程与科学》2009,31(Z1)

现代超级计算机具有越来越多的计算结点,同时结点内具有多个处理器核。由于互联带宽的差异,结点间与结点内构成两个通信性能不同的通信层次,后者的通信性能好于前者。但是,目前MPI程序的默认进程映射未考虑该通信层次差异,无法利用结点内较好的通信带宽,严重束缚了超级计算机的性能发挥。针对该问题,本文设计实现了能利用层次通信差异的MPI程序自动进程优化映射工具POM,提供了高效、低开销获取MPI程序通信信息的方法,最终通过优化通信在通信层次上的分布提高了程序的通信效率,从而提高了应用程序的性能。本文解决了硬件平台通信层次的抽象、MPI程序通信信息的低开销获取与映射方案的计算三个问题。首先,按照通信能力差异将超级计算机结构抽象为高速互联的不同计算结点与相同结点上的多个处理器核两层。其次,提出了将集合通信转化成点到点通信的简单实现方法。最后,利用无向加权边图来表示MPI程序的进程间通信关系,将MPI程序的进程映射问题转化为图划分问题。在曙光5000A和曙光4000A上的实验结果表明,利用POM工具能够显著提高MPI程序的性能。相似文献