首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 140 毫秒
1.
Extrinsic是HPF中用来调用外部语言过程的机制。利用HPF Extrinsic机制可以实现多范例并行计算,文章首先给出p-HPF并行编译器中Extrinsic过程调用的支持方法,然后给出几种在分布内存的网络环境下,基于Extrinsic的并行应用模版,它们是并行算法库应用模版、协同应用模版、MPSD处理应用模版、异步I/O应用模版和流水线应用模版。并分析了它们的运行效率,给出了p-HPF实现方法。  相似文献   

2.
一种HPF编译系统的研究与实现*   总被引:9,自引:1,他引:8  
HPF(high performance Fortran)是一种典型的数据并行语言,HPF编译系统的实现是并行计算研究领域的一个难点.文章介绍了一个HPF编译系统的研究与实现情况,在对该系统的主要组成进行了简要介绍之后,着重讨论了系统实现中的若干关键技术,并列出了部分HPF源程序及其编译器生成的相应代码,最后给出了对该编译器的一些性能测试结果和有关问题的讨论.  相似文献   

3.
本文结合YH-F2系统的并行运算机制,分析了算术表达式的标量并行计算机方法,指出传统单带自动机编译算法在识别全局并行性的不足,提出了一种基于多带自动机的编译方法,对表达式的全局并行计算进行局部关联。  相似文献   

4.
Cluster环境下p—HPF编译器支持的并行计算范式   总被引:2,自引:0,他引:2  
p-HPF是研制的一个符合HPF(high performance Fortran)规范的并行编译系统,以HPF为核心实现多范式并行计算是开发大型并行应用系统的基础。首先论述了Cluster环境下的并行运行范式,包括farm parallel范式、流水线并行、流循环并行、基于数据并行和组合数据并行等,抽象分析了它们的性能,接着给出了利用p-HPF的外部过程机制、任务并行机制以以FORALL,INDEPENDENT DO等典型并行语句实现几种典型并行范式的方法,给出了实例程序,对实例进行了实际运行并对运行结果进行了分析。  相似文献   

5.
胡长军  张素琴  田金兰 《计算机学报》2003,26(12):1671-1677
多范例并行是大规模并行应用系统的本质特征.规范化描述并行应用系统,建立性能估算模型对于提高多范例并行应用系统的开发效率和运行效率具有重要意义.该文提出了一种基于模块及其组合关系的描述方法和系统执行代价计算模型,它不仅能描述并行应用系统的多范例特征,而且将不同并行范例模块的组合时产生的代价引入模型.考虑的代价包括并行执行模式的转换、数据分布方式的转换以及编程范例的转换等,从而使模型更为准确.给出了描述和代价估算的应用实例,说明了规范化描述和代价估算对于确定并行策略的重要性以及模型的精确性.  相似文献   

6.
陈江  赵永华  迟学斌 《计算机工程》2005,31(22):58-60,94
COUPL+是一种基于消息传递模型的并行库,它将并行程序巾需要处理的数据划分、消息传递函数的调用等都封装在其函数中。COUPL+可以简化在分布式存储结构并行机上编写基于网格的应用程序的任务。该文简要介绍了COUPL+的基本原理,以及它与MPI、OpenMP和HPF的特性对比;并且使用COUPL+实现了共轭梯度法和结构化网格计算两种并行计算中常用的任务,也对比了使用MPI和HPF的性能差异。  相似文献   

7.
并行构件技术作为并行软件工程的主要内容之一,对提高并行计算软件的生产率具有重要意义,也是并行软件工程的重要研究内容。并行构件技术研究现状包含并行构件模型、并行构件体系结构规范、并行构件框架3个方面。并行构件、接口、框架的定义和实现是区分不同并行构件体系结构规范的主要标准。如何在实现并行构件复用性的同时提高并行构件组成的应用系统性能,是这些技术发展的共同思想。单构件多数据、多构件多数据编程模型、多语言互操作技术、并行远程方法调用、MxN问题的解决方法、不同模型构件间的互操作都是并行构件领域的研究成果。并行构件技术的发展方向是开展对多语言互操作、性能预测、自适应构件、构件模型的互操作、多核硬件上的并行构件体系结构技术的研究。  相似文献   

8.
LS SIMD C编译器的数据通信优化算法   总被引:1,自引:1,他引:0  
1 引言当前理想的程序自动并行化系统的实现存在许多难于解决的问题,因此较为流行的并行计算方法是利用并行语言编写并行程序,编译器对并行程序进行编译生成相应的节点程序执行。并行语言按并行执行的粒度分为基于任务的并行语言(主要面向一般应用领域的计算)和数据并行语言(主要应用于科学数值计算),典型的数据并行语言如HPF。对于数据并行语言而言,程序执行的并行性已由程序设计人员根据程序中的数据相关性给出。因此,如何确定数据的分布、优化数据的通信是影响并行程序执行效率的重要问题。数据分布大致可以分为两个阶段:首先对源程序中数据的相关性分析得到数据在抽象处理机上的分布,然后将抽象处理机上的数据分布映射到物理处理机上。数据分布的确定通常有以下几种实现方式:一种是由程序员给出抽象数据分布,编译  相似文献   

9.
并行遗传算法骨架的研究和实现   总被引:1,自引:0,他引:1  
通过对并行遗传算法的4种并行模型和基于骨架的编程模型的对比研究,设计并实现了一个并行遗传算法骨架,用以简化并行遗传算法应用程序的开发过程.透明的并行机制,使得用户只需编写个体适应度函数的顺序程序,再调用该算法骨架就可以完成并行遗传算法程序开发;开放的算法骨架结构,可以吸收遗传算法研究领域众多优秀成熟的改进算法;多种编码方式的支持为用户提供的更自由的选择空间.该算法骨架通过调用现有的结构骨架实现具体的并行,从而与并行计算平台相独立,具有很高的重用性和灵活性.  相似文献   

10.
随着计算机硬件的发展,多核并行计算在计算机软件及应用领域的出现率也越来越频繁。目前的多核编程模型采用线程级并行模型,现有的多线程并行编程模型主要有线程库、指令模型和任务式模型三种。提出一种与MPI并行编程模型相似的基于通信的方法在Win32平台上来实现并行编程,在此基础上实现MTI并行编程模型。通过若干典型的测试给出使用MTI进行并行编程的执行结果,结果表明MTI是有效、易用的。  相似文献   

11.
Clusters of SMPs are hybrid-parallel architectures that combine the main concepts of distributed-memory and shared-memory parallel machines. Although SMP clusters are widely used in the high performance computing community, there exists no single programming paradigm that allows exploiting the hierarchical structure of these machines. Most parallel applications deployed on SMP clusters are based on MPI, the standard API for distributed-memory parallel programming, and thus may miss a number of optimization opportunities offered by the shared memory available within SMP nodes. In this paper we present extensions to the data parallel programming language HPF and associated compilation techniques for optimizing HPF programs on clusters of SMPs. The proposed extensions enable programmers to control key aspects of distributed-memory and shared-memory parallelization at a high-level of abstraction. Based on these language extensions, a compiler can adopt a hybrid parallelization strategy which closely reflects the hierarchical structure of SMP clusters by automatically exploiting shared-memory parallelism based on OpenMP within cluster nodes and distributed-memory parallelism utilizing MPI across nodes. We describe the implementation of these features in the VFC compiler and present experimental results which show the effectiveness of these techniques.  相似文献   

12.
13.
为挖掘可重构处理器的内在并行性,需要编译器通过分析程序的并行性来决定可重构处理器硬件最好的执行模式。为此,提出一种基于可重构处理器的并行优化算法。将有向无环图的并行计算部分映射到可重构处理器上,对任务实现3个不同层次的并行性(指令级并行、循环级并行、线程级并行)。测试结果表明,该算法使得可重构处理器在处理任务时比未用并行优化算法的性能提升1.2倍左右。  相似文献   

14.
Unstructured mesh generation exposes highly irregular computation patterns, which imposes a challenge in implementing triangulation algorithms on parallel machines. This paper reports on an efficient parallel implementation of near Delaunay triangulation with High Performance Fortran (HPF). Our algorithm exploits embarrassing parallelism by performing sub‐block triangulation and boundary merge independently at the same time. The sub‐block triangulation is a divide & conquer Delaunay algorithm known for its sequential efficiency, and the boundary triangulation is an incremental construction algorithm with low overhead. Compared with prior work, our parallelization method is both simple and efficient. In the paper, we also describe a solution to the collinear points problem that usually arises in large data sets. Our experiences with the HPF implementation show that with careful control of the data distribution, we are able to parallelize the program using HPF's standard directives and extrinsic procedures. Experimental results on several parallel platforms, including an IBM SP2 and a DEC Alpha farm, show that a parallel efficiency of 42–86% can be achieved for an eight‐node distributed memory system. We also compare efficiency of the HPF implementation with that of a similarly hand‐coded MPI implementation. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

15.
p-HPF并行编译系统核外计算的实现及优化策略   总被引:4,自引:0,他引:4  
文中阐述了p-HPF编译系统中对核外计算的支持 以及采取的优化策略,通过对编程模型的扩充和并行I/O模型的构造,p-HPF编译系统已能对核外数组进行有效的处理。  相似文献   

16.
This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, shared-memory multiprocessors. The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis and a simple cache model to drive its optimizations. It also optimizes across procedures by using interprocedural analysis and transformations. We validate the algorithm by hand-applying it to sequential versions of parallel, Fortran programs operating over dense matrices. The programs initially were hand-coded to target a variety of parallel machines using loop parallelism. We ignore the user's parallel loop directives, and use known and implemented dependence and interprocedural analysis to find parallelism. We then apply our new optimization algorithm to the resulting program. We compare the original parallel program to the hand-optimized program, and show that our algorithm improves three programs, matches four programs, and degrades one program in our test suite on a shared-memory, bus-based parallel machine with local caches. This experiment suggests existing dependence and interprocedural array analysis can automatically detect user parallelism, and demonstrates that user parallelized codes often benefit from our compiler optimizations, providing evidence that we need both parallel algorithms and compiler optimizations to effectively utilize parallel machines  相似文献   

17.
This paper describes the results of the research for implementing applications for concurrent execution on the CYBERPLUS multiparallel computer system. Three layers of parallelism are built into this system: instruction, computation and functional levels. The paper contains a short summary of the hardware, and the different layers of the software for multiprocessor systems. They are based on the previously developed CPFTN (Cyberplus Fortran Compiler) compiler which produces optimized object code for a single CYBERPLUS processor, exploiting through a series of horizontal and vertical optimizations the internal parallelism present in this processor. Unlike other compilers, the input to CPFTN consists of an entire kernel containing all subroutines and functions loaded into a single processor. An interprocessor communications subsystem provides the low-level component of the multiprocessor system and contains the data transfer and synchronization capabilities. The high-level component consists of extensions to FORTRAN (CPMFTN, Cyberplus Multiprocessor Fortran) for characterizing the data shared between processors and implements concepts specially adapted to the private memory architecture of the CYBERPLUS as well as to the specific needs of the user community targetted by this product. The resulting scheme is a large grain size, demand driven, data flow type. It is assumed that partition between processors can best be done on the bases of the high-level knowledge of the problem possessed by the user. The corresponding kernels may be created independently of each other, using the new language features introduced to characterize data shared between processors. Scheduling, synchronization, as well as data transfer operations are automatically distributed by the compiler across the subroutines and functions constituting the grain size of the partition. The first version of CPMFTN under development is described followed by several simple application paradigms and discussion of the merits of possible future extensions.  相似文献   

18.
In order to exploit the efficient computing power of many integrated cores on heterogeneous cluster, a multi-level and multi-granularity collaborative parallel computing method is proposed for finite element structural mechanical analysis. Computing tasks are divided into three levels: inter-node parallelism, inter-device parallelism and inter-core parallelism. Through mapping decomposablecomput- ing jobs to different hardware layers of heterogeneous MIC system, the proposed method not only effectively resolves the load balancing problem between CPU and MIC devices, but also significantly reduces the communication overheads of the system. Different engineering simulation case experiments for large scale parallel computing were conducted on “Tianhe 2” supercomputer. Up to 39000 CPU+MIC cores were employed and the finite element size of the analysis was more than 100 million units. Test results show that the proposed method can achieve good speedup and parallel computing efficiency in large scale parallel computing of finite element structural analysis. The optimized adaptation of finite element structural analysis and heterogeneous MIC computing platform is realized, which can provide reference for parallel porting and performance optimization of similar applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号