期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

吴少刚章隆兵蔡飞胡伟武《计算机研究与发展》2004,41(7):1298-1305

OpenMP作为共享存储并行编程标准，以其良好的易用性、支持增量并行等特点成为并行程序设计的主流模型之一．OpenMP标准是针对UMA共享存储结构制定的，其循环调度机制只考虑了负载平衡而无须考虑数据分布．然而在机群OpenMP系统中，数据局部性是影响性能的关键因素．针对OpenMP标准中静态调度策略不适合机群计算的缺点，提出了一个充分体现拥有者计算原则的LBS调度算法，并通过扩展制导的方式在机群OpenMP系统（OpenMP/JIAJIA)上加以实现．测试结果表明，LBS算法对于机群OpenMP系统很有效．相似文献

2.

机群OpenMP系统的设计与实现 总被引：5，自引：0，他引：5

吴少刚章隆兵蔡飞顾丽红唐志敏《计算机学报》2004,27(7):904-912

OpenMP以其易用性和支持增量并行的特点成为共享存储体系结构的编程标准．目前机群系统已成为高性能计算的主流平台,研究机群OpenMP系统对推进并行应用的开发和普及非常有意义．该文作者以软件DSM系统JIAJIA作为OpenMP的运行时系统,结合一个前端编译器OMP2JIA,在机群系统上实现了OpenMP／JIAJIA计算环境,同时在提高性能方面根据机群系统特点扩展了OpenMP制导,优化了后端运行时库。通过11个OpenMP应用,作者比较了该计算环境和一个支持OpenMP的硬件cc-NUMA系统(SGI 2100)的性能．结果表明,作者的机群OpenMP系统的7机平均加速比为4．62;SGI 2100系统为4．55,二者性能相当．相似文献

3.

PC机群上共享存储与消息传递的比较 总被引：7，自引：0，他引：7

下载免费PDF全文

章隆兵吴少刚蔡飞胡伟武《软件学报》2004,15(6):842-849

共享存储和消息传递是目前两种主流的并行编程模型.一般认为,消息传递的可编程性不及共享存储友好.OpenMP是目前共享存储编程的实际工业标准.机群OpenMP系统在机群上提供了OpenMP编程环境,具有易编程和可扩展的特点,但是其性能如何一直是关注的热点.以机群OpenMP系统OpenMP/JIAJIA和典型的消息传递系相似文献

4.

针对非规则应用的OpenMP制导扩展 总被引：1，自引：0，他引：1

顾丽红吴少刚章隆兵蔡飞《小型微型计算机系统》2005,26(1):124-128

许多非规则应用的棱心是稀疏矩阵运算．稀疏矩阵运算的特点是对一个数组元素的引用依赖于另两个数组的元素值，因此具有非规则访存特点．本文针对稀疏矩阵运算特点，提出一种新的OpenMP制导子句indirect，并在机群OpenMP系统OpenMP／JIAJIA上进行了实现．采用一个实的OpenMP应用Equake进行了测试，测试结果表明该制导扩展很有效，对于直接使用该制导子句的函数代码，其性能改进了18％，而整个应用的性能改进了15％．相似文献

5.

面向嵌入式多核的OpenMP扩展方法

下载免费PDF全文

王庆季振洲刘涛《计算机科学与探索》2011,5(1):81-86

为多核平台开发一种有效的编程方法已经成为并行软件研究的一个重要目标.在嵌入式多核平台上进行了OpenMP并行程序的有效的实施运行.针对嵌入式具有有限内存资源的特点,提出了通过扩展OpenMP自定义制导语句tiling来提高并行程序在嵌入式多核平台上的运行效率.扩展后的OpenMP并行程序支持循环分片,从而能够充分利用层... 相似文献

6.

可恢复的软件DSM系统JIACKPT

下载免费PDF全文

章隆兵张福新胡伟武唐志敏《软件学报》2005,16(2):165-173

软件DSM(distributed shared memory)系统在机群上构造了共享存储编程环境,结合了共享存储的易编程性和机群的可扩展性,引起了广泛的研究.由于软件DSM系统是一个分布式系统,系统失败风险大,需要实现容错技术以促进其实用化.利用用户级检查点技术,在支持域存储一致模型的软件DSM系统JIAJIA的基础上,设计并实现了一个可恢复的高可移植的软件DSM系统JIACKPT(JIAjia with ChecKPoinTing).由于采用适合软件DSM系统的强全局一致状态以及多种优化措施,JIACKPT易于实现且获得很好的性能.在一个8节点的PC机群上的应用测试表明,即使每分钟做一次检查点,大部分应用的检查点开销也小于10%.此外,JIACKPT还具有高可移植性.这些都表明JIACKPT已经成为一个比较实用的系统. 相似文献

7.

面向DSWP并行的OpenMP任务调度机制的扩展与实现

刘晓娴赵荣彩丁锐《计算机科学》2013,40(9):38-43

多核处理器能够提升多线程程序的性能,但早已存在的诸多单线程程序无法从中获益,程序员也习惯于编写单线程程序.自动并行化技术是将单线程程序移植到多核上的重要手段,但是当循环中存在无法确定的数据依赖或复杂的控制流时,传统的自动并行化技术无法取得良好效果.Ottoni等人针对传统自动并行失败的循环提出了Decoupled Software Pipelining(DSWP)算法用以实现指令级的细粒度并行,但其需要对处理器体系结构的深入了解以及对核间通信队列和专用指令的硬件支持,并行性能和应用广泛性受到限制.基于OpenMP应用编程接口实现的DSWP并行不依赖于硬件上对核间通信队列和专用指令的支持,且不受平台的限制,但现有的OpenMP任务调度机制无法满足DSWP并行中对任务调度的需求.对现有的OpenMP任务调度机制进行扩展,增加了任务与线程绑定的属性,保证了基于OpenMP的DSWP并行程序的正确执行.在GCC的OpenMP运行库libgomp中扩展了任务绑定属性子句的功能,扩展后的GCC作为OpenMP DSWP程序的基础编译器,为自动并行提供支持.通过对基准测试集NPB3.3.1的测试表明,传统自动并行失败的循环,经OpenMP DSWP自动并行后在双核处理器上平均加速比达到1.23以上;使用添加了OpenMP DSWP算法的Open64编译器生成的并行程序,与仅使用传统自动并行方法的Intel 编译器和Open64编译器所得程序相比,平均加速比分别高出22％和26％. 相似文献

8.

共享内存结构并行程序的编译器优化

张平李清宝赵荣彩《计算机工程与应用》2006,42(1):13-16

共享内存结构上的程序自动并行化通常实现循环级并行,采用fork-join执行模式,并行性能有待提高。论文结合fork-join和SPMD两种执行模式的优势,在并行化编译过程中通过并行区合并和扩展,实现fork-join和SPMD混合执行模式,并在SPMD并行区中实现了基于跨处理器相关图的barrier同步优化。分析验证表明,这些优化策略减少了并行区和barrier同步的数目,有效地提高了生成并行程序的性能。相似文献

9.

渐进式MPI并行编程的教学方法

罗秋明李晶《计算机时代》2006,(9):68-69

MPI并行编程的教学过程中，首要任务是解决好串行编程思想到并行编程思想的转换，让学生掌握SPMD方式下同一段代码可以形成不同计算行为是关键所在。从单机环境下的多进程／线程编程入手，获得SPMD的概念后再学习MPI的基本概念，然后学习其他高级特性，可以使得学习过程相当顺利。相似文献

10.

基于OpenMP的AVS并行编码算法研究与实现 总被引：1，自引：1，他引：0

胡文安于鸿洋《计算机工程与设计》2010,31(10)

为了提高新一代音视频编解码技术标准AVS的编码速度,利用OpenMP在多核处理器平台上研究并实现了AVS的GOP级、条带级,帧级和基于任务队列模型的帧级并行编码算法.对CIF格式的视频序列进行了测试,在四核处理器平台上加速比最高能达到3.82x.另外,基于任务队列模型的帧级并行算法在保持图像质量不变的基础上解决了帧级并行算法加速比偏低的缺点.实验结果表明,OpenMP是一种简单而有效的并行化编程工具,基于OpenMP的各个AVS并行编码算法与原串行算法相比,编码速度都有显著提高. 相似文献

11.

A Comparison of Co-Array Fortran and OpenMP Fortran for SPMD Programming

Alan J. Wallcraft 《The Journal of supercomputing》2002,22(3):231-250

Co-Array Fortran, formally called F^––, is a small set of extensions to Fortran 90/95 for Single-Program-Multiple-Data (SPMD) parallel processing. OpenMP Fortran is a set of compiler directives that provide a high level interface to threads in Fortran, with both thread-local and thread-shared memory. OpenMP is primarily designed for loop-level directive-based parallelization, but it can also be used for SPMD programs by spawning multiple threads as soon as the program starts and having each thread then execute the same code independently for the duration of the run. The similarities and differences between these two SPMD programming models are described.Co-Array Fortran can be implemented using either threads or processes, and is therefore applicable to a wider range of machine types than OpenMP Fortran. It has also been designed from the ground up to support the SPMD programming style. To simplify the implementation of Co-Array Fortran, a formal Subset is introduced that allows the mapping of co-arrays onto standard Fortran arrays of higher rank. An OpenMP Fortran compiler can be extended to support Subset Co-Array Fortran with relatively little effort. 相似文献

12.

Performance comparison of MPI and OpenMP on shared memory multiprocessors

Graud Krawezik Franck Cappello 《Concurrency and Computation》2006,18(1):29-61

When using a shared memory multiprocessor, the programmer faces the issue of selecting the portable programming model which will provide the best performance. Even if they restricts their choice to the standard programming environments (MPI and OpenMP), they have to select a programming approach among MPI and the variety of OpenMP programming styles. To help the programmer in their decision, we compare MPI with three OpenMP programming styles (loop level, loop level with large parallel sections, SPMD) using a subset of the NAS benchmark (CG, MG, FT, LU), two dataset sizes (A and B), and two shared memory multiprocessors (IBM SP3 NightHawk II, SGI Origin 3800). We have developed the first SPMD OpenMP version of the NAS benchmark and gathered other OpenMP versions from independent sources (PBN, SDSC and RWCP). Experimental results demonstrate that OpenMP provides competitive performance compared with MPI for a large set of experimental conditions. Not surprisingly, the two best OpenMP versions are those requiring the strongest programming effort. MPI still provides the best performance under some conditions. We present breakdowns of the execution times and measurements of hardware performance counters to explain the performance differences. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

13.

多核环境下AREM模式混合并行计算研究 总被引：1，自引：1，他引：0

下载免费PDF全文

赵军吴建平宋君强辜旭赞《计算机工程与应用》2011,47(21):61-63

使用多核处理器已成为构建高性能计算机系统的主流方式。结合多核高性能计算机系统集共享内存结构和分布式内存结构于一体的体系结构特点,对AREM模式开展MPI/OpenMP混合并行计算研究与实现。性能测试结果表明,使用MPI/OpenMP混合并行计算可以将并行应用扩展至更大处理机规模,缩短计算时间,不对原程序结构做大的改动、以增量方式和较小的并行化代价,取得比较好的并行计算效果。相似文献

14.

基于Docker的MPI和OpenMP混合编程

赵博颖肖鹏张力《计算机与现代化》2018,(5):60

针对当前搭建集群并行系统复杂且耗时等问题,提出基于Docker搭建并行系统。介绍轻量级虚拟化技术Docker的核心概念和基本架构,并基于Docker技术在Linux平台上搭建集群并行开发环境。简要阐述并行计算的思想,叙述MPI和OpenMP并行计算的基本概念和特点,针对矩阵并行乘法的算法建立MPI和OpenMP的混合编程模型,并给出混合编程模型与MPI并行编程模型以及OpenMP并行编程模型的性能对比,分析出现差异的原因。基于该混合编程模型比较Docker与传统物理机两者搭建的并行系统的并行效率。相似文献

15.

OpenMP‐oriented applications for distributed shared memory architectures

Ami Marowka Zhenying Liu Barbara Chapman 《Concurrency and Computation》2004,16(4):371-384

The rapid rise of OpenMP as the preferred parallel programming paradigm for small‐to‐medium scale parallelism could slow unless OpenMP can show capabilities for becoming the model‐of‐choice for large scale high‐performance parallel computing in the coming decade. The main stumbling block for the adaptation of OpenMP to distributed shared memory (DSM) machines, which are based on architectures like cc‐NUMA, stems from the lack of capabilities for data placement among processors and threads for achieving data locality. The absence of such a mechanism causes remote memory accesses and inefficient cache memory use, both of which lead to poor performance. This paper presents a simple software programming approach called copy‐inside–copy‐back (CC) that exploits the data privatization mechanism of OpenMP for data placement and replacement. This technique enables one to distribute data manually without taking away control and flexibility from the programmer and is thus an alternative to the automat and implicit approaches. Moreover, the CC approach improves on the OpenMP‐SPMD style of programming that makes the development process of an OpenMP application more structured and simpler. The CC technique was tested and analyzed using the NAS Parallel Benchmarks on SGI Origin 2000 multiprocessor machines. This study shows that OpenMP improves performance of coarse‐grained parallelism, although a fast copy mechanism is essential. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

16.

PPModel: a modeling tool for source code maintenance and optimization of parallel programs

Ferosh Jacob Jeff Gray Jeffrey C. Carver Marjan Mernik Purushotham Bangalore 《The Journal of supercomputing》2012,62(3):1560-1582

As the computation power in desktops advances, parallel programming has emerged as one of the essential skills needed by next generation software engineers. However, programs written in popular parallel programming paradigms have a substantial amount of sequential code mixed with the parallel code. Several such versions supporting different platforms are necessary to find the optimum version of the program for the available resources and problem size. As revealed by our study on benchmark programs, sequential code is often duplicated in these versions. This can affect code comprehensibility and re-usability of the software. In this paper, we discuss a framework named PPModel, which is designed and implemented to free programmers from these scenarios. Using PPModel, a programmer can separate parallel blocks in a program, map these blocks to various platforms, and re-execute the entire program. We provide a graphical modeling tool (PPModel) intended for Eclipse users and a Domain-Specific Language (tPPModel) for non-Eclipse users to facilitate the separation, the mapping, and the re-execution. This is illustrated with a case study from a benchmark program, which involves re-targeting a parallel block to CUDA and another parallel block to OpenMP. The modified program gave almost 5× performance gain compared to the sequential counterpart, and 1.5× gain compared to the existing OpenMP version. 相似文献

17.

Integrating Parallelizing Compilation Technologies for SMP Clusters

下载免费PDF全文

Xiao-BingFeng LiChen Yi-RanWang Xiao-MiAn LinMa Chun-LeiSang Zhao-QingZhang 《计算机科学技术学报》2005,20(1):0-0

In this paper, a source to source parallelizing compiler system, AutoPar, is presentd. The system transforms FORTRAN programs to multi-level hybrid MPI/OpenMP parallel programs. Integrated parallel optimizing technologies are utilized extensively to derive an effective program decomposition in the whole program scope. Other features such as synchronization optimization and communication optimization improve the performance scalability of the generated parallel programs, from both intra-node and inter-node. The system makes great effort to boost automation of parallelization. Profiling feedback is used in performance estimation which is the basis of automatic program decomposition. Performance results for eight benchmarks in NPB1.0 from NAS on an SMP cluster are given, and the speedup is desirable. It is noticeable that in the experiment, at most one data distribution directive and a reduction directive are inserted by the user in BT/SP/LU. The compiler is based on ORC, Open Research Compiler. ORC is a powerful compiler infrastructure, with such features as robustness, flexibility and efficiency. Strong analysis capability and well-defined infrastructure of ORC make the system implementation quite fast. 相似文献