首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 218 毫秒
刘有耀  杨鹏程 《计算机应用》2016,36(9):2422-2426
针对当前大量遗产代码无法重复利用的问题,设计一种新的编译工具将C的串行代码转换为基于MPI+OpenMP的混合并行编程代码,降低了并行编程的开发成本。首先,通过对JavaCC的优化,实现一种可以解析C语言的词法和语法分析器,进行源代码分析并生成抽象语法树;其次,根据语法树对源代码进行控制依赖性和数据依赖性分析,产生可并行化的语句块分区;再次,按照提出的并行代码生成方法得到目标代码;最后,基于Visual Studio 2010构建目标代码仿真验证环境。实验结果表明,该工具可以较为理想地实现串行代码自动并行化,与手工编写的代码在加速比上的误差为8.2%~18.4%。  相似文献   

有效的程序自动并行化系统能帮助用户充分利用并行计算机的硬件资源和提高并行程序设计的效率。OpenMP作为共享内存结构的编程标准,具有良好的性能和可移植性。本文介绍了基于SUIF的OpenMP并行程序自动生成工具OAGT的设计和实现,重点讨论了其中所涉及的几个主要技术问题:循环分析、流水并行、归约操作、同步优化等。  相似文献   

尽管归约识别及并行化技术已经不是一个新的技术,但现有的并行化编译器的归约识别功能还不能满足实际应用的需要.通过对归约操作识别及并行化的分析和研究,在SUIF的基础架构上通过修改中间表示语言,实现了对位归约操作的识别功能.  相似文献   

主流通用处理器都已经实现了多核并行以及处理器核内的SIMD并行。虽然GCC编译器实现了面向SIMD并行的自动向量化,但是编译器针对OpenMP并行程序的自动向量化效果仍很不理想。针对多线程并行的OpenMP程序,基于GCC的OpenMP编译实现,扩展了数据对齐属性指导语句,使编译器在自动向量化时能够进行更准确的数据对齐与否的判断,优化了GCC编译器的自动向量化。  相似文献   

尽管归约识别及并行化技术已经不是一个新的技术,但现有的并行化编译器的归约识别功能还不能满足实际应用的需要。通过对归约操作识别及并行化的分析和研究,在SUIF的基础架构上通过修改中间表示语言,实现了对位归约操作的识别功能。  相似文献   

多核处理器能够提升多线程程序的性能,但早已存在的诸多单线程程序无法从中获益,程序员也习惯于编写单线程程序.自动并行化技术是将单线程程序移植到多核上的重要手段,但是当循环中存在无法确定的数据依赖或复杂的控制流时,传统的自动并行化技术无法取得良好效果.Ottoni等人针对传统自动并行失败的循环提出了Decoupled Software Pipelining(DSWP)算法用以实现指令级的细粒度并行,但其需要对处理器体系结构的深入了解以及对核间通信队列和专用指令的硬件支持,并行性能和应用广泛性受到限制.基于OpenMP应用编程接口实现的DSWP并行不依赖于硬件上对核间通信队列和专用指令的支持,且不受平台的限制,但现有的OpenMP任务调度机制无法满足DSWP并行中对任务调度的需求.对现有的OpenMP任务调度机制进行扩展,增加了任务与线程绑定的属性,保证了基于OpenMP的DSWP并行程序的正确执行.在GCC的OpenMP运行库libgomp中扩展了任务绑定属性子句的功能,扩展后的GCC作为OpenMP DSWP程序的基础编译器,为自动并行提供支持.通过对基准测试集NPB3.3.1的测试表明,传统自动并行失败的循环,经OpenMP DSWP自动并行后在双核处理器上平均加速比达到1.23以上;使用添加了OpenMP DSWP算法的Open64编译器生成的并行程序,与仅使用传统自动并行方法的Intel 编译器和Open64编译器所得程序相比,平均加速比分别高出22%和26%.  相似文献   

OpenMP规范了一系列的编译制导、环境变量和运行库,具有简单、可移植、支持增量并行等优点.但同时,采用FORK-JOIN模型所引起的频繁的线程管理开销也是制约OpenMP程序性能的瓶颈之一.本文讨论了如何利用并行区的合并与扩展,实现并行区的重构,并在此基础上利用Open64的IPA优化部件所提供的全局间过程分析能力,实现跨越过程边界的并行块的合并.最终实验表明,该方法有效地改进了OpenMP程序的运行性能.  相似文献   

为了提高程序题自动评分的准确性,及解决传统评分方法无法从语法结构和语义角度衡量错误的学生程序与正确答案之间的相似度,提出了一种基于抽象语法树匹配的程序题自动评分方法。文中以JavaCC技术为核心,首先通过词法分析、语法分析和语义分析生成错误列表和抽象语法树的中间表示,然后通过语法树切片匹配得分,最后和错误列表结合给出评分结果。文中详细论述了各个模块的设计方法,着重讨论了抽象语法树生成并匹配的细节,设计并实现了一个传统方法与语义分析结合的C++程序题自动评分系统。通过对实际考试的结果进行实验,进而验证了该系统的实用性与有效性。  相似文献   

OpenMP并行程序的编译器优化   总被引:3,自引:0,他引:3       下载免费PDF全文
OpemMP标准以其良好的可移植性和易用性被广泛应用于并行程序设计。该文讨论了OpenMP并行程序的编译器优化算法,在编译过程中通过并行区合并和扩展,实现并行区重构,并在并行区中实现了基于跨处理器相关图的barrier同步优化。分析验证表明,这些优化策略减少了并行区和barrier同步的数目,有效地提高了OpenMP程序的并行性能。  相似文献   

刘金硕  黄朔  邓娟 《计算机工程》2022,48(12):16-23
当使用高分辨率的图像作为图像处理算法的输入时会降低算法运行速度,将算法并行化可提升执行效率,但手动将串行程序转换为并行程序则较为繁琐,并且现有自动并行翻译工具性能不稳定,同时翻译后的程序是单一并行模式。面向基于面片的三维多视角立体视觉(PMVS)算法,提出一种从C到CUDA的自动两级并行翻译方法。使用ANTLR自动解析源C代码,通过分析数据依赖关系和循环数组私有化来识别可并行化的循环结构,将算法翻译成CPU多线程和GPU两级并行结构的代码。在算法执行过程中,将输入图像在CPU和GPU上分别进行处理,降低了算法总执行时间。实验结果表明,该方法的计算加速比随着输入图像分辨率的增加逐渐提高,最高约达到32,相比于PPCG和OpenACC自动并行翻译方法提升明显。  相似文献   

This paper presents the design and implementation of a parallelization framework and OpenMP runtime support in Intel® C++ & Fortran compilers for exploiting nested parallelism in applications using OpenMP pragmas or directives. We conduct the performance evaluation of two multimedia applications parallelized with OpenMP pragmas and compiled with the Intel C++ compiler on Hyper-Threading Technology (HT) enabled multiprocessor systems. The performance results show that the multithreaded code generated by the Intel compiler achieved a speedup up to 4.69 on 4 processors with HT enabled for five different input video sequences for the H.264 encoder workload, and a 1.28 speedup on an HT enabled single-CPU system and 1.99 speedup on an HT-enabled dual-CPU system for the audio–visual speech recognition workload. The performance gain due to exploiting nested parallelism for leveraging Hyper-Threading Technology is up to 70% for two multimedia workloads under different multiprocessor system configurations. These results demonstrate that hyper-threading benefits can be achieved by exploiting nested parallelism through Intel compiler and runtime system support for OpenMP programs.  相似文献   

介绍了SUIF中作为并行化依据的数据依赖关系分析技术,并针对其未将分析结果加以保存的不足,利用SUIF系统提供的遍、注释等技术,通过对依赖关系库和遍skweel的修改,对依赖关系分析的结果进行提取,并以注释的形式输出到SUIF中间文件中。  相似文献   

This paper describes compiler techniques that can translate standard OpenMP applications into code for distributed computer systems. OpenMP has emerged as an important model and language extension for shared-memory parallel programming. However, despite OpenMP's success on these platforms, it is not currently being used on distributed system. The long-term goal of our project is to quantify the degree to which such a use is possible and develop supporting compiler techniques. Our present compiler techniques translate OpenMP programs into a form suitable for execution on a Software DSM system. We have implemented a compiler that performs this basic translation, and we have studied a number of hand optimizations that improve the baseline performance. Our approach complements related efforts that have proposed language extensions for efficient execution of OpenMP programs on distributed systems. Our results show that, while kernel benchmarks can show high efficiency of OpenMP programs on distributed systems, full applications need careful consideration of shared data access patterns. A naive translation (similar to OpenMP compilers for SMPs) leads to acceptable performance in very few applications only. However, additional optimizations, including access privatization, selective touch, and dynamic scheduling, resulting in 31% average improvement on our benchmarks.  相似文献   

为分布内存系统开发的并行编译器碰到的第一个问题就是如何分解一个应用程序中的数据。由于访问非本地节点上数据的代价是昂贵的,所以数据分解必须仔细考虑。尽管数据分解的定义已被提出,但是文献并没有给出相应的算法.本文介绍了在一个已被证明且功能强大的数学模型下如何产生数据分解代码的算法,并在SUIF(Stanforduniversityintermediateformat)系统中的Paraguin编译器上得到实现。  相似文献   

以在嵌入式系统中建立C编译器的技术特点为主要内容,用设计实例论述了C编译器实现中前端、后端的主要工作内容。说明了在前、后端之间起桥梁作用的中间描述语言有向无环图(DAG)的设计原理及形成方法,同时还就如何将DAG与目标机系统之间形成映射关系进行描述,提出了在映射中规约规则制定的方法和原则,给出了一些有指导意义的经验性结论。  相似文献   

This paper presents the results of an experiment to measure empirically the remaining opportunities for exploiting loop-level parallelism that are missed by the Stanford SUIF compiler, a state-of-the-art automatic parallelization system targeting shared-memory multiprocessor architectures. For the purposes of this experiment, we have developed a run-time parallelization test called the Extended Lazy Privatizing Doall (ELPD) test, which is able to simultaneously test multiple loops in a loop nest. The ELPD test identifies a specific type of parallelism where each iteration of the loop being tested accesses independent data, possibly by making some of the data private to each processor. For 29 programs in three benchmark suites, the ELPD test was executed at run time for each candidate loop left unparallelized by the SUIF compiler to identify which of these loops could safely execute in parallel for the given program input. The results of this experiment point to two main requirements for improving the effectiveness of parallelizing compiler technology: incorporating control flow tests into analysis and extracting low-cost run-time parallelization tests from analysis results  相似文献   

同步语言Lustre所描述的反应系统通常应用在航空航天、国防建设等领域,对系统的正确性和安全性都要求很高。如果系统在运行时出现了正确性问题,很可能会导致系统崩溃,产生非常严重的后果。系统中的任何一个词法错误或者语法错误都应该受到重视,而且应该被及时纠正。因此,对Lustre语言进行正确的编译是十分重要的。传统的Lustre语言的编译器都采用OCaml语言描述,无法保证所有人员都能够很容易地理解和使用,而且,需要耗费开发人员大量的时间和精力。基于上述问题,提出了一种新型的Lustre语言编译器。新型的Lustre语言编译器前端主要采用C++语言进行描述,并对生成的抽象语法树的结构进行重新定义,简化了编译的过程。该编译前端会对一个经典的Lustre语言模型进行检测,通过对检测的结果进行分析,验证了该编译前端的可行性。  相似文献   

This paper addresses how to automatically generate code for multimedia extension architectures in the presence of conditionals. We evaluate the costs and benefits of exploiting branches on the aggregate condition codes associated with the fields of a superword (an aggregate object larger than a machine word) such as the branch-on-any instruction of the AltiVec. Branch-on-superword-condition-codes (BOSCC) instructions allow fast detection of aggregate conditions, an optimization opportunity often found in multimedia applications. This paper presents compiler analyses and techniques for generating efficient parallel code using BOSCC instructions. We evaluate our approach, which has been implemented in the SUIF compiler, through a set of experiments with multimedia benchmarks, and compare it with the default approach previously implemented in our compiler. Our experimental results show that using BOSCC instructions can result in better performance for applications where the aggregate condition codes of a superword often evaluate to the same value.  相似文献   

MOD问题是指在进行过程调用时哪些信息可能在被调用的过程中被修改。针对C语言,本文提出了一种基于流敏感、上下文敏感指针分析结果的MOD分析算法。该算法通过计算表达式在指向图中的左值,得到所有可能被修改的内存位置,从而计算出所有可能在被调用过程中被修改的表达式。我们在SUIF2平台下实现了该算法,得到了预期的实验结果。  相似文献   

In light of GPUs’ powerful floating-point operation capacity,heterogeneous parallel systems incorporating general purpose CPUs and GPUs have become a highlight in the research field of high performance computing(HPC).However,due to the complexity of programming on GPUs,porting a large number of existing scientific computing applications to the heterogeneous parallel systems remains a big challenge.The OpenMP programming interface is widely adopted on multi-core CPUs in the field of scientific computing.To effectively inherit existing OpenMP applications and reduce the transplant cost,we extend OpenMP with a group of compiler directives,which explicitly divide tasks among the CPU and the GPU,and map time-consuming computing fragments to run on the GPU,thus dramatically simplifying the transplantation.We have designed and implemented MPtoStream,a compiler of the extended OpenMP for AMD’s stream processing GPUs.Our experimental results show that programming with the extended directives deviates from programming with OpenMP by less than 11% modification and achieves significant speedup ranging from 3.1 to 17.3 on a heterogeneous system,incorporating an Intel Xeon E5405 CPU and an AMD FireStream 9250 GPU,over the execution on the Xeon CPU alone.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号