JAPS中任务调度协议的设计与实现   总被引:1,自引:0,他引:1  
1 背景和系统概述在并行编译系统中,主要的目标是分析程序中的依赖关系。通过依赖分析得到的结果将源程序划分成可以并行执行的部分,然后在一定的运行时支撑平台上并行执行。并行执行的基本要求是程序的并行语义和串行语义必须一致。在以NOW基础的并行计算系统中,由于通信开销较大,一般需要发掘在任务层次上的并行性。相对数  相似文献   

在神威高性能多核服务器上,自动并行化编译系统为识别和申明程序中的并行性,产生的OpenMP程序没有经过充分的优化,其采用简单的fork-join模型,存在大量的并行循环嵌套,导致运行效率低。为提升自动并行化编译系统产生的OpenMP程序的运行效率,提出一种并行域重构优化技术。并行域重构技术通过合并程序中的并行域和扩展嵌套循环中的并行域范围,减少OpenMP程序的并行域数目,降低线程组频繁创建和合并等控制开销,将简单fork-join模型的OpenMP程序转换为性能更为高效的单程序多数据模型的OpenMP程序。实验结果表明,在新一代神威高性能多核服务器SW1621平台上,并行域重构技术在NPB3.3-OMP测试集和SPEC OMP2012测试集上的运行效率分别提高了10.77%和7.94%的,可有效提升自动并行化编译系统OpenMP程序的执行效率。  相似文献   

并行性分析技术是并行编译器中的关键分析技术,也是这一领域研究的热点问题,其目的是对串性程序进行依赖关系分析,提取可并行成分,并在此基础上对串行程序进行变换和分割。文章主要讨论了在基于JAVA的自动并行编译系统JAPS中,并行性分析模块的设计框架和实现方式。  相似文献   

使用Intel Parallel Amplifier高性能工具,针对模糊C均值聚类算法在多核平台的性能问题,找出串行程序的热点和并发性,提出并行化设计方案.基于Intel并行库TBB(线程构建模块)和OpenMP运行时库函数,对多核平台下的串行程序进行循环并行化和任务分配的并行化设计.  相似文献   

介绍了Java自动并行化编译系统Java Automatic Parallelizing System(JAPS)的可视化界面中数据转换模块和动态处理模块的设计和实现。其中,数据转移模块根据编译器的输出构造任务依赖图的数据结构并实现其层次化的直观显示。而动态处理模块实现运行过程中对动态信息时获取和处理。  相似文献   

传统MPI自动并行化编译系统从数据重分布的角度,生成面向分布式存储系统的消息传递程序,但是大量数据重分布通信的额外开销导致其加速比低。为了解决此问题,在基于Open64的MPI自动并行化编译系统后端,提出了一种消息传递代码生成算法。该算法以统一数据分布为中心,根据给定的并行化循环集和通信数组集,通过修改WHIRL表示的串行代码语法结构树,生成更精确的消息传递代码。实验结果表明,该算法能够较大程度地降低消息传递程序的通信开销,并且明显提升其加速比。  相似文献   

介绍一种可扩展的自动并行化编译系统Agassiz,研究其架构设计及关键特性。该系统可以把串行程序转换为并行程序,并为编译优化技术的研究提供良好的平台,通过面向对象的设计和实现,能有效集成各种并行优化技术。实验结果表明,该系统具有良好的可扩展性。  相似文献   

群体智能系统通过邻居个体的信息交互实现群体级别的应用任务,具有良好的鲁棒性和灵活性.与此同时,大多数开发人员难以对分布式、并行的个体交互机制进行描述.一些高级语言允许用户以串行思维方式、从系统全局角度来编程并行的群体智能计算任务,而无需考虑通信协议、数据分布等底层交互细节.但面向用户、全局声明式的群体智能系统应用程序与个体并行执行逻辑存在的巨大语义差距,使得编译过程复杂进而导致应用程序开发效率不高.本文提出了一个编译系统及其支撑工具,支持将高级的群体智能系统应用程序转换为安全、高效的分布式实现.该编译系统通过并行信息识别,计算划分,交互信息生成技术,将面向系统全局、串行编程的群体智能应用程序编译为面向个体独立执行的并行目标代码,从而使用户不必了解个体间的复杂交互机制.设计了一种标准化中间表示,将复杂群体智能计算任务转换为群体智能算子和输入输出变量组合而成的标准化语义模块序列,其以独立于平台的形式表示源程序信息,屏蔽了目标硬件平台的异构性.在一个群体智能系统案例平台中部署和测试了该编译系统,结果表明该系统能够有效将群体智能应用程序编译为平台可执行的目标代码并提升应用程序开发效率,其生成的代码在一系列基准测试中具有比现有编译器更好的性能.  相似文献   

并行计算是指同时使用多种计算资源解决计算问题的过程,节省了大量计算时间,极大地提高计算效率.目前各领域大量的串行程序已经相当成熟,所以如何通过一种转换,将现有大量的串行程序转化成并行程序,是提高程序运行速度的突破口.为了将串行程序并行化,以提高程序的运行效率,充分利用已经非常成熟的大量串行程序,文中从图论出发,建立并讨论了串行算法并行化的三个数学模型:有向带权图模型、集合划分模型、标记AVL树模型.通过这些数学模型,基于图论的思想方法,文中讨论了串行算法并行化的可行性,并提出了串行程序并行化的算法思想  相似文献   

AceMesh是一种基于数据流描述的任务并行编程语言,它允许程序员从串行程序出发,追加并行区域、并行循环的制导以及任务区的数据访问信息,AceMesh编译系统则自动把该程序转化为异步任务图并行的程序。分析了AceMesh程序改写中常见的并行化错误,介绍了其错误检查工具AceMeshCheck的结构,描述了访存轨迹的高效收集、存储方法以及逻辑形状推导的三维压缩算法。实验表明,AceMeshCheck不仅能分析出制导程序中的典型错误,而且开销较小。  相似文献   

For the parallel computer systems, a new formulation of the problem of constructing parallel asynchronous abstract programs of the desired length was proposed. The conditions for the problem of planning were represented as a system of Boolean equations (constraints) whose solutions define the feasible plans for activation of the program modules specified in the planner’s knowledge base. The constraints on the number of processors and time delays arising at execution of the program modules were taken into consideration.  相似文献   

Kasi Anantha  Fred Long 《Software》1990,20(6):537-554
There are two principal methods used to exploit the parallelism available on a parallel machine: the program to be executed can be optimized by hand, or the program can be automatically converted to parallel machine code by a compiler. The first method usually derives parallelism at the procedure level; a parallel program is written in a high-level language and typically has various modules executing in parallel. By contrast, the compiler methodically transforms the program into parallel code using various transformations, such as code movement. The automatic conversion of a program to parallel code is called compaction or parallelization. This paper describes the evolution of a new compaction program and presents a new algorithm for determining legal code movements. A simulator of the target architecture was used to estimate the execution times of a sample suite of programs before and after compaction. The results verify that substantial advantages arise from applying this compaction technique.  相似文献   

在基于模块组合的图形化编程中,存在大量互不依赖的模块,这些模块具有并行执行的性质。翻译程序以拓扑排序算法遍历该有向无环图,为每个模块产生一个线程,为每条输入线产生一个信号量,以同步有依赖关系的模块的执行顺序,最终产生一个可并行执行的多线程程序,从而达到自动挖掘并行性、提高所生成程序的运行效率的目的。  相似文献   

We describe a system that allows programmers to take advantage of both control and data parallelism through multiple intercommunicating data-parallel modules. This programming environment extends C-type stream I/O to include intermodule communication channels. The programmer writes each module as a separate data-parallel program, then develops a channel linker specification describing how to connect the modules together. A channel linker we have developed loads the separate modules on to the parallel machine and binds the communication channels together as specified. We present performance data that demonstrates a mixed control- and data-parallel solution can yield better performance than a strictly data-parallel solution. The system described currently runs on the Intel iWarp multicomputer.  相似文献   

We present a new data-driven paradigm for solving mapping problems on parallel computers. This paradigm targets at mapping data modules, instead of task modules, onto multiple processing cores. By dependency analysis of data modules, we devise a data movement matrix to reduce the need of manipulating task program modules at the expenses of handling data modules. To visualize and quantify the complex maneuver, we adopt the parallel activities trace graphs introduced earlier. To demonstrate the procedure and algorithmic values of our paradigm, we test it on the Strassen matrix multiplication and Cholesky matrix inversion algorithms. Mapping tasks has been more widely studied while mapping data is a new approach that appears to be more efficient for data-intensive applications that are becoming prevalent for today's parallel computers with millions of cores.  相似文献   

A novel distributed control ideology and technology is described for the management of advanced crisis relief missions. The approach is based on the installation of a universal “social” module in massively wearable electronic devices, such as laptops and mobile phones, which can collectively interpret a spatial scenario language, exchange high-level program code (waves) and data, and control other modules in parallel. This can dynamically integrate any scattered post-disaster human and technical resources into an operable distributed system capable of solving autonomously complex survivability, relief, and reconstruction problems. This work was presented in part at the 11th International Symposium on Artificial Life and Robotics, Oita, Japan, January 23–25, 2006  相似文献   

Matching an application to an architecture in structure and size is a way of achieving higher computation speed. This paper presents a combination of a compiler and a reconfigurable long instruction word (RLIW) architecture as an approach to the matching problem. Configurations suitable for the execution of different parts of a program are determined by a compiler, and code is generated for both reconfiguring the hardware and performing the computation. The RLIW machine, consisting of multiple processing and global data memory modules, effectively utilizes the fine-grained parallelism detected in programs by a compiler. The long word instructions control the operation of processing and memory modules in the system. To reduce the data transfer between processing modules and data memory modules, we provide reconfigurable interconnections among the processing modules which permit direct communication. The compiler uses new techniques, including region scheduling, generation of code for reconfiguration of the system, and memory allocation techniques, to achieve improved performance. Algorithms for packing operations into long word instructions and techniques for effectively assigning memory modules to the operands required by an instruction are developed. Results of the experiments to evaluate the system indicate that speedups of 60–300% can be obtained for both scientific and nonscientific programs. The reconfigurable architecture is responsible for much of the speedup. Also, the results indicate that the major problem of memory bottleneck faced in designing parallel systems is successfully attacked.This paper represents work done while the author was at the University of Pittsburgh  相似文献   

并行测试技术在自动测试系统中的应用   总被引:22,自引:6,他引:16  
并行测试拥有减少测试时间、降低测试成本的强大优势,正成为研究热点之一。首先详细分析了并行测试的基本概念,介绍了目前实现并行测试可以采用的两大类4种结构,对这4种结构各自的优缺点进行了比较。接着以多线程并行测试程序为例描述了并行测试程序中同步、异步和单线程的三种模型,最后重点对多线程并行测试实现中几个值得注意的重要问题进行了讨论。  相似文献   

提出了一种分布式控制并联方案实现多台逆变电源并联控制系统,分析了逆变电源并联运行控制过程中的电压和电流特性。试验运行结果表明,各模块均流效果好,控制策略可行,达到比较理想的并联运行控制效果。  相似文献   

