首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
大量遗留的串行代码需要进行并行化改造,而并行程序复杂性及并行计算平台多样性导致改造成本较高.为此,设计了一种基于标记语言的三层并行编程框架,完成了从串行程序层到并行中间代码层、并行中间代码层到目标并行编程语言程序层的二个转换阶段.采用对串行代码进行语言标记的方法来实现并行中间代码层,该代码层实际是共享存储、分布式存储并行平台编程语言的一种抽象.该框架还实现了一种性能标记方法,可用于并行参数自动寻优.用于雷达数据处理的实验结果表明,实现了对应并行代码的生成,且并行加速比与人工实现的并行代码相当.  相似文献   

2.
GPS L5信号是GPS现代化中一个新的民用信号。基于此,从Galileo信号及其捕获算法的简单分析引入GPS L5信号并行码相位搜索的研究,并对GPS L5信号的捕获进行仿真分析。通过研究表明,在GPS L5并行码信号捕获中,双信道并行码相位搜索算法的捕获能力最强但计算量最大。pilot channel并行码相位搜索算法的计算量与data channel并行码相位搜索算法的计算量相同,但捕获能力比双信道并行码相位搜索算法强。  相似文献   

3.
The extended full-potential (FPX) helicopter rotor computational fluid dynamics (CFD) code of Fortran in its reduced two-dimensional version is successfully converted into a parallel version for multiprocessing. The FPX code with an internal grid generator solves the compressible full-potential equation using an approximately factored finite-difference scheme with added numerous physical modeling enhancements, including viscous boundary layers, shock-induced entropy corrections and wake-vortex embedding. The parallel version of the code uses open multi-processing (OpenMP) directives as parallel programming tool in shared-memory (SM) environment. The OpenMP code is portable and scalable, which can run on various computer platforms including UNIX platforms and Windows NT platforms. The performance study of the parallel code on SGI Origin 2000 UNIX platform is made. The results show that reasonable speedups through parallelization are obtained and that OpenMP is easy to use and an efficient parallel programming tool for the present problem.  相似文献   

4.
This paper presents parallel computational strategies to implement explicit nonlinear finite element analysis code onto distributed memory parallel computers for solving large-scale problems in structural dynamics. Implementation details on both homogeneous and heterogeneous parallel processing environments are considered in detail in this paper. Implementation of an explicit nonlinear finite element dynamic analysis code on homogeneous systems is discussed first and this is later moved onto heterogeneous systems. Domain decomposition with explicit message passing is preferred for parallel implementation. The message passing implementation in the parallel algorithm is based on MPI (Message Passing Interface) libraries. Implementation aspects of overlapped, non-overlapped domain decomposition techniques, Dynamic Task Allocation (DTA) and clustering techniques for DTA and their relative merits are presented. The interprocessor communications are optimised by overlapping with computations to improve the performance of the domain decomposition based explicit dynamic analysis finite element code.The issues related to implementation of finite element code for nonlinear dynamic analysis on heterogeneous parallel computing environment are later presented. A new dynamic load-balancing algorithm is developed for this purpose and it is integrated with the domain decomposition based parallel explicit finite element code to test our algorithms on a coarse grain heterogeneous cluster of workstations. Numerical experiments have been carried out on PARAM-10000, an Indian parallel computer and also on cluster of Unix workstations.  相似文献   

5.
Parallelism has become one of the most extended paradigms used to improve performance. However, it forces software developers to adapt applications and coding mechanisms to exploit the available computing devices. Legacy source code needs to be re-written to take advantage of multi- core and many-core computing devices. Writing parallel applications in a traditional way is hard, expensive, and time consuming. Furthermore, there is often more than one possible transformation or optimization that can be applied to a single piece of legacy code. Therefore many parallel versions of the same original sequential code need to be considered. In this paper, we describe an automatic parallel source code generation workflow (REWORK) for parallel heterogeneous platforms. REWORK automatically identifies promising kernels on legacy C++ source code and generates multiple specific versions of kernels for improving C++ applications, selecting the most adequate version based on both static source code and target platform characteristics.  相似文献   

6.
The arrival of multicore systems, along with the speed‐up potential available in graphics processing units, has given us unprecedented low‐cost computing power. These systems address some of the known architecture problems but at the expense of considerably increased programming complexity. Heterogeneity, at both the architectural and programming levels, poses a great challenge to programmers. Many proposals have been put forth to facilitate the job of programmers. Leaving aside proposals based on the development of new programming languages because of the effort this represents for the user (effort to learn and reuse code), the remaining proposals are based on transforming sequential code into parallel code, or on transforming parallel code designed for one architecture into parallel code designed for another. A different approach relies on the use of skeletons. The programmer has available set of parallel standards that comprise the basis for developing parallel code while programming sequential code. In this context, we propose a methodology for developing an automatic source‐to‐source transformation in a specific domain. This methodology is instantiated in a framework aimed at solving dynamic programming problems. Using this framework, the final user (a physician, mathematician, biologist, etc.) can express her problem using an equation in Latex, and the system will automatically generate the optimal parallel code for homogeneous or heterogeneous architectures. This approach allows for great portability toward these new emerging architectures and for great productivity, as evidenced by the computational results.Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

7.
Kasi Anantha  Fred Long 《Software》1990,20(6):537-554
There are two principal methods used to exploit the parallelism available on a parallel machine: the program to be executed can be optimized by hand, or the program can be automatically converted to parallel machine code by a compiler. The first method usually derives parallelism at the procedure level; a parallel program is written in a high-level language and typically has various modules executing in parallel. By contrast, the compiler methodically transforms the program into parallel code using various transformations, such as code movement. The automatic conversion of a program to parallel code is called compaction or parallelization. This paper describes the evolution of a new compaction program and presents a new algorithm for determining legal code movements. A simulator of the target architecture was used to estimate the execution times of a sample suite of programs before and after compaction. The results verify that substantial advantages arise from applying this compaction technique.  相似文献   

8.
刘有耀  杨鹏程 《计算机应用》2016,36(9):2422-2426
针对当前大量遗产代码无法重复利用的问题,设计一种新的编译工具将C的串行代码转换为基于MPI+OpenMP的混合并行编程代码,降低了并行编程的开发成本。首先,通过对JavaCC的优化,实现一种可以解析C语言的词法和语法分析器,进行源代码分析并生成抽象语法树;其次,根据语法树对源代码进行控制依赖性和数据依赖性分析,产生可并行化的语句块分区;再次,按照提出的并行代码生成方法得到目标代码;最后,基于Visual Studio 2010构建目标代码仿真验证环境。实验结果表明,该工具可以较为理想地实现串行代码自动并行化,与手工编写的代码在加速比上的误差为8.2%~18.4%。  相似文献   

9.
在生物自发光成像领域,将基于蒙特卡罗方法的光子前向传输仿真进行并行化,提高了仿真的速度。首先介绍了所采用的一系列并行机制和串行加速算法,然后分别对并行仿真结果进行正确性验证和性能验证,并与软件MOSE、triMC3D的结果进行了对比,最后对该并行平台进行了总结和展望。  相似文献   

10.

In this paper, we present several important details in the process of legacy code parallelization, mostly related to the problem of maintaining numerical output of a legacy code while obtaining a balanced workload for parallel processing. Since we maintained the non-uniform mesh imposed by the original finite element code, we have to develop a specially designed data distribution among processors so that data restrictions are met in the finite element method. In particular, we introduce a data distribution method that is initially used in shared memory parallel processing and obtain better performance than the previous parallel program version. Besides, this method can be extended to other parallel platforms such as distributed memory parallel computers. We present results including several problems related to performance profiling on different (development and production) parallel platforms. The use of new and old parallel computing architectures leads to different behavior of the same code, which in all cases provides better performance in multiprocessor hardware.

  相似文献   

11.
刘勇  陆鑫达 《计算机工程》2005,31(4):71-73,84
提出的用户指导的并行化策略,提供给程序员一个图形化的交互界面,首先由程序员选择并行算法,然后通过配置向导对所选择的并行算法进行定制,生成配置文件,然后参照配置文件自动生成并行程序的框架代码,并将程序员提供的元任务的串行代码嵌入到框架代码中,最后生成并行程序,这样程序员就可以较方便高效地编写并行程序了。  相似文献   

12.
程海英  张武 《计算机工程与设计》2004,25(11):1961-1963,2011
根据解反应扩散方程的自适应样条小波-交替方向(SW-ADI)方法,使用MPI、OpenMP两种并行编程模式,对串行程序进行了直接并行化,并在上海大学的高性能计算机自强2000上分别用MPI和OpenMP实现了对方程的求解。对运算结果进行了分析并给出了与串行程序相比较的并行加速比。  相似文献   

13.
并行程序开发平台的可视化实现*   总被引:3,自引:0,他引:3  
并行程序可视化平台的实现有利于网络并行计算的发展,基于WPVM 3.4平台,构建并实现了一个网络并行可视化平台,它由任务描述器、通信代码生成器、代码插入器等主要模块组成。主要讲述了该平台的可视化实现部分,阐述了如何将用户前台的设置按照规则转变为后台的PVM原语代码自动插入,帮助并行程序开发人员从复杂的并行通信的烦琐性和并行系统的底层运作中解放出来。  相似文献   

14.
We describe an attempt to improve the performance of a sequential Fortran code on a shared memory multiprocessor. The only parallelism employed is that provided by a set of parallel BLAS. The code is modified to make maximum use of the BLAS and, in particular, to use an appropriate LAPACK routine. Comparisons are made with automatically generated parallel code.  相似文献   

15.
在某个共享存储式对称多处理(SMP)并行计算机上实现了应用程序(二维弹塑性流体动力学程序)的并行化。该并行计算机系统仅支持对FORTRAN DO循环结构的并行化。文章结合并行机的高性能特征,组织了该程序主体模块的并行化计算,同时给出解决Cache问题的一个实用并行优化技术。数据结果表明:有比较好的加速比。  相似文献   

16.
We survey genetic improvement (GI) of general purpose computing on graphics cards. We summarise several experiments which demonstrate four themes. Experiments with the gzip program show that genetic programming can automatically port sequential C code to parallel code. Experiments with the StereoCamera program show that GI can upgrade legacy parallel code for new hardware and software. Experiments with NiftyReg and BarraCUDA show that GI can make substantial improvements to current parallel CUDA applications. Finally, experiments with the pknotsRG program show that with semi-automated approaches, enormous speed ups can sometimes be had by growing and grafting new code with genetic programming in combination with human input.  相似文献   

17.
We report on practical experience using the Oxford BSP Library to parallelize a large electromagnetic code, the British Aerospace finite-difference time-domain code EMMA T:FD3D. The Oxford BS Library is one of the first realizations of the Bulk Synchronous Parallel computational model to be targeted at numerically intensive scientific (typically Fortran) computing. The BAe EMMA code is one of the first large-scale applications to be parallelized using this library, and it is an important demonstration of the cost effectiveness of the BSP approach. We illustrate how BSP cost-modelling techniques can be used to predict and optimize performance for single-source programs across different parallel platforms. We provide predicted and observed performance figures for an industrial-strength, single-source parallel code for a variety of real parallel architectures: shared memory multiprocessors, workstation clusters and massively parallel platforms.  相似文献   

18.
多路同步PCM遥测数据采集编码系统的设计和实现   总被引:2,自引:1,他引:1  
孙发鱼  纪立红  曾蕾 《测控技术》2001,20(2):22-23,26
多路同步数据采集编码系统可以较好地实现A/D采集的同步性和实时性,本文介绍了适用于PCM体制遥测的同步数据采集编码系统的实现方法,合理地解决了模拟量和数字量的同步采集问题。该系统可广泛应用于无线电遥测、远程数据采集等诸多数据测试系统中。  相似文献   

19.
并行HDL模拟是加速大型复杂的VLSI系统模拟验证的有效方法,支持并行模拟的HDL编译技术是其中的关键技术,文章提出了一种支持并行模拟的Verilog编译技术,编译器将Verilog描述转换成C++代码,最后与并行模拟核心库编译链接生成可执行并行程序。文章将编译器构成,代码生成方法和并行模拟核心库,该技术已经在并行Verilog模拟器ParaVer上实现。  相似文献   

20.
傅立国  姚远  丁锐 《计算机应用》2014,34(4):1014-1018
不规则计算在大规模并行应用中广泛存在。在面向分布存储结构的自动并行化过程中,较难在编译时为不规则循环生成并行代码。并行代码中的通信代码对程序运行结果的正确性以及加速效果有着严重的影响。通过分析程序的数组重分布图,使用部分冗余的通信方式来维持不规则数组访问的生产者消费者关系,可以在编译时为一类常见的不规则循环自动生成有效的通信代码。该方法使用计算分解和数组引用的访问表达式求解不规则数组在各处理器的本地定义集作为通信的数据集,分析针对此类不规则循环划分的通信策略,继而生成相应的通信代码。实验测试的结果取得了预期的加速效果,验证了方法的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号