期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

唐佩佳徐云钟旭阳《计算机系统应用》2020,29(10):82-88

大量遗留的串行代码需要进行并行化改造,而并行程序复杂性及并行计算平台多样性导致改造成本较高.为此,设计了一种基于标记语言的三层并行编程框架,完成了从串行程序层到并行中间代码层、并行中间代码层到目标并行编程语言程序层的二个转换阶段.采用对串行代码进行语言标记的方法来实现并行中间代码层,该代码层实际是共享存储、分布式存储并行平台编程语言的一种抽象.该框架还实现了一种性能标记方法,可用于并行参数自动寻优.用于雷达数据处理的实验结果表明,实现了对应并行代码的生成,且并行加速比与人工实现的并行代码相当. 相似文献

2.

GPS L5并行码相位捕获算法仿真分析

下载免费PDF全文

徐贵州张雷胡以华《计算机工程》2011,37(4):290-292

GPS L5信号是GPS现代化中一个新的民用信号。基于此,从Galileo信号及其捕获算法的简单分析引入GPS L5信号并行码相位搜索的研究,并对GPS L5信号的捕获进行仿真分析。通过研究表明,在GPS L5并行码信号捕获中,双信道并行码相位搜索算法的捕获能力最强但计算量最大。pilot channel并行码相位搜索算法的计算量与data channel并行码相位搜索算法的计算量相同,但捕获能力比双信道并行码相位搜索算法强。相似文献

3.

A parallel CFD rotor code using OpenMP

《Advances in Engineering Software》2001,32(8):665-671

The extended full-potential (FPX) helicopter rotor computational fluid dynamics (CFD) code of Fortran in its reduced two-dimensional version is successfully converted into a parallel version for multiprocessing. The FPX code with an internal grid generator solves the compressible full-potential equation using an approximately factored finite-difference scheme with added numerous physical modeling enhancements, including viscous boundary layers, shock-induced entropy corrections and wake-vortex embedding. The parallel version of the code uses open multi-processing (OpenMP) directives as parallel programming tool in shared-memory (SM) environment. The OpenMP code is portable and scalable, which can run on various computer platforms including UNIX platforms and Windows NT platforms. The performance study of the parallel code on SGI Origin 2000 UNIX platform is made. The results show that reasonable speedups through parallelization are obtained and that OpenMP is easy to use and an efficient parallel programming tool for the present problem. 相似文献

4.

Explicit nonlinear dynamic finite element analysis on homogeneous/heterogeneous parallel computing environment

《Advances in Engineering Software》2006,37(11):701-720

This paper presents parallel computational strategies to implement explicit nonlinear finite element analysis code onto distributed memory parallel computers for solving large-scale problems in structural dynamics. Implementation details on both homogeneous and heterogeneous parallel processing environments are considered in detail in this paper. Implementation of an explicit nonlinear finite element dynamic analysis code on homogeneous systems is discussed first and this is later moved onto heterogeneous systems. Domain decomposition with explicit message passing is preferred for parallel implementation. The message passing implementation in the parallel algorithm is based on MPI (Message Passing Interface) libraries. Implementation aspects of overlapped, non-overlapped domain decomposition techniques, Dynamic Task Allocation (DTA) and clustering techniques for DTA and their relative merits are presented. The interprocessor communications are optimised by overlapping with computations to improve the performance of the domain decomposition based explicit dynamic analysis finite element code.The issues related to implementation of finite element code for nonlinear dynamic analysis on heterogeneous parallel computing environment are later presented. A new dynamic load-balancing algorithm is developed for this purpose and it is integrated with the domain decomposition based parallel explicit finite element code to test our algorithms on a coarse grain heterogeneous cluster of workstations. Numerical experiments have been carried out on PARAM-10000, an Indian parallel computer and also on cluster of Unix workstations. 相似文献

5.

Automatic CPU/GPU Generation of Multi-versioned OpenCL Kernels for C++ Scientific Applications

Rafael Sotomayor Luis Miguel Sanchez Javier Garcia Blas Javier Fernandez J. Daniel Garcia 《International journal of parallel programming》2017,45(2):262-282

Parallelism has become one of the most extended paradigms used to improve performance. However, it forces software developers to adapt applications and coding mechanisms to exploit the available computing devices. Legacy source code needs to be re-written to take advantage of multi- core and many-core computing devices. Writing parallel applications in a traditional way is hard, expensive, and time consuming. Furthermore, there is often more than one possible transformation or optimization that can be applied to a single piece of legacy code. Therefore many parallel versions of the same original sequential code need to be considered. In this paper, we describe an automatic parallel source code generation workflow (REWORK) for parallel heterogeneous platforms. REWORK automatically identifies promising kernels on legacy C++ source code and generates multiple specific versions of kernels for improving C++ applications, selecting the most adequate version based on both static source code and target platform characteristics. 相似文献

6.

High‐level specifications for automatically generating parallel code

Alejandro Acosta Francisco Almeida Ignacio Pelez 《Concurrency and Computation》2013,25(7):989-1012

The arrival of multicore systems, along with the speed‐up potential available in graphics processing units, has given us unprecedented low‐cost computing power. These systems address some of the known architecture problems but at the expense of considerably increased programming complexity. Heterogeneity, at both the architectural and programming levels, poses a great challenge to programmers. Many proposals have been put forth to facilitate the job of programmers. Leaving aside proposals based on the development of new programming languages because of the effort this represents for the user (effort to learn and reuse code), the remaining proposals are based on transforming sequential code into parallel code, or on transforming parallel code designed for one architecture into parallel code designed for another. A different approach relies on the use of skeletons. The programmer has available set of parallel standards that comprise the basis for developing parallel code while programming sequential code. In this context, we propose a methodology for developing an automatic source‐to‐source transformation in a specific domain. This methodology is instantiated in a framework aimed at solving dynamic programming problems. Using this framework, the final user (a physician, mathematician, biologist, etc.) can express her problem using an equation in Latex, and the system will automatically generate the optimal parallel code for homogeneous or heterogeneous architectures. This approach allows for great portability toward these new emerging architectures and for great productivity, as evidenced by the computational results.Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

7.

Code compaction for parallel architectures

Kasi Anantha Fred Long 《Software》1990,20(6):537-554

There are two principal methods used to exploit the parallelism available on a parallel machine: the program to be executed can be optimized by hand, or the program can be automatically converted to parallel machine code by a compiler. The first method usually derives parallelism at the procedure level; a parallel program is written in a high-level language and typically has various modules executing in parallel. By contrast, the compiler methodically transforms the program into parallel code using various transformations, such as code movement. The automatic conversion of a program to parallel code is called compaction or parallelization. This paper describes the evolution of a new compaction program and presents a new algorithm for determining legal code movements. A simulator of the target architecture was used to estimate the execution times of a sample suite of programs before and after compaction. The results verify that substantial advantages arise from applying this compaction technique. 相似文献

8.

基于JavaCC的C代码自动并行化的设计与实现

刘有耀杨鹏程《计算机应用》2016,36(9):2422-2426

针对当前大量遗产代码无法重复利用的问题,设计一种新的编译工具将C的串行代码转换为基于MPI+OpenMP的混合并行编程代码,降低了并行编程的开发成本。首先,通过对JavaCC的优化,实现一种可以解析C语言的词法和语法分析器,进行源代码分析并生成抽象语法树;其次,根据语法树对源代码进行控制依赖性和数据依赖性分析,产生可并行化的语句块分区;再次,按照提出的并行代码生成方法得到目标代码;最后,基于Visual Studio 2010构建目标代码仿真验证环境。实验结果表明,该工具可以较为理想地实现串行代码自动并行化,与手工编写的代码在加速比上的误差为8.2%~18.4%。相似文献

9.

自发荧光成像中光子传输蒙卡仿真的并行实现

杨薇杨鑫代晓倩王珊骆劼徐敏《计算机科学与探索》2009,3(2):198-209

在生物自发光成像领域,将基于蒙特卡罗方法的光子前向传输仿真进行并行化,提高了仿真的速度。首先介绍了所采用的一系列并行机制和串行加速算法,然后分别对并行仿真结果进行正确性验证和性能验证,并与软件MOSE、triMC3D的结果进行了对比,最后对该并行平台进行了总结和展望。相似文献

10.

Legacy code and parallel computing: updating and parallelizing a numerical model

Tinetti Fernando G. Perez Maximiliano J. Fraidenraich Ariel Altenberg Adolfo E. 《The Journal of supercomputing》2020,76(7):5636-5654

In this paper, we present several important details in the process of legacy code parallelization, mostly related to the problem of maintaining numerical output of a legacy code while obtaining a balanced workload for parallel processing. Since we maintained the non-uniform mesh imposed by the original finite element code, we have to develop a specially designed data distribution among processors so that data restrictions are met in the finite element method. In particular, we introduce a data distribution method that is initially used in shared memory parallel processing and obtain better performance than the previous parallel program version. Besides, this method can be extended to other parallel platforms such as distributed memory parallel computers. We present results including several problems related to performance profiling on different (development and production) parallel platforms. The use of new and old parallel computing architectures leads to different behavior of the same code, which in all cases provides better performance in multiprocessor hardware.

相似文献

11.

用户指导的并行化策略的研究与实现

刘勇陆鑫达《计算机工程》2005,31(4):71-73,84

提出的用户指导的并行化策略,提供给程序员一个图形化的交互界面,首先由程序员选择并行算法,然后通过配置向导对所选择的并行算法进行定制,生成配置文件,然后参照配置文件自动生成并行程序的框架代码,并将程序员提供的元任务的串行代码嵌入到框架代码中,最后生成并行程序,这样程序员就可以较方便高效地编写并行程序了。相似文献

12.

自适应SW-ADI方法解反应扩散方程的并行实现

程海英张武《计算机工程与设计》2004,25(11):1961-1963,2011

根据解反应扩散方程的自适应样条小波-交替方向(SW-ADI)方法，使用MPI、OpenMP两种并行编程模式，对串行程序进行了直接并行化，并在上海大学的高性能计算机自强2000上分别用MPI和OpenMP实现了对方程的求解。对运算结果进行了分析并给出了与串行程序相比较的并行加速比。相似文献

13.

并行程序开发平台的可视化实现* 总被引：3，自引：0，他引：3

张信一李代平罗伟刚《计算机应用研究》2004,21(11):266-269

并行程序可视化平台的实现有利于网络并行计算的发展,基于WPVM 3.4平台,构建并实现了一个网络并行可视化平台,它由任务描述器、通信代码生成器、代码插入器等主要模块组成。主要讲述了该平台的可视化实现部分,阐述了如何将用户前台的设置按照规则转变为后台的PVM原语代码自动插入,帮助并行程序开发人员从复杂的并行通信的烦琐性和并行系统的底层运作中解放出来。相似文献

14.

Porting a dusty deck Fortran program to a shared-memory multiprocessor

Andy Beavis Chris Phillips 《Concurrency and Computation》1992,4(8):575-587

We describe an attempt to improve the performance of a sequential Fortran code on a shared memory multiprocessor. The only parallelism employed is that provided by a set of parallel BLAS. The code is modified to make maximum use of the BLAS and, in particular, to use an appropriate LAPACK routine. Comparisons are made with automatically generated parallel code. 相似文献

15.

并行应用程序中一些实用并行优化技术

左风丽郭勤张宝琳谭荣乐《计算机工程与应用》2001,37(5):83-85

在某个共享存储式对称多处理（SMP）并行计算机上实现了应用程序（二维弹塑性流体动力学程序）的并行化。该并行计算机系统仅支持对FORTRAN DO循环结构的并行化。文章结合并行机的高性能特征,组织了该程序主体模块的并行化计算,同时给出解决Cache问题的一个实用并行优化技术。数据结果表明：有比较好的加速比。相似文献

16.

Genetic improvement of GPU software

William B. Langdon Brian Yee Hong Lam Marc Modat Justyna Petke Mark Harman 《Genetic Programming and Evolvable Machines》2017,18(1):5-44

We survey genetic improvement (GI) of general purpose computing on graphics cards. We summarise several experiments which demonstrate four themes. Experiments with the gzip program show that genetic programming can automatically port sequential C code to parallel code. Experiments with the StereoCamera program show that GI can upgrade legacy parallel code for new hardware and software. Experiments with NiftyReg and BarraCUDA show that GI can make substantial improvements to current parallel CUDA applications. Finally, experiments with the pknotsRG program show that with semi-automated approaches, enormous speed ups can sometimes be had by growing and grafting new code with genetic programming in combination with human input. 相似文献

17.

Portability,predictability and performance for parallel computing: BSP in practice

Joy Reed Kevin Parrott Tim Lanfear 《Concurrency and Computation》1996,8(10):799-812

We report on practical experience using the Oxford BSP Library to parallelize a large electromagnetic code, the British Aerospace finite-difference time-domain code EMMA T:FD3D. The Oxford BS Library is one of the first realizations of the Bulk Synchronous Parallel computational model to be targeted at numerically intensive scientific (typically Fortran) computing. The BAe EMMA code is one of the first large-scale applications to be parallelized using this library, and it is an important demonstration of the cost effectiveness of the BSP approach. We illustrate how BSP cost-modelling techniques can be used to predict and optimize performance for single-source programs across different parallel platforms. We provide predicted and observed performance figures for an industrial-strength, single-source parallel code for a variety of real parallel architectures: shared memory multiprocessors, workstation clusters and massively parallel platforms. 相似文献

18.

多路同步PCM遥测数据采集编码系统的设计和实现 总被引：2，自引：1，他引：1

孙发鱼纪立红曾蕾《测控技术》2001,20(2):22-23,26

多路同步数据采集编码系统可以较好地实现A/D采集的同步性和实时性,本文介绍了适用于PCM体制遥测的同步数据采集编码系统的实现方法,合理地解决了模拟量和数字量的同步采集问题。该系统可广泛应用于无线电遥测、远程数据采集等诸多数据测试系统中。相似文献

19.

支持并行模拟的Verilog编译技术研究与实现

李暾李思昆郭阳刘功杰《计算机工程与应用》2002,38(16):184-187

并行HDL模拟是加速大型复杂的VLSI系统模拟验证的有效方法，支持并行模拟的HDL编译技术是其中的关键技术，文章提出了一种支持并行模拟的Verilog编译技术，编译器将Verilog描述转换成C＋＋代码，最后与并行模拟核心库编译链接生成可执行并行程序。文章将编译器构成，代码生成方法和并行模拟核心库，该技术已经在并行Verilog模拟器ParaVer上实现。相似文献

20.

自动并行化中不规则循环的通信代码生成

傅立国姚远丁锐《计算机应用》2014,34(4):1014-1018

不规则计算在大规模并行应用中广泛存在。在面向分布存储结构的自动并行化过程中,较难在编译时为不规则循环生成并行代码。并行代码中的通信代码对程序运行结果的正确性以及加速效果有着严重的影响。通过分析程序的数组重分布图,使用部分冗余的通信方式来维持不规则数组访问的生产者消费者关系,可以在编译时为一类常见的不规则循环自动生成有效的通信代码。该方法使用计算分解和数组引用的访问表达式求解不规则数组在各处理器的本地定义集作为通信的数据集,分析针对此类不规则循环划分的通信策略,继而生成相应的通信代码。实验测试的结果取得了预期的加速效果,验证了方法的有效性。相似文献