期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

朱正东刘袁魏洪昌颜康王寅峰董小社《计算机工程与应用》2015,51(21):41-47

针对GPU上应用开发移植困难的问题,提出了一种串行计算源程序到并行计算源程序的映射方法。该方法从串行源程序中获得可并行化循环的层次信息,建立循环体结构与GPU线程的对应关系,生成GPU端核心函数代码;根据变量引用读写属性生成CPU端控制代码。基于该方法实现了一个编译原型系统,完成了C语言源程序到CUDA源程序的自动生成。对原型系统在功能和性能方面的测试结果表明,该系统生成的CUDA源程序与C语言源程序在功能上一致,其性能有显著提高,在一定程度上解决了计算密集型应用向CPU-GPU异构多核系统移植困难的问题。相似文献

2.

面向语言的计算机体系结构

徐研人《计算机工程》1982,(3)

引言目前,大多数计算机用户均采用高级语言来编制程序,用户源程序经编译后得到用机器指令所组成的程序称为目标程序,通常机器执行的是目标程序,与汇编语言书写的程序相比,机器编译出来的目标程序不仅程序长,而且机器执行时所化费的时间长,虽然采用了各种语言优化等手段,但是要接近汇编程序还是比较困难的。其原因是由于编译程序的编制通常是在硬件结构已经确定好以后进行的,因而受到较大的限制,因此较好的办法是硬件结构的设计就要考虑如何加速目标程序的执行。本文介绍一台以科学计算为主。适合 FORTRAN等高级语言的目标程序执行的大型计算机的体系结构设计。在文章的前一部分选取了在通用计算机上进行计算的各种典型题目的程序,对它们的核心部分——循环程序进行数据和程序相似文献

3.

C语言中文件包含命令的使用

孙益欣《计算机》1997,(27)

在C语言的源程序中,除去完成程序功能所需要的说明性语句和执行性语句之外,还可以使用另一类语句,这类语句的作用不是实现程序的功能,而是发布给编译系统的信息,它们告诉编译系统,在以源程序进行编译之前应该做些什么,所以称这类语句为编译预处理语句。C语言的编译预处理语句以#开头,共有三种,它们分别是:文件包含、宏定义和条件编译。C语言的编译预处理功能为程序调试和程序移植等提供了便利,正确地使用编译预处理功能可以有效地提高程序开发效率。但初学者对C语言中的编译预处理相似文献

4.

二维器件模拟软件PISCES的程序转换

黄雷向采兰余志平《计算机应用与软件》2003,20(9):37-39

为了研究半导体器件特性，对器件特性模拟软件PISCES需要不断进行改进。由于PISCES的源程序是用Fortran语言写的，而且前程序设计普遍使用C语言，因此我们将其源程序由Fortran语言转化成了C语言。整个程序编译联接成功，全面功能运行正确。本文结合实际的转化过程，阐述了由Fortran语言到C语言的源程序转化方法。相似文献

5.

源程序正确性的检查方法

金曙白《计算机工程》1983,(2)

§1 引言众所周知,一个具有实际应用价值的编译程序,除具备把源程序翻译为目标程序的基本功能外,还应能对用户编制的源程序进行检查并指出错误,以保证被编译的源程序的正确性,本文把曾实际运用过的一些关于检查的原则及处理方法作一系统的说明。文中以ALGOL60语言作为源语言,所论及的方法也可用于其它程序设计语言。相似文献

6.

面向异构多核架构的自适应编译框架

《计算机学报》2014,(7)

针对应用在移植到异构多核高性能计算机系统中所面临的可移植性差以及性能优化难度大的问题,文中提出一种面向异构多核架构的自适应编译框架.通过源到源编译解决传统并行编程模型应用向异构多核架构的映射问题;同时利用动态剖分信息,自适应地调整插桩并配置优化策略,形成迭代式的自动优化过程.文中自适应编译框架将软硬件映射机制与优化策略结合,有效地解决了同构并行应用向异构多核架构的移植问题并提高了应用的整体性能.实验结果表明,文中基于Cell架构实现的原型系统,很好地解决了异构多核架构下应用移植性等问题,同时应用性能有所提高. 相似文献

7.

基于扩展逻辑变换系统_μTS证明循环优化正确性

王昌晶《计算机研究与发展》2012,49(9):1863-1873

循环优化对于提高Cache性能、发掘程序的并行性以及减少执行循环的开销都有着重要的作用,证明带循环优化功能的现代编译器的正确性已成为可信编译的一个挑战性的问题.形式化证明一个羽翼丰满的优化编译器本质上是不可行的,可以使用替代的方法,即不是证明优化编译器本身,而是形式化证明每一次循环变换前后编译对象的正确性.提出一种新颖的基于扩展逻辑变换系统μTS来证明循环优化正确性的方法.系统μTS在逻辑变换系统TS的基础上扩展了若干条派生规则,经谓词抽象将源程序与目标程序转换为形式化Radl语言后,使用μTS的派生规则能证明常见循环变换的正确性,如循环融合、循环分配、循环交换、循环反转、循环分裂、循环脱皮、循环调整、循环展开、循环铺盖、循环判断外提、循环不变代码外提等.循环优化可以看作一系列循环变换的组合,从而系统μTS能证明循环优化的正确性.为了支持自动化证明循环优化的正确性并出示证据,进一步提出了一个辅助证明算法.最后通过一个典型实例对这一方法进行了详细的阐述,实际效果表明了该方法的有效性.该方法对设计高可信优化编译器具有重要的指导意义. 相似文献

8.

PIC单片机C语言程序设计(9)

丁锦滔《电子制作．电脑维护与应用》2010,(6):54-56

<正>(接上期)3.pic07.C源程序的编译在《PIC单片机C语言程序(8)》一文中,我们已在MPLAB IDE7.40集成开发环境中编辑了pic07.C源程序(0~99秒脉冲发生器)。对于PIC单片机来说,所有的C语言源程序,都要在进行编译并生成目标码.hex文件后,方能烧写到PIC单片机中运行,即每编辑一个C程序都要进行一次编译,所以对C程序(pic0.7c)的编译操作是十分重要的。相似文献

9.

基于图元装接模式由程序流程图自动生成源代码

《软件工程师》2016,(11):4-10

针对程序流程图到代码自动翻译过程中,通常需要经过代码优化及二次编程,翻译效率不高,提出基于图元装接模式的流程图到C语言源程序转换的解决方案,实现流程图到源程序的双向转换。由图元同步产生代码元,通过装接自动生成程序代码。所生成的程序代码无需修改,可直接编译运行。通过词法分析,识别并产生代码元和对应图元,可逆向生成流程图。其有效性在原型系统中得到了验证。相似文献

10.

基于OpenMP/Fortran的源到源转换事务存储编程环境

黄春贾建斌彭林《计算机科学》2011,38(4):299-302

首次在Fortran语言中引入事务存储,对OpenMP Fortran API进行了扩展,以源到源转换的方式实现了FortranTM编译器原型。针对软件事务存储实现的特点,扩展了EXCLUDED和SCHEDULE指导命令子句,以便为程序员提供性能调整优化APIo测试结果表明FortranTM API编程便利,具有良好的性能。相似文献

11.

一种基于关键路径分析的CPU-GPU异构系统综合能耗优化方法 总被引：1，自引：0，他引：1

林一松杨学军唐滔王桂彬徐新海《计算机学报》2012,35(1):123-133

GPU强大的计算性能使得CPU-GPU异构体系结构成为高性能计算领域热点研究方向.虽然GPU的性能/功耗比较高,但在构建大规模计算系统时,功耗问题仍然是限制系统运行的关键因素之一.现在已有的针对GPU的功耗优化研究主要关注如何降低GPU本身的功耗,而没有将CPU和GPU作为一个整体进行综合考虑.文中深入分析了CUDA程序在CPU-GPU异构系统上的运行特点,归纳其中的任务依赖关系,给出了使用AOV网表示程序执行过程的方法,并在此基础上分析程序运行的关键路径,找出程序中可以进行能耗优化的部分,并求解相应的频率调节幅度,在保持程序性能不变的前提下最小化程序的整体能量消耗. 相似文献

12.

GCC中基于值剖视的代码特例化

黄春孔凡金《计算机与数字工程》2012,40(2):121-123,147

基于剖视的优化技术根据程序先前运行时收集的剖视信息来指导编译优化。文章给出了一种在GCC中基于值剖视的代码特例化实现方法。NPB和SPEC CPU2000基准程序测试结果表明,该代码特例化方法能够有效提高程序性能,同时引入的开销较小。相似文献

13.

A CUDA implementation of the Continuous Space Language Model

Elizabeth A. Thompson Timothy R. Anderson 《The Journal of supercomputing》2014,68(1):65-86

The training phase of the Continuous Space Language Model (CSLM) was implemented in the NVIDIA hardware/software architecture Compute Unified Device Architecture (CUDA). A detailed explanation of the CSLM algorithm is provided. Implementation was accomplished using a combination of CUBLAS library routines, NVIDIA NPP functions, and CUDA kernel calls on three different CUDA enabled devices of varying compute capability and a time savings over the traditional CPU approach demonstrated. The efficiency of the CUDA version of the open source implementation is analyzed and compared to that using the Intel Math Kernel Libraries (MKL) on a variety of CUDA enabled and multi-core CPU platforms. It is demonstrated that substantial performance benefit can be obtained using CUDA, even with nonoptimal code. Techniques for optimizing performance are then provided. Furthermore, an analysis is performed to determine the conditions in which the performance of CUDA exceeds that of the multi-core MKL realization. 相似文献

14.

Swan: A tool for porting CUDA programs to OpenCL 总被引：1，自引：0，他引：1

M.J. Harvey G. De Fabritiis 《Computer Physics Communications》2011,(4):1093-1099

The use of modern, high-performance graphical processing units (GPUs) for acceleration of scientific computation has been widely reported. The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by NVIDIA. An industry standardisation effort has recently produced the OpenCL specification for GPU programming. This offers the benefits of hardware-independence and reduced dependence on proprietary tool-chains. Here we describe a source-to-source translation tool, “Swan” for facilitating the conversion of an existing CUDA code to use the OpenCL model, as a means to aid programmers experienced with CUDA in evaluating OpenCL and alternative hardware. While the performance of equivalent OpenCL and CUDA code on fixed hardware should be comparable, we find that a real-world CUDA application ported to OpenCL exhibits an overall 50% increase in runtime, a reduction in performance attributable to the immaturity of contemporary compilers. The ported application is shown to have platform independence, running on both NVIDIA and AMD GPUs without modification. We conclude that OpenCL is a viable platform for developing portable GPU applications but that the more mature CUDA tools continue to provide best performance.

Program summary

Program title: SwanCatalogue identifier: AEIH_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIH_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: GNU Public License version 2No. of lines in distributed program, including test data, etc.: 17 736No. of bytes in distributed program, including test data, etc.: 131 177Distribution format: tar.gzProgramming language: CComputer: PCOperating system: LinuxRAM: 256 MbytesClassification: 6.5External routines: NVIDIA CUDA, OpenCLNature of problem: Graphical Processing Units (GPUs) from NVIDIA are preferentially programed with the proprietary CUDA programming toolkit. An alternative programming model promoted as an industry standard, OpenCL, provides similar capabilities to CUDA and is also supported on non-NVIDIA hardware (including multicore ×86 CPUs, AMD GPUs and IBM Cell processors). The adaptation of a program from CUDA to OpenCL is relatively straightforward but laborious. The Swan tool facilitates this conversion.Solution method:Swan performs a translation of CUDA kernel source code into an OpenCL equivalent. It also generates the C source code for entry point functions, simplifying kernel invocation from the host program. A concise host-side API abstracts the CUDA and OpenCL APIs. A program adapted to use Swan has no dependency on the CUDA compiler for the host-side program. The converted program may be built for either CUDA or OpenCL, with the selection made at compile time.Restrictions: No support for CUDA C++ featuresRunning time: Nominal 相似文献

15.

可移动代码的一种优化技术

王明文孙永强《计算机科学》2000,27(1):18-20

相似文献

16.

Experiments with Program unification on the Cray Y-MP

Ling-Yu Chuang Vernon Rego Aditya Mathur 《Concurrency and Computation》1994,6(1):33-53

Program unification is a technique for source-to-source transformation of code for enhanced execution performance on vector and SIMD architectures. This work focuses on simple examples of program unification to explain the methodology and demonstrate its promise as a practical technique for improved performance. Using simple examples to explain how unification is done, we outline two experiments in the simulation domain that benefit from unification, namely Monte Carlo and discrete-event simulation. Empirical tests of unified code on a Cray Y-MP multiprocessor show that unification improves execution performance by a factor of roughly 8 for given application. The technique is general in that it can be applied to computation-intensive programs in various data-parallel application domains. 相似文献

17.

An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations

Xipeng Shen Yixun Liu Eddy Z. Zhang Poornima Bhamidipati 《International journal of parallel programming》2013,41(6):855-869

Graphic processing units (GPU) have become increasingly adopted for the enhancement of computing throughput. However, the development of a high-quality GPU application is challenging, due to the large optimization space and complex unpredictable effects of optimizations on GPU program performance. Many recent efforts have been employing empirical search-based auto-tuners to tackle the problem, but few of them have concentrated on the influence of program inputs on the optimizations. In this paper, based on a set of CUDA and OpenCL kernels, we report some evidences on the importance for auto-tuners to adapt to program input changes, and present a framework, G-ADAPT+, to address the influence by constructing cross-input predictive models for automatically predicting the (near-)optimal configurations for an arbitrary input to a GPU program. G-ADAPT+ is based on source-to-source compilers, specifically, Cetus and ROSE. It supports the optimizations of both CUDA and OpenCL programs. 相似文献

18.

SafeGPU: Contract- and library-based GPGPU for object-oriented languages

《Computer Languages, Systems and Structures》2017

Using GPUs as general-purpose processors has revolutionized parallel computing by providing, for a large and growing set of algorithms, massive data-parallelization on desktop machines. An obstacle to their widespread adoption, however, is the difficulty of programming them and the low-level control of the hardware required to achieve good performance. This paper proposes a programming approach, SafeGPU, that aims to make GPU data-parallel operations accessible through high-level libraries for object-oriented languages, while maintaining the performance benefits of lower-level code. The approach provides data-parallel operations for collections that can be chained and combined to express compound computations, with data synchronization and device management all handled automatically. It also integrates the design-by-contract methodology, which increases confidence in functional program correctness by embedding executable specifications into the program text. We present a prototype of SafeGPU for Eiffel, and show that it leads to modular and concise code that is accessible for GPGPU non-experts, while still providing performance comparable with that of hand-written CUDA code. We also describe our first steps towards porting it to C#, highlighting some challenges, solutions, and insights for implementing the approach in different managed languages. Finally, we show that runtime contract-checking becomes feasible in SafeGPU, as the contracts can be executed on the GPU. 相似文献

19.

CU++: an object oriented framework for computational fluid dynamics applications using graphics processing units

Dominic D. J. Chandar Jayanarayanan Sitaraman Dimitri Mavriplis 《The Journal of supercomputing》2014,67(1):47-68

The application of graphics processing units (GPU) to solve partial differential equations is gaining popularity with the advent of improved computer hardware. Various lower level interfaces exist that allow the user to access GPU specific functions. One such interface is NVIDIA’s Compute Unified Device Architecture (CUDA) library. However, porting existing codes to run on the GPU requires the user to write kernels that execute on multiple cores, in the form of Single Instruction Multiple Data (SIMD). In the present work, a higher level framework, termed CU++, has been developed that uses object oriented programming techniques available in C++ such as polymorphism, operator overloading, and template meta programming. Using this approach, CUDA kernels can be generated automatically during compile time. Briefly, CU++ allows a code developer with just C/C++ knowledge to write computer programs that will execute on the GPU without any knowledge of specific programming techniques in CUDA. This approach is tremendously beneficial for Computational Fluid Dynamics (CFD) code development because it mitigates the necessity of creating hundreds of GPU kernels for various purposes. In its current form, CU++ provides a framework for parallel array arithmetic, simplified data structures to interface with the GPU, and smart array indexing. An implementation of heterogeneous parallelism, i.e., utilizing multiple GPUs to simultaneously process a partitioned grid system with communication at the interfaces using Message Passing Interface (MPI) has been developed and tested. 相似文献