首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
赵博  赵荣彩  徐金龙  高伟 《计算机科学》2015,42(1):50-53,58
为了充分发挥高性能计算机的计算能力,缓解程序员设计和编写并行程序的压力,扩充可用软件集合,设计并实现了利用交互界面深入挖掘程序中的可向量化语句,优化生成代码中的向量化语句,提高生成代码的执行效率.该方法对充分发挥高性能计算机的计算能力,增强系统可用性和扩展应用范围具有重要的意义,同时能够提供有效的辅助手段和工具支持.渐进式智能回溯向量化代码调优架构通过对用户提交的串行程序进行程序分析和变换,采用串行程序分析、数据依赖分析、向量化分析等技术手段,根据分析结果对程序进行变换和优化,自动生成最终的向量化代码.该方法通过分析串行程序中潜在的并行性,将其自动变换为等价的向量化代码形式,大大简化了程序员的工作.  相似文献   

2.
Data dependence concepts are reviewed, concentrating on and extending previous work on direction vectors. A bit vector representation of direction vectors is discussed. Various program restructuring transformations, such as loop circulation (a form of loop interchanging), reversal, skewing, sectioning (strip mining), combing, and rotation, are discussed in terms of their effects on the execution of the program, the required dependence tests for legality, and the effects of each transformation on the dependence graph. The bit vector representation of direction vectors is used to develop simple and efficient bit vector operations for the dependence tests and to generate the modified direction vector for each transformation. Finally, a simple method to interchange complex convex loop limits is given, which is useful when several loop restructuring operations are being applied in sequence.This work was supported by NSF Grant CCR-8906909 and DARPA Grant MDA972-88-J-1004.  相似文献   

3.
龚雪容  生拥宏  沈亚楠 《计算机应用》2006,26(10):2473-2475
着重论述了串行程序并行化过程中的数据收集部分代码的自动生成。提出利用等价类的方法获取数据的最后写关系,并建立包括计算划分、循环迭代和数据最后写关系的不等式限制系统,最后利用FME消元法对不等式限制系统进行消元处理,最终实现数据收集代码的自动生成。  相似文献   

4.
The ability to represent, manipulate, and optimize data placement and movement between processors in a distributed address space machine is crucial in allowing compilers to generate efficient code. Data placement is embodied in the concept of dataownership. Data movement can include not just the transfer of data values but the transfer of ownership as well. However, most existing compilers for distributed address space machines either represent these notions in a language-or machine-dependent manner, or represent data or ownership transfer implicitly. In this paper we describe XDP, a set of intermediate language extensions for representing and manipulating data and ownership transfers explicitly in a compller. XDP is supported by a set of per-processor structures that can be used to implement ownership testing and manipulation at run-time, XDP provides a uniform framework for translating and optimizing sequential, data parallel, and message-passing programs to a distributed address space machine. We describe analysis and optimization techniques for this explicit representation. Finally, we compare the intermediate languages of some current distributed address space compilers with XDP.  相似文献   

5.
沈亚楠  姚远  张平  赵荣彩  罗向阳 《计算机工程》2006,32(11):114-115,132
数据分解对消息传递并行机下的并行编译器取得高性能至关重要。根据编译器自动得出的数据分解(映射数据到处理机)信息,C语言版本的发送/接收消息循环嵌套可产生出来,从而在处理机之间实现分布数据。不仅一个已被证明且功能强大的数学模型用于产生数据分解代码,而且一个形式化的算法及其实现也已给出。初步实验结果显示该算法能显著提高性能。  相似文献   

6.
7.
GCC编译器是一种受广大研究者青睐的开源优化编译器,但它仅仅能够对完美嵌套循环进行依赖分析。为了更好地挖掘嵌套循环粗粒度的并行,深入研究了GCC5.1数据依赖分析过程,提出了一种能够处理分支嵌套循环的依赖测试方法。首先识别出分支嵌套循环,然后分析数组下标与分支嵌套循环外层索引变量的关系,最后计算出外层循环索引变量的距离向量,并通过检测距离向量判断循环是否存在依赖。实验结果表明,该方法能够正确、有效地分析出分支嵌套循环的依赖关系。  相似文献   

8.
9.
A fundamentally new approach to building a code with summation of on-data bits based on the selection and separate check of subsets of bits of the data vector is presented. The properties of the proposed code are analyzed in comparison with the classic and modified Berger codes. The advantages and disadvantages of new codes with summation of on-data bits are noted. The basic properties of the proposed codes with summation that should be taken into account in solving problems of technical diagnostics are established. The results of experimental applications of the developed codes to the organization of concurrent error detection systems of combinational benchmarks from LGSynth`89 are given.  相似文献   

10.
Many scientific codes can achieve significant performance improvement when executed on a computer equipped with a vector processor. Vector constructs in source code should be recognized by a vectorizing compiler or preprocessor. This paper discusses, from a general point of view, how a vectorizing compiler/preprocessor can be evaluated. The areas discussed include data dependence analysis, IF loop analysis, nested loops, loop interchanging, loop collapsing, indirect addressing, use of temporary storage, and order of arithmetic. The ideas presented are based on vectorization of over a million lines of production codes and an extensive test suite developed to evaluate preprocessors under varying degrees of code complexity. Areas for future research are also discussed.  相似文献   

11.
基于坐标映射的矢量地理数据全盲水印算法   总被引:3,自引:0,他引:3       下载免费PDF全文
通过对全盲水印特点的分析,提出了将文字按照字符编码方式直接转化为二进制位的水印信息生成方式。基于矢量地理数据的空间定位特性,研究了水印嵌入与提取中坐标映射机制和映射函数构造原则。利用提出的水印生成和坐标映射方法,建立了一种矢量地理空间数据全盲水印算法。实验结果表明,提出的算法能抵抗矢量数据中常见的数据压缩、增点、删点、编辑、裁剪以及平移等攻击,具有很好的鲁棒性。  相似文献   

12.
Reduction operations frequently appear in algorithms. Due to their mathematical invariance properties (assuming that round-off errorscan be tolerated), it is reasonable to ignore ordering constraints on the computation of reductions in order to take advantage of the computing power of parallel machines.One obvious and widely-used compilation approach for reductions is syntactic pattern recognition. Either the source language includes explicit reduction operators, or certain specific loops are recognized as equivalent to known reductions. Once such patterns are recognized, hand optimized code for the reductions are incorporated in the target program. The advantage of this approach is simplicity. However, it imposes restrictions on the reduction loops—no data dependence other than that caused by the reduction operation itself is allowed in the reduction loops.In this paper, we present a parallelizing technique, interleaving transformation, for distributed-memory parallel machines. This optimization exploits parallelism embodied in reduction loops through combination of data dependence analysis and region analysis. Data dependence analysis identifies the loop structures and the conditions that can trigger this optimization. Region analysis divides the iteration domain into a sequential region and an order-insensitive region. Parallelism is achieved by distributing the iterations in the order-insensitive region among multiple processors. We use a triangular solver as an example to illustrate the optimization. Experimental results on various distributed-memory parallel machines, including the Connection Machines CM-5, the nCUBE, the IBM SP-2, and a network of Sun Workstations are reported.  相似文献   

13.
Data dependence testing is the basic step in detecting loop level parallelism in numerical programs. The problem is undecidable in the general case. Therefore, work has been concentrated on a simplified problem, affine memory disambiguation. In this simpler domain, array references and loops bounds are assumed to be linear integer functions of loop variables. Dataflow information is ignored. For this domain, we have shown that in practice the problem can be solved accurately and efficiently.(1) This paper studies empirically the effectiveness of this domain restriction, how many real references are affine and flow insensitive. We use Larus's llpp system(2) to find all the data dependences dynamically. We compare these to the results given by our affine memory disambiguation system. This system is exact for all the cases we see in practice. We show that while the affine approximation is reasonable, memory disambiguation is not a sufficient approximation for data dependence analysis. We propose extensions to improve the analysis. This research was supported in part by a fellowship from AT & T Bell Laboratories and by DARPA contract N00014-87-K-0828.  相似文献   

14.
一种并发程序依赖性分析方法   总被引:12,自引:0,他引:12  
并发程序的依赖性分析是并发程序分析,理解,调试,测试和维护的重要手段,由于并发程序执行的不确定性,目前,尚有很多难点有待解决,针对Ada任务机制,首先提出了一种简洁,有效的并发程序表示方法-并发程序流图,然后讨论了由任务间同步引起的同步依赖和由访问共享变量引起的任务数据依赖,建立了并发程序依赖图,并在此基础上给出了一种有效的并发程序依赖性分析算法,得到一个比较精确的依赖性,较好地解决了并发程序依赖关系不可传递性问题。  相似文献   

15.
刘有耀  杨鹏程 《计算机应用》2016,36(9):2422-2426
针对当前大量遗产代码无法重复利用的问题,设计一种新的编译工具将C的串行代码转换为基于MPI+OpenMP的混合并行编程代码,降低了并行编程的开发成本。首先,通过对JavaCC的优化,实现一种可以解析C语言的词法和语法分析器,进行源代码分析并生成抽象语法树;其次,根据语法树对源代码进行控制依赖性和数据依赖性分析,产生可并行化的语句块分区;再次,按照提出的并行代码生成方法得到目标代码;最后,基于Visual Studio 2010构建目标代码仿真验证环境。实验结果表明,该工具可以较为理想地实现串行代码自动并行化,与手工编写的代码在加速比上的误差为8.2%~18.4%。  相似文献   

16.
17.
实用数据依赖分析方法   总被引:2,自引:1,他引:1  
数据依赖分析是检测程序循环级并行的基本步骤,基于数组下标对分类,本文提出了一个实用,有效的数据依赖分析方案。现有的依赖测试算法,都有循环正规化的假设,由于它存在某些弊端,我们抛弃这一假设,允许循环增量是任意整表达式,为此,本文对有关依赖的定义做了适当修改,并重新推导了某些重要结论,为处理循环增量为变量或表达式的情形,给出了弱形式下的GCD和Banerjee测试,该方案已在PORT中实现。  相似文献   

18.
何倩  孟祥武  陈俊亮  沈筱彦 《软件学报》2011,22(10):2263-2278
数据竞争问题是并发程序开发与测试难题,发现数据竞争可能导致计算重复,重复会导致系统性能下降.从实例出发定义了并发计算重复问题(concurrent computation redundancy problem,简称CCRP),给出了相关性能指标和判断方法,设计了通用并发重复控制机制.并发程序一般都可以基于生产者-消费者模型进行CCRP分析.以带数据源的生产者-消费者为例详细分析了CCRP,给出了单条件、条件交叉两种重复控制算法,算法具有不同的适用范围,都可以作为固定模式来解决CCRP,基于Petri网作了相关性质的证明与仿真.并发程序实验结果说明了并发重复控制的必要性和有效性,比较了两种算法的差异.该研究对于数据竞争检测、并发程序设计具有参考价值.  相似文献   

19.
模拟技术是进行计算机体系结构设计的重要方法。循环语句形成了SCMD的程序结构,使得少量源代码产生大量的Trace和超长的运行时间。本文从源程序的这一特征出发,构建基于循环缩减的Trace简化和模拟加速方法——Rasbora。Rasbora在程序源代码中添加指令,有选择地记录循环过程中的Trace内容,从而有效地简化Trace;并且在模拟过程中,识别循环体表现的相似性,用少量的循环体模拟近似代替所有循环的运行。经过测试表明,Rasbora方法可以有效地减少Trace量,缩减模拟时间,同时保证了一定精度的要求。  相似文献   

20.
刘金硕  黄朔  邓娟 《计算机工程》2022,48(12):16-23
当使用高分辨率的图像作为图像处理算法的输入时会降低算法运行速度,将算法并行化可提升执行效率,但手动将串行程序转换为并行程序则较为繁琐,并且现有自动并行翻译工具性能不稳定,同时翻译后的程序是单一并行模式。面向基于面片的三维多视角立体视觉(PMVS)算法,提出一种从C到CUDA的自动两级并行翻译方法。使用ANTLR自动解析源C代码,通过分析数据依赖关系和循环数组私有化来识别可并行化的循环结构,将算法翻译成CPU多线程和GPU两级并行结构的代码。在算法执行过程中,将输入图像在CPU和GPU上分别进行处理,降低了算法总执行时间。实验结果表明,该方法的计算加速比随着输入图像分辨率的增加逐渐提高,最高约达到32,相比于PPCG和OpenACC自动并行翻译方法提升明显。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号