共查询到20条相似文献,搜索用时 46 毫秒
1.
为了充分发挥高性能计算机的计算能力,缓解程序员设计和编写并行程序的压力,扩充可用软件集合,设计并实现了利用交互界面深入挖掘程序中的可向量化语句,优化生成代码中的向量化语句,提高生成代码的执行效率.该方法对充分发挥高性能计算机的计算能力,增强系统可用性和扩展应用范围具有重要的意义,同时能够提供有效的辅助手段和工具支持.渐进式智能回溯向量化代码调优架构通过对用户提交的串行程序进行程序分析和变换,采用串行程序分析、数据依赖分析、向量化分析等技术手段,根据分析结果对程序进行变换和优化,自动生成最终的向量化代码.该方法通过分析串行程序中潜在的并行性,将其自动变换为等价的向量化代码形式,大大简化了程序员的工作. 相似文献
2.
Michael Wolfe 《The Journal of supercomputing》1991,4(4):321-344
Data dependence concepts are reviewed, concentrating on and extending previous work on direction vectors. A bit vector representation of direction vectors is discussed. Various program restructuring transformations, such as loop circulation (a form of loop interchanging), reversal, skewing, sectioning (strip mining), combing, and rotation, are discussed in terms of their effects on the execution of the program, the required dependence tests for legality, and the effects of each transformation on the dependence graph. The bit vector representation of direction vectors is used to develop simple and efficient bit vector operations for the dependence tests and to generate the modified direction vector for each transformation. Finally, a simple method to interchange complex convex loop limits is given, which is useful when several loop restructuring operations are being applied in sequence.This work was supported by NSF Grant CCR-8906909 and DARPA Grant MDA972-88-J-1004. 相似文献
3.
4.
Larry Carter Jeanne Ferrante Vasanth Bala 《International journal of parallel programming》1994,22(5):485-518
The ability to represent, manipulate, and optimize data placement and movement between processors in a distributed address
space machine is crucial in allowing compilers to generate efficient code. Data placement is embodied in the concept of dataownership. Data movement can include not just the transfer of data values but the transfer of ownership as well. However, most existing
compilers for distributed address space machines either represent these notions in a language-or machine-dependent manner,
or represent data or ownership transfer implicitly. In this paper we describe XDP, a set of intermediate language extensions
for representing and manipulating data and ownership transfers explicitly in a compller. XDP is supported by a set of per-processor
structures that can be used to implement ownership testing and manipulation at run-time, XDP provides a uniform framework
for translating and optimizing sequential, data parallel, and message-passing programs to a distributed address space machine.
We describe analysis and optimization techniques for this explicit representation. Finally, we compare the intermediate languages
of some current distributed address space compilers with XDP. 相似文献
5.
6.
7.
8.
9.
D. V. Efanov V. V. Sapozhnikov Vl. V. Sapozhnikov 《Automatic Control and Computer Sciences》2018,52(1):1-12
A fundamentally new approach to building a code with summation of on-data bits based on the selection and separate check of subsets of bits of the data vector is presented. The properties of the proposed code are analyzed in comparison with the classic and modified Berger codes. The advantages and disadvantages of new codes with summation of on-data bits are noted. The basic properties of the proposed codes with summation that should be taken into account in solving problems of technical diagnostics are established. The results of experimental applications of the developed codes to the organization of concurrent error detection systems of combinational benchmarks from LGSynth`89 are given. 相似文献
10.
Many scientific codes can achieve significant performance improvement when executed on a computer equipped with a vector processor. Vector constructs in source code should be recognized by a vectorizing compiler or preprocessor. This paper discusses, from a general point of view, how a vectorizing compiler/preprocessor can be evaluated. The areas discussed include data dependence analysis, IF loop analysis, nested loops, loop interchanging, loop collapsing, indirect addressing, use of temporary storage, and order of arithmetic. The ideas presented are based on vectorization of over a million lines of production codes and an extensive test suite developed to evaluate preprocessors under varying degrees of code complexity. Areas for future research are also discussed. 相似文献
11.
通过对全盲水印特点的分析,提出了将文字按照字符编码方式直接转化为二进制位的水印信息生成方式。基于矢量地理数据的空间定位特性,研究了水印嵌入与提取中坐标映射机制和映射函数构造原则。利用提出的水印生成和坐标映射方法,建立了一种矢量地理空间数据全盲水印算法。实验结果表明,提出的算法能抵抗矢量数据中常见的数据压缩、增点、删点、编辑、裁剪以及平移等攻击,具有很好的鲁棒性。 相似文献
12.
An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines
Reduction operations frequently appear in algorithms. Due to their mathematical invariance properties (assuming that round-off errorscan be tolerated), it is reasonable to ignore ordering constraints on the computation of reductions in order to take advantage of the computing power of parallel machines.One obvious and widely-used compilation approach for reductions is syntactic pattern recognition. Either the source language includes explicit reduction operators, or certain specific loops are recognized as equivalent to known reductions. Once such patterns are recognized, hand optimized code for the reductions are incorporated in the target program. The advantage of this approach is simplicity. However, it imposes restrictions on the reduction loops—no data dependence other than that caused by the reduction operation itself is allowed in the reduction loops.In this paper, we present a parallelizing technique, interleaving transformation, for distributed-memory parallel machines. This optimization exploits parallelism embodied in reduction loops through combination of data dependence analysis and region analysis. Data dependence analysis identifies the loop structures and the conditions that can trigger this optimization. Region analysis divides the iteration domain into a sequential region and an order-insensitive region. Parallelism is achieved by distributing the iterations in the order-insensitive region among multiple processors. We use a triangular solver as an example to illustrate the optimization. Experimental results on various distributed-memory parallel machines, including the Connection Machines CM-5, the nCUBE, the IBM SP-2, and a network of Sun Workstations are reported. 相似文献
13.
Dror E. Maydan John L. Hennessy Monica S. Lam 《International journal of parallel programming》1995,23(1):63-81
Data dependence testing is the basic step in detecting loop level parallelism in numerical programs. The problem is undecidable
in the general case. Therefore, work has been concentrated on a simplified problem, affine memory disambiguation. In this
simpler domain, array references and loops bounds are assumed to be linear integer functions of loop variables. Dataflow information
is ignored. For this domain, we have shown that in practice the problem can be solved accurately and efficiently.(1) This paper studies empirically the effectiveness of this domain restriction, how many real references are affine and flow
insensitive. We use Larus's llpp system(2) to find all the data dependences dynamically. We compare these to the results given by our affine memory disambiguation system.
This system is exact for all the cases we see in practice. We show that while the affine approximation is reasonable, memory
disambiguation is not a sufficient approximation for data dependence analysis. We propose extensions to improve the analysis.
This research was supported in part by a fellowship from AT & T Bell Laboratories and by DARPA contract N00014-87-K-0828. 相似文献
14.
一种并发程序依赖性分析方法 总被引:12,自引:0,他引:12
并发程序的依赖性分析是并发程序分析,理解,调试,测试和维护的重要手段,由于并发程序执行的不确定性,目前,尚有很多难点有待解决,针对Ada任务机制,首先提出了一种简洁,有效的并发程序表示方法-并发程序流图,然后讨论了由任务间同步引起的同步依赖和由访问共享变量引起的任务数据依赖,建立了并发程序依赖图,并在此基础上给出了一种有效的并发程序依赖性分析算法,得到一个比较精确的依赖性,较好地解决了并发程序依赖关系不可传递性问题。 相似文献
15.
针对当前大量遗产代码无法重复利用的问题,设计一种新的编译工具将C的串行代码转换为基于MPI+OpenMP的混合并行编程代码,降低了并行编程的开发成本。首先,通过对JavaCC的优化,实现一种可以解析C语言的词法和语法分析器,进行源代码分析并生成抽象语法树;其次,根据语法树对源代码进行控制依赖性和数据依赖性分析,产生可并行化的语句块分区;再次,按照提出的并行代码生成方法得到目标代码;最后,基于Visual Studio 2010构建目标代码仿真验证环境。实验结果表明,该工具可以较为理想地实现串行代码自动并行化,与手工编写的代码在加速比上的误差为8.2%~18.4%。 相似文献
16.
17.
实用数据依赖分析方法 总被引:2,自引:1,他引:1
数据依赖分析是检测程序循环级并行的基本步骤,基于数组下标对分类,本文提出了一个实用,有效的数据依赖分析方案。现有的依赖测试算法,都有循环正规化的假设,由于它存在某些弊端,我们抛弃这一假设,允许循环增量是任意整表达式,为此,本文对有关依赖的定义做了适当修改,并重新推导了某些重要结论,为处理循环增量为变量或表达式的情形,给出了弱形式下的GCD和Banerjee测试,该方案已在PORT中实现。 相似文献
18.
数据竞争问题是并发程序开发与测试难题,发现数据竞争可能导致计算重复,重复会导致系统性能下降.从实例出发定义了并发计算重复问题(concurrent computation redundancy problem,简称CCRP),给出了相关性能指标和判断方法,设计了通用并发重复控制机制.并发程序一般都可以基于生产者-消费者模型进行CCRP分析.以带数据源的生产者-消费者为例详细分析了CCRP,给出了单条件、条件交叉两种重复控制算法,算法具有不同的适用范围,都可以作为固定模式来解决CCRP,基于Petri网作了相关性质的证明与仿真.并发程序实验结果说明了并发重复控制的必要性和有效性,比较了两种算法的差异.该研究对于数据竞争检测、并发程序设计具有参考价值. 相似文献
19.
模拟技术是进行计算机体系结构设计的重要方法。循环语句形成了SCMD的程序结构,使得少量源代码产生大量的Trace和超长的运行时间。本文从源程序的这一特征出发,构建基于循环缩减的Trace简化和模拟加速方法——Rasbora。Rasbora在程序源代码中添加指令,有选择地记录循环过程中的Trace内容,从而有效地简化Trace;并且在模拟过程中,识别循环体表现的相似性,用少量的循环体模拟近似代替所有循环的运行。经过测试表明,Rasbora方法可以有效地减少Trace量,缩减模拟时间,同时保证了一定精度的要求。 相似文献
20.
当使用高分辨率的图像作为图像处理算法的输入时会降低算法运行速度,将算法并行化可提升执行效率,但手动将串行程序转换为并行程序则较为繁琐,并且现有自动并行翻译工具性能不稳定,同时翻译后的程序是单一并行模式。面向基于面片的三维多视角立体视觉(PMVS)算法,提出一种从C到CUDA的自动两级并行翻译方法。使用ANTLR自动解析源C代码,通过分析数据依赖关系和循环数组私有化来识别可并行化的循环结构,将算法翻译成CPU多线程和GPU两级并行结构的代码。在算法执行过程中,将输入图像在CPU和GPU上分别进行处理,降低了算法总执行时间。实验结果表明,该方法的计算加速比随着输入图像分辨率的增加逐渐提高,最高约达到32,相比于PPCG和OpenACC自动并行翻译方法提升明显。 相似文献