期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

丁强臧斌宇朱传琪《计算机工程与应用》2005,41(27):62-65,183

数据划分是分布主存系统中并行编译的关键技术,它以数组和包含这些数组的嵌套循环为研究对象,以提高数据局部性和挖掘计算并行性为根本目的。传统数据划分模式不适合指向数组的指针数组的数据划分,论文提出了解决该类指针数组数据划分的划分模式,文中称为数组向量的数据划分。分析其数据引用的特性,通过选取代表元,给出数据划分的策略,弥补了现有数据划分研究的不足。相似文献

2.

基于线性变换的计算与数据动态分解方法

下载免费PDF全文

韩林赵荣彩庞建民《计算机工程》2008,34(15):4-6

在并行优化编译器的并行识别过程中,许多串行代码无法找到全局一致的分解结果,数据重分布无可避免,有必要寻找一种有效的方法求解计算和数据的动态分解。该文研究了单个嵌套循环计算与数据分解算法以及分解结果表示方法,提出一种在多个嵌套循环间求解数据线性一致分布的动态分解算法,结合程序的结构分析和程序的控制流信息,用于通用串行代码的并行分解过程,可以同时给出串行代码的计算划分和数据分布结果。相似文献

3.

HPF计算划分的算法实现

下载免费PDF全文

仲跻冬李晓明《计算机工程与科学》1997,19(2):55-58

ＨＰＦ（ＨｉｇｈＰｅｒｆｏｒｍａｎｃｅＦｏｒｔｒａｎ）是基于数据划分说明的并行语言。如何由数据划分确定程序的计算划分是ＨＰＦ编译器需要首先解决的基本问题。本文介绍了ＨＰＦ的数据划分和计算划分的概念。以三层嵌套循环为例，直观地提出了一种求得计算划分的算法相似文献

4.

基于嵌套循环分类的并行识别技术

赵捷赵荣彩丁锐黄品丰《软件学报》2012,23(10):2695-2704

传统的分布存储并行编译系统大多是在共享存储并行编译系统的基础上开发的.共享存储并行编译系统的并行识别技术适合OpenMP代码生成,实现方式是将所有嵌套循环都按照相同的识别方法进行处理,用于分布存储并行编译系统必然会导致无法高效发掘程序的并行性.分布存储并行编译系统应根据嵌套循环结构的特点进行分类处理,提出适合MPI代码生成的并行识别技术.为解决上述问题,根据嵌套循环的结构和MPI并行程序的特点,提出了一种新的嵌套循环分类方法,并针对不同的嵌套循环分别提出了相应的并行识别技术.实验结果表明,与采用传统并行识别技术的分布存储并行编译系统相比,按照所提方法对嵌套循环进行分类,采用相应并行识别技术的编译系统能够更高效地识别基准程序中的并行循环,自动生成的MPI并行代码其性能加速比提高了20%以上. 相似文献

5.

一种面向分布主存多处理机的有效数据分布方法 总被引：1，自引：0，他引：1

下载免费PDF全文

夏军杨学军《计算机工程与科学》2005,27(10):73-76

本文针对分布主存多处理机中的数据分布问题，在程序已经过并行性分析的基础之上，提出了一种基于数据变换技术的有效数据分布方法。该方法能对多个嵌套循环中具有一般仿射数组下标的任意维数组进行有效的数据分布，并且该方法还考虑了偏移常量的对准问题，从而能使得数据通信量尽量小。实验结果表明了该方法的有效性。相似文献

6.

增量式目标信息系统的分布约简算法

《计算机应用与软件》2015,(8)

知识约简是粗糙集理论中的重要研究内容之一。由于静态分布约简算法不适应处理快速增长的信息系统,时间开销过大。通过研究新增对象对于信息系统中划分的影响,以划分为桥梁,讨论对象与分布约简的理论关系。基于已有的分布约简算法,提出增量式目标信息系统的分布约简算法,使其能够将原有知识和新增数据相结合,快速获取分布约简。使用UCI数据库中的数据集对算法进行测试,验证了该增量式算法的有效性和可行性。相似文献

7.

基于LLVM Pass的复杂嵌套循环自动并行化框架

马春燕吕炳旭叶许姣张雨《软件学报》2023,34(7):3022-3042

随着多核处理器的普及应用,针对嵌入式遗留系统中串行代码的自动并行化方法是研究热点.其中,针对具有非完美嵌套结构、非仿射依赖关系特征的复杂嵌套循环的自动并行化方法存在技术挑战.提出了一种基于LLVMPass的复杂嵌套循环的自动并行化框架(CNLPF).首先,提出了一种复杂嵌套循环的表示模型,即循环结构树,并将嵌套循环的正则区域自动转换为循环结构树表示;然后,对循环结构树进行数据依赖分析,构建循环内和循环间的依赖关系;最后,基于OpenMP共享内存的编程模型生成并行的循环程序.针对SPEC2006数据集中包含近500个复杂嵌套循环的6个程序案例,分别对其进行复杂嵌套循环占比统计和并行性能加速测试.结果表明,提出的自动并行化框架可以处理LLVMPolly无法优化的复杂嵌套循环,增强了LLVM的并行编译优化能力,且该方法结合Polly的组合优化,比单独采用Polly优化的加速效果提升了9%-43%. 相似文献

8.

地震资料分布数据管理平台研究与实现 总被引：1，自引：0，他引：1

钟敏王海霞石进仝兆岐《计算机应用与软件》2013,(2):248-252

利用数据网格技术实现海量地震资料并行处理需要解决动态、自治、异构的高性能计算系统之间的分布数据管理问题。根据石油领域资源现状,将该领域高性能资源划分为不同的虚拟社区,并建立分布数据管理体系结构,详细描述元数据模型与副本定位算法等关键技术。实现并部署了基于Web portal的地震资料数据管理平台,系统运行稳定、可操作性强、可扩展性好。相似文献

9.

分布异构数据库信息集成的设计与实现

宫彦婷《福建电脑》2009,25(5):155-156

本文对异构分布数据库集成方法、EAI的集成模式及数据集成中间件技术进行研究,提出了一种EAI扩展分布异构数据集成模式,详细介绍了这种扩展数据集成模式下数据库集成中间件的设计与实现方法。应用结果表明,这种分布异构数据库集成中间件对信息有较高的集成度,可扩展性强,有较强的稳定性和实时性,有一定的推广价值。相似文献

10.

应用编译技术优化核外计算程序

李淼张建张红艳许桂艳徐大庆胡泽林袁媛《计算机应用》2007,27(5):1241-1244

阐述了一种适用于核外计算程序的变换技术，它通过联合使用循环变换和数据变换这两种编译优化技术来增强程序的局部性，提高数据存取效率。该方法不仅能优化单独一个嵌套循环，还能同时处理多个嵌套循环。实验结果表明了该方法能显著提高核外计算的性能。相似文献

11.

A Partitioning-Independent Paradigm for Nested Data Parallelism

Dean Engelhardt Andrew Wendelborn 《International journal of parallel programming》1996,24(4):291-317

A generalization of the data parallel model has been proposed by Blelloch which permits the nesting of data parallel operators to specify parallel computation across nested and irregular data structures. In this paper we consider the costs of supporting the general model of nested data parallelism, analyzing the requirements such a model places upon an underlying model of execution. We propose a new multi-node execution model which meets the needs of the paradigm and is additionally generic in the partitioning of data aggregates within the system. The basis for our execution model is an abstract machine based upon elementary notions of nodal multi-threading. We demonstrate the utility of our proposal by providing a full definition for a simple nestable one-dimensional data parallel operator. We discuss the applicability of our design to existing multi-processor machines, illustrating performance statistics gathered from a prototype system we have constructed on the Thinking Machines CM-5. 相似文献

12.

Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers 总被引：3，自引：3，他引：0

Shih Kuei-Ping Sheu Jang-Ping Huang Chua-Huang 《The Journal of supercomputing》2000,15(3):243-269

This paper addresses the problem of communication-free partition of iteration spaces and data spaces along hyperplanes. To finding more possible communication-free hyperplane partitions, we treat statements within a loop body as separate schedulable units. Instead of using the information about data dependence distance or direction vectors, our technique explicitly formulates array references as transformations from statement-iteration spaces to data spaces. Based on these transformations, the necessary and sufficient conditions for communication-free partition along hyperplanes to be feasible have been proposed. This approach can be applied to all programs with an imperfectly nested loop or sequences of imperfectly nested loops, whose array references are affine functions of outer loop indices or loop invariant variables. The proposed approach is more practical than existing methods in finding the data and computation distribution patterns that can cause the processor to execute fully-parallel on multicomputers without any interprocessor communication. 相似文献

13.

一种基于代表元的划分算法

张为华王鹏臧斌宇朱传琪《计算机学报》2008,31(3):400-410

划分是把程序中不同的计算和数据分配到并行处理系统的不同处理机来充分利用并行系统的计算资源、提高程序处理速度的一种优化技术.划分的效果对程序在并行系统上的执行效率将产生至关重要的影响,因此划分问题一直是并行领域研究的一个热点.但是应用程序的一些特性,如非紧密嵌套循环、一条语句对非只读数组的多次引用间存在重叠、不同语句对同一数组不同步长的引用,给有效解决划分问题设置了极大的障碍.已有的划分算法无法对具有这些特征的程序进行自动划分.虽然在对具有这些特征的程序进行手工优化过程中,存在一些直观上的划分策略,但这些策略无法应用到编译器中来指导编译器完成对程序的自动划分.文中根据这类程序的特点,提出了一种基于代表元的划分算法.该算法通过使用程序中对划分计算产生实际影响的数组引用作为代表元素构造各种划分的限制条件,完成程序的划分.同时通过寻找最大一致性数据划分方向有效减少了程序划分过程中的数据重组织通信.该算法已经在AFT2004中实现,并对应用程序获得了很好的效果. 相似文献

14.

一种面向众核处理器的嵌套循环多维并行识别方法*

李颖颖庞建民李雁冰翟胜伟《计算机应用研究》2018,35(11)

现有并行识别方法用于众核处理器时存在一定不足,当选择的循环并行维迭代数较少时可能导致严重地负载不均衡。针对这一问题,提出了一种面向众核处理器的多维并行识别方法,在现有并行识别方法无法做到较好的负载均衡时,选择嵌套循环的多个维进行并行,将多个并行维的迭代空间合并后再做任务划分,减少负载不均衡对程序并行效率的影响。此方法已在课题组开发的自动并行化系统中进行了实现,实际应用过程中能够提升一些应用程序在众核处理器上并行执行的效率。相似文献

15.

EIGENVECTORS-BASED PARALLELISATION OF NESTED LOOPS WITH AFFINE DEPENDENCES

《International Journal of Parallel, Emergent and Distributed Systems》2012,27(3):227-248

Abstract

This paper presents a method for parallelising nested loops with affine dependences. The data dependences of a program are represented exactly using a dependence matrix rather than an imprecise dependence abstraction. By a careful analysis of the eigenvectors and eigenvalues of the dependence matrix, we detect the parallelism inherent in the program, partition the iteration space of the program into sequential and parallel regions, and generate parallel code to execute these regions. For a class of programs considered in the paper, the proposed method can expose more coarse-grain and fine-grain parallelism than a hyperplane-based loop transformation. 相似文献

16.

Communication-free data allocation techniques for parallelizingcompilers on multicomputers

Tzung-Shi Chen Jang-Ping Sheu 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(9):924-938

In distributed memory multicomputers, local memory accesses are much faster than those involving interprocessor communication. For the sake of reducing or even eliminating the interprocessor communication, the array elements in programs must be carefully distributed to local memory of processors for parallel execution. We devote our efforts to the techniques of allocating array elements of nested loops onto multicomputers in a communication-free fashion for parallelizing compilers. We first analyze the pattern of references among all arrays referenced by a nested loop, and then partition the iteration space into blocks without interblock communication. The arrays can be partitioned under the communication-free criteria with nonduplicate or duplicate data. Finally, a heuristic method for mapping the partitioned array elements and iterations onto the fixed-size multicomputers under the consideration of load balancing is proposed. Based on these methods, the nested loops can execute without any communication overhead on the distributed memory multicomputers. Moreover, the performance of the strategies with nonduplicate and duplicate data for matrix multiplication is studied 相似文献

17.

基于多核阵列体系结构的嵌套循环并行优化

杨子煜严明赵鹏《计算机工程与科学》2009,31(Z1)

多核处理器已广泛应用于高性能计算领域,如何有效地将传统串行程序转换为并行代码并减少程序中嵌套循环所占用时间仍是该领域的挑战性问题。本文首先基于多面体模型对嵌套循环进行依赖特征分析并实现瓦片分割,据此自动生成粗粒度并行代码。针对多核阵列处理器的结构特点,采用遗传算法生成通信优化的瓦片任务序列,在此基础上建立了有效的任务调度模型。最后将上述方法应用于LU分解,结果表明该方法与传统调度算法相比,在增加数据局部性、实现负载平衡方面具有更好效果。相似文献

18.

Optimizing FORTRAN Programs for Hierarchical Memory Parallel Processing Systems

下载免费PDF全文

Jin Guohua Chen Fujie 《计算机科学技术学报》1993,8(3):19-30

Parallel loops account for the greatest amount of parallelism in numerical programs.Executing nested loops in parallel with low run-time overhead is thus very important for achieving high performance in parallel processing systems.However,in parallel processing systems with caches or local memories in memory hierarchies,“thrashing problemmay”may arise whenever data move back and forth between the caches or local memories in different processors.Previous techniques can only deal with the rather simple cases with one linear function in the perfactly nested loop.In this paper,we present a parallel program optimizing technique called hybri loop interchange(HLI)for the cases with multiple linear functions and loop-carried data dependences in the nested loop.With HLI we can easily eliminate or reduce the thrashing phenomena without reucing the program parallelism. 相似文献