期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An exact data dependence testing method for quadratic expressions

Jia-Hwa Wu Chih-Ping Chu 《Information Sciences》2007,177(23):5316-5328

相似文献

2.

Data Dependence Analysis Techniques for Increased Accuracy and Extracted Parallelism 总被引：1，自引：0，他引：1

Konstantinos Kyriakopoulos Kleanthis Psarris 《International journal of parallel programming》2004,32(4):317-359

相似文献

3.

Improving the parallelism of iterative methods by aggressive loop fusion

Jingling Xue Minyi Guo Daming Wei 《The Journal of supercomputing》2008,43(2):147-164

Traditionally, loop nests are fused only when the data dependences in the loop nests are not violated. This paper presents a new loop fusion algorithm that is capable of fusing loop nests in the presence of fusion-preventing anti-dependences. All the violated anti-dependences are removed by automatic array copying. As a case study, this aggressive loop fusion strategy is applied to a Jacobi solver. The performance of iterative methods is typically limited by the speed of the memory system. Fusing the two loop nests in the Jacobi solver into one reduces data cache misses, and consequently, improves the performance results of both sequential and parallel versions of the Jacobi program, as validated by our experimental results on an HP AlphaServer SC45 supercomputer. 相似文献

4.

The extension of the I test

Weng-Long Chang Chih-Ping Chu 《Parallel Computing》1998,24(14):2101-2127

The I test is an efficient and precise data dependence method to ascertain whether integer solutions exist for one-dimensional arrays with constant bounds. For one-dimensional arrays with variable limits, the I test assumes that there may exist integer solutions. In this paper, we extend the I test. The extended I test can be applied towards determining whether integer solutions exist for one-dimensional arrays with variable limits, improving the applicable range of the I test. Experiments with benchmark cited from EISPACK, LINPACK, Parallel loops, etc. showed that among 1189 pairs of one-dimensional arrays tested, 183 had their data dependence analysis amended by the extended I test. That is, the extended I test increases the success rate of the I test by approximately 15.4%. Comparing with the Power test and the Omega test, the extended I test has higher accuracy than the Power test and shares the same accuracy with the Omega test for these 1189 pairs of arrays, but has much better efficiency over these two well-known tests. 相似文献

5.

分布式系统中数据分解的研究

沈亚楠姚远张平赵荣彩罗向阳《计算机工程》2006,32(11):114-115,132

数据分解对消息传递并行机下的并行编译器取得高性能至关重要。根据编译器自动得出的数据分解（映射数据到处理机）信息,C语言版本的发送／接收消息循环嵌套可产生出来,从而在处理机之间实现分布数据。不仅一个已被证明且功能强大的数学模型用于产生数据分解代码,而且一个形式化的算法及其实现也已给出。初步实验结果显示该算法能显著提高性能。相似文献

6.

面向SIMD扩展部件的循环优化研究

侯永生赵荣彩黄磊韩林《计算机科学》2014,41(5):27-32

高性能微处理器中普遍采用SIMD向量扩展作为计算加速部件。在深入研究SIMD扩展部件数据依赖关系约束条件的基础上,提出一种基于依赖关系逆向图的Tarjan扩展算法,提高了SIMD并行性识别率,并结合传统向量化方法,实现了面向SIMD扩展部件的循环优化技术,消除了不可向量化语句对可向量化语句在数据重组中不必要的开销。实际程序测试结果显示,其在基于依赖关系的SIMD并行性判定方面优于ICC编译器,经过循环优化后,最终生成的SIMD代码其执行效率平均提高了12%。相似文献

7.

Efficient compiler and run-time support for parallel irregular reductions

Hwansoo Han Chau-Wen Tseng 《Parallel Computing》2000,26(13-14)

Many scientific applications are comprised of irregular reductions on large data sets. In shared-memory parallel programs, these irregular reductions are typically computed in parallel using replicated buffers, then combined using synchronization. We develop L W , a new technique which partitions irregular reductions so that each processor computes values only for locally assigned data, eliminating the need for buffers or synchronized writes. Computation is replicated if its results are needed on multiple processors. We experimentally evaluate its performance for three irregular codes on a software DSM running on a distributed-memory multiprocessor and two shared-memory multiprocessors while varying connectivity, locality, and adaptivity. Results show L W improves performance significantly compared to using replicated buffers, and can match or exceed explicit message-passing gather/scatter for applications with low locality or high adaptivity. 相似文献

8.

Predecessor/successor approach for high-performance run-time wavefront scheduling

Tsung-Chuan Huang Po-Hsueh Hsu 《Information Sciences》2006,176(7):845-860

Most scientific applications rely on parallel multiprocessor computing to enhance performance. However, the irregular loops within these applications obstruct the parallelism analysis at compile-time. Rauchwerger et al. presented a run-time method to extract the hidden parallelism in a program using dependence chains. The relative overhead degrades this approach’s performance due to the mass storage requirement and huge array reference processing. In this study, a new predecessor/successor approach is developed in which high-level predecessor/successor information is recorded and processed efficiently. A predecessor/successor table is constructed first in the inspector phase so that only the successor iterations in the current wavefront need to be examined, instead of the entire loop iterations during wavefront scheduling. Usually, the performance of dependence chain approach degrades dramatically for a hot-spot access pattern, but our scheme works very efficiently in this case. The experimental results using synthetic code and real programs are presented to prove the superiority of the proposed approach. 相似文献

9.

一种改进的控制流SIMD向量化方法

下载免费PDF全文

高伟李颖颖孙回回李雁冰赵荣彩《软件学报》2017,28(8):2046-2063

SIMD扩展部件是近年来集成到通用处理器中的加速部件,旨在发掘多媒体和科学计算等程序的数据级并行.控制依赖给发掘程序中的数据级并行带来了阻碍,当前不论基于loop-based还是SLP的控制流向量化方法都需要if转换,而没有考虑循环内蕴含的向量并行度,导致生成的向量代码效率较低.此外不精确的代价模型指导控制流向量化,同样导致生成的向量代码效率较低.为此提出了改进的控制流SIMD向量化方法,首先提出了含有控制依赖的循环分布算法,分离循环的可向量化部分和不可向量化部分,同时考虑分布时数据的局部性;其次提出了一种直接向量化控制流的方法,该方法考虑了基本块间的向量重用;最后利用精确的代价模型指导超字选择指令和超字条件分支指令的生成.实验结果表明,与现有的控制流向量化方法相比,本文提出的改进方法生成的向量代码性能提高24%. 相似文献

10.

Interprocedural alias analysis: Implementation and empirical results

Herbert G. Mayer Michael Wolfe 《Software》1993,23(11):1201-1233

相似文献

11.

分布式系统中数据分解代码的自动产生

沈亚楠姚远龚雪容张平赵荣彩《微计算机信息》2005,(20)

为分布内存系统开发的并行编译器碰到的第一个问题就是如何分解一个应用程序中的数据。由于访问非本地节点上数据的代价是昂贵的,所以数据分解必须仔细考虑。尽管数据分解的定义已被提出,但是文献并没有给出相应的算法.本文介绍了在一个已被证明且功能强大的数学模型下如何产生数据分解代码的算法,并在SUIF(Stanforduniversityintermediateformat)系统中的Paraguin编译器上得到实现。相似文献

12.

自动向量化中基于数据依赖分析的循环分布算法

黄磊姚远侯永生杨明《计算机科学》2011,38(9):288-293

循环分布是开发向量化程序的一个有效的方法。但是由于程序中的数据相关性,当前的自动向量化编译器实现完全的循环分布非常困难。因此,当前的自动向量化编译器一般采用简单的循环分布方法。以数据依赖关系分析为基础,从有无依赖环的角度分析了程序中语句的向量化能力,提出了基于语句向量化识别的循环分布算法,并在自动向量化中加以实现。通过此方法,可以充分地分析语句或依赖环的向量化能力,最终采用循环分布,将可向量化的语句与不可向量化的语句分布在不同的循环中。该方法可以处理当前的自动向量化编译器无法向量化的循环,对一些语句间有依赖关系的循环可达到较好的效果。相似文献

13.

Non-strict independence-based program parallelization using sharing and freeness information

Daniel Cabeza Gras Manuel V. Hermenegildo 《Theoretical computer science》2009

相似文献

14.

一个过程间数据流分析的框架

下载免费PDF全文

郁卫江朱根江谢立《软件学报》1997,8(9):653-662

本文提出一个过程间数据流分析的框架．它将层次式任务图HTG（hierarchical-taskgraph）用于程序功能并行性的表示与挖掘．在框架中定义了过程表ProcTable和二叉树形式的过程调用图BCG（binary-call－graph）,以使算法的时空代价最小．相似文献

15.

Data locality and parallelism optimization using a constraint-based approach

Ozcan Ozturk Author Vitae 《Journal of Parallel and Distributed Computing》2011,71(2):280-287

Embedded applications are becoming increasingly complex and processing ever-increasing datasets. In the context of data-intensive embedded applications, there have been two complementary approaches to enhancing application behavior, namely, data locality optimizations and improving loop-level parallelism. Data locality needs to be enhanced to maximize the number of data accesses satisfied from the higher levels of the memory hierarchy. On the other hand, compiler-based code parallelization schemes require a fresh look for chip multiprocessors as interprocessor communication is much cheaper than off-chip memory accesses. Therefore, a compiler needs to minimize the number of off-chip memory accesses. This can be achieved by considering multiple loop nests simultaneously. Although compilers address these two problems, there is an inherent difficulty in optimizing both data locality and parallelism simultaneously. Therefore, an integrated approach that combines these two can generate much better results than each individual approach. Based on these observations, this paper proposes a constraint network (CN)-based formulation for data locality optimization and code parallelization. The paper also presents experimental evidence, demonstrating the success of the proposed approach, and compares our results with those obtained through previously proposed approaches. The experiments from our implementation indicate that the proposed approach is very effective in enhancing data locality and parallelization. 相似文献

16.

并行化技术与工具

金国华陈福接《计算机研究与发展》1996,33(7):481-492

程序并行化工具由它能有效地解决了多种并行机结构间的代码可移植性和大大地减轻用户使用并行机的困难，已成为当今并行处理领域的一个热门研究课题。相信随着对并行机系统越来越广泛的使用。它还将会得到不断的发展和完善。本文着重介绍了并行化关键技术和工具系统的研究历史与现状，并就这一研究课题今后的发展趋势提出一些看法。相似文献

17.

一种基于非正规域的区域依赖关系分析法 总被引：1，自引：0，他引：1

朱根江谢立《计算机学报》1994,17(3):168-175

在自动并行编译中，并行性的识别主要集中在循环及语句级，而许多程序实际上可通过挖掘子程序级这种“任务“并行性来提高性能。本文提出了基于非正规域的区域依赖分析方法，旨在发掘这类并行性，它能精确地刻划程序中的数据访问区域。克服了现有区域分析技术中趋于保守的弱点，从而提出了并行度，依赖关系的测试算法简单而有效。相似文献

18.

Analyzing reference patterns in automatic data distribution tools

Eduard Ayguadé Jesús Labarta Jordi Garcia Mercè Gironès Mateo Valero 《International journal of parallel programming》1995,23(6):515-535

相似文献

19.

基于重排序变换和循环分布的通信优化算法

陈达智赵荣彩韩林丁锐赵捷《计算机科学》2012,39(9):296-301

针对现有通信优化算法无法使MPI自动并行化编译器生成加速比理想的消息传递程序问题,提出了一种基于重排序变换和循环分布的通信优化算法。该算法根据给出的过程间副作用集合和基于mpi_wait/mpi_irecv移动的重排序变换规则,有序地采用重排序变换和循环分布,尽可能安全地扩大点到点非阻塞通信中通信与计算的重叠窗口,使MPI自动并行化编译器生成具有更多计算重叠通信的消息传递代码。实验结果表明,该算法能够隐藏更多的点到点非阻塞通信开销,并且明显提升消息传递程序的加速比。相似文献

20.

Communication-Free Alignment for Array References with Linear Subscripts in Three Loop Index Variables or Quadratic Subscripts

Chang Weng-Long Chu Chih-Ping Wu Jia-Hwa 《The Journal of supercomputing》2001,20(1):67-83

相似文献