首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
分布存储系统中优化通信的冗余计算分割   总被引:1,自引:0,他引:1  
针对并行循环套序列,提出一种冗余计算分割的通信优化方法,根据数据流分析,文中给出用以确定每个循环套的冗余计算量的一般方法,并在此基础上提出冗余计算分割的实现和判定,针对规则依赖的程序,该文还提出了一个高效的冗余计算分割的实现方法,该技术已经在一个并行编译器中实现,试验结果表明,它比传统的通信优化技术有明显的优越性。  相似文献   

2.
A solution to the problem of partitioning data for distributed memory machines is discussed. The solution uses a matrix notation to describe array accesses in fully parallel loops, which allows the derivation of sufficient conditions for communication-free partitioning (decomposition) of arrays. A series of examples that illustrate the effectiveness of the technique for linear references, the use of loop transformations in deriving the necessary data decompositions, and a formulation that aids in deriving heuristics for minimizing a communication when communication-free partitions are not feasible are presented  相似文献   

3.
We investigate the lattice-based array partitioning based on the theory of the Smith Normal Form and we present two elegant techniques for partitioning arrays in parallel DoAll loops for message-passing parallel machines: (1) DoAll loops with constant dependencies for communication-free partitioning: a general solution of all possible communication-free partitioning is derived where the dependencies among array references are described in constant distance vectors. (2) DoAll loops with non-constant dependencies for block-communication partitioning: the dependencies among array references are described in non-constant distance vectors. We derive the partitioning equations which allocate all remote data to a unique processor such that only one block-communication can obtain all the remote data for the computation. By using the Smith Normal Form decomposition, we are also able to verify our partitioning results.  相似文献   

4.
This paper addresses the problem of communication-free partition of iteration spaces and data spaces along hyperplanes. To finding more possible communication-free hyperplane partitions, we treat statements within a loop body as separate schedulable units. Instead of using the information about data dependence distance or direction vectors, our technique explicitly formulates array references as transformations from statement-iteration spaces to data spaces. Based on these transformations, the necessary and sufficient conditions for communication-free partition along hyperplanes to be feasible have been proposed. This approach can be applied to all programs with an imperfectly nested loop or sequences of imperfectly nested loops, whose array references are affine functions of outer loop indices or loop invariant variables. The proposed approach is more practical than existing methods in finding the data and computation distribution patterns that can cause the processor to execute fully-parallel on multicomputers without any interprocessor communication.  相似文献   

5.
Palamidessi has shown that the π-calculus with mixed choice is powerful enough to solve the leader election problem on a symmetric ring of processes. We show that this is also possible in the calculus of Mobile Ambients (MA), without using communication or restriction. Following Palamidessi's methods, we deduce that there is no encoding satisfying certain conditions from MA into CCS. We also show that the calculus of Boxed Ambients is more expressive than its communication-free fragment.  相似文献   

6.
This paper addresses the problem of partitioning the iterations of nested loops, and data arrays accessed by the loops. Hyperplane partitions of disjoint subsets of data arrays and loop iterations that result in the elimination of communication are sought. A characterization of necessary and sufficient conditions for communication-free hyperplane partitioning is provided.  相似文献   

7.
In distributed memory multicomputers, local memory accesses are much faster than those involving interprocessor communication. For the sake of reducing or even eliminating the interprocessor communication, the array elements in programs must be carefully distributed to local memory of processors for parallel execution. We devote our efforts to the techniques of allocating array elements of nested loops onto multicomputers in a communication-free fashion for parallelizing compilers. We first analyze the pattern of references among all arrays referenced by a nested loop, and then partition the iteration space into blocks without interblock communication. The arrays can be partitioned under the communication-free criteria with nonduplicate or duplicate data. Finally, a heuristic method for mapping the partitioned array elements and iterations onto the fixed-size multicomputers under the consideration of load balancing is proposed. Based on these methods, the nested loops can execute without any communication overhead on the distributed memory multicomputers. Moreover, the performance of the strategies with nonduplicate and duplicate data for matrix multiplication is studied  相似文献   

8.
We investigate the simulation preorder between finite-state systems and a simple subclass of BPP-nets (communication-free nets). We show EXPSPACE lower bounds for the simulation problems, in both directions, as well as for the simulation equivalence. Our results improve PSPACE and co-NP lower bounds for the simulation between finite-state systems and BPP-nets, given by Ku?era and Mayr in [A. Ku?era, R. Mayr, Simulation preorder over simple process algebras, Information and Computation 173 (2) (2002) 184-198].  相似文献   

9.
10.
11.
Before a conventional application is converted into a distributed one (typically a costly process), it is prudent to estimate the improvement in run time that will be achieved. Previous research has tended to ignore communications delays in order to facilitate analysis. However, such models lead to optimistic predictions and may be grossly inaccurate for problems involving fine-grained parallelism. In this paper, we consider distributed computation on a token ring local area network. We obtain exact analytical results for the mean speedup, both for smallnand for asymptotically largen. For largen, we show that under very general conditions speedup tends to a limiting value with increasing numbers of processors; i.e., there is a “communications speedup limit” that cannot be exceeded regardless of the number of processors. Because the token ring represents a limiting case for the effects of communications delays, results obtained thus provide an upper bound for speedup on Ethernets and other bus-type networks. Analytical results were verified by simulation.  相似文献   

12.
In this paper we explore compiler techniques for achieving efficient communications on circuit switching interconnection networks. We propose a compilation framework for identifying communication patterns and compiling these patterns as network configuration directives. This has the potential of providing significant performance benefits when connections can be established in the network prior to the actual communications. The framework includes a flexible and powerful communication pattern representation scheme that captures the property of communication patterns and allows manipulation of these patterns. In this way, communication phases can be identified within the application. Additionally, we extend the classification of static and dynamic communications to include persistent communications. Persistent communications are a subclass of dynamic communications that remain unchanged for large segments of the application execution. An experimental compiler has been developed to implement the framework. This compiler is capable of detecting both static and persistent communications within an application. We show that for the NAS Parallel Benchmarks, 100% of the point-to-point communications can be classified as either static or persistent and 100% of the collectives are either static or persistent with the exception of IS. Simulation-based performance analysis demonstrates the benefit of using our compiler techniques for achieving efficient communications in multiprocessor systems.  相似文献   

13.
Determination of data dependences is a task typically performed with high-level language source code in today's optimizing and parallelizing compilers. Very little work has been done in the field of data dependence analysis on assembly language code, but this area will be of growing importance, e.g., for increasing instruction-level parallelism. A central element of a data dependence analysis in this case is a method for memory reference disambiguation which decides whether two memory operations may access (or definitely access) the same memory location. In this paper we describe a new approach for the determination of data dependences in assembly code. Our method is based on a sophisticated algorithm for symbolic value propagation, and it can derive value-based dependences between memory operations instead of just address-based dependences. We have integrated our method into the Salto system for assembly language optimization. Experimental results show that our approach greatly improves the precision of the dependence analysis in many cases.  相似文献   

14.
The problem of identifying the conditions under which semantic or behavioural dependences arise between different program statements has interesting applications in various areas such as program understanding, software maintenance, software audits and software testing. We present an extension to the program dependence graph (PDG), called the dependence condition graph (DCG), that enables identifying the conditions for dependence between program points. We show that these conditions are not only correct with respect to the program's semantics, but also more precise than identified by other known techniques. We also present evidence that the DCG is a practical representation and can be built for large programs, and sketch many different applications of the DCG.  相似文献   

15.
C程序单元级依赖性分析   总被引:1,自引:1,他引:1  
程序依赖性分析是软件分析的一个基本内容,目前的相关工作大多集中在语句级的分析方面。人们同样需要单元级的依赖信息来考察单元间的信息流向及整个程序的构架。本文针对C程序中函数间的调用依赖、参数传递依赖、全局数据依赖以及文件间的包含依赖和外部变量定义依赖进行了分析,并提出单元依赖图表达这些关系。基于此图,本文采用基于信息论的方法度量了单元间的耦合性。单元依赖图中保留的函数调用间的互斥关系提高了度量的准确性。相关的分析思想和技术适用于分析使用其它高级程序设计语言编写的软件。  相似文献   

16.
A method for generating sequences of quasirandom numbers allows conventional serial Monte Carlo algorithms to be parallelized using a leapfrog scheme. Specifically, a Sobol' sequence can be broken up into interleaved subsets; with each processing node calculating a unique subset of the full sequence, all of the computational advantages of quasirandom Monte Carlo methods over pseudorandom algorithms or grid-based techniques are retained. Tests with several parallel supercomputers demonstrate that as many as 106integration points (up to 6 dimensions) can be generated per second per node in the optimal case where the number of nodes is a power of 2. The speed of communication-free parallel Sobol' sequence generators and the rapid convergence properties of quasirandom Monte Carlo schemes indicate that the method described here may be gainfully applied to a wide range of problems.  相似文献   

17.
We consider an automated agent that needs to coordinate with a human partner when communication between them is not possible or is undesirable (tacit coordination games). Specifically, we examine situations where an agent and human attempt to coordinate their choices among several alternatives with equivalent utilities. We use machine learning algorithms to help the agent predict human choices in these tacit coordination domains. Experiments have shown that humans are often able to coordinate with one another in communication-free games, by using focal points, “prominent” solutions to coordination problems. We integrate focal point rules into the machine learning process, by transforming raw domain data into a new hypothesis space. We present extensive empirical results from three different tacit coordination domains. The Focal Point Learning approach results in classifiers with a 40–80% higher correct classification rate, and shorter training time, than when using regular classifiers, and a 35% higher correct classification rate than classical focal point techniques without learning. In addition, the integration of focal points into learning algorithms results in agents that are more robust to changes in the environment. We also present several results describing various biases that might arise in Focal Point based coordination.  相似文献   

18.
通过给基于孤岛模型的并行粒子群算法引入K-means来进行子种群的划分。这不仅可以使一个子种群中的粒子位置相对集中,学习相对容易,而且可以提高搜索效率,使有限的时间用在最有效的搜索上。针对并行算法的特点,对其进行改进,在满足一定条件时才进行通信,这样可以避免无效通信,减少通信所花的时间。仿真结果证实,该算法具有较高的收敛速度和收敛精度。  相似文献   

19.
Inter‐iteration dependences in loops can hinder loop‐level parallelism. For some loops, existing thread‐level speculation techniques fail to expose their inherent loop‐level parallelism, because some inter‐iteration dependences are too costly to synchronize, predict, pre‐compute and isolate. This paper presents a compiler technique called loop recreation to change the nature of some dependences (by turning some inter‐iteration dependences into intra‐iteration ones and vice versa) in a loop so that the inter‐iteration dependences in the transformed loop are less costly to enforce at runtime than those in the original loop. We present an algorithm for finding an optimal loop recreation transformation with respect to a simple misspeculation cost model and demonstrate the performance advantages of loop recreation over two recent techniques for multicore systems running nine representative irregular applications. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

20.
Presents the hierarchical task graph (HTG) as an intermediate parallel program representation which encapsulates minimal data and control dependences, and which can be used for the extraction and exploitation of functional, or task-level parallelism. The hierarchical nature of the HTG facilitates efficient task-granularity control during code generation, and thus applicability to a variety of parallel architectures. The construction of the HTG at a given hierarchy level, the derivation of the execution conditions of tasks which maximizes task-level parallelism, and the optimization of these conditions which results in reducing synchronization overhead imposed by data and control dependences are emphasized. Algorithms for the formation of tasks and their execution conditions based on data and control dependence constraints are presented. The issue of optimization of such conditions is discussed, and optimization algorithms are proposed. The HTG is used as the intermediate representation of parallel Fortran and C programs for generating parallel source as well as parallel machine code  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号