首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
针对分布存储计算机系统并行编译过程中,为维持数据一致性而产生冗余通信的问题,提出一种优化的通信求解算法。该算法基于依赖关系分析和过程间数据流分析,通过遍历Define-Use图,获得更精确的通信数据,消除过程调用时产生的冗余通信。实验结果表 明,将算法所得结果作为后端生成MPI通信代码的依据,可以有效减少通信量,加速比接近手工MPI并行程序。  相似文献   

2.
为提高接触问题并行计算的效率,分析内力计算和接触计算过程的并行性,提出基于边权约束法构造接触多约束图的方法,对比和分析多约束图剖分算法和双重区域剖分算法的负载平衡和通信性能.数值实验表明,在典型二维模型中多约束图剖分算法的负载平衡性能略低于双重区域剖分算法,但仍可将负载不平衡度控制在较好的范围内,简化并行计算的通信过程,减少总通信量并降低动态通信量比例.  相似文献   

3.
为了提高消息传递型多处理器的性能,优化处理器间的通信对于并行化编译器至关重要。介绍了建立在精确数组数据流分析基础上的通信优化技术。通过优化,减少了通信次数,降低了通信代价。最后通过一个实例说明,在一定计算划分下结合精确数据流分析实现通信比仅仅依据计算划分实现通信能更有效地减少通信量。  相似文献   

4.
数据流编程作为一种编程模式已被广泛应用到各个领域.然而,多核体系结构的不同使得数据流程序在不同平台上移植困难.X10作为一种新型并行编程语言,为不同的多核体系结构提供了统一的并行计算环境.如何利用X10语言的特性来提高数据流程序的效率已成为目前研究工作的一大难点.本文设计并实现了一个面向X10的编译优化系统,该系统确立了三种优化算法:针对X10语言的代码生成优化减少了生成的X10代码量;针对同步数据流图的任务划分优化在负载均衡的基础上,避免了死锁的产生,同时减少了通信开销;针对底层硬件资源的通信优化在机器间通信、机器内部线程间通信、线程内部通信方面进行了区分和优化,减少了通信开销.实验结果表明,设计的三种编译优化算法都获得了较大的性能提升.  相似文献   

5.
针对列车通信网络的网络性能直接受到网络拓扑结构的影响,提出一种基于设备间的通信量权值的二分图分配算法,解决网络拓扑中设备到交换机的分配问题。首先,根据列车通信网络中各设备间的实际通信情况,建立列车交换式以太网模型,得出设备间的通信量权值;然后,利用通信量权值的二分图分配算法完成设备到交换机的分配,构建新的交换式列车网络拓扑结构。通过OPNET建模仿真对该结构的网络性能进行分析,结果表明,优化后的列车通信网络拓扑结构比未优化的拓扑结构,在网络时延、链路利用率和吞吐量等网络性方面能均有很大的提升,可为列车通信网络拓扑结构优化研究提供理论参考。  相似文献   

6.
提出一种适用于分布式数据流环境的、基于密度网格的聚类算法。利用局部站点快速更新数据流信息,使网格空间反映当前数据流的变化。中心站点负责在接收及合并局部网格结构后,对全局网格结构进行密度网格聚类以及噪声网格优化,形成全局聚类结果。实验结果表明,该算法能减少网络通信量,提高全局聚类精度。  相似文献   

7.
不确定环境下MAS生成协作策略的复杂度关系到协作任务能否成功实现.为降低马尔可夫决策模型生成MAS协作策略的复杂度,减少协作通信量,改进了可分解MDP模型生成策略树的方法.利用Bayesian网络中agent状态之间存在的条件独立性与上下文独立性,分解并优化SPI算法生成的策略树,使得MAS中处于独立状态的agent可以分布独立运行,只有在需要同其他agent协商时才进行通信.通信时采用端对端的方式,agent不仅知道协商内容、协商时机,而且知道协作的目标.实验表明,采用该协作策略MAS在完成协作任务获得目标奖励的同时可以有效降低通信量.  相似文献   

8.
基于数据流的程序分析算法能够有效识别程序的数据处理流程,但是对于采用数据加密技术进行通信的网络程序,数据流分析由于无法准确识别、提取解密数据而失效.针对如何提取解密数据,提出一种基于内存依赖度的算法,从解密数据内存依赖度的角度研究如何提取加密通信中的明文数据及实现该算法的原型工具EncMemCheck.通过实验对比分析该算法的优缺点,并在加密通信软件UnrealIrcd上进行实际测试,验证了算法的准确性和实用性.  相似文献   

9.
图划分成功地应用在许多领域,但应用于并行计算时,使用边割度量通信量,其主要缺点是不能准确代表通信量,而且图划分模型没有考虑通信延迟和通信额外开销的分布对并行性能的影响.提出了改进的图划分模型,该模型将影响并行性能的多个要素(通信延迟、最大的局部通信额外开销和整体通信额外开销)整合到一个统一的代价函数,不仅克服了图划分模型中边割度量的一些缺点,而且可以通过调整加权参数,处理不同的优化目标和强调不同因素对并行性能的影响.  相似文献   

10.
为降低马尔可夫决策模型生成MAS协作策略的复杂度,减少协作通信量,在无线传感器网络中利用agent状态之间存在的条件独立性与上下文独立性关系提出了一种新的优化方法.方法通过分解并优化SPI算法生成的策略树,使得MAS中处于独立状态的agent可以分布独立运行,只有在需要同其他agent协商时才进行通信.并在协作中采用Q分解机制实现共享资源的分配,减少资源使用冲突,获取更大奖励.使用STATLOGO软件对方法进行验证,实验结果表明该方法在MAS完成协作任务获取目标奖励的同时,具有产生通信量较小的优点.  相似文献   

11.
A composite service is usually specified by means of a process model that captures control-flow and data-flow relations between activities that are bound to underlying component services. In mainstream service orchestration platforms, this process model is executed by a centralized orchestrator through which all interactions are channeled. This architecture is not optimal in terms of communication overhead and has the usual problems of a single point of failure. In previous work, we proposed a method for executing composite services in a decentralized manner. However, this and similar methods for decentralized composite service execution do not optimize the communication overhead between the services participating in the composition. This paper studies the problem of optimizing the selection of services assigned to activities in a decentralized composite service, both in terms of communication overhead and overall quality of service, and taking into account collocation and separation constraints that may exist between activities in the composite service. This optimization problem is formulated as a quadratic assignment problem. The paper puts forward a greedy algorithm to compute an initial solution as well as a tabu search heuristic to identify improved solutions. An experimental evaluation shows that the tabu search heuristic achieves significant improvements over the initial greedy solution. It is also shown that the greedy algorithm combined with the tabu search heuristic scale up to models of realistic size.  相似文献   

12.
A generalized mapping strategy that uses a combination of graph theory, mathematical programming, and heuristics is proposed. The authors use the knowledge from the given algorithm and the architecture to guide the mapping. The approach begins with a graphical representation of the parallel algorithm (problem graph) and the parallel computer (host graph). Using these representations, the authors generate a new graphical representation (extended host graph) on which the problem graph is mapped. An accurate characterization of the communication overhead is used in the objective functions to evaluate the optimality of the mapping. An efficient mapping scheme is developed which uses two levels of optimization procedures. The objective functions include minimizing the communication overhead and minimizing the total execution time which includes both computation and communication times. The mapping scheme is tested by simulation and further confirmed by mapping a real world application onto actual distributed environments  相似文献   

13.
Minimizing communication and synchronization costs is crucial to the realization of the performance potential of parallel computers. This paper presents a general technique which uses a global data-flow framework to optimize communication and synchronization in the context of the one-way communication model. In contrast to the conventional send/receive message-passing communication model, one-way communication is a new paradigm that decouples message transmission and synchronization. In parallel machines with appropriate low-level support, this may open up new opportunities not only to further optimize communication, but also to reduce the synchronization overhead. We present optimization techniques using our framework for eliminating redundant data communication and synchronization operations. Our approach works with the most general data alignments and distributions in languages like High Performance Fortran (HPF) and uses a combination of the traditional data-flow analysis and polyhedral algebra. Empirical results for several scientific benchmarks on a Cray T3E multiprocessor machine demonstrate that our approach is successful in reducing the number of data (communication) and synchronization messages, thereby reducing the overall execution times  相似文献   

14.
多核处理器已广泛应用于高性能计算领域,如何有效地将传统串行程序转换为并行代码并减少程序中嵌套循环所占用时间仍是该领域的挑战性问题。本文首先基于多面体模型对嵌套循环进行依赖特征分析并实现瓦片分割,据此自动生成粗粒度并行代码。针对多核阵列处理器的结构特点,采用遗传算法生成通信优化的瓦片任务序列,在此基础上建立了有效的任务调度模型。最后将上述方法应用于LU分解,结果表明该方法与传统调度算法相比,在增加数据局部性、实现负载平衡方面具有更好效果。  相似文献   

15.
邓维  李兆鹏 《计算机科学》2017,44(2):209-215
符号执行技术以其良好的精确度控制和代码覆盖率被广泛应用于静态程序分析和高覆盖率测试用例自动生成。 符号执行 在分析程序时,以模拟真实的程序执行过程的方式分析程序的数据流和控制流信息,并检查程序可能出现的所有状态,得到程序的分析结果。高精确度和高覆盖率要求对程序状态描述具体而完备,这会导致符号执行过程中常见的状态爆炸问题。首先提出在不同的执行路径上对具体内存状态进行合并的算法,然后对内存模型进行适度的抽象,扩大状态合并算法的适用范围,最后讨论状态合并所带来的实际效果,并提出了状态合并的优化解决方案。所提出的算法在符号执行引擎ShapeChecker上实现,并取得了良好的实验结果。  相似文献   

16.
This paper presents a new method that can be applied by a parallelizing compiler to find, without user intervention, the iteration and data decompositions that minimize communication and load imbalance overheads in parallel programs targeted at NUMA architectures. One of the key ingredients in our approach is the representation of locality as a locality-communication graph (ICG) and the formulation of the compiler technique as a mixed integer nonlinear programming (MINLP) optimization problem on this graph. The objective function and constraints of the optimization problem model communication costs and load imbalance. The solution to this optimization problem is a decomposition that minimizes the parallel execution overhead. This paper summarizes the process of how the compiler extracts the locality information from a nonannotated code and focuses on how this compiler can derive the optimization problem, solve it, and generate the parallel code with the automatically selected iteration and data distributions. In addition, we include a discussion about our model and the solutions - the decompositions - that it provides. The approach presented in the paper is evaluated using several benchmarks. The experimental results demonstrate that the MINLP formulation does not increase compilation time significantly and that our framework generates very efficient iteration/data distributions for a variety of NUMA machines.  相似文献   

17.
Optimizing large join queries using a graph-based approach   总被引:4,自引:0,他引:4  
Although many query tree optimization strategies have been proposed in the literature, there still is a lack of a formal and complete representation of all possible permutations of query operations (i.e., execution plans) in a uniform manner. A graph-theoretic approach presented in the paper provides a sound mathematical basis for representing a query and searching for an execution plan. In this graph model, a node represents an operation and a directed edge between two nodes indicates the older of executing these two operations in an execution plan. Each node is associated with a weight and so is an edge. The weight is an expression containing optimization required parameters, such as relation size, tuple size, join selectivity factors. All possible execution plans are representable in this graph and each spanning tree of the graph becomes an execution plan. It is a general model which can be used in the optimizer of a DBMS for internal query representation. On the basis of this model, we devise an algorithm that finds a near optimal execution plan using only polynomial time. The algorithm is compared with a few other popular optimization methods. Experiments show that the proposed algorithm is superior to the others under most circumstances  相似文献   

18.
Partitioning and mapping of nested loops for linear array multicomputers   总被引:1,自引:1,他引:0  
In distributed-memory multicomputers, minimizing interprocessor communication is the key to the efficient execution of parallel programs. In order to reduce the amount of communication overhead, parallel programs on multicomputers must be carefully scheduled by parallelizing compilers. This paper proposes some compilation techniques for partitioning and mapping nested loops with constant data dependences onto linear array multicomputers. First, a systematic partition strategy is proposed to project ann-dimensional computational structure, representing ann-nested loop, onto a line to form a one-dimensional projected structure with low communication overhead. Then, a mapping algorithm is proposed for mapping the partitioned loops onto linear arrays in a way that balances the workload and minimizes the communication cost among processors. Finally, parallel execution codes can be automatically generated for such linear array multicomputers.  相似文献   

19.
江南  汪吕蒙  张晓瞳  何炎祥 《软件学报》2022,33(6):2115-2126
迭代计算数据流等式的解,是数据流分析的常用方法.计算支配节点,从而识别自然循环,是许多现代编译器优化分析的重要组成部分.机械化验证高效的求解支配节点的算法通常是获得一个实际的“验证编译器”不可或缺的一部分.为了形式化证明一个高效的迭代求解严格支配节点的算法(CHK),首先建立了值域是逆序列表集合的半格结构,逆序列表中的元素是控制流图中节点的逆后序遍历次序,并证明了它是一个半格,其偏序满足上升链条件.然后使用半格结构,实现了一个基于工作表的Kildall迭代算法,计算严格支配节点.接下来,首先给出了控制流图中支配节点的定义性规范和相关性质定理,然后构造并证明了迭代求解算法所满足的重要性质.利用这些性质定理,相对于定义性规范,证明了该迭代求解算法的正确性和完备性.最后进行总结,并讨论未来工作.整个形式化开发使用的是定理证明助手Isabelle/HOL.  相似文献   

20.
动态频谱接入技术允许认知用户接入未授权的频谱,可以有效地提高频谱资源的利用率。频谱分配算法的时间开销和公平性是算法优劣的主要评价标准。本文从图论着色模型出发,构建了着色算法的评价体系及优化目标。针对用户间的公平性与分配的时间开销问题,在极大独立集的基础上提出了基于加权最大独立集的着色算法,获得了接近于最优的用户公平性,且该算法的时间开销等于信道数,与认知用户的数目无关。仿真分析验证了算法的正确性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号