首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
给出了一种将单线程程序自动变为多线程程序的一种方法.该方法基于依赖性分析,将依赖性分析的结果用有向无环图表示出来,然后将有向无环图分解成多个存在依赖关系的集合,同一集合内的元素却不存在依赖关系,它们之间是可以并行执行的,集合间是有执行先后顺序的.然后将各个集合看作各个并行域,并行域内部的程序并行执行,处理并行域的时候,可以用各种模型,如:Thread and Lock,OPENMP等,最后生成相应的并行程序.  相似文献   

2.
汤志忠  张赤红  王剑 《软件学报》1995,6(Z1):148-156
分解式软件流水DESP是我们最近提出来的一种对无分支循环进行有效调度的新方法,它通过把循环调度分解为两个子问题,把无分支调度问题转化为无环路图的调度,从而运用图论中一些经典的复杂度为多项式的方法来解决.在本文中,我们把DESP方法扩展成可以优化带条件分支的循环,称为全局分解式软件流水方法——GDESP.研究结果表明,GDESP方法具有时间效益高和实用性好等优点,是一种有效实用的全局循环调度方法.  相似文献   

3.
汤志忠  王剑 《软件学报》1995,6(1):148-156
分解式软件流水DESP是我们最近提出来的一种对无分支循环进行有效调度的新方法,它通过把循环地分解为两个子问题,把无分支调度问题转化为无环路图的调度,从而运用图论中一些经典的复杂度为多项式的方法来解决。在本中,我们把DESP方法扩展成可以优化带条件分支的循环,称为全局分解式软件流水方法-GDESP。  相似文献   

4.
本文旨在介绍建立在良基集上的多重集合和多重序集。引入这些概念后,可较为简单和直观地取得循环程序的终结函数,这不仅便于循环程序的终结性证明,而且可将结构归纳法建于其上,方便使用间歇断言法验证循环程序的完全正确性。  相似文献   

5.
对于一类基于运行距离最短的车队调度问题,构建了问题的数学规划模型。由于模型难以直接求解,构造网络图对车队问题进行表述。通过求解车队调度网路图的最小生成树,去除最小生成树中车辆和车辆之间连接线,从而将问题分解为一个个单车辆调度问题。对于单车辆调度问题的处理,设计了最小权奇点边添加法。该方法通过构造奇点边集合,使单车辆调度网络图成为所有顶点均为偶点的多重图;进而寻找欧拉环,并删除欧拉环中的重复中间点,最终得到问题的求解方案。最后设计了实例,分别采用图解算法和禁忌搜索算法进行求解。对比发现图解算法在求解车辆调度问题方面具有一定的优越性。  相似文献   

6.
芦奉良  刘羽  张军 《计算机工程》2011,37(11):77-79,82
针对共享存储多处理机系统中各处理机负载不均衡的问题,提出一种新的任务调度算法--多重波前法.在任务图划分的基础上,采用分层调度方式对原波前法进行改进,通过对任务序列进行多重遍历和重组以降低各处理器的分配误差,利用循环调度算法提高任务调度结果的精度,并给出该算法的并行实现.实验结果证明,该算法具有较低的任务分配误差和较高...  相似文献   

7.
指令调度通过调整指令之间的顺序来提高指令级并行度(ILP)。然而基本块通常很小,因而潜在的ILP也很小。随着芯片设计技术的发展,现代的处理机所包含的资源却越来越丰富。指令调度只有跨越基本块的边界(即全局指令调度)才能够充分发挥处理机潜在的和程序中固有的ILP。全局指令调度可划分为有环和无环两种。该文介绍了无环全局指令调度的几种影响力较大的算法。同时还简单介绍了有关全局指令调度的新的热点。  相似文献   

8.
任务DAG图是刻画程序中各任务间依赖关系的一种手段,DAG图上除了标有任务间的依赖关系,还记录了各任务的计算量和任务之间的通信量,这些信息共同构成了任务调度的依据,国内外有许多基于任务DAG图的调度算法研究,但通过分析串行程序的相关性来构造任务DAG图的研究却不多见.分析了串行程序中存在的数据相关性和控制相关性,就程序中的顺序,分支,循环三种基本结构进行分别讨论,提出了一种串行程序任务DAG图的构造算法.  相似文献   

9.
操作系统内核作为软件系统的基础组件, 其安全可靠是构造高可信软件系统的重要环节, 但是, 在实际的验证工作中, 操作系统内核中全局性质的不变式定义, 复杂数据结构程序的形式化描述和验证仍存在很多困难. 本文针对操作系统内核中满足的全局性质, 在代码层以函数为单位, 用全局不变式进行定义, 并在不同的函数中进行形式化验证, 从而证明各个函数符合操作系统内核的全局性质; 针对操作系统内核中经常使用的复杂数据结构程序, 本文通过扩展形状图理论, 提出一种使用嵌套形状图逻辑的方法来形式化描述复杂数据结构程序, 并对该方法进行了正确性证明, 最终成功验证操作系统内核中关于任务创建与调度, 消息队列创建与操作相关的代码.  相似文献   

10.
李建勋  郭建华  李维乾  曹茂生 《计算机科学》2015,42(3):233-236, 251
对于网格系统中计算力调度等问题,结合有向无环作业图DATG和无向节点图UNG,采用并行集APS建立了一种基于二分图的网格调度算法BGS,并在惩罚策略、负载均衡、复活机制的引导下,使系统的调度动态地逐步趋向优化.实验结果表明:该算法能够更加适应网格资源的变化,降低作业负载,提高作业的并行化程度,并能根据系统负载合理地利用节点资源.  相似文献   

11.
介绍了一种为即时编译器和时空受限系统设计的轻量级线性复杂指令调度算法。该算法进行指令调度时,不基于传统的DAG图或表达式树,而是基于一种独创的数据结构扩展关联矩阵,其时间复杂性在最坏情况下也能与全部指令长度构成严格的线性关系,仅占用不到1 KB的内存空间。该算法已被Intel为Xscale设计的高性能J2ME虚拟机XORP采用为即时编辑器中的缺省指令调度算法。  相似文献   

12.
Many partitioned scientific programs can be modeled as iterative executions of computational tasks and represented by iterative task graphs (ITGs). An ITG may or may not have dependence cycles. In this paper, we consider the symbolic scheduling of ITGs on distributed memory architectures with nonzero communication overhead and propose heuristic algorithms for scheduling both cyclic and acyclic ITGs without searching an entire iteration space. Our approach incorporates techniques of software pipelining, graph unfolding, directed acyclic graph (DAG) scheduling, and load balancing. We analyze the asymptotic optimality of the algorithms to show that the derived schedules are competitive to optimal solutions. We also study the sensitivity of scheduling performance on inaccurate weights. Finally, we present experimental results to demonstrate the effectiveness of the optimization techniques  相似文献   

13.
Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present an approach that schedules the loop operations for minimum register requirements, given a modulo reservation table. Our method determines optimal register requirements for machines with finite resources and for general dependence graphs. Measurements on a benchmark suite of 1327 loops from the Perfect Club, SPEC-89, and the Livermore Fortran Kernels show that the register requirements decrease by 24.8% on average when applying the optimal stage scheduler to the MRT-schedules of a register-insensitive modulo scheduler.  相似文献   

14.
To handle scheduling of tasks on heterogeneous systems, an algorithm is proposed to reduce execution time while allowing for maximum parallelization. The algorithm is based on multi-objective scheduling cuckoo optimization algorithm (MOSCOA). In this algorithm, each cuckoo represents a scheduling solution in which the ordering of tasks and processors allocated to them are considered. In addition, the operators of cuckoo optimization algorithm means laying and immigration are defined so that it is usable for scheduling scenario of the directed acyclic graph of the problem. This algorithm adapts cuckoo optimization algorithm operators to create proper scheduling in each stage. This ensures avoiding local optima while allowing for global search within the problem space for accelerating the finding of a global optimum and delivering a relatively optimized scheduling with the least number of repetitions. Moving toward global optima is done through a target immigration operator in this algorithm and schedules in each repetition are pushed toward optimized schedules to secure global optima. The results of MOSCOA implementation on a large number of random graphs and real-world application graphs with a wide range characteristics show MOSCOA superiority over the previous task scheduling algorithms.  相似文献   

15.
周静  曾国荪 《计算机工程》2007,33(20):15-17
并行编译的两大工作是程序代码划分和调度。对于调度问题,目前已有大量的解决方案,但是针对代码划分提取并行性的研究工作却非常少。该文提出了通过合并结点来划分DAG图的新的划分算法。实例分析证明,该算法是一种有效的、低复杂度的自适应代码划分解决方案,并且适用于异构计算的任务图划分。  相似文献   

16.
The lower and upper bounds on the minimum time needed to process a given directed acyclic task graph for a given number of processors are derived. It is proved that the proposed lower bound on time is not only sharper than the previously known values but also easier to calculate. The upper bound on time, which is useful in determining the worst case behavior of a given task graph, is presented. The lower and upper bounds on the minimum number of processors required to process a given task graph in the minimum possible time are also derived. It is seen with a number of randomly generated dense task graphs that the lower and upper bounds we derive are equal, thus giving the optimal time for scheduling directed acyclic task graphs on a given set of processors  相似文献   

17.
Dependence analysis algorithms have been proposed to identify parallelism in programs with tree-like data structures. However, they cannot analyze the dependence of statements if recursive data structures of programs are cyclic. This paper presents a technique to identify parallelism in programs with cyclic graphs. The technique consists of three steps: (1) traversal patterns that loops or recursive procedures traverse graphs are identified, and the statements that construct the links of traversal patterns will be located by definition–use chains of recursive data structures; (2) traversal-pattern-sensitive shape analysis is performed to estimate possible shapes of traversal patterns; (3) dependence analysis is performed to identify parallelism using the result of shape analysis. This approach can identify parallelism in programs with cyclic data structures due to the facts that many programs follow acyclic structures (i.e. traversal patterns) to access all nodes on the cyclic data structures. Once the traversal patterns are isolated from the overall data structures, dependence analysis can be applied to identify parallelism.  相似文献   

18.
Composite objects often involve recursive relationships, so-called bills-of-materials, which are cumbersome to handle in relational database systems. The relationships constitute a directed graph, where the successors of a node represent its components, recursively. Instead of the whole transitive closure (all ancestor-descendant pairs), the task is to retrieve the descendants of any given node. A simple relational solution is suggested, which packs information of the ancestor path of each node into a fixed-length code, called the signature. The code is nonunique, and its purpose is to define a relatively small superset of the descendants, as well as to establish a basis for clustering. It supports effective retrieval of the descendants, in terms of both disk accesses and DBMS calls. The method performs best for tree-structured graphs, where the processing time typically decreases by a factor of more than 10, compared to a simple loop of joins. Also, general directed graphs, both acyclic and cyclic, can be processed more effectively. The method is implemented on top of a relational system, but advantages can be gained on other platforms, too  相似文献   

19.
Global software pipelining is a complex but efficient compilation technique to exploit instruction-level parallelism for loops with branches.This paper presents a novel global software pipelining technique,called Trace Software Pipelining,targeted to the instruction-level parallel processors such as Very Long Instruction Word (VLIW) and superscalar machines.Trace software pipelining applies a global code scheduling technique to compact the original loop body.The resulting loop is called a trace software pipelined (TSP) code.The trace softwrae pipelined code can be directly executed with special architectural support or can be transformed into a globally software pipelined loop for the current VLIW and superscalar processors.Thus,exploiting parallelism across all iterations of a loop can be completed through compacting the original loop body with any global code scheduling technique.This makes our new technique very promising in practical compilers.Finally,we also present the preliminary experimental results to support our new approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号