共查询到19条相似文献,搜索用时 140 毫秒
1.
给出了一种将单线程程序自动变为多线程程序的一种方法.该方法基于依赖性分析,将依赖性分析的结果用有向无环图表示出来,然后将有向无环图分解成多个存在依赖关系的集合,同一集合内的元素却不存在依赖关系,它们之间是可以并行执行的,集合间是有执行先后顺序的.然后将各个集合看作各个并行域,并行域内部的程序并行执行,处理并行域的时候,可以用各种模型,如:Thread and Lock,OPENMP等,最后生成相应的并行程序. 相似文献
2.
3.
分解式软件流水DESP是我们最近提出来的一种对无分支循环进行有效调度的新方法,它通过把循环地分解为两个子问题,把无分支调度问题转化为无环路图的调度,从而运用图论中一些经典的复杂度为多项式的方法来解决。在本中,我们把DESP方法扩展成可以优化带条件分支的循环,称为全局分解式软件流水方法-GDESP。 相似文献
4.
唐保兴 《计算机应用与软件》1985,(2)
本文旨在介绍建立在良基集上的多重集合和多重序集。引入这些概念后,可较为简单和直观地取得循环程序的终结函数,这不仅便于循环程序的终结性证明,而且可将结构归纳法建于其上,方便使用间歇断言法验证循环程序的完全正确性。 相似文献
5.
对于一类基于运行距离最短的车队调度问题,构建了问题的数学规划模型。由于模型难以直接求解,构造网络图对车队问题进行表述。通过求解车队调度网路图的最小生成树,去除最小生成树中车辆和车辆之间连接线,从而将问题分解为一个个单车辆调度问题。对于单车辆调度问题的处理,设计了最小权奇点边添加法。该方法通过构造奇点边集合,使单车辆调度网络图成为所有顶点均为偶点的多重图;进而寻找欧拉环,并删除欧拉环中的重复中间点,最终得到问题的求解方案。最后设计了实例,分别采用图解算法和禁忌搜索算法进行求解。对比发现图解算法在求解车辆调度问题方面具有一定的优越性。 相似文献
6.
7.
指令调度通过调整指令之间的顺序来提高指令级并行度(ILP)。然而基本块通常很小,因而潜在的ILP也很小。随着芯片设计技术的发展,现代的处理机所包含的资源却越来越丰富。指令调度只有跨越基本块的边界(即全局指令调度)才能够充分发挥处理机潜在的和程序中固有的ILP。全局指令调度可划分为有环和无环两种。该文介绍了无环全局指令调度的几种影响力较大的算法。同时还简单介绍了有关全局指令调度的新的热点。 相似文献
8.
9.
操作系统内核作为软件系统的基础组件, 其安全可靠是构造高可信软件系统的重要环节, 但是, 在实际的验证工作中, 操作系统内核中全局性质的不变式定义, 复杂数据结构程序的形式化描述和验证仍存在很多困难. 本文针对操作系统内核中满足的全局性质, 在代码层以函数为单位, 用全局不变式进行定义, 并在不同的函数中进行形式化验证, 从而证明各个函数符合操作系统内核的全局性质; 针对操作系统内核中经常使用的复杂数据结构程序, 本文通过扩展形状图理论, 提出一种使用嵌套形状图逻辑的方法来形式化描述复杂数据结构程序, 并对该方法进行了正确性证明, 最终成功验证操作系统内核中关于任务创建与调度, 消息队列创建与操作相关的代码. 相似文献
10.
11.
12.
Tao Yang Cong Fu 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(6):608-622
Many partitioned scientific programs can be modeled as iterative executions of computational tasks and represented by iterative task graphs (ITGs). An ITG may or may not have dependence cycles. In this paper, we consider the symbolic scheduling of ITGs on distributed memory architectures with nonzero communication overhead and propose heuristic algorithms for scheduling both cyclic and acyclic ITGs without searching an entire iteration space. Our approach incorporates techniques of software pipelining, graph unfolding, directed acyclic graph (DAG) scheduling, and load balancing. We analyze the asymptotic optimality of the algorithms to show that the derived schedules are competitive to optimal solutions. We also study the sensitivity of scheduling performance on inaccurate weights. Finally, we present experimental results to demonstrate the effectiveness of the optimization techniques 相似文献
13.
Alexandre E. Eichenberger Edward S. Davidson Santosh G. Abraham 《International journal of parallel programming》1996,24(2):103-132
Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present an approach that schedules the loop operations for minimum register requirements, given a modulo reservation table. Our method determines optimal register requirements for machines with finite resources and for general dependence graphs. Measurements on a benchmark suite of 1327 loops from the Perfect Club, SPEC-89, and the Livermore Fortran Kernels show that the register requirements decrease by 24.8% on average when applying the optimal stage scheduler to the MRT-schedules of a register-insensitive modulo scheduler. 相似文献
14.
To handle scheduling of tasks on heterogeneous systems, an algorithm is proposed to reduce execution time while allowing for maximum parallelization. The algorithm is based on multi-objective scheduling cuckoo optimization algorithm (MOSCOA). In this algorithm, each cuckoo represents a scheduling solution in which the ordering of tasks and processors allocated to them are considered. In addition, the operators of cuckoo optimization algorithm means laying and immigration are defined so that it is usable for scheduling scenario of the directed acyclic graph of the problem. This algorithm adapts cuckoo optimization algorithm operators to create proper scheduling in each stage. This ensures avoiding local optima while allowing for global search within the problem space for accelerating the finding of a global optimum and delivering a relatively optimized scheduling with the least number of repetitions. Moving toward global optima is done through a target immigration operator in this algorithm and schedules in each repetition are pushed toward optimized schedules to secure global optima. The results of MOSCOA implementation on a large number of random graphs and real-world application graphs with a wide range characteristics show MOSCOA superiority over the previous task scheduling algorithms. 相似文献
15.
16.
Kumar Jain K. Rajaraman V. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(8):879-886
The lower and upper bounds on the minimum time needed to process a given directed acyclic task graph for a given number of processors are derived. It is proved that the proposed lower bound on time is not only sharper than the previously known values but also easier to calculate. The upper bound on time, which is useful in determining the worst case behavior of a given task graph, is presented. The lower and upper bounds on the minimum number of processors required to process a given task graph in the minimum possible time are also derived. It is seen with a number of randomly generated dense task graphs that the lower and upper bounds we derive are equal, thus giving the optimal time for scheduling directed acyclic task graphs on a given set of processors 相似文献
17.
《Journal of Parallel and Distributed Computing》2003,63(3):337-355
Dependence analysis algorithms have been proposed to identify parallelism in programs with tree-like data structures. However, they cannot analyze the dependence of statements if recursive data structures of programs are cyclic. This paper presents a technique to identify parallelism in programs with cyclic graphs. The technique consists of three steps: (1) traversal patterns that loops or recursive procedures traverse graphs are identified, and the statements that construct the links of traversal patterns will be located by definition–use chains of recursive data structures; (2) traversal-pattern-sensitive shape analysis is performed to estimate possible shapes of traversal patterns; (3) dependence analysis is performed to identify parallelism using the result of shape analysis. This approach can identify parallelism in programs with cyclic data structures due to the facts that many programs follow acyclic structures (i.e. traversal patterns) to access all nodes on the cyclic data structures. Once the traversal patterns are isolated from the overall data structures, dependence analysis can be applied to identify parallelism. 相似文献
18.
Composite objects often involve recursive relationships, so-called bills-of-materials, which are cumbersome to handle in relational database systems. The relationships constitute a directed graph, where the successors of a node represent its components, recursively. Instead of the whole transitive closure (all ancestor-descendant pairs), the task is to retrieve the descendants of any given node. A simple relational solution is suggested, which packs information of the ancestor path of each node into a fixed-length code, called the signature. The code is nonunique, and its purpose is to define a relatively small superset of the descendants, as well as to establish a basis for clustering. It supports effective retrieval of the descendants, in terms of both disk accesses and DBMS calls. The method performs best for tree-structured graphs, where the processing time typically decreases by a factor of more than 10, compared to a simple loop of joins. Also, general directed graphs, both acyclic and cyclic, can be processed more effectively. The method is implemented on top of a relational system, but advantages can be gained on other platforms, too 相似文献
19.
Global software pipelining is a complex but efficient compilation technique to exploit instruction-level parallelism for loops with branches.This paper presents a novel global software pipelining technique,called Trace Software Pipelining,targeted to the instruction-level parallel processors such as Very Long Instruction Word (VLIW) and superscalar machines.Trace software pipelining applies a global code scheduling technique to compact the original loop body.The resulting loop is called a trace software pipelined (TSP) code.The trace softwrae pipelined code can be directly executed with special architectural support or can be transformed into a globally software pipelined loop for the current VLIW and superscalar processors.Thus,exploiting parallelism across all iterations of a loop can be completed through compacting the original loop body with any global code scheduling technique.This makes our new technique very promising in practical compilers.Finally,we also present the preliminary experimental results to support our new approach. 相似文献