首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
消除VLIW结构上的循环体间冗余流相关   总被引:2,自引:1,他引:1  
容红波  汤志忠 《软件学报》2000,11(1):126-132
数据相关是并行处理的基本依据.该文指出,VLIW(very long instruction word)特有的锁步性质使其数据相关性分析具有与众不同的特点.同一体差上的流相关形成一个线序集合,多体差上的特征流相关之间也存在包含关系.据此,提出一种用于VLIW的消除循环体间冗余流相关的方法.该方法是完备的,可以去除所有冗余的体间流相关,从而减轻循环调度的负担.文章给出判定单体差和多体差存在冗余的充分必要条件,以及消除冗余的线性复杂度的算法.这种方法具有普遍意义,可作为VLIW上软件流水和多指令流调度的基础.  相似文献   

2.
In this paper we address the problem of partitioning nested loops with non-uniform (irregular) dependence vectors. Parallelizing and partitioning of nested loops requires efficient inter-iteration dependence analysis. Although many methods exist for nested loop partitioning, most of these perform poorly when parallelizing nested loops with irregular dependences. Unlike the case of nested loops with uniform dependences these will have a complicated dependence pattern which forms a non-uniform dependence vector set. We apply the results of classical convex theory and principles of linear programming to iteration spaces and show the correspondence between minimum dependence distance computation and iteration space tiling. Cross-iteration dependences are analyzed by forming an Integer Dependence Convex Hull (IDCH). Every integer point in this IDCH corresponds to a dependence vector in the iteration space of the nested loops. A simple way to compute minimum dependence distances from the dependence distance vectors of the extreme points of the IDCH is presented. Using these minimum dependence distances the iteration space can be tiled. Iterations within a tile can be executed in parallel and the different tiles can then be executed with proper synchronization. We demonstrate that our technique gives much better speedup and extracts more parallelism than the existing techniques.  相似文献   

3.
It is extremely difficult to parallelize DOACROSS loops with nonuniform loop-carried dependences. In this paper, we present a static scheduling scheme with an accompanying synchronization strategy that can execute such DOACROSS loops effectively and efficiently. Our approach uses one of the parallelization techniques called Dependence Uniformization, which finds a small set of uniform dependence vectors to cover all possible nonuniform dependences in a DOACROSS loop. It differs from the previous schemes in that we demonstrate a better way to select the uniform dependence vectors. When used with the Static Strip Scheduling scheme, the proposed uniform dependence vector set allows us to enforce dependences with more locality, which reduces the requirement of explicit synchronization considerably while retaining most of the parallelism. This paper describes the uniform dependence vectors selection strategy and the static strip scheduling scheme. The performance analysis and examples are also presented  相似文献   

4.
An efficient algorithm to remove redundant dependences in simple loops with constant dependences is presented. Dependences constrain the parallel execution of programs and are typically enforced by synchronization instructions. The synchronization instructions represent a significant part of the overhead in the parallel execution of a program. Some program dependences are redundant because they are covered by other dependences. It is shown that unlike with single loops, in the case of nested loops, a particular dependence may be redundant at some iterations but not redundant at others, so that the redundancy of a dependence may not be uniform over the entire iteration space. A sufficient condition for the uniformity of redundancy in a doubly nested loop is developed  相似文献   

5.
VLSI technology has had tremendous success in revolutionizing computer design with processor arrays. Local communication and interconnection is a constraint that dictates the design of processor arrays. The shared bus and global access to memory are now no longer used, since they lower the speed. Consequently, parallel algorithms must be designed according to these constraints.

One of the problems that must be resolved for the above mentioned constraints is data broadcast elimination. Algorithms must be transformed into a form that uses data propagation instead of data broadcast.

Here systems of affine recurrence equations are analyzed and data broadcast is denned in context of the definition of data dependence and affine recurrence equations. A method for data broadcast elimination is introduced in [1] and expands the system of affine recurrence equations into new recurrence equations, that define data propagation and eliminates the data dependences where data broadcast occurs.

Parallel algorithms are usually given as a set of similar tasks repetitively performed on different data. The iteration form of presenting the algorithms is most common. Several techniques are introduced to transform the algorithm to a single assignment form of recurrence equations.

Some improvements of these techniques are presented to make the application of the data broadcast elimination method easier and more straight forward. The presented techniques are classified as the transformation of iterative algorithms to a recurrence form, the transformation of recurrence form to a single assignment form, and fulfilling the index forms of the algorithms.

A system of affine recurrence equations with the data broadcast property is always obtained by applying these procedures. The method of data broadcast elimination successfully transforms this system of affine recurrence equations into a system of uniform recurrence equations which can be used for parallel implementation on VLSI processor arrays.  相似文献   

6.
Precise value-based data dependence analysis for scalars is useful for advanced compiler optimizations. The new method presented here for flow and output dependence uses Factored Use and Def chains (FUD chains), our interpretation and extension of Static Single Assignment. It is precise with respect to conditional control flow and dependence vectors. Our method detects dependences which are independent with respect to arbitrary loop nesting, as well as loop-carried dependences. A loop-carried dependence is further classified as being carried from the previous iteration, with distance 1, or from any previous iteration, with direction <. This precision cannot be achieved by traditional analysis, such as dominator information or reaching definitions. To compute anti- and input dependence, we use Factored Redef-Use chains, which are related to FUD chains. We are not aware of any prior work which explicitly deals with scalar data dependence utilizing a sparse graph representation. A preliminary version of this paper appeared in theSeventh Anual Workshop on Languages and Compilers for Parallel Computing, August 1994. Supported in part by NSF Grant CCR-9113885 and a grant from Intel Corporation and the Oregon Advanced Computing Institute.  相似文献   

7.
Abstract

This paper presents a method for parallelising nested loops with affine dependences. The data dependences of a program are represented exactly using a dependence matrix rather than an imprecise dependence abstraction. By a careful analysis of the eigenvectors and eigenvalues of the dependence matrix, we detect the parallelism inherent in the program, partition the iteration space of the program into sequential and parallel regions, and generate parallel code to execute these regions. For a class of programs considered in the paper, the proposed method can expose more coarse-grain and fine-grain parallelism than a hyperplane-based loop transformation.  相似文献   

8.
This paper presents Chain Grouping, a new low complexity method for the problem of partitioning the loop iteration space into groups with little intercommunication requirements, for mapping onto mesh-connected architectures. First, the iterations are scheduled in time, according to the hyperplane method, taking into consideration the minimum time displacement. Then, the iteration space is divided into discrete groups of related iterations, which are assigned to different processors, while preserving the optimal completion time. Chain Grouping is based on clustering together neighboring uniform chains of iterations, formed by a particular dependence vector. This vector will be proven as the best among all to reduce the total communication requirements. Inside every group, the optimal hyperplane scheduling is preserved and references to intragroup iterations are considerably increased. The partitioned groups are afterward assigned to meshes of processors. The resulting space mapping maximizes processor utilization and cuts down overall communication delays while preserving the optimal hyperplane time schedule.  相似文献   

9.
以Banerjee-GCD方法和Banerjee-Bound方法为基础,充分考虑了两者的测试结果之间的相互影响以及程序并行化对相关性测试的要求,从而提出了一个在统一的框架下利用Banerjee-GCD方法与Banerjee-Bound方法对不同的相关向量进行测试的联合数组相关性测试方法,该方法在保持执行时间效率的前提下提高了测试的精确性和结果的有效性,并且能够处理一部分非线性下标表达式的情况。  相似文献   

10.
以基本几何约束组合统一表达装配约束,为提高求解效率,研究了姿态约束和位置约束的可解耦情况下位置约束的解析求解.将基本位置约束映射为移动空间并以参数方程表达,通过移动空间的增量解析求交,满足约束;在姿态约束和位置约束的不可解耦情况,联立基本约束进行整体数值法求解.文中方法保持了基本约束表达的独立性,适合于欠约束系统和完整约束系统.  相似文献   

11.
Many abstractions of program dependences have already been proposed, such as the Dependence Distance, the Dependence Direction Vector, the Dependence Level or the Dependence Cone. These different abstractions have different precisions. Theminimal abstraction associated to a transformation is the abstraction that contains the minimal amount of information necessary to decide when such a transformation is legal. Minimal abstractions for loop reordering and unimodular transformations are presented. As an example, the dependence cone, which approximates dependences by a convex cone of the dependence distance vectors, is the minimal abstraction for unimodular transformations. It also contains enough information for legally applying all loop reordering transformations and finding the same set of valid mono- and multi-dimensional linear schedules as the dependence distance set.  相似文献   

12.
We consider in this paper a set of generic tasks constrained by a set of uniform precedence constraints corresponding to a natural generalization of the basic cyclic scheduling problem. The two parameters of any uniform constraint (namely the value and the height) between two tasks may be negative, which allows one to tackle a larger class of practical applications.  相似文献   

13.
This paper presents a constructive method for generating a uniform cubic B-spline curve interpolating a set of data points simultaneously controlled by normal and curvature constraints. By comparison, currently published methods have addressed one or two of those constraints (point, normal or cross-curvature interpolation), but not all three constraints simultaneously with C2 continuity. Combining these constraints provides better control of the generated curve in particular for feature curves on free-form surfaces. Our approach is local and provides exact interpolation of these constraints.  相似文献   

14.
基于轴节点的XML Schema到关系模式的映射   总被引:2,自引:0,他引:2  
任廷艳  余建桥 《计算机应用》2009,29(8):2303-2305
DTD模式不支持复杂元素类型定义,在引入Schema形式化定义的基础上,给出XML上的复杂元素和函数依赖的定义,提出一种基于轴节点的映射算法。该算法根据轴节点和XML函数依赖生成关系表,能保持XML文档的内容和结构信息,保持函数依赖,减少存储冗余,并且证明映射后的关系模式满足3NF。  相似文献   

15.
This paper presents a new algorithm for three-dimensional coverage path planning for autonomous structural inspection operations using aerial robots. The proposed approach is capable of computing short inspection paths via an alternating two-step optimization algorithm according to which at every iteration it attempts to find a new and improved set of viewpoints that together provide full coverage with decreased path cost. The algorithm supports the integration of multiple sensors with different fields of view, the limitations of which are respected. Both fixed-wing as well as rotorcraft aerial robot configurations are supported and their motion constraints are respected at all optimization steps, while the algorithm operates on both mesh- and occupancy map-based representations of the environment. To thoroughly evaluate this new path planning strategy, a set of large-scale simulation scenarios was considered, followed by multiple real-life experimental test-cases using both vehicle configurations.  相似文献   

16.
We present a new hybrid method for solving constrained numerical and engineering optimization problems in this paper. The proposed hybrid method takes advantage of the differential evolution (DE) ability to find global optimum in problems with complex design spaces while directly enforcing feasibility of constraints using a modified augmented Lagrangian multiplier method. The basic steps of the proposed method are comprised of an outer iteration, in which the Lagrangian multipliers and various penalty parameters are updated using a first-order update scheme, and an inner iteration, in which a nonlinear optimization of the modified augmented Lagrangian function with simple bound constraints is implemented by a modified differential evolution algorithm. Experimental results based on several well-known constrained numerical and engineering optimization problems demonstrate that the proposed method shows better performance in comparison to the state-of-the-art algorithms.  相似文献   

17.
18.
This paper applies unimodular transformations and tiling to improve data locality of a loop nest. Due to data dependences and reuse information, not all dimensions of the iteration space will and can be tiled. By using cones to represent data dependences and vector spaces to quantify data reuse in the program, a reuse-driven transformational approach is presented, which aims at maximizing the amount of data reuse carried in the tiled dimensions of the iteration space while keeping the number of tiled dimensions to a minimum (to reduce loop control overhead). In the special case of one single fully permutable loop nest, an algorithm is presented that tiles the program optimally so that all data reuse is carried in the tiled dimensions. In the general case of multiple fully permutable loop nests, data dependences can prevent all data reuse to be carried in the tiled dimensions. An algorithm is presented that aims at localizing data reuse in the tiled dimensions so that the reuse space localized has the largest dimensionality possible.  相似文献   

19.
Multi-robot area patrol under frequency constraints   总被引:1,自引:0,他引:1  
Patrolling involves generating patrol paths for mobile robots such that every point on the paths is repeatedly covered. This paper focuses on patrolling in closed areas, where every point in the area is to be visited repeatedly by one or more robots. Previous work has often examined paths that allow for repeated coverage, but ignored the frequency in which points in the area are visited. In contrast, we first present formal frequency-based optimization criteria used for evaluation of patrol algorithms. Then, we present a patrol algorithm that guarantees maximal uniform frequency, i.e., each point in the target area is covered at the same optimal frequency. This solution is based on finding a circular path that visits all points in the area, while taking into account terrain directionality and velocity constraints. Robots are positioned uniformly along this path in minimal time, using a second algorithm. Moreover, the solution is guaranteed to be robust in the sense that uniform frequency of the patrol is achieved as long as at least one robot works properly. We then present a set of algorithms for handling events along the patrol path. The algorithms differ in the way they handle the event, as a function of the time constraints for handling them. However, all the algorithms handle events while maintaining the patrol path, and minimizing the disturbance to the system.  相似文献   

20.
在研究二次函数等简单幂函数的Julia(朱利亚)集过程中,发现传统的逃逸时间算法耗费机时,并且得到的是Julia集的填充集而不是Julia集的吸引子。该文介绍了Julia集的反函数迭代基本算法和基于IFS(函数迭代系)反函数迭代算法。并且在反函数迭代的基本算法的基础上,提出了通过对迭代顺序的改变,以减少在迭代过程中对内存空间的需求。文中将反函数迭代算法与传统的Julia集的逃逸时间算法进行了仿真对比,表明利用反函数迭代算法可以极大的减少机时,并且利用该算法可以得到Julia集的吸引子。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号