首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
This paper describes Cronus, a platform for parallelizing general nested loops. General nested loops contain complex loop bodies (assignments, conditionals, repetitions) and exhibit uniform loop-carried dependencies. The novelty of Cronus is twofold: (1) it determines the optimal scheduling hyperplane using the QuickHull algorithm, which is more efficient than previously used methods, and (2) it implements a simple and efficient dynamic rule (successive dynamic scheduling) for the runtime scheduling of the loop iterations along the optimal hyperplane. This scheduling policy enhances data locality and improves the makespan. Cronus provides an efficient runtime library, specifically designed for communication minimization, that performs better than more generic systems, such as Berkeley UPC. Its performance was evaluated through extensive testing. Three representative case studies are examined: the Floyd-Steinberg dithering algorithm, the Transitive Closure algorithm, and the FSBM motion estimation algorithm. The experimental results corroborate the efficiency of the parallel code. The tests show speedup ranging from 1.18 (out of the ideal 4) to 12.29 (out of the ideal 16) on distributed-systems and 3.60 (out of 4) to 15.79 (out of 16) on shared-memory systems. Cronus outperforms UPC by 5-95% depending on the test case.  相似文献   

3.
Software pipelining increases the loop execution throughput by overlapping the execution of successive iterations in a pipelined fashion. For loops with control flows, however, software pipelining is not straightforward because we need to consider the overlap of more than one execution path. Modulo scheduling simply transforms them into straightline loops through if-conversion which, in effect, achieves a fixed, worst-case initiation interval (/spl par/) among all paths. On the other hand, all-path pipelining (APP) and enhanced pipeline scheduling (EPS) can achieve a variable /spl par/ depending on the path that is taken at execution time. Unfortunately, APP concentrates only on the overlap within the same path, entirely losing the overlap between different paths, whereas EPS attempts to overlap all paths together, failing to produce a tight schedule for each individual path, especially when resource constraints are tight. In this paper, we propose a new approach to EPS called split-path EPS (SP-EPS), which first splits each individual path via tail duplication and then performs EPS in a way to guarantee a tight schedule for each path, while producing a competitive cross-path schedule. We also extend SP-EPS to outer loops such that frequent paths that bypass the inner loop are split and then scheduled by SP-EPS. Our experimental results on nontrivial integer benchmarks show that SP-EPS can achieve as much as a geometric mean of 10 percent speedup over EPS when innermost loops are scheduled by SP-EPS, while it can achieve a geometric mean of 11.9 percent speedup when outer loops are also scheduled by SP-EPS.  相似文献   

4.
This paper presents an approach to the design and development of knowledge-based systems in general and their application in the field of maintenance management in particular. Our approach is based on the idea that different kinds of knowledge in a given domain, namely declarative, procedural and heuristic are supported by corresponding methods and software tools. A prototype knowledge-based system, called EXPERT-MM, for the maintenance activities in the Siam Gipsum Industry (Bangkok, Thailand) has been worked out as a case study and is described in the paper. EXPERT-MM supports three main functions: maintenance policy suggestions, machine diagnosis and maintenance scheduling. The maintenance policy deals with the three types of preventive maintenance. For each component of the equipment it analyses the historical failure data and recommends an appropriate policy with optimal preventive maintenance intervals. This is based on the experts' knowledge stored in a knowledge base. A rotary screw type air compressor is selected for a diagnosis. The knowledge representation scheme is rule-based and the inference strategy mechanism is backward chaining. The knowledge-acquisition process has been organized and realised using a decision tree diagram. The knowledge base contains 154 rules for the diagnosis and 54 rules for the maintenance model selection. the maintenance scheduling module is procedure based. EXPERT-MM development is based on the software tools dBase III Plus, TURBO PASCAL version 6.0 and expert system shell EXSYS, all integrated into a single software system with a user-friendly interface.  相似文献   

5.
The paper presents a knowledge-based analysis approach that generates first order predicate logic annotations of loops. A classification of loops according to their complexity levels is presented. Based on this taxonomy, variations on the basic analysis approach that best fit each of the different classes are described. In general, mechanical annotation of loops is performed by first decomposing them using data flow analysis. This decomposition encapsulates closely related statements in events, that can be analyzed individually. Specifications of the resulting loop events are then obtained by utilizing patterns, called plans, stored in a knowledge base. Finally, a consistent and rigorous functional abstraction of the whole loop is synthesized from the specifications of its individual events. To test the analysis techniques and to assess their effectiveness, a case study was performed on an existing program of reasonable size. Results concerning the analyzed loops and the plans designed for them are given  相似文献   

6.
面向SLP 的多重循环向量化   总被引:1,自引:0,他引:1  
魏帅  赵荣彩  姚远 《软件学报》2012,23(7):1717-1728
如今,越来越多的处理器集成了SIMD(single instruction multiple data)扩展,现有的编译器大多也实现了自动向量化的功能,但是一般都只针对最内层循环进行向量化,对于多重循环缺少一种通用、易行的向量化方法.为此,提出了一种面向SLP(superword level parallelism)的多重循环向量化方法,从外至内依次对各个循环层次进行分析,收集各层循环对应的一些影响向量化效果的属性值,主要包括能否对该循环进行直接循环展开和压紧、有多少数组引用相对于该循环索引连续以及该循环所包含的区域等,然后根据这些属性值决定在哪些循环层次进行直接循环展开和压紧,最后通过SLP对循环中的语句进行向量化.实验结果表明,该算法相对于内层循环向量化和简单的外层循环向量化平均加速比提升了2.13和1.41,对于一些常用的核心循环可以得到高达5.3的加速比.  相似文献   

7.
Software pipelining is an efficient method of loop optimization that allows for parallelism of operations related to different loop iterations. Currently, most commercial compilers use loop pipelining methods based on modulo scheduling algorithms. This paper reviews these algorithms and considers algorithmic solutions designed for overcoming the negative effects of software pipelining of loops (a significant growth in code size and increase in the register pressure) as well as methods making it possible to use some hardware features of a target architecture. The paper considers global-scheduling mechanisms allowing one to pipeline loops containing a few basic blocks and loops in which the number of iterations is not known before the execution.  相似文献   

8.
Parallel loops account for the greatest amount of parallelism in numerical programs.Executing nested loops in parallel wit low run-time overhead is thus very important for achieving high performance in paralleo processing systems.However,in parallel processing systems with caches of local memories in memory hierarchies,“thrashing problemmay” may arise when data move back and forth frequently between the caches or local memories in different processors.The techniques associated with parallel compiler to solve the problem are not completely developed.In this paper,we present two restructuring techniques called loopg staggering,loop staggering and compacting,with which we can not only eliminate the cache or local memory thrashing phemomena significantly,but also exploit the potential parallelism existing in outer serial loop.Loop staggering benefits the dynamic loop scheduling strategies,whereas loop staggering and compacting is good for static loop scheduling strategies,Our method especially benefits parallel programs,in which a parallel loop is enclosed by a serial loop and array elements are repeatedly used in the different iterations of the parallel loop.  相似文献   

9.
An architectural configuration of a knowledge-based system for production rescheduling reported in this paper uncovers a number of points of interest to practitioners as well as researchers. The study shows that knowledge-based methods applied to production rescheduling are a valuable approach for manufacturers to manage production disturbances and deliver customer orders on time. Very often, developing an effective scheduling system whilst solving some problems requires an appropriate combination of a rigorous analysis of the production system state and the rules of thumb used by the human scheduler. In the actual performance of this hybrid system, an expert simulation system was used to produce new schedules that fit the real production environment.  相似文献   

10.
阳雪林  于勐  陈道蓄  谢立 《软件学报》2002,13(8):1718-1722
针对分布式环境下可抽取观察循环的不规则串行程序循环的动态依赖关系分析问题,提出了一个基于观察/执行模型的动态分析算法.其贡献是:(1) 算法可并行执行于分布式系统;(2) 直接分析具有拷入和最后赋值操作的循环;(3) 给出了循环的并行化方法;(4) 并不要求循环是完全可并行的,对某些部分可并行循环,也支持其并行执行.理论分析和实验表明,在处理器数量适当的情况下,循环可以并行时,可以获得很好的加速比;不能并行时,对串行执行增加的开销也是小的.从而为分布式环境下开发更多的循环并行性提供了一种新的手段.  相似文献   

11.
软件流水是编译后端优化中针对循环的调度技术,在软件流水优化过程中,依赖环是影响软件流水优化的重要因素。针对循环体中依赖环导致软件流水失败的问题,通过对循环中的依赖环进行分析处理,基于传统的模调度框架,提出了改进的软件流水优化算法,对于造成依赖环的寄存器引入多个分量,实现了对含有归约变量循环的流水。通过典型的算法测试,实验结果表明,该框架能够使得更多类型的循环流水成功,对于循环核心性能提升至少58%。  相似文献   

12.
Specification of software pipelining using petri nets   总被引:1,自引:0,他引:1  
This paper presents a flexible model for software pipelining using the petri nets. Our technique, called the Petri Net Pacemaker (PNP), can create near optimal pipelines with less algorithmic effort than other techniques. The pacemaker is a novel idea which exploits the cyclic behavior of petri nets to model the problem of scheduling operations of a loop body for software pipelining. A way of improving the performance of loops containing predicates is given. The PNP technique also shows how nested loops can be pipelined. A comparison with some of the other techniques is presented. THis work was partially supported by the National Science Foundation under grants CDA-9100788 and CDA-9200371.  相似文献   

13.
This paper describes several loop transformation techniques for extracting parallelism from nested loop structures. Nested loops can then be scheduled to run in parallel so that execution time is minimized. One technique is called selective cycle shrinking, and the other is called true dependence cycle shrinking. It is shown how selective shrinking is related to linear scheduling of nested loops and how true dependence shrinking is related to conflict-free mappings of higher dimensional algorithms into lower dimensional processor arrays. Methods are proposed in this paper to find the selective and true dependence shrinkings with minimum total execution time by applying the techniques of finding optimal linear schedules and optimal and conflict-free mappings proposed by W. Shang and A.B. Fortes  相似文献   

14.
针对含有大量循环的串行程序存在的问题,提出一种基于线程级前瞻技术的循环选择方案。该方案对循环进行最优选择后建立一个可并行运行的循环集。对于该集合中的循环,选择并行效率高的代码段作并行处理,以加快串行程序运行速度。实验表明,相对于一般的简单内部循环或外部循环并行方法,该方案使9种基准代码的加速比平均上升23.8%,从而提高串行程序并行运行的效率。  相似文献   

15.
吴悦  雷超付  杨洪斌 《计算机工程》2010,36(9):35-37,40
针对含有大量循环的串行程序存在的问题,提出一种基于线程级前瞻技术的循环选择方案。该方案对循环进行最优选择后建立一个可并行运行的循环集。对于该集合中的循环,选择并行效率高的代码段作并行处理,以加快串行程序运行速度。实验表明,相对于一般的简单内部循环或外部循环并行方法,该方案使9种基准代码的加速比平均上升23.8%,从而提高串行程序并行运行的效率。  相似文献   

16.
宋广辉  郭绍忠  赵捷  陶小涵  李飞  许瑾晨 《软件学报》2023,34(12):5704-5723
混合精度在深度学习和精度调整与优化方面取得了许多进展,广泛研究表明,面向Stencil计算的混合精度优化也是一个很有挑战性的方向.同时,多面体模型在自动并行化领域取得的一系列研究成果表明,该模型为循环嵌套提供很好的数学抽象,可以在其基础上进行一系列的循环变换.基于多面体编译技术设计并实现了一个面向Stencil计算的自动混合精度优化器,通过在中间表示层进行迭代空间划分、数据流分析和调度树转换,首次实现了源到源的面向Stencil计算的混合精度优化代码自动生成.实验表明,经过自动混合精度优化之后的代码,在减少精度冗余的基础上能够充分发挥其并行潜力,提升程序性能.以高精度计算为基准,在x86平台上最大加速比是1.76,几何平均加速比是1.15;在新一代国产申威平台上最大加速比是1.64,几何平均加速比是1.20.  相似文献   

17.
For different levels of user performance, different types of information are processed and users will make different types of errors. Based on the error's immediate cause and the information being processed, usability problems can be classified into three categories. They are usability problems associated with skill-based, rule-based and knowledge-based levels of performance. In this paper, a user interface for a Web-based software program was evaluated with two usability evaluation methods, user testing and heuristic evaluation. The experiment discovered that the heuristic evaluation with human factor experts is more effective than user testing in identifying usability problems associated with skill-based and rule-based levels of performance. User testing is more effective than heuristic evaluation in finding usability problems associated with the knowledge-based level of performance. The practical application of this research is also discussed in the paper.  相似文献   

18.
《Knowledge》1999,12(3):81-93
This study aims to model a high school time-tabling task using the knowledge-based approach. The body of knowledge consists of a structural data set, rules sets and heuristics. A scheduling model is articulated to allocate teaching assignments to the time slot system by applying appropriate heuristics and rule sets. A scheduling engine is devised to allow the defining of assignments in any desired order using a heuristic function for enhancing the performance of the system; and to allow a search for the best slot on multiple feasible slots. The rule priorities may facilitate different time-tabling approaches.  相似文献   

19.
Global software pipelining is a complex but efficient compilation technique to exploit instruction-level parallelism for loops with branches.This paper presents a novel global software pipelining technique,called Trace Software Pipelining,targeted to the instruction-level parallel processors such as Very Long Instruction Word (VLIW) and superscalar machines.Trace software pipelining applies a global code scheduling technique to compact the original loop body.The resulting loop is called a trace software pipelined (TSP) code.The trace softwrae pipelined code can be directly executed with special architectural support or can be transformed into a globally software pipelined loop for the current VLIW and superscalar processors.Thus,exploiting parallelism across all iterations of a loop can be completed through compacting the original loop body with any global code scheduling technique.This makes our new technique very promising in practical compilers.Finally,we also present the preliminary experimental results to support our new approach.  相似文献   

20.
Inter‐iteration dependences in loops can hinder loop‐level parallelism. For some loops, existing thread‐level speculation techniques fail to expose their inherent loop‐level parallelism, because some inter‐iteration dependences are too costly to synchronize, predict, pre‐compute and isolate. This paper presents a compiler technique called loop recreation to change the nature of some dependences (by turning some inter‐iteration dependences into intra‐iteration ones and vice versa) in a loop so that the inter‐iteration dependences in the transformed loop are less costly to enforce at runtime than those in the original loop. We present an algorithm for finding an optimal loop recreation transformation with respect to a simple misspeculation cost model and demonstrate the performance advantages of loop recreation over two recent techniques for multicore systems running nine representative irregular applications. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号