首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
Previous research on runahead execution took it for granted as a prefetch-only technique. Even though the results of instructions independent of an L2 miss are correctly computed during runahead mode, previous approaches discarded those results instead of trying to utilize them in normal mode execution. This paper evaluates the effect of reusing the results of preexecuted instructions on performance. We find that, even with an ideal scheme, it is not worthwhile to reuse the results of preexecuted instructions. Our analysis provides insights into why result reuse does not provide significant performance improvement in runahead processors and concludes that runahead execution should be employed as a prefetching mechanism rather than a full-blown prefetching/result-reuse mechanism.  相似文献   

2.
Runahead执行技术能够显著地提高计算机系统的存储级并行,而无需对处理器结构做出较大改动。但Runahead执行处理器要比传统处理器多执行很多指令,最多是正常执行指令数的三倍以上,大大增加了处理器的功耗。本文通过分析发现Runahead执行在预执行阶段会执行大量的无效指令,据此提出一种减少无效指令的方法来提高Runa-head执行处理器的效率。通过实验分析,在性能影响较小的情况下,该方法最多可以减少50%的Runahead执行处理器在预执行阶段执行的无效指令。  相似文献   

3.
An instruction window that can tolerate latencies to DRAM memory is prohibitively complex and power hungry. To avoid having to build such large windows, runahead execution uses otherwise-idle clock cycles to achieve an average 22 percent performance improvement for processors with instruction windows of contemporary sizes. This technique incurs only a small hardware cost and does not significantly increase the processor's complexity.  相似文献   

4.
提高科学工作流在云环境中的执行效率、降低执行费用受到广泛关注。用户期望的局部QoS约束与工作流的总体执行效率之间往往存在矛盾。针对该现象,在前期的研究基础上提出一种允许违反局部时间约束的科学工作流调度策略。通过对已聚簇的工作流任务集使用任务后向优先合并的方法,可实现任务间空闲时间片的合理利用,进而优化科学工作流的执行时间;另外,为充分利用任务的松弛时间,提高工作流的整体执行效率,允许部分任务的调度违反局部最晚完成时间的约束。实验结果表明,该策略能提前科学工作流的最早完成时间,提高处理机的利用率,并最终降低工作流的执行费用。  相似文献   

5.
龚沛  耿楚瑶  郭俊霞  赵瑞莲 《计算机科学》2016,43(2):199-203, 229
在软件调试过程中,如何快速、精确地定位程序中的错误代码是软件开发人员普遍关注的问题。基于变异的错误定位方法是一种通过分析被测程序与程序变异体之间的行为相似性来估计语句出错概率、进行错误定位的方法。该方法有较高的错误定位精确度,但由于需对大量程序变异体执行测试用例集,因此其变异执行开销较大。为此提出了一种动态变异执行策略,它通过搜集测试用例执行信息,动态地调整变异体及测试用例的执行顺序,以减少其变异执行开销。实验结果表明,在6个程序包的127个错误版本上,应用提出的动态变异执行策略可在保证错误定位精确度的前提下,减少23%~78%的变异执行开销,显著提高了基于变异的错误定位方法的效率。  相似文献   

6.
The problem of making a feasible schedule in a preemptive multiprocessing system with identical processors and several types of additional recourses is considered in the case when the task execution intervals are given and the durations of task execution linearly depend on the amount of the additional resource allocated to them. Algorithms based on reducing this problem to a network flow problem and a system of linear constraints are developed.  相似文献   

7.
Conditional branches incur a severe performance penalty in wide-issue, deeply pipelined processors. Speculative execution(1, 2) and predicated execution(3–9) are two mechanisms that have been proposed for reducing this penalty. Speculative execution can completely eliminate the penalty associated with a particular branch, but requires accurate branch prediction to be effective. Predicated execution does not require accurate branch prediction to eliminate the branch penalty, but is not applicable to all branches and can increase the latencies within the program. This paper examines the performance benefit of using both mechanisms to reduce the branch execution penalty. Predicated execution is used to handle the hard-to-predict branches and speculative execution is used to handle the remaining branches. The hard-to-predict branches within the program are determined by profiling. We show that this approach can significantly reduce the branch execution penalty suffered by wide-issue processors.  相似文献   

8.
The EGEE Grid offers the necessary infrastructure and resources for reducing the running time of particle tracking Monte-Carlo applications like GATE. However, efforts are required to achieve reliable and efficient execution and to provide execution frameworks to end-users. This paper presents results obtained with porting the GATE software on the EGEE Grid, our ultimate goal being to provide reliable, user-friendly and fast execution of GATE to radiation therapy researchers. To address these requirements, we propose a new parallelization scheme based on a dynamic partitioning and its implementation in two different frameworks using pilot jobs and workflows. Results show that pilot jobs bring strong improvement w.r.t. regular gLite submission, that the proposed dynamic partitioning algorithm further reduces execution time by a factor of two and that the genericity and user-friendliness offered by the workflow implementation do not introduce significant overhead.  相似文献   

9.
SQL性能优化是一个系统工程,熟知SQL语法并不能保证SQL语句的高效。从分析ORACLE数据库中SQL语句执行原理入手,指出减少SQL语句的解析次数和优化CBO是SQL性能优化的重点,并用实例验证了使用绑定变量和高速缓存游标在提高SQL性能方面的重要作用。CBO将输入的SQL语句转化为实际的执行计划,而执行计划的优劣又主要依赖于初始化参数与统计信息的质量,因此在优化CBO中重点阐述了如何设置影响CBO行为的初始化参数及收集统计信息必须遵守的五条原则。  相似文献   

10.
On applying hash filters to improving the execution of multi-join queries   总被引:1,自引:0,他引:1  
In this paper, we explore an approach of interleaving a bushy execution tree with hash filters to improve the execution of multi-join queries. Similar to semi-joins in distributed query processing, hash filters can be applied to eliminate non-matching tuples from joining relations before the execution of a join, thus reducing the join cost. Note that hash filters built in different execution stages of a bushy tree can have different costs and effects. The effect of hash filters is evaluat ed first. Then, an efficient scheme to determine an effective sequence of hash filters for a bushy execution tree is developed, where hash filters are built and applied based on the join sequence specified in the bushy tree so that not only is the reduction effect optimized but also the cost associated is minimized. Various schemes using hash filters are implemented and evaluated via simulation. It is experimentally shown that the application of hash filters is in general a very powerful means to improve th e execution of multi-join queries, and the improvement becomes more prominent as the number of relations in a query increases. Edited by G. Gardarin. Received October 1994 / Accepted December 1995  相似文献   

11.
Context: Message-passing parallel programs are commonly used parallel programs. Various scheduling sequences contained in these programs, however, increase the difficulty of testing them. Therefore, reducing scheduling sequences by using appropriate approaches can greatly improve the efficiency of testing these programs.Objective: This paper focuses on the problem of reducing scheduling sequences of message-passing parallel programs, and presents a novel approach to reducing scheduling sequences.Method: In this approach, scheduling sequences that affect the target statement are first determined based on the relation between a scheduling sequence and the execution of the target statement. Then, these scheduling sequences are divided into a number of equivalent classes according to the execution of the target statement. Finally, for each scheduling sequence in the same equivalent class, the values of the two proposed indexes are calculated, and the scheduling sequence with the minimal comprehensive value is selected as the one after reduction.Results: To evaluate the performance of the proposed approach, it is applied to test 12 typical message-passing parallel programs. The experimental results show that the proposed approach reduces 63% scheduling sequences on average. And compared with the method without reduction, and the method with randomly selecting scheduling sequences, the proposed approach shortens 67% and 52% execution time of a program for covering the target statement on average, respectively.Conclusion: The proposed approach can greatly reduce scheduling sequences, and shorten execution time of a program for covering the target statement, hence improving the efficiency of testing the program.  相似文献   

12.
The problem of finding an optimal dynamic assignment of a modular program for a two-processor system is analyzed. Stone's formulation of the static assignment problem is extended to include the cost of dynamically reassigning a module from one processor to the other and the cost of module residence without execution. By relocating modules during the course of program execution, changes in the locality of the program can be taken into account. It is shown that network flow algorithms may be used to find a dynamic assignment that minimizes the sum of module execution costs, module residence costs, intermodule communication costs, and module reassignment costs. Techniques for reducing the size of the problem are described for the case where the costs of residence are negligible.  相似文献   

13.
The problem of speed-optimal solution was formulated for the pipeline systems with variable job execution order which depends on the number of the system stage, the order of job execution, and the finite time of its variation. The mathematical system model was constructed. The solution based on reducing the problem to a finite number of similar problems with constant job execution orders was proposed. Reduction algorithms and examples of designing the optimal variable job order were presented.Translated from Avtomatika i Telemekhanika, No. 3, 2005, pp. 74–90.Original Russian Text Copyright © 2005 by Levin.This paper was recommended for publication by V.M. Vishnevskii, a member of the Editorial Board  相似文献   

14.
Measuring execution time is one of the most used performance evaluation techniques in computer science research. Inaccurate measurements cannot be used for a fair performance comparison between programs. Despite the prevalence of its use, the intrinsic variability in the time measurement makes it hard to obtain repeatable and accurate timing results of a program running on an operating system. We propose a novel execution time measurement protocol (termed EMP) for measuring the execution time of a compute‐bound program on Linux, while minimizing that measurement's variability. During the development of execution time measurement protocol, we identified several factors that disturb execution time measurement. We introduce successive refinements to the protocol by addressing each of these factors, in concert, reducing variability by more than an order of magnitude. We also introduce a new visualization technique, what we term ‘dual‐execution scatter plot’ that highlights infrequent, long‐running daemons, differentiating them from frequent and/or short‐running daemons. Our empirical results show that the proposed protocol successfully achieves three major aspects—precision, accuracy, and scalability—in execution time measurement that can work for open‐source and proprietary software. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

15.
The author presents strategies for static loop decomposition and scheduling as well as computer-assisted run-time scheduling that take into account, in addition to the cost of performing operations, the overhead costs associated with a decomposition and schedule. An algorithm for static decomposition of multidimensional loops based on the operation execution costs, communication costs, and synchronization costs is discussed. Synchronization instructions are introduced to ensure correct program execution following program decomposition. An algorithm for determining the explicit synchronization instruction that should be introduced to ensure correct execution of a program with arbitrarily nested loops is presented. Techniques for reducing run-time scheduling and communication and synchronization costs due to self-scheduling of multidimensional loops are also presented. Experiments performed on the Encore multiprocessor system demonstrate that the techniques developed can reduce overhead costs  相似文献   

16.
在传统调试过程中,缺陷定位通常作为程序修复的前置步骤.最近,一种新型调试框架(统一化调试)被提出.不同于传统调试中缺陷定位和程序修复的单向连接方式,统一化调试首次建立了定位与修复之间的双向连接机制,从而达到同时提升两个领域的效果.作为首个统一化调试技术,ProFL利用程序修复过程中伴随产生的大量补丁执行信息逆向地提升已...  相似文献   

17.
低功耗多线程编译优化技术   总被引:12,自引:1,他引:12  
提出了在多线程体系结构中通过降低执行频率有效减小功耗的理论模型和方法.首先研究识别可降频运行的线程的计算模型和降频因子的计算,然后给出在编译过程中基于对应用程序行为的分析,结合线程划分的低功耗编译优化算法和实现策略.该模型和方法可用于具有执行频率可动态调整的多处理器类多线程体系结构,既可开发TLP(thread level parallelism),又可有效减小功率消耗.  相似文献   

18.
关联规则挖掘是指从数据中发现有用的信息。本文首先介绍了DHP算法的基本思想,DHP算法利用hash修剪技术,减少数据量。举例说明DHP算法的执行过程,并且对DHP的算法性能进行分析。DHP算法高效的生成频繁项目集,解决了生成频繁2-项目集 时的性能瓶颈问题。减少事务数据库大小和减少数据库扫描次数。  相似文献   

19.
易会战  罗兆成 《软件学报》2013,24(8):1761-1774
当前,很多部门使用高性能计算机周期性地进行业务性的数值计算。维护这些业务系统的主要代价是每天消耗的大量电能,降低能量消耗能够极大地降低维护业务系统的成本。高性能业务系统的核心是微处理器,当前,微处理器普遍支持动态电压调节技术。该技术通过降低微处理器的电压和频率减小微处理器的能耗,但是一般会导致系统性能的下降。提出了一种面向高性能业务应用的能量优化技术。该技术利用系统支持的多个频率层次,建立性能约束下的能量优化模型,优化业务应用的能耗。根据程序信息获取方式的差别,提出了SEOM 和 CEOM 两种能量优化模型,SEOM模型的程序信息可以直接测试获取,CEOM的程序信息采用编译器插桩方法获取。使用典型平台对能耗优化效果进行了验证,最多可节省12%的能耗。  相似文献   

20.
多目标最优化云工作流调度进化遗传算法   总被引:1,自引:0,他引:1  
为了实现云环境中科学工作流调度的执行跨度和执行代价的同步优化,提出了一种多目标最优化进化遗传调度算法MOEGA。该算法以进化遗传为基础,定义了任务与虚拟机映射、虚拟机与主机部署间的编码机制,设计了满足多目标优化的适应度函数。同时,为了满足种群的多样性,在调度方案中引入了交叉与变异操作,并使用启发式方法进行种群初始化。通过4种现实科学工作流的仿真实验,将其与同类型算法进行了性能比较。结果表明,MOEGA算法不仅可以满足工作流截止时间约束,而且在降低任务执行跨度与执行代价的综合性能方面也优于其他算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号