期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李静梅关海洋《计算机科学》2012,39(9):307-311

针对传统处理器分支预测器存在分支预测信息混乱、分支指令别名冲突和容量冲突率高的缺点,提出基于同时多线程处理器的分支预测器TBHBP。该分支预测器采取线程历史信息与基于地址索引的局部历史信息相结合的综合历史信息作为模式匹配表PHT的索引,并采取线程独立拥有线程历史寄存器和分支历史寄存器的方式,通过新增分支结果输出表来提高指令的分支预测执行速度。研究结果表明,TBHBP分支预测器有效解决了分支信息过时、分支指令别名和容量冲突的问题。与Gshare分支预测器相比,其指令吞吐率提升了12.5%,分支误预测率和误预测路径取指率分别下降了0.5%和2.1%。相似文献

2.

基于谓词代码的编译优化技术研究

田祖伟孙光《计算机科学》2010,37(5):130-133

程序中大量分支指令的存在,严重制约了体系结构和编译器开发并行性的能力。有效发掘指令级并行性的一个主要挑战是要克服分支指令带来的限制。利用谓词执行可有效地删除分支,将分支指令转换为谓词代码,从而扩大了指令调度的范围并且删除了分支误测带来的性能损失。阐述了基于谓词代码的指令调度、软件流水、寄存器分配、指令归并等编译优化技术。设计并实现了一个基于谓词代码的指令调度算法。实验表明,对谓词代码进行编译优化,能有效提高指令并行度,缩短代码执行时间,提高程序性能。相似文献

3.

BWDSP10x上地址和数据谓词执行的编译优化

樊永朝郑启龙耿锐王向前王昊《计算机系统应用》2016,25(12):92-99

传统的谓词优化技术是在冯·诺伊曼体系结构计算机上实施的,仅对数据流进行优化,并没有考虑哈佛体系结构下指令和数据分开的情况.BWDSP10x是指令和数据分开的哈佛体系结构,它支持超长指令字,不仅提供了对数据谓词执行的支持也提供了对地址谓词执行的支持.特此提出了一种在区域上对两种谓词模式优化支持的方法,在进行两种比较之前,通过判断比较操作的两个操作数类型来分别实施两种模式的谓词优化,使得对地址的比较不用传输到通用寄存器中.实验结果表明该优化方法能显著地节省CPU的时间和带宽,大大减少了分支指令,使程序性能提高了28.4%. 相似文献

4.

谓词执行技术在类数据流体系结构中的实现和优化

王莉安虹王耀彬任永青从明路璐《小型微型计算机系统》2010,31(12)

谓词执行技术是克服程序中控制依赖的重要软件技术.利用类数据流体系结构的特点,设计了一种在类数据流体系结构中低开销、高效率地实现谓词执行技术的方法:仅占用指令编码中的1-bit;谓词的值通过片上网络在指令间直接传递,无需谓词寄存器.这种实现方法的主要开销是由类教据流指令集引入的软件输出树,本文进一步提出一种基于边剖析技术的优化方法.实验表明,这种优化能减少17.3%的软件输出树开销,同时将程序性能提高了15.5%. 相似文献

5.

一种向分支指令后插入冗余指令的容错微结构

张仕健胡伟武《计算机学报》2007,30(10):1674-1680

随着深亚微米工艺的广泛应用,瞬态故障已成为芯片失效的主要原因.文中提出了一种向分支指令后插入冗余指令的容错微结构,利用分支误预测浪费的处理带宽,降低了冗余执行导致的性能损失.实验结果表明,该技术的性能损失在6%～31%之间,平均为21%,明显低于MBI技术而和DIE技术的性能损失相当.该技术能够检测流水线上各阶段发生的瞬态故障并能恢复处理器状态,故障检测延时短,需要的硬件开销也较小,非常适合提高带有简单预测机制的嵌入式微处理器的容错能力. 相似文献

6.

同时多线程处理器上的动态分支预测器设计方案研究

任建安虹路放梁博《计算机科学》2006,33(3):239-243

同时多线程处理器（SMT）每个周期能够从多个线程中发射指令执行，从而大大地提高了超标量微处理器的指令吞吐量，但多个线程的同时执行也带来了许多硬件资源的共享冲突问题.其中，多个线程共享分支预测硬件的方案会对分支预测精度产生较大的影响.研究SMT处理器中分支处理方案对于处理器整体性能的影响，对于指导SMT处理器的设计是十分重要的.本文利用SMT处理器模拟器，针对各线程运行独立应用的SMT结构实验评估了几种著名的分支预测方案;给出了在单线程和多线程情况下，分支预测方案对分支预测精度和处理器整体性能的影响的分析;总结出在这样的SMT结构中，各线程拥有独立的预测器是一种较好的选择，并且由于各独立预测器可以采用小而简单的结构，所以不会带来太多的硬件开销. 相似文献

7.

基于历史长度自适应的分支预测方法

赵朝君陈晨陈志坚孟建熠《计算机辅助设计与图形学学报》2015,(4)

通过研究处理器动态分支预测器中预测效率与分支历史长度的关系,针对程序中各分支指令存在不同最优历史长度的规律,提出一种搜索各分支指令最佳历史长度的分支预测方法.该方法通过实时监测分支指令的预测准确率,在分支预测表硬件资源不变的情况下动态调整预测器的历史长度,以适应程序的动态运行特征.实验结果表明,在相同硬件资源下,文中方法相对于Gshare预测器错误率降低15.8%,相对于Bi-mode预测器预测错误率降低10.3%. 相似文献

8.

路径模糊:一种有效抵抗符号执行的二进制混淆技术

贾春福王志刘昕刘昕海《计算机研究与发展》2011,48(11)

符号执行能够对软件的路径分支信息进行收集和形式化表示,然后通过路径可达性推理得到软件行为同用户输入、网络输入等外部执行环境间的依赖关系.这些依赖关系已被广泛地应用到漏洞发掘、代码复用、协议分析等领域.该逆向分析技术也可被黑客用于软件破解、篡改和盗版等,对软件知识产权的保护带来了新的威胁.提出了一种新的基于路径模糊的软件保护方法以抵抗基于符号执行的逆向分析:利用条件异常代码替换条件跳转指令来隐藏程序的路径分支信息,使用不透明谓词技术引入伪造的路径分支来弥补程序在统计属性上的差异,并对路径模糊技术的强度、弹性和开销进行了分析.实验结果表明路径模糊技术能保护各类路径分支条件,有效减少路径分支信息的泄露,抵抗基于符号执行的逆向分析. 相似文献

9.

指令级并行中谓词分析技术的研究 总被引：2，自引：0，他引：2

芦运照张兆庆连瑞琦《计算机学报》2003,26(10):1337-1342

谓词支持是IA 6 4体系结构的新特征 ,它为发掘指令级并行提供了更多的机会 ,同时给编译器的设计者增加了难度 .谓词是条件执行的依据 ,是提高指令级并行的新途径 .该文介绍在ORC(IA 6 4OpenResearchCompiler)中首次设计实现的基于谓词划分图的谓词分析技术及其在指令调度中的应用 .利用谓词分析技术建立了谓词关系数据库、指令调度查询谓词关系数据库提高指令级并行 .文章着重论述了谓词关系数据库的核心———谓词划分图的建立 ,在谓词划分图的基础上实现了谓词关系的计算和查询 ,实际结果表明谓词分析技术有显著优化效果 . 相似文献

10.

一种有效的同时多线程处理器取指控制机制 总被引：1，自引：0，他引：1

何立强刘志勇《计算机学报》2006,29(4):535-543

同时多线程处理器通过每时钟周期从多个运行的线程取指令执行,极大地提高了处理器的性能.分支预测器的预测精度和取指策略的效率是影响同时多线程处理器性能的关键.通过将一个基于值的分支预测器和一个基于线程推进速度的取指策略相结合,提出一种新的取指控制机制.该结构的硬件开销较小,实现复杂度较低.实验结果表明,该取指控制机制有效地提高了处理器的性能,其相对于传统取指控制机制的性能加速比为28%且该加速比也高于目前基于流缓冲区和基于分支分类器的取指控制机制. 相似文献

11.

基于简化Trace的动态隐式断言执行

唐遇星邓鹍窦勇周兴铭《计算机学报》2007,30(11):1972-1981

分支指令与分支预测失败限制了处理器发掘指令级并行(ILP)的潜力.通过If-conversion或Predicated执行将程序中的控制相关转化为数据相关,能较好地降低分支预测开销.提出一种基于简化Trace结构的动态隐式断言执行机制(Dynamic Implicit Predication,DIP),而早期的相关研究主要集中于由编译器显式为宽发射处理器产生静态Predicated指令.无需编译器或者其他二进制工具的帮助,DIP可以在程序运行过程中识别可以进行断言变换的指令片断,完成指令转换与优化,并在以后的执行中使用优化后的指令Trace.基于SPEC2000模拟测试表明DIP可以有效避免错误的分支预测,提高并行度,单个程序的IPC平均提高10.3%,基准程序的平均加速比可达7.59%. 相似文献

12.

The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication

David I. August Wen-Mei W. Hwu Scott A. Mahlke 《International journal of parallel programming》1999,27(5):381-423

Predicated execution is a promising architectural feature for exploiting instruction-level parallelism in the presence of control flow. Compiling for predicated execution involves converting program control flow into conditional, or predicated, instructions. This process is known as if-conversion. In order to apply ifconversion effectively, one must address two major issues: what should be ifconverted and when the if-conversion should be performed. A compiler's use of predication as a representation is most effective when large amounts of code are if-converted and when if-conversion is performed early in the compilation procedure. On the other hand, efficient execution of code generated for a processor with predicated execution requires a delicate balance between control flow and predication. The appropriate balance is tightly coupled with scheduling decisions and detailed processor characteristics. This paper presents a compilation framework based on partial reverse if-conversion that allows the compiler to maximize the benefits of predication as a compiler representation while delaying the final balancing of control flow and predication to schedule time. 相似文献

13.

Evaluating the Effects of Predicated Execution on Branch Prediction

Gary Tyson Matthew Farrens 《International journal of parallel programming》1996,24(2):159-186

As microprocessor designs move towards deeper pipelines and support for multiple instruction issue, steps must be taken to alleviate the negative impact of branch operations on processor performance. One approach is to use branch prediction hardware and perform speculative execution of the instructions following an unresolved branch. Another technique is to eliminate certain branch instructions altogether by translating the instructions following a forward branch into predicate form. Both these techniques are employed in many current processor designs. This paper investigates the relationship between branch prediction techniques and branch predication. In particular, we are interested in how using predication to remove a certain class of poorly predicted branches affects the prediction accuracy of the remaining branches. A variety of existing predication models for eliminating branch operations are presented, and the effect that eliminating branches has on branch prediction schemes ranging from simple prediction mechanisms to the newer more sophisticated branch predictors is studied. We also examine the impact of predication on basic block size, and how the two techniques used together affect overall processor performance. 相似文献

14.

Path Analysis and Renaming for Predicated Instruction Scheduling

Lori Carter Beth Simon Brad Calder Larry Carter Jeanne Ferrante 《International journal of parallel programming》2000,28(6):563-588

Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose and use such parallelism, traditional compiler data-flow and path analysis needs to be extended to predicated code. In this paper, we motivate the need for renaming and for predicates that reflect path information. We present Predicated Static Single Assignment (PSSA) which uses renaming and introduces Full -Path Predicates to remove false dependences and enable aggressive predicated optimization and instruction scheduling. We demonstrate the usefulness of PSSA for Predicated Speculation and Control Height Reduction. These two predicated code optimizations used during instruction scheduling reduce the dependence length of the critical paths through a predicated region. Our results show that using PSSA to enable speculation and control height reduction reduces execution time from 12 to 68%. 相似文献

15.

开发指令并行性的分支控制技术 总被引：1，自引：1，他引：0

王新辉王建新《计算机工程与应用》1999,35(12):25-27,35

提高指令级并行性是现代计算机追求的目标之一,控制分支则为挖掘指令级并行提出了挑战性问题。为开发指令级并行性,现代计算机采用了两种分支控制技术即投机执行技术和判定执行技术。文章就这两种技术的实现进行了系统分析,并以Ｍｅｒｃｅｄ芯片的实现为例进行了说明。相似文献

16.

Explore prediction for instruction level redundant execution in fault tolerant microprocessors

《Journal of Systems Architecture》2016

Many devices with modern microprocessor have generated an increased attention for transient soft errors. Previous strategies for instruction level temporal redundancy in super-scalar out-of-order processors have up to 45% performance degradation in certain applications compared to normal execution. The reason is that the redundant workload slows down the normal execution. Solutions are proposed to avoid certain redundant execution by reusing the result of the previously executed instructions, but there are still limitations on the instruction level parallelism and the pipeline throughput. In this paper, we propose a novel technique to recover the performance gap between instruction level temporal redundancy and normal execution. We present a set of micro-architectural extensions to implement the reliability prediction and integrate it with the issue logic of a dual instruction stream superscalar core, and conduct extensive evaluations to demonstrate how it can solve the performance problem. Experiments show that in average it can gain back nearly 71.13% of the overall IPC loss caused by redundant execution. Generally, it exhibits much performance and power efficiency within a high transient error rate. 相似文献

17.

Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution

Po-Yung Chang Eric Hao Yale N. Patt Pohua P. Chang 《International journal of parallel programming》1996,24(3):209-234

Conditional branches incur a severe performance penalty in wide-issue, deeply pipelined processors. Speculative execution^{(1, 2)} and predicated execution^(3–9) are two mechanisms that have been proposed for reducing this penalty. Speculative execution can completely eliminate the penalty associated with a particular branch, but requires accurate branch prediction to be effective. Predicated execution does not require accurate branch prediction to eliminate the branch penalty, but is not applicable to all branches and can increase the latencies within the program. This paper examines the performance benefit of using both mechanisms to reduce the branch execution penalty. Predicated execution is used to handle the hard-to-predict branches and speculative execution is used to handle the remaining branches. The hard-to-predict branches within the program are determined by profiling. We show that this approach can significantly reduce the branch execution penalty suffered by wide-issue processors. 相似文献

18.

软件流水中隐式控制流恢复技术

汪淼赵荣彩蔡国明丁志芳《计算机科学》2008,35(10):272-274

具有条件分支的循环通过IF转换将显式的控制流转换为隐式的控制流,从而为指令调度提供进一步的机会.但它往往将程序的代码进行深度重构,增加了程序的理解和代码重建工作的复杂性.提出了一种软件流水循环中的隐式控制流恢复技术,用于重构软件流水循环中的条件分支,提高软件逆向工程中生成的目标代码的质量. 相似文献

19.

Wish Branches: Enabling Adaptive and Aggressive Predicated Execution

Hyesoon Kim Mutlu O. Patt Y.N. Stark J. 《Micro, IEEE》2006,26(1):48-58

The goal of wish branches is to use predicated execution for hard-to-predict dynamic branches, and branch prediction for easy-to-predict dynamic branches, thereby obtaining the best of both worlds. Wish loops, one class of wish branches, use predication to reduce the misprediction penalty for hard-to-predict backward (loop) branches. 相似文献