共查询到20条相似文献,搜索用时 125 毫秒
1.
《计算机应用与软件》2014,(8)
在一款同时支持超标量与超长指令字执行方式混合结构数字信号处理器上,为超标量结构添加分支预测功能。为控制硬件设计的复杂度,同时保证分支预测的命中率,分支预测方案使用gshare预测器。在设计完成的硬件上,运行由Open64编译器编译的Dhrystone、Coremark基准测试程序。实验结果表明,在添加分支预测功能后,处理器的性能提高30%~35%。 相似文献
2.
3.
《计算机应用与软件》2017,(2)
描述在一款支持超标量与超长指令字结构的混合架构数字信号处理器上设计的分支预测结构。为控制硬件复杂度并充分提高预测准确度,设计双峰预测器与PAp预测器混合型预测结构,充分发挥两种预测器的优点。在设计完成的处理器上,运行标准DSPstone程序。实验结果表明,添加分支预测结构使得处理器性能平均提升23%,并且混合型预测结构相比单一预测结构在准确度方面优势明显。 相似文献
4.
一种精确的分支预测微处理器模型 总被引:3,自引:0,他引:3
在当今深流水宽发射的微处理器中,为实现高性能,精确的分支预测是不可缺少的关键技术.分支预测失效将浪费大量的时钟周期,无法发挥乱序执行的效能.宽发射微处理器的有效性能同时还依赖指令窗口的大小和指令预取宽度.提出了一种新的更精确的支持分支预测和分支误预测周期损失的微处理器模型.根据指令的执行带宽为指令窗口中可用指令数的平方根统计规律,给出了一个更为精确的描述微处理器取指带宽、分支预测精度、分支误预测周期损失、指令窗口大小和IPC之间关系的算法,并讨论了这些参数的综合权衡以及这些参数对程序IPC的影响.由此可以确定依赖多个微处理器参数的取指带宽阈值和微处理器中几个关键参数的选取. 相似文献
5.
6.
分支预测技术可消除分支指令之后损失的周期,防止流水线断流.高比率的分支预测精确度是高性能微处理器性能的保证.本文详细分析了安腾处理器(Itanium)多级分支预测机制,并研究了每级预测器的具体实现. 相似文献
7.
提出了一种精简指令预测与分支部件.指令预测部件由访问延迟不同的两级全相联缓冲组成,在基于同时多线程技术的微处理器条件下使用改进的精简预测部件后,取得了较高的预测准确性.实现了基于超前扩展进位加法器快速计算目标地址与比较器确定指令是否跳转的两线程分支部件的设计,提高了硬件资源的利用率与运算的效率.实例测试结果表明,精简预测与分支部件在测试的过程中达到了较好的效果. 相似文献
8.
9.
基于SimpleScalar的龙芯CPU模拟器Sim-Godson 总被引:6,自引:1,他引:6
现代高性能通用处理器的设计越来越复杂,模拟器在处理器设计中所起的作用越来越大.龙芯2号是中国科学院计算技术研究所研制的高性能通用处理器.最早开发的龙芯2号的模拟器ICT-Godson是信号级模拟器,它模拟了处理器的所有细节,十分准确,但速度和灵活性有较大限制.文章基于SimpleScalar工具集,设计并实现了龙芯2号的模拟器Sim-Godson.Sim-Godson具有高速度和高灵活性的优点,且准确性也很高.在3.0GHz的Pentium4微机上,Sim-Godson速度约为500K指令/s.大部份测试程序在Sim-Godson上的IPC(Instruction Per Cycle)与ICT-Godson相差不到5%,达到了很高的准确性.Sim-Godson在龙芯2号的性能分析工作中发挥了重要作用. 相似文献
10.
分支预测器是现代处理器的重要微架构组件,它可有效缓解流水线的控制流冒险问题,提升处理器性能.然而,尽管分支预测器的设计越发先进,设计细节也不被处理器厂商公开,但基于分支预测器的分支预测机制存在的安全问题仍不断被研究人员曝光.利用分支预测机制,攻击者能构建侧信道或隐藏通道,从而绕过软硬件的安全边界检查.在著名的Spectre攻击中,分支预测器还被用来构建瞬态执行窗口,这打破了被错误预测并执行的指令对软件程序员完全透明的错误安全假设.Spectre攻击曝光后,分支预测的安全问题越来越受到重视,相关的攻击变种与防御措施成为学术界和工业界共同关注的课题.本文从分支预测器的设计角度出发,从已公开和被研究人员逆向工程出的分支预测器设计中总结了分支预测器的工作机制,然后按分支预测器填充方式、分支预测器索引方式和分支预测利用过程等特征对现有的分支预测攻击进行归纳和整理,并总结了这些攻击的攻击模型,包括攻击场景与攻击链.随后,本文结合Intel、AMD和ARM等主流商用处理器的典型微体系结构,从攻击模型深入分析了各分支预测攻击的关联性、创新点和可行性,并提出一种评价分支预测类瞬态执行攻击可行性的理论方法... 相似文献
11.
Evaluation and Choice of Various Branch Predictors for Low-Power Embedded Processor 总被引:2,自引:0,他引:2 下载免费PDF全文
Power is an important design constraint in embedded computing systems.To meet the power constraint,microarchitecture and hardware designed to achieve high performance need to be revisited,from both performance and power angles.This paper studies one of them:branch predictor.As well known,branch prediction is critical to exploit instruction level parallelism effectively,but may incur additional power consumption due to the hardware resource dedicated for branch prediction and the extra power consumed on mispredicted branches.This paper explores the design space of branch prediction mechanisms and tries to find the most beneficial one to realiz elow-power embedded processor.The sample processor studied is Godson-like processor,which is adual-issue,out-of-order processor with deep pipeline,supporting MIPS instruction set. 相似文献
12.
13.
通过研究处理器动态分支预测器中预测效率与分支历史长度的关系,针对程序中各分支指令存在不同最优历史长度的规律,提出一种搜索各分支指令最佳历史长度的分支预测方法.该方法通过实时监测分支指令的预测准确率,在分支预测表硬件资源不变的情况下动态调整预测器的历史长度,以适应程序的动态运行特征.实验结果表明,在相同硬件资源下,文中方法相对于Gshare预测器错误率降低15.8%,相对于Bi-mode预测器预测错误率降低10.3%. 相似文献
14.
Artur Klauser Srilatha Manne Dirk Grunwald 《International journal of parallel programming》2001,29(1):81-110
This paper describes a family of branch predictors that use confidence estimation to improve the performance of an underlying branch predictor. This method, referred to as Selective Branch Inversion (SBI), uses a confidence estimator to determine when the branch direction prediction is likely to be incorrect; branch decisions for these low-confidence branches are inverted. SBI with an underlying Gshare branch predictor outperforms other equal sized predictors such as the best history length Gshare predictor, as well as equally complex McFarling and Bi-Mode predictors. Our analysis shows that SBI achieves its performance through conflict detection and correction, rather than through conflict avoidance as some of the previously proposed predictors such as Bi-Mode and Agree. We also show that SBI is applicable to other underlying predictors, such as the McFarling Combined predictor. Finally we show that Dynamic Inversion Monitoring (DIM) can be used as a safeguard to turn off SBI in cases where it degrades the overall performance. 相似文献
15.
In a modern processor,branch prediction is crucial in effectively exploiting the instruction-level parallelism for high-performance execution.However,recently exposed vulnerabilities reveal the urgency to improve the security of branch predictors.The vital cause of the branch predictor vulnerabilities is that the update strategy of the saturating counter is deterministic.As a fundamental building block in a modern branch predictor,previous studies have paid too much attention to the performance and hardware cost and ignored the security of saturating counter.This leaves attackers with the opportunities to perform side-channel attacks on the branch predictor.This paper focuses on the saturating counter to explore a secure and lightweight design to mitigate branch predictor side-channel attacks.Instead of applying the isolation mechanism to branch predictor resources,we propose a novel probabilistic saturating counter design to confuse the attacker's perception of the victim's behaviour.It changes the conventional deterministic state transition function to a probabilistic state transition function.When a branch is committed,the conventional saturating counter needs to be updated about whether the prediction results are correct or not.While for the probabilistic saturating counter,the branch predictor determines whether the update is performed based on the update probability.The probabilistic saturating counter dramatically reduces the ability of the attacker to spy the saturating counter's state.Our analyses using a cycle-accurate simulator suggest that the proposed mechanism incurs 2.4% performance overhead and hardware cost while providing strong protection. 相似文献
16.
传统的分支目标缓冲器(BTB)每个取指周期都要进行访问,由于程序中的分支指令只占总指令数的20%左右,使得大约80%的BTB访问都是无效的.为此,利用程序控制流中分支指令间距固定的特性,提出一种对性能影响极小的BTB跳跃访问算法.在BTB中存储分支指令到运行路径中下一条分支指令的距离,BTB命中后,根据相应的分支距离来关闭当前分支指令与下一条分支指令之间的BTB访问,以有效地提高访问效率并降低动态功耗.该算法在嵌入式处理器中实现时只控制预测跳转分支指令的BTB跳跃访问,减少了硬件资源的开销.在硬件模型上进行模拟和综合后的结果表明,在128分支项的BTB中,采用文中算法可以降低72%的动态功耗,而性能损失仅为0.013%. 相似文献
17.
Microarchitects should consider power consumption, together with accuracy, when designing a branch predictor, especially in embedded processors. This paper proposes a power-aware branch predictor, which is based on the gshare predictor, by accessing the BTB (Branch Target Buffer) selectively. To enable the selective access to the BTB, the PHT (Pattern History Table) in the proposed branch predictor is accessed one cycle earlier than the traditional PHT if the program is executed sequentially without branch instructions. As a side effect, two predictions from the PHT are obtained through one access to the PHT, resulting in more power savings. In the proposed branch predictor, if the previous instruction was not a branch and the prediction from the PHT is untaken, the BTB is not accessed to reduce power consumption. If the previous instruction was a branch, the BTB is always accessed, regardless of the prediction from the PHT, to prevent the additional delay/accuracy decrease. The proposed branch predictor reduces the power consumption with little hardware overhead, not incurring additional delay and never harming prediction accuracy. The simulation results show that the proposed branch predictor reduces the power consumption by 29-47%. 相似文献
18.
19.
在理论上,越来越复杂的分支预测算法和更大的存储结构会使分支预测精度不断提高,但当前复杂算法和庞大数据结构所引发的分支预测时延已无法满足流水线单周期运行要求.针对分支预测精度和时延的矛盾,设计提出提前分支预测结构(ahead branch prediction architecture,ABPA).ABPA为流水线前端取指部件提供简单的分支预测表,以实现快速分支预测;复杂的预测算法和较大的存储结构均被移至流水线后端实现,从而保证了分支预测精度.对于一直难以准确预测的多目标间接分支指令,设计提出基于分支历史和目标路径的间接分支预测算法(indirect branch prediction algorithm based on branch history and target path,BHTP algorithm).提前分支预测算法采用改进的高精度分支预测算法和BHTP算法的混合.嵌入提前分支预测算法的分支预测引擎实现流水线后端的分支推测和目标预测,以及流水线前端的分支预测表更新.实验结果表明:采用ABPA结构和BHTP算法的分支预测系统平均精度达到94.27%.设计不仅实现了快速、高精度分支预测,更为分支预测的深入研究提供了条件. 相似文献
20.
Pierre Michaud André Seznec Stéphan Jourdan 《International journal of parallel programming》2001,29(1):35-58
The performance of superscalar processors depends on many parameters with correlated effects. This paper explores the relations between some of these parameters, and more particularly, the requirement in instruction fetch bandwidth. We introduce new enhancements to increase the bandwidth of conventional instruction fetch engines. However, experiments show that the performance does not increase proportionally to the fetch. Once the measured IPC is half the instruction fetch bandwidth, increasing the fetch bandwidth brings very little improvement. In order to better understand this behavior, we develop a model from the empirical observation that the available instruction parallelism grows as the square root of the instruction window size. From the model, we derive that the fetch bandwidth requirement grows as the square root of the distance between mispredicted branches. We also verify experimentally that, to double the IPC, one should both double the fetch bandwidth and decrease the number of mispredicted branches fourfold. 相似文献