首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 62 毫秒
1.
嵌入式处理器动态分支预测机制研究与设计   总被引:2,自引:1,他引:1  
黄伟  王玉艳  章建雄 《计算机工程》2008,34(21):163-165
针对嵌入式处理器的特定应用环境,通过对传统神经网络算法的改进,结合定制的分支目标缓冲,提出一种复合式动态分支预测机制。该机制基于全局索引方式,对BTB结构进行定制设计,实现对循环逻辑中最后一条分支指令的精确预测。实验结果表明,该动态分支预测机制能降低硬件复杂度,提高预测精度。  相似文献   

2.
传统的分支目标缓冲器(BTB)每个取指周期都要进行访问,由于程序中的分支指令只占总指令数的20%左右,使得大约80%的BTB访问都是无效的.为此,利用程序控制流中分支指令间距固定的特性,提出一种对性能影响极小的BTB跳跃访问算法.在BTB中存储分支指令到运行路径中下一条分支指令的距离,BTB命中后,根据相应的分支距离来关闭当前分支指令与下一条分支指令之间的BTB访问,以有效地提高访问效率并降低动态功耗.该算法在嵌入式处理器中实现时只控制预测跳转分支指令的BTB跳跃访问,减少了硬件资源的开销.在硬件模型上进行模拟和综合后的结果表明,在128分支项的BTB中,采用文中算法可以降低72%的动态功耗,而性能损失仅为0.013%.  相似文献   

3.
针对传统处理器分支预测器存在分支预测信息混乱、分支指令别名冲突和容量冲突率高的缺点,提出基于同时多线程处理器的分支预测器TBHBP。该分支预测器采取线程历史信息与基于地址索引的局部历史信息相结合的综合历史信息作为模式匹配表PHT的索引,并采取线程独立拥有线程历史寄存器和分支历史寄存器的方式,通过新增分支结果输出表来提高指令的分支预测执行速度。研究结果表明,TBHBP分支预测器有效解决了分支信息过时、分支指令别名和容量冲突的问题。与Gshare分支预测器相比,其指令吞吐率提升了12.5%,分支误预测率和误预测路径取指率分别下降了0.5%和2.1%。  相似文献   

4.
分支目标缓存(BTB)是高端嵌入式CPU的主要耗能部件之一。针对BTB访问中引入的冗余功耗问题,提出了一种循环体访问过滤机制消除循环体指令流中顺序指令对BTB的无效访问。进一步提出了一种分支跟踪方法补偿循环过滤机制对循环体中非循环类分支指令的错误过滤造成的性能损失,节省了循环体指令流中顺序指令访问BTB的大量冗余功耗。基于Powerstone基准程序的仿真实验表明,在128表项BTB配置下,二级循环过滤器和4表项分支踪迹表可以减少约71.9%的BTB功耗,而平均每条指令周期数(CPI)退化仅为0.66%。  相似文献   

5.
在理论上,越来越复杂的分支预测算法和更大的存储结构会使分支预测精度不断提高,但当前复杂算法和庞大数据结构所引发的分支预测时延已无法满足流水线单周期运行要求.针对分支预测精度和时延的矛盾,设计提出提前分支预测结构(ahead branch prediction architecture,ABPA).ABPA为流水线前端取指部件提供简单的分支预测表,以实现快速分支预测;复杂的预测算法和较大的存储结构均被移至流水线后端实现,从而保证了分支预测精度.对于一直难以准确预测的多目标间接分支指令,设计提出基于分支历史和目标路径的间接分支预测算法(indirect branch prediction algorithm based on branch history and target path,BHTP algorithm).提前分支预测算法采用改进的高精度分支预测算法和BHTP算法的混合.嵌入提前分支预测算法的分支预测引擎实现流水线后端的分支推测和目标预测,以及流水线前端的分支预测表更新.实验结果表明:采用ABPA结构和BHTP算法的分支预测系统平均精度达到94.27%.设计不仅实现了快速、高精度分支预测,更为分支预测的深入研究提供了条件.  相似文献   

6.
本文提出了一种结合PAs和Gshare两者优点,同时尽可能消除两者缺点的两级自适应分支预测算法及其实现——Pshare分支预测方法。通过使用单独的BHSR来跟踪不同分支指令的历史轨迹,克服了GAs和Gshare机制中统一历史模式带来的缺点;同时,通过使用分支指令部分地址和历史模式位的杂凑运算来索引共享的一维PHT小表,减少了PAs机制中二维PHT大表的面积开销和索引延时开销。Pshare两级自适应分支预测器用接近Gshare的实现代价,达到了近似PAs的预测精度。该分支预测技术适用于强调单核性能的超标量超流水结构的高性能处理器中,有助于减少垂直浪费,提高单核心的性能和效率。  相似文献   

7.
熊振亚  林正浩  任浩琪 《计算机科学》2017,44(3):195-201, 214
现代计算机体系结构受两个方面的困扰:性能和能耗。为降低嵌入式处理器日益增长的功耗,提出基于跳转轨迹的分支目标缓冲结构(TG-BTB)。与传统分支目标缓冲每次提取指令时需要查询分支目标缓冲不同,TG-BTB只在执行轨迹预测为跳转时才查询分支目标缓冲。该结构通过在程序执行过程中动态分析跳转轨迹行为,可以实现只在轨迹跳转时查询分支目标缓冲,从而降低功耗。在动态分析过程中首先提取记录两条跳转分支指令之间的指令间隔,然后将提取的指令间隔存储在TG-BTB中,最后根据存储在TG-BTB中的指令间隔决定是否需要查询BTB。基于基准测试向量进行模型验证和性能测试,实验结果表明TG-BTB降低了81%的BTB查询能耗。  相似文献   

8.
提出了一种精简指令预测与分支部件.指令预测部件由访问延迟不同的两级全相联缓冲组成,在基于同时多线程技术的微处理器条件下使用改进的精简预测部件后,取得了较高的预测准确性.实现了基于超前扩展进位加法器快速计算目标地址与比较器确定指令是否跳转的两线程分支部件的设计,提高了硬件资源的利用率与运算的效率.实例测试结果表明,精简预测与分支部件在测试的过程中达到了较好的效果.  相似文献   

9.
优化散列排序算法研究   总被引:1,自引:0,他引:1  
本文介绍一种优化了的散列排序新算法。适当增加存储空间,子序列元素相对一次到位,有效地减少了地址冲突,大大地提高了排序效率  相似文献   

10.
任建  安虹  路放  梁博 《计算机科学》2006,33(3):239-243
同时多线程处理器(SMT)每个周期能够从多个线程中发射指令执行,从而大大地提高了超标量微处理器的指令吞吐量,但多个线程的同时执行也带来了许多硬件资源的共享冲突问题.其中,多个线程共享分支预测硬件的方案会对分支预测精度产生较大的影响.研究SMT处理器中分支处理方案对于处理器整体性能的影响,对于指导SMT处理器的设计是十分重要的.本文利用SMT处理器模拟器,针对各线程运行独立应用的SMT结构实验评估了几种著名的分支预测方案;给出了在单线程和多线程情况下,分支预测方案对分支预测精度和处理器整体性能的影响的分析;总结出在这样的SMT结构中,各线程拥有独立的预测器是一种较好的选择,并且由于各独立预测器可以采用小而简单的结构,所以不会带来太多的硬件开销.  相似文献   

11.
This paper describes a general code‐improving transformation that can coalesce conditional branches into an indirect jump from a table. Applying this transformation allows an optimizer to exploit indirect jumps for many other coalescing opportunities besides the translation of multiway branch statements. First, dataflow analysis is performed to detect a set of coalescent conditional branches, which are often separated by blocks of intervening instructions. Secondly, several techniques are applied to reduce the cost of performing an indirect jump operation, often requiring the execution of only two instructions on a SPARC. Finally, the control flow is restructured using code duplication to replace the set of branches with an indirect jump. Thus, the transformation essentially provides early resolution of conditional branches that may originally have been some distance from the point where the indirect jump is inserted. The transformation can be frequently applied with often significant reductions in the number of instructions executed, total cache work, and execution time. In addition, we show that with branch target buffer support, indirect jumps improve branch prediction since they cause fewer mispredictions than the set of branches they replaced. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

12.
In theory, branch predictors with more complicated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be implemented because of their long access delay. To date, many strategies have been proposed to balance delay with accuracy, but none has completely solved the issue. The architecture for ahead branch prediction (A2BP) separates traditional predictors into two parts. First is a small table located at the front-end of the pipeline, which makes the prediction brief enough even for some aggressive processors. Second, operations on complicated algorithms and large data structures for accurate predictions are all moved to the back-end of the pipeline. An effective mechanism is introduced for ahead branch prediction in the back-end and small table update in the front. To substantially improve prediction accuracy, an indirect branch prediction algorithm based on branch history and target path (BHTP) is implemented in A2BP. Experiments with the standard performance evaluation corporation (SPEC) benchmarks on gem5/SimpleScalar simulators demonstrate that A2BP improves average performance by 2.92% compared with a commonly used branch target buffer-based predictor. In addition, indirect branch misses with the BHTP algorithm are reduced by an average of 28.98% compared with the traditional algorithm.  相似文献   

13.
Nowadays energy-efficiency becomes the first design metric in chip development. To pursue higher energy efficiency, the processor architects should reduce or eliminate those unnecessary energy dissipations. Indirect-branch pre- diction has become a performance bottleneck, especially for the applications written in object-oriented languages. Previous hardware-based indirect-branch predictors are generally inefficient, for they either require significant hardware storage or predict indirect-branch targets slowly. In this paper, we propose an energy-efficient indirect-branch prediction technique called TAP (target address pointer) prediction. Its key idea includes two parts: utilizing specific hardware pointers to accelerate the indirect branch prediction flow and reusing the existing processor components to reduce additional hardware costs and power consumption. When fetching an indirect branch, TAP prediction first gets the specific pointers called target address pointers from the conditional branch predictor, and then uses such pointers to generate virtual addresses which index the indirect-branch targets. This technique spends similar time compared to the dedicated storage techniques without requiring additional large amounts of storage. Our evaluation shows that TAP prediction with some representative state-of-the-art branch predictors can improve performance significantly over the baseline processor. Compared with those hardware-based indirect-branch predictors, the TAP-Perceptron scheme achieves performance improvement equivalent to that provided by an 8 K-entry TTC predictor, and also outperforms the VPC predictor.  相似文献   

14.
Predicting indirect-branch targets has become a performance bottleneck for many applications.Previous highperformance indirect-branch predictors usually require significant hardware storage or additional compiler support,which increases the complexity of the processor front-end or the compilers.This paper proposes a complexity-effective indirectbranch prediction mechanism,called the Set-Way Index Pointing (SWIP) prediction.It stores multiple indirect-branch targets in different branch target buffer (BTB) entries,whose set indices and way locations are treated as set-way index pointers.These pointers are stored in the existing branch-direction predictor.SWIP prediction reuses the branch direction predictor to provide such pointers,and then accesses the pointed BTB entries for the predicted indirect-branch target.Our evaluation shows that SWIP prediction could achieve attractive performance improvement without requiring large dedicated storage or additional compiler support.It improves the indirect-branch prediction accuracy by 36.5% compared to that of a commonly-used BTB,resulting in average performance improvement of 18.56%.Its energy consumption is also reduced by 14.34% over that of the baseline.  相似文献   

15.
32位RISC微处理器中分支预测器的硬件实现*   总被引:1,自引:0,他引:1  
提出了一种基于Bi-mode和分支路径历史的动态分支预测器,并在西北工业大学自主设计的“龙腾R2”微处理器中得以FPGA硬件实现,提出的分支预测器对条件分支可以进行准确地预测,具有延迟小、功耗低的特点。  相似文献   

16.
处理器性能的发挥常常受到转移指令的限制,所以转移预测的成功与否对于处理器的性能影响至关重要。反馈式编译优化是一种基于程序当前和以前运行时的趋势来改变程序以后执行动作的技术,能够提供给编译器一些有用的优化信息。本文针对ALPHA中的结构特点,利用反馈式编译优化技术,提高了ALPHA中的转移预测命中率,实验结果表明,加速比效果较为明显。  相似文献   

17.
Rocket是基于RISC-V指令集架构的开源处理器,具有分支预测功能,其实现了GShare分支预测机制,在分析Rocket处理器分支预测处理过程、分支预测实现原理的基础上,利用模拟器进行了性能测试,并依据测试结果,对Rocket处理器分支预测参数配置给出建议.  相似文献   

18.
通过研究处理器动态分支预测器中预测效率与分支历史长度的关系,针对程序中各分支指令存在不同最优历史长度的规律,提出一种搜索各分支指令最佳历史长度的分支预测方法.该方法通过实时监测分支指令的预测准确率,在分支预测表硬件资源不变的情况下动态调整预测器的历史长度,以适应程序的动态运行特征.实验结果表明,在相同硬件资源下,文中方法相对于Gshare预测器错误率降低15.8%,相对于Bi-mode预测器预测错误率降低10.3%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号