共查询到20条相似文献,搜索用时 46 毫秒
1.
分支目标缓存(BTB)是高端嵌入式CPU的主要耗能部件之一。针对BTB访问中引入的冗余功耗问题,提出了一种循环体访问过滤机制消除循环体指令流中顺序指令对BTB的无效访问。进一步提出了一种分支跟踪方法补偿循环过滤机制对循环体中非循环类分支指令的错误过滤造成的性能损失,节省了循环体指令流中顺序指令访问BTB的大量冗余功耗。基于Powerstone基准程序的仿真实验表明,在128表项BTB配置下,二级循环过滤器和4表项分支踪迹表可以减少约71.9%的BTB功耗,而平均每条指令周期数(CPI)退化仅为0.66%。 相似文献
2.
首先介绍“九五”期间研制的LSMPP协处理器的数据缓冲器的功能与设计 ,并从降低活动因子的角度提出了一种针对低功耗的改进 ,如果阵列的大小为N×N ,则功耗可以降低到“九五”期间方案的 1/N 然后又提出一种针对引出头的减少的改进方法 ,引出头的减少是与互连方案有关的 ,一路串行互连方案可以减少 4N个 ,两路并行互连方案可以减少 8N个 最后提出了一种新的数据缓冲器的设计方案 ,每一时刻只有一个PE的数据缓冲器是传送数据的 ,功耗降低为“九五”期间方案的 1/ (N×N) ,且功耗不再与阵列的大小有关 相似文献
3.
由于嵌入式GIS系统被广泛地应用于移动性较强的设备,因此功耗成了一项重要的技术指标.本文系统地描述了对嵌入式GIS系统进行低功耗设计的方法.本文将功耗分为硬件功耗和软件功耗,其中软件低功耗设计是个新的领域.本文采用按需转换处理器状态、优化编译器、按需分层调入GIS数据、优化关键算法和压缩栅格数据等方法从软件角度降低系统功耗. 相似文献
4.
5.
6.
7.
8.
低功耗已成为衡量电子系统的重要指标。针对嵌入式GIS系统的特性,采用全动态切换处理嚣工作模式;通过数据调度、坐标数据处理、绘制地图符号和优化关键算法等具体措施,降低处理器运行时间,实现嵌入式GIS系统软件低功耗设计。 相似文献
9.
10.
11.
Power is an important design constraint in embedded computing systems.To meet the power constraint,microarchitecture and hardware designed to achieve high performance need to be revisited,from both performance and power angles.This paper studies one of them:branch predictor.As well known,branch prediction is critical to exploit instruction level parallelism effectively,but may incur additional power consumption due to the hardware resource dedicated for branch prediction and the extra power consumed on mispredicted branches.This paper explores the design space of branch prediction mechanisms and tries to find the most beneficial one to realiz elow-power embedded processor.The sample processor studied is Godson-like processor,which is adual-issue,out-of-order processor with deep pipeline,supporting MIPS instruction set. 相似文献
12.
Microarchitects should consider power consumption, together with accuracy, when designing a branch predictor, especially in embedded processors. This paper proposes a power-aware branch predictor, which is based on the gshare predictor, by accessing the BTB (Branch Target Buffer) selectively. To enable the selective access to the BTB, the PHT (Pattern History Table) in the proposed branch predictor is accessed one cycle earlier than the traditional PHT if the program is executed sequentially without branch instructions. As a side effect, two predictions from the PHT are obtained through one access to the PHT, resulting in more power savings. In the proposed branch predictor, if the previous instruction was not a branch and the prediction from the PHT is untaken, the BTB is not accessed to reduce power consumption. If the previous instruction was a branch, the BTB is always accessed, regardless of the prediction from the PHT, to prevent the additional delay/accuracy decrease. The proposed branch predictor reduces the power consumption with little hardware overhead, not incurring additional delay and never harming prediction accuracy. The simulation results show that the proposed branch predictor reduces the power consumption by 29-47%. 相似文献
13.
This paper describes a general code‐improving transformation that can coalesce conditional branches into an indirect jump from a table. Applying this transformation allows an optimizer to exploit indirect jumps for many other coalescing opportunities besides the translation of multiway branch statements. First, dataflow analysis is performed to detect a set of coalescent conditional branches, which are often separated by blocks of intervening instructions. Secondly, several techniques are applied to reduce the cost of performing an indirect jump operation, often requiring the execution of only two instructions on a SPARC. Finally, the control flow is restructured using code duplication to replace the set of branches with an indirect jump. Thus, the transformation essentially provides early resolution of conditional branches that may originally have been some distance from the point where the indirect jump is inserted. The transformation can be frequently applied with often significant reductions in the number of instructions executed, total cache work, and execution time. In addition, we show that with branch target buffer support, indirect jumps improve branch prediction since they cause fewer mispredictions than the set of branches they replaced. Copyright © 1999 John Wiley & Sons, Ltd. 相似文献
14.
跟踪缓冲可以记录程序执行的历史轨迹,这在各种在线仿真器(ICE)和逻辑分析仪中得到了广泛的应用.文章主要讲述了嵌入式系统中的跟踪缓冲的设计思想。并介绍了嵌入式系统领域中的不同架构处理器中跟踪缓冲的设计方法,对它们所表现的性能差异也进行了简单的评估. 相似文献
15.
谓词执行能使分片式处理器充分利用众多的执行单元,开发指令级并行性.但因此形成的超块也使得分支误预测代价增大,所以提高分支预测器的性能至关重要.本文提出一种基于剖析信息决策的谓词执行技术,该技术利用剖析信息对谓词执行前后的执行周期进行估算,从而对分支的谓词执行进行决策.该技术使分支预测器的命中率提高了0.68%~3.50%,使系统性能提高了1.67%~8.33%.同时,利用select指令表示谓词化指令也消除了重命名阶段寄存器多定义问题. 相似文献
16.
在理论上,越来越复杂的分支预测算法和更大的存储结构会使分支预测精度不断提高,但当前复杂算法和庞大数据结构所引发的分支预测时延已无法满足流水线单周期运行要求.针对分支预测精度和时延的矛盾,设计提出提前分支预测结构(ahead branch prediction architecture,ABPA).ABPA为流水线前端取指部件提供简单的分支预测表,以实现快速分支预测;复杂的预测算法和较大的存储结构均被移至流水线后端实现,从而保证了分支预测精度.对于一直难以准确预测的多目标间接分支指令,设计提出基于分支历史和目标路径的间接分支预测算法(indirect branch prediction algorithm based on branch history and target path,BHTP algorithm).提前分支预测算法采用改进的高精度分支预测算法和BHTP算法的混合.嵌入提前分支预测算法的分支预测引擎实现流水线后端的分支推测和目标预测,以及流水线前端的分支预测表更新.实验结果表明:采用ABPA结构和BHTP算法的分支预测系统平均精度达到94.27%.设计不仅实现了快速、高精度分支预测,更为分支预测的深入研究提供了条件. 相似文献
17.
分支误预测是影响高性能处理器性能进一步提升的一个主要因素.现代处理器采用分支目标缓存(branch target buffer, BTB)预测分支指令的目标地址,BTB的预测精度受限于其命中率.由于程序中分支指令的分布并不均匀,传统的BTB索引方式无法充分利用BTB资源,从而造成不必要的冲突缺失,影响分支目标地址的预测精度,采用散列索引方式优化访问映射关系是有效解决方法之一.当前大量文献研究了cache的访问方式,但对BTB的散列索引算法的专门探讨则显不足.为了消除分支指令的分布空洞,离散分支指令和BTB条目的固有映射关系,设计了用于BTB索引的XOR散列算法和优化的bit-select索引算法,使用概率方法对BTB单组最大映射数期望的上界作了估计,并对这两种散列索引算法的效果进行了模拟评估.实验结果表明,散列映射方式能够较好地避免BTB冲突缺失造成的预测失败,XOR散列算法的离散效果更好. 相似文献
18.
一种有效的同时多线程处理器取指控制机制 总被引:1,自引:0,他引:1
同时多线程处理器通过每时钟周期从多个运行的线程取指令执行,极大地提高了处理器的性能.分支预测器的预测精度和取指策略的效率是影响同时多线程处理器性能的关键.通过将一个基于值的分支预测器和一个基于线程推进速度的取指策略相结合,提出一种新的取指控制机制.该结构的硬件开销较小,实现复杂度较低.实验结果表明,该取指控制机制有效地提高了处理器的性能,其相对于传统取指控制机制的性能加速比为28%且该加速比也高于目前基于流缓冲区和基于分支分类器的取指控制机制. 相似文献
19.
In theory, branch predictors with more complicated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be implemented because of their long access delay. To date, many strategies have been proposed to balance delay with accuracy, but none has completely solved the issue. The architecture for ahead branch prediction (A2BP) separates traditional predictors into two parts. First is a small table located at the front-end of the pipeline, which makes the prediction brief enough even for some aggressive processors. Second, operations on complicated algorithms and large data structures for accurate predictions are all moved to the back-end of the pipeline. An effective mechanism is introduced for ahead branch prediction in the back-end and small table update in the front. To substantially improve prediction accuracy, an indirect branch prediction algorithm based on branch history and target path (BHTP) is implemented in A2BP. Experiments with the standard performance evaluation corporation (SPEC) benchmarks on gem5/SimpleScalar simulators demonstrate that A2BP improves average performance by 2.92% compared with a commonly used branch target buffer-based predictor. In addition, indirect branch misses with the BHTP algorithm are reduced by an average of 28.98% compared with the traditional algorithm. 相似文献
20.
Nowadays energy-efficiency becomes the first design metric in chip development. To pursue higher energy efficiency, the processor architects should reduce or eliminate those unnecessary energy dissipations. Indirect-branch pre- diction has become a performance bottleneck, especially for the applications written in object-oriented languages. Previous hardware-based indirect-branch predictors are generally inefficient, for they either require significant hardware storage or predict indirect-branch targets slowly. In this paper, we propose an energy-efficient indirect-branch prediction technique called TAP (target address pointer) prediction. Its key idea includes two parts: utilizing specific hardware pointers to accelerate the indirect branch prediction flow and reusing the existing processor components to reduce additional hardware costs and power consumption. When fetching an indirect branch, TAP prediction first gets the specific pointers called target address pointers from the conditional branch predictor, and then uses such pointers to generate virtual addresses which index the indirect-branch targets. This technique spends similar time compared to the dedicated storage techniques without requiring additional large amounts of storage. Our evaluation shows that TAP prediction with some representative state-of-the-art branch predictors can improve performance significantly over the baseline processor. Compared with those hardware-based indirect-branch predictors, the TAP-Perceptron scheme achieves performance improvement equivalent to that provided by an 8 K-entry TTC predictor, and also outperforms the VPC predictor. 相似文献