期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

高金加孟建熠陈志坚《计算机应用研究》2012,29(3):998-1001

分支目标缓存(BTB)是高端嵌入式CPU的主要耗能部件之一。针对BTB访问中引入的冗余功耗问题,提出了一种循环体访问过滤机制消除循环体指令流中顺序指令对BTB的无效访问。进一步提出了一种分支跟踪方法补偿循环过滤机制对循环体中非循环类分支指令的错误过滤造成的性能损失,节省了循环体指令流中顺序指令访问BTB的大量冗余功耗。基于Powerstone基准程序的仿真实验表明,在128表项BTB配置下,二级循环过滤器和4表项分支踪迹表可以减少约71.9%的BTB功耗,而平均每条指令周期数(CPI)退化仅为0.66%。相似文献

2.

数据缓冲器的低功耗设计

李莉沈绪榜钱刚许琪王忠《计算机研究与发展》2004,41(4):761-766

首先介绍“九五”期间研制的LSMPP协处理器的数据缓冲器的功能与设计 ,并从降低活动因子的角度提出了一种针对低功耗的改进 ,如果阵列的大小为N×N ,则功耗可以降低到“九五”期间方案的 1/N 然后又提出一种针对引出头的减少的改进方法 ,引出头的减少是与互连方案有关的 ,一路串行互连方案可以减少 4N个 ,两路并行互连方案可以减少 8N个最后提出了一种新的数据缓冲器的设计方案 ,每一时刻只有一个PE的数据缓冲器是传送数据的 ,功耗降低为“九五”期间方案的 1/ (N×N) ,且功耗不再与阵列的大小有关相似文献

3.

嵌入式GIS系统的低功耗设计

秦晓倩《微计算机信息》2008,24(34)

由于嵌入式GIS系统被广泛地应用于移动性较强的设备,因此功耗成了一项重要的技术指标.本文系统地描述了对嵌入式GIS系统进行低功耗设计的方法.本文将功耗分为硬件功耗和软件功耗,其中软件低功耗设计是个新的领域.本文采用按需转换处理器状态、优化编译器、按需分层调入GIS数据、优化关键算法和压缩栅格数据等方法从软件角度降低系统功耗. 相似文献

4.

低功耗高转换速率CMOS模拟缓冲器

李鉴黄义定石振岩《微计算机信息》2009,25(2)

低功耗CMOS模拟缓冲器是基于输入级有两个互补AB类差分对,而输出端有一个简单的附加电路允许轨到轨操作.所提出的电路融合了低静态功耗和高的驱动能力,适合在大的电容负栽下应用,给出了仿真结果. 相似文献

5.

嵌入式处理器动态分支预测机制研究与设计 总被引：2，自引：1，他引：1

黄伟王玉艳章建雄《计算机工程》2008,34(21):163-165

针对嵌入式处理器的特定应用环境,通过对传统神经网络算法的改进,结合定制的分支目标缓冲,提出一种复合式动态分支预测机制。该机制基于全局索引方式,对BTB结构进行定制设计,实现对循环逻辑中最后一条分支指令的精确预测。实验结果表明,该动态分支预测机制能降低硬件复杂度,提高预测精度。相似文献

6.

基于嵌入式系统的低功耗设计

陆希玉唐昆崔慧娟《微计算机信息》2005,(10)

本文研究了针对嵌入式系统的低功耗设计,通过采用动态改变系统运行频率的方法,降低系统中的微处理器功耗,并且针对算法对系统性能的影响进行了研究,并给出了实验结果,证实该算法取得了较好的效果。相似文献

7.

基于嵌入式系统的低功耗设计 总被引：1，自引：0，他引：1

陆希玉唐昆崔慧娟《微计算机信息》2005,30(20):4-5

本文研究了针对嵌入式系统的低功耗设计,通过采用动态改变系统运行频率的方法,降低系统中的微处理器功耗,并且针对算法对系统性能的影响进行了研究,并给出了实验结果,证实该算法取得了较好的效果. 相似文献

8.

嵌入式GIS系统软件的低功耗设计

胡泽明王志刚岳春生《单片机与嵌入式系统应用》2006,(3):15-17

低功耗已成为衡量电子系统的重要指标。针对嵌入式GIS系统的特性，采用全动态切换处理嚣工作模式；通过数据调度、坐标数据处理、绘制地图符号和优化关键算法等具体措施，降低处理器运行时间，实现嵌入式GIS系统软件低功耗设计。相似文献

9.

基于SimpleScalar的动态分支预测器研究

张筱史战果吴迪《微型电脑应用》2011,27(11):19-21,68,69

分支预测精度是影响当代处理器性能的重要指标,在近十年内一直是学术界和工业界的研究热点。为给不同应用场合的处理器动态分支预测器设计提供性能参考,针对处理器架构设计中应用广泛的几种动态分支预测器,使用SPEC CPU2000在SimpleScalar模拟器中进行仿真及测试分析。测试结果以预测精度和指令/时钟周期作为指标,并结合硬件开销,分析了不同种类分支预测器的适用对象和场合。相似文献

10.

一种复合分支预测电路的设计与实现

马鹏方晓旻王春军许团辉《计算机工程》2011,37(13):243-245

针对现有预测算法仅能精准预测某类程序的缺陷,设计一种复合分支预测电路。该电路组合多种分支预测算法,可以对各种程序进行精准预测,并应用于自主设计的嵌入式微处理器中。性能仿真结果表明,复合分支预测电路对各种程序可以实现高精准预测,并且满足处理器设计的时序要求。相似文献

11.

总被引：2，自引：0，他引：2

下载免费PDF全文

范东睿杨洪波高光荣赵荣彩《计算机科学技术学报》2003,18(6):0-0

Power is an important design constraint in embedded computing systems.To meet the power constraint,microarchitecture and hardware designed to achieve high performance need to be revisited,from both performance and power angles.This paper studies one of them:branch predictor.As well known,branch prediction is critical to exploit instruction level parallelism effectively,but may incur additional power consumption due to the hardware resource dedicated for branch prediction and the extra power consumed on mispredicted branches.This paper explores the design space of branch prediction mechanisms and tries to find the most beneficial one to realiz elow-power embedded processor.The sample processor studied is Godson-like processor,which is adual-issue,out-of-order processor with deep pipeline,supporting MIPS instruction set. 相似文献

12.

下载免费PDF全文

Cheol Hong Kim Sung Woo Chung and Chu Shik Jhon 《计算机科学技术学报》2005,20(5):607-614

Microarchitects should consider power consumption, together with accuracy, when designing a branch predictor, especially in embedded processors. This paper proposes a power-aware branch predictor, which is based on the gshare predictor, by accessing the BTB （Branch Target Buffer） selectively. To enable the selective access to the BTB, the PHT （Pattern History Table） in the proposed branch predictor is accessed one cycle earlier than the traditional PHT if the program is executed sequentially without branch instructions. As a side effect, two predictions from the PHT are obtained through one access to the PHT, resulting in more power savings. In the proposed branch predictor, if the previous instruction was not a branch and the prediction from the PHT is untaken, the BTB is not accessed to reduce power consumption. If the previous instruction was a branch, the BTB is always accessed, regardless of the prediction from the PHT, to prevent the additional delay/accuracy decrease. The proposed branch predictor reduces the power consumption with little hardware overhead, not incurring additional delay and never harming prediction accuracy. The simulation results show that the proposed branch predictor reduces the power consumption by 29-47%. 相似文献

13.

Gang‐Ryung Uh David B. Whalley 《Software》1999,29(12):1061-1101

This paper describes a general code‐improving transformation that can coalesce conditional branches into an indirect jump from a table. Applying this transformation allows an optimizer to exploit indirect jumps for many other coalescing opportunities besides the translation of multiway branch statements. First, dataflow analysis is performed to detect a set of coalescent conditional branches, which are often separated by blocks of intervening instructions. Secondly, several techniques are applied to reduce the cost of performing an indirect jump operation, often requiring the execution of only two instructions on a SPARC. Finally, the control flow is restructured using code duplication to replace the set of branches with an indirect jump. Thus, the transformation essentially provides early resolution of conditional branches that may originally have been some distance from the point where the indirect jump is inserted. The transformation can be frequently applied with often significant reductions in the number of instructions executed, total cache work, and execution time. In addition, we show that with branch target buffer support, indirect jumps improve branch prediction since they cause fewer mispredictions than the set of branches they replaced. Copyright © 1999 John Wiley & Sons, Ltd. 相似文献

14.

嵌入式处理器中跟踪缓冲的设计考虑

崔亮陈章龙《小型微型计算机系统》2005,26(3):443-447

跟踪缓冲可以记录程序执行的历史轨迹,这在各种在线仿真器(ICE)和逻辑分析仪中得到了广泛的应用．文章主要讲述了嵌入式系统中的跟踪缓冲的设计思想。并介绍了嵌入式系统领域中的不同架构处理器中跟踪缓冲的设计方法,对它们所表现的性能差异也进行了简单的评估．相似文献

15.

分片式处理器上的谓词执行技术优化

邓春华安虹路璐王耀彬《小型微型计算机系统》2012,33(2):399-403

谓词执行能使分片式处理器充分利用众多的执行单元,开发指令级并行性.但因此形成的超块也使得分支误预测代价增大,所以提高分支预测器的性能至关重要.本文提出一种基于剖析信息决策的谓词执行技术,该技术利用剖析信息对谓词执行前后的执行周期进行估算,从而对分支的谓词执行进行决策.该技术使分支预测器的命中率提高了0.68%～3.50%,使系统性能提高了1.67%～8.33%.同时,利用select指令表示谓词化指令也消除了重命名阶段寄存器多定义问题. 相似文献

16.

提前分支预测结构及算法研究

下载免费PDF全文

靳文兵石峰左琦张杨《计算机研究与发展》2013,50(10):2228-2238

在理论上,越来越复杂的分支预测算法和更大的存储结构会使分支预测精度不断提高,但当前复杂算法和庞大数据结构所引发的分支预测时延已无法满足流水线单周期运行要求.针对分支预测精度和时延的矛盾,设计提出提前分支预测结构(ahead branch prediction architecture,ABPA).ABPA为流水线前端取指部件提供简单的分支预测表,以实现快速分支预测;复杂的预测算法和较大的存储结构均被移至流水线后端实现,从而保证了分支预测精度.对于一直难以准确预测的多目标间接分支指令,设计提出基于分支历史和目标路径的间接分支预测算法(indirect branch prediction algorithm based on branch history and target path,BHTP algorithm).提前分支预测算法采用改进的高精度分支预测算法和BHTP算法的混合.嵌入提前分支预测算法的分支预测引擎实现流水线后端的分支推测和目标预测,以及流水线前端的分支预测表更新.实验结果表明：采用ABPA结构和BHTP算法的分支预测系统平均精度达到94.27%.设计不仅实现了快速、高精度分支预测,更为分支预测的深入研究提供了条件. 相似文献

17.

BTB索引散列算法的研究与设计

下载免费PDF全文

王国澎胡向东尹飞朱英《计算机研究与发展》2014,51(9):2003-2011

分支误预测是影响高性能处理器性能进一步提升的一个主要因素.现代处理器采用分支目标缓存(branch target buffer, BTB)预测分支指令的目标地址,BTB的预测精度受限于其命中率.由于程序中分支指令的分布并不均匀,传统的BTB索引方式无法充分利用BTB资源,从而造成不必要的冲突缺失,影响分支目标地址的预测精度,采用散列索引方式优化访问映射关系是有效解决方法之一.当前大量文献研究了cache的访问方式,但对BTB的散列索引算法的专门探讨则显不足.为了消除分支指令的分布空洞,离散分支指令和BTB条目的固有映射关系,设计了用于BTB索引的XOR散列算法和优化的bit-select索引算法,使用概率方法对BTB单组最大映射数期望的上界作了估计,并对这两种散列索引算法的效果进行了模拟评估.实验结果表明,散列映射方式能够较好地避免BTB冲突缺失造成的预测失败,XOR散列算法的离散效果更好. 相似文献

18.

一种有效的同时多线程处理器取指控制机制 总被引：1，自引：0，他引：1

何立强刘志勇《计算机学报》2006,29(4):535-543

同时多线程处理器通过每时钟周期从多个运行的线程取指令执行,极大地提高了处理器的性能.分支预测器的预测精度和取指策略的效率是影响同时多线程处理器性能的关键.通过将一个基于值的分支预测器和一个基于线程推进速度的取指策略相结合,提出一种新的取指控制机制.该结构的硬件开销较小,实现复杂度较低.实验结果表明,该取指控制机制有效地提高了处理器的性能,其相对于传统取指控制机制的性能加速比为28%且该加速比也高于目前基于流缓冲区和基于分支分类器的取指控制机制. 相似文献

19.

Wenbing JIN Feng SHI Qiugui SONG Yang ZHANG 《Frontiers of Computer Science》2013,7(6):914-923

In theory, branch predictors with more complicated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be implemented because of their long access delay. To date, many strategies have been proposed to balance delay with accuracy, but none has completely solved the issue. The architecture for ahead branch prediction (A²BP) separates traditional predictors into two parts. First is a small table located at the front-end of the pipeline, which makes the prediction brief enough even for some aggressive processors. Second, operations on complicated algorithms and large data structures for accurate predictions are all moved to the back-end of the pipeline. An effective mechanism is introduced for ahead branch prediction in the back-end and small table update in the front. To substantially improve prediction accuracy, an indirect branch prediction algorithm based on branch history and target path (BHTP) is implemented in A²BP. Experiments with the standard performance evaluation corporation (SPEC) benchmarks on gem5/SimpleScalar simulators demonstrate that A²BP improves average performance by 2.92% compared with a commonly used branch target buffer-based predictor. In addition, indirect branch misses with the BHTP algorithm are reduced by an average of 28.98% compared with the traditional algorithm. 相似文献

20.

A General Low-Cost Indirect Branch Prediction Using Target Address Pointers

下载免费PDF全文

谢子超佟冬黄明凯《计算机科学技术学报》2014,29(6):929-946

Nowadays energy-efficiency becomes the first design metric in chip development. To pursue higher energy efficiency, the processor architects should reduce or eliminate those unnecessary energy dissipations. Indirect-branch pre- diction has become a performance bottleneck, especially for the applications written in object-oriented languages. Previous hardware-based indirect-branch predictors are generally inefficient, for they either require significant hardware storage or predict indirect-branch targets slowly. In this paper, we propose an energy-efficient indirect-branch prediction technique called TAP （target address pointer） prediction. Its key idea includes two parts： utilizing specific hardware pointers to accelerate the indirect branch prediction flow and reusing the existing processor components to reduce additional hardware costs and power consumption. When fetching an indirect branch, TAP prediction first gets the specific pointers called target address pointers from the conditional branch predictor, and then uses such pointers to generate virtual addresses which index the indirect-branch targets. This technique spends similar time compared to the dedicated storage techniques without requiring additional large amounts of storage. Our evaluation shows that TAP prediction with some representative state-of-the-art branch predictors can improve performance significantly over the baseline processor. Compared with those hardware-based indirect-branch predictors, the TAP-Perceptron scheme achieves performance improvement equivalent to that provided by an 8 K-entry TTC predictor, and also outperforms the VPC predictor. 相似文献