期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

喻明艳张祥建杨兵《计算机辅助设计与图形学学报》2010,22(4)

传统的分支目标缓冲器(BTB)每个取指周期都要进行访问,由于程序中的分支指令只占总指令数的20%左右,使得大约80%的BTB访问都是无效的.为此,利用程序控制流中分支指令间距固定的特性,提出一种对性能影响极小的BTB跳跃访问算法.在BTB中存储分支指令到运行路径中下一条分支指令的距离,BTB命中后,根据相应的分支距离来关闭当前分支指令与下一条分支指令之间的BTB访问,以有效地提高访问效率并降低动态功耗.该算法在嵌入式处理器中实现时只控制预测跳转分支指令的BTB跳跃访问,减少了硬件资源的开销.在硬件模型上进行模拟和综合后的结果表明,在128分支项的BTB中,采用文中算法可以降低72%的动态功耗,而性能损失仅为0.013%. 相似文献

2.

嵌入式处理器动态分支预测机制研究与设计 总被引：2，自引：1，他引：1

黄伟王玉艳章建雄《计算机工程》2008,34(21):163-165

针对嵌入式处理器的特定应用环境,通过对传统神经网络算法的改进,结合定制的分支目标缓冲,提出一种复合式动态分支预测机制。该机制基于全局索引方式,对BTB结构进行定制设计,实现对循环逻辑中最后一条分支指令的精确预测。实验结果表明,该动态分支预测机制能降低硬件复杂度,提高预测精度。相似文献

3.

基于循环体访问过滤的低功耗分支目标缓冲器

高金加孟建熠陈志坚《计算机应用研究》2012,29(3):998-1001

分支目标缓存(BTB)是高端嵌入式CPU的主要耗能部件之一。针对BTB访问中引入的冗余功耗问题,提出了一种循环体访问过滤机制消除循环体指令流中顺序指令对BTB的无效访问。进一步提出了一种分支跟踪方法补偿循环过滤机制对循环体中非循环类分支指令的错误过滤造成的性能损失,节省了循环体指令流中顺序指令访问BTB的大量冗余功耗。基于Powerstone基准程序的仿真实验表明,在128表项BTB配置下,二级循环过滤器和4表项分支踪迹表可以减少约71.9%的BTB功耗,而平均每条指令周期数(CPI)退化仅为0.66%。相似文献

4.

二进制翻译中解析多目标分支语句的图匹配方法 总被引：1，自引：0，他引：1

陈龙武成岗谢海斌崔慧敏张兆庆《计算机研究与发展》2008,45(10)

二进制翻译技术现已成为实现软件移植的重要手段.在二进制翻译系统中,如何有效地挖掘程序的代码并对其进行高效翻译是影响系统性能的关键,而二进制代码中间接跳转语句的存在,使得静态时难以得到它的跳转目标,影响了代码的发掘率和最终的翻译效果.在通常的应用程序中,间接跳转指令经常用来实现多目标分支语义,分支目标存放在跳转表中.提出了一种解析多目标分支语句及其跳转表的方法,能够挖掘出间接跳转的目标,进而对其进行有效翻译并提高二进制翻译系统的性能.该方法提出使用语义图来对预期语义进行刻画和表达.语义图能够对考察的指令序列进行语义提取,识别出与预期语义相匹配的指令流,还可以应对编译器在不同优化选项下生成的指令,并能有效滤除不相关指令带来的干扰.实验结果表明,对于SPEC CINT2000中的部分测试用例,代码翻译的覆盖率可以提高9.85%～22.13%,相应带来的性能提升可达到8.30%～17.71%,而使用的算法时间复杂度仅为O(1). 相似文献

5.

BTB索引散列算法的研究与设计

王国澎胡向东尹飞朱英《计算机研究与发展》2014,51(9)

分支误预测是影响高性能处理器性能进一步提升的一个主要因素.现代处理器采用分支目标缓存(branch target buffer,BTB)预测分支指令的目标地址,BTB的预测精度受限于其命中率.由于程序中分支指令的分布并不均匀,传统的BTB索引方式无法充分利用BTB资源,从而造成不必要的冲突缺失,影响分支目标地址的预测精度,采用散列索引方式优化访问映射关系是有效解决方法之一.当前大量文献研究了cache的访问方式,但对BTB的散列索引算法的专门探讨则显不足.为了消除分支指令的分布空洞,离散分支指令和BTB条目的固有映射关系,设计了用于BTB索引的XOR散列算法和优化的bit-select索引算法,使用概率方法对BTB单组最大映射数期望的上界作了估计,并对这两种散列索引算法的效果进行了模拟评估.实验结果表明,散列映射方式能够较好地避免BTB冲突缺失造成的预测失败,XOR散列算法的离散效果更好. 相似文献

6.

精简的指令预测与分支部件的设计

刘权胜杨洪斌吴悦《计算机工程与设计》2008,29(7):1603-1605

提出了一种精简指令预测与分支部件.指令预测部件由访问延迟不同的两级全相联缓冲组成,在基于同时多线程技术的微处理器条件下使用改进的精简预测部件后,取得了较高的预测准确性.实现了基于超前扩展进位加法器快速计算目标地址与比较器确定指令是否跳转的两线程分支部件的设计,提高了硬件资源的利用率与运算的效率.实例测试结果表明,精简预测与分支部件在测试的过程中达到了较好的效果. 相似文献

7.

一种高能效的结构不对称指令缓存

刘骁高红光陈芳园丁亚军《计算机工程与科学》2017,39(3):443-450

在现代微处理器中,指令缓存的Tag读取、比较消耗了指令缓存较大比例的能耗。提出一种基于推断的低能耗指令缓存:不对称指令缓存。根据跳转指令比例低的特点,在该结构中区别处理跳转指令和顺序指令,使用和数据不完全对应的简化标记管理位。该结构采用了命中推断和变长指令取指两种创新技术,其中基于命中推断技术解决了指令缓存命中时Tag比较过多的问题;使用变长指令取指技术提高了顺序指令块的命中率。实验结果表明,对于选取的SPEC2006测试程序,不对称指令缓存结构较常规L1指令Cache取指能耗下降了40%~60%,比无标记指令缓存结构TH IC能耗降低了9%;取指ED2P方面,较常规L1指令Cache优化约50%,比TH IC结构优化约17%。相似文献

8.

基于预缓冲机制的低功耗指令Cache

下载免费PDF全文

王冶张盛兵王党辉《计算机工程》2012,38(1):268-269,272

为降低微处理器中片上Cache的能耗,设计一种基于预缓冲机制的指令Cache。通过预缓冲控制部件的预测,使处理器需要的指令尽可能在缓冲区命中,从而避免访问指令Cache所造成的功耗。对7个测试程序的仿真结果表明,预缓冲机制能节省23.23%的处理器功耗,程序执行性能平均提升7.53%。相似文献

9.

复杂模式下的多分支语句恢复技术

下载免费PDF全文

张龙杰谢晓方袁胜智李洪周《计算机工程》2009,35(21):67-70

对多分支结构编译后的各种实现模式进行研究分析,对复杂条件下典型的多分支语句实现模式进行形式化描述,在多分支结构的识别过程中,分析索引表和跳转表调用指令的格式,提出双特征指令匹配算法。通过程序切片,建立索引表和跳转表调用的表达式标准型,消除多分支语句恢复过程中编译器类型和版本差异的影响,提高了算法通用性。相似文献

10.

IA 32反编译中的多分支语句恢复技术*

张龙杰谢晓方袁胜智李洪周《计算机应用研究》2009,26(6):2359-2361

对IA32反编译后多分支结构的各种实现模式进行了系统的研究分析,并对复杂条件下典型的多分支结构实现模式进行了形式化的描述。在多分支结构的识别过程中,通过对索引表和跳转表调用指令的格式分析,提出了双特征指令匹配算法。通过程序切片建立了索引表和跳转表调用的表达式标准型,消除了多分支语句恢复过程中编译器类型和版本差异带来的影响,提高了算法通用性,对于进行程序反解及软件逆向工程具有重要的参考价值。相似文献

11.

Linked instruction caches for enhancing power efficiency of embedded systems

Chang-Jung Ku Ching-Wen Chen An Hsia Chun-Lin Chen 《Microprocessors and Microsystems》2014

The power consumed by memory systems accounts for 45% of the total power consumed by an embedded system, and the power consumed during a memory access is 10 times higher than during a cache access. Thus, increasing the cache hit rate can effectively reduce the power consumption of the memory system and improve system performance. In this study, we increased the cache hit rate and reduced the cache-access power consumption by developing a new cache architecture known as a single linked cache (SLC) that stores frequently executed instructions. SLC has the features of low power consumption and low access delay, similar to a direct mapping cache, and a high cache hit rate similar to a two way-set associative cache by adding a new link field. In addition, we developed another design known as a multiple linked caches (MLC) to further reduce the power consumption during each cache access and avoid unnecessary cache accesses when the requested data is absent from the cache. In MLC, the linked cache is split into several small linked caches that store frequently executed instructions to reduce the power consumption during each access. To avoid unnecessary cache accesses when a requested instruction is not in the linked caches, the addresses of the frequently executed blocks are recorded in the branch target buffer (BTB). By consulting the BTB, a processor can access the memory to obtain the requested instruction directly if the instruction is not in the cache. In the simulation results, our method performed better than selective compression, traditional cache, and filter cache in terms of the cache hit rate, power consumption, and execution time. 相似文献

12.

Variable Length Instruction Compression on Transport Triggered Architectures

Timo Viitanen Janne Helkala Heikki Kultala Pekka Jääskeläinen Jarmo Takala Tommi Zetterman Heikki Berg 《International journal of parallel programming》2018,46(6):1283-1303

The memories used for embedded microprocessor devices consume a large portion of the system’s power. The power dissipation of the instruction memory can be reduced by using code compression methods, which may require the use of variable length instruction formats in the processor. The power-efficient design of variable length instruction fetch and decode is challenging for static multiple-issue processors, which aim for low power consumption on embedded platforms. The memory-side power savings using compression are easily lost on inefficient fetch unit design. We propose an implementation for instruction template-based compression and two instruction fetch alternatives for variable length instruction encoding on transport triggered architecture, a static multiple-issue exposed data path architecture. With applications from the CHStone benchmark suite, the compression approach reaches an average compression ratio of 44% at best. We show that the variable length fetch designs reduce the number of memory accesses and often allow the use of a smaller memory component. The proposed compression scheme reduced the energy consumption of synthesized benchmark processors by 15% and area by 33% on average. 相似文献

13.

Area efficient remote code execution platform with on-demand instruction manager for cloud-connected code executable IoT devices

《Simulation Modelling Practice and Theory》2017

An energy-area efficient cloud-connected software execution architecture in IoT sensor processor is proposed. A remotely installed sensor device such as an environmental activity monitor is commonly implemented using the conventional embedded processor only providing the fixed services, which includes statically compiled embedded software in on-chip flash memory. Instead of conventional on-chip flash memory for an instruction code area, we adopt an virtually mapped internal memory concept to realize cloud-connected software execution, in where the remote storage area via the IoT platform is indirectly mapped onto the physical address space of the instruction memory using a dynamic address translation technique. The proposed cloud-connected architecture of the system enables on-demand code execution for the instructions, which are fetched from the cloud-side remote storage area in the runtime, instead of using a directly-connected on-chip instruction bus. The proposed storage-less approach may be adopted to reduce the high access current and large chip area overhead by eliminating the on-chip code flash memory. To reduce the access current overhead in order to retrieve the requested instruction, a small-sized RAM scratch pad is adopted for retaining the hot-spot instruction code and early filled with pre-estimated instruction sector. The experimental results show that the proposed technique reduces the energy consumption and packet delay of an IoT device for executing the remote embedded software, as well as the reduced chip area by realizing a storage-less sensor architecture. 相似文献

14.

基于指令聚类与指令调度的嵌入式软件功耗优化研究

陈嘉董渊杨阳戴桂兰王生原《小型微型计算机系统》2006,27(1):175-179

选用指令级能耗评估模型，提出和验证了一种基于指令聚类与指令调度的功耗优化方案．该方案采用深度优先算法搜索局部最优解，挑选出能耗较小的一种指令序列．又兼顾测试工作量与精确度，将能耗相似的指令归入同类，有效降低了获取相邻指令切换能耗参数的工作量过大这一问题．通过分析基于SimpleSealar／Wattch模拟器的实验结果，指出仅用指令调度技术进行指令级功耗优化，其效果有限，为了提高优化效率，必须进行更高级别的功耗评估与优化．相似文献

15.

基于动态电压调节的低功耗视频解码技术研究

钟伟军刘明业彭刚阎光伟《计算机工程与应用》2005,41(23):115-117

讨论在嵌入式系统中使用动态电压调节技术降低视频解码功耗。提出一种基于动态电压调节的低功耗解码技术。该方法采用移动平均法预测帧的解码时间,依据预测的结果动态地调节解码过程中微处理器的工作电压,降低能量消耗。实验结果表明,基于动态电压调节的视频解码器比常规解码器减少10%￣30%的能量消耗。相似文献

16.

嵌入式系统源程序级软件能耗建模与分析

叶珊郭荣佐黄君《计算机应用研究》2017,34(10)

针对嵌入式系统能耗对各种嵌入式设备工作时长的影响,本文从系统指令级到源程序级的软件能耗考虑,首先通过分析设备源程序级语句的相关特征,基于源程序语句的指令能耗,提出一种针对源程序级的能耗模型,然后基于模型分析对五个经典算法的源程序中不同类别语句进行能耗优化,最后分别对五组经典算法优化前后的能耗比较。实验表明,本模型使得优化后的源程序能耗降低了9.46%-50.29%,达到了降低嵌入式系统软件能耗的目的。相似文献

17.

面向无线网络的可伸缩视频编码传输策略_*

田波蔡述庭杨宜民《计算机应用研究》2016,33(8)

针对现有面向无线网络的可伸缩视频编码(Scalable Video Coding, SVC)传输策略未能充分考虑失真和能耗的问题,提出了一种基于失真和节点能耗最小化的SVC传输策略。该策略在分析SVC的编码失真、传输过程中的丢包失真的基础上,计算了接收端的视频失真总和;通过计算SVC传输系统的功率,对无线网络中的节点能耗进行了分析。然后综合考虑了能耗、传输时间及质量要求,将SVC的传输策略转化为一个优化问题,进而得到最优的SVC编码参数,在获得较优视频质量的前提下实现了SVC的可靠传输。仿真实验结果表明,与目前典型的SVC传输策略相比,该策略不但有效降低了SVC传输过程中的平均失真,而且在相同的能量消耗水平下,获得了更好的视频质量。相似文献

18.

Optimizing CAM-based instruction cache designs for low-power embedded systems

Juan L. Alexander V. 《Journal of Systems Architecture》2008,54(12):1155-1163

Energy consumption and power dissipation are important concerns in the design of embedded systems and they will become even more crucial with finer process geometry, higher frequencies, deeper pipelines and wider issue designs. In particular, the instruction cache consumes more energy than any other processor module, especially with commonly used highly associative CAM-based implementations.Two energy-efficient approaches for highly associative CAM-based instruction cache designs are presented by means of using a segmented wordline and a predictor-based instruction fetch mechanism. The latter is based on the fact that not all instructions in a given I-cache fetch are used due to taken branches. The proposed Fetch Mask Predictor unit determines which instructions in a cache access will actually be used to avoid fetching any of the other instructions. Both proposed approaches are evaluated for an embedded 4-wide issue processor in 100 nm technology. Experimental results show average I-cache energy savings of 48% and overall processor energy savings of 19%. 相似文献