期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

黄明凯刘先华谭明星谢子超程旭《计算机研究与发展》2015,52(1)

解释器广泛应用于Java虚拟机、JavaScript执行引擎等托管运行环境中.解释器通常使用间接转移指令实现字节码分派.在现代多发射多级流水的微处理器中,间接转移预测失效严重制约解释器的性能.针对解释器提出了一种字节码指针引导的间接转移预测技术,其核心思想是使用解释器特有的字节码指针值区分不同的间接转移场景.该技术使用软硬件协同的方式,解释器中插入专门的引导指令以标记字节码指针,预测器在运行时刻使用字节码指针值预测转移目标地址.实验结果表明,该技术与常用的转移目标缓冲预测器相比,能提升Java解释器的性能达34.7％,能提升JavaScript解释器的性能达8.3％,与专用的硬件间接转移预测器TTC(tagged target cache)相比,也能提升Java解释器的性能达21.9％. 相似文献

2.

频繁现场切换条件下转移预测方案的研究

陈智勇《微计算机信息》2008,24(2):301-302

在超流水和超标量处理机中转移的误预测是性能降低的主要原因之一.目前许多基于转移历史开发的转移预测器,由于频繁的现场切换而导致了预测准确度的降低.这篇论文研究的转移预测方案同时采用了转移预测器LGshare和Skew的思想,在此基础上还使用一个用户BHR和一个内核BHR代替了两级预测方案中一个公用的全局BHR.该转移预测方案不仅很好地解决了转移内相关和转移间相关,而且在现场切换频繁时也能很好地保存和恢复各进程的转移预测信息,提高了转移预测的准确度. 相似文献

3.

基于指令距离的存储相关性预测方法

路冬冬何军杨剑新王飙《计算机应用》2013,33(7):1903-1907

存储相关性预测对于减少存储相关性冲突、提高微处理器性能具有十分重要的作用。针对传统相关性预测器硬件开销大、可实现性较差的缺点,通过对存储相关性的局部性分析,提出了一种基于指令距离的存储相关性预测方法。该方法充分利用了发生存储相关性冲突的指令在指令距离上的局部性,预测冲突指令的指令距离,进而控制部分访存指令的发射时机,大大减少了存储相关性冲突的次数。实验结果表明,在硬件开销约为1KB的情况下,使用基于指令距离的相关性预测器后,每个时钟周期平均执行的指令数可以提高1.70%,最高可以提高5.11%。在硬件开销较小的情况下,较大程度提高了微处理器的性能。相似文献

4.

基于同时多线程的TBHBP分支预测器研究

李静梅关海洋《计算机科学》2012,39(9):307-311

针对传统处理器分支预测器存在分支预测信息混乱、分支指令别名冲突和容量冲突率高的缺点,提出基于同时多线程处理器的分支预测器TBHBP。该分支预测器采取线程历史信息与基于地址索引的局部历史信息相结合的综合历史信息作为模式匹配表PHT的索引,并采取线程独立拥有线程历史寄存器和分支历史寄存器的方式,通过新增分支结果输出表来提高指令的分支预测执行速度。研究结果表明,TBHBP分支预测器有效解决了分支信息过时、分支指令别名和容量冲突的问题。与Gshare分支预测器相比,其指令吞吐率提升了12.5%,分支误预测率和误预测路径取指率分别下降了0.5%和2.1%。相似文献

5.

一种支持Superscalar-VLIW混合架构处理器的混合分支预测设计

《计算机应用与软件》2017,(2)

描述在一款支持超标量与超长指令字结构的混合架构数字信号处理器上设计的分支预测结构。为控制硬件复杂度并充分提高预测准确度,设计双峰预测器与PAp预测器混合型预测结构,充分发挥两种预测器的优点。在设计完成的处理器上,运行标准DSPstone程序。实验结果表明,添加分支预测结构使得处理器性能平均提升23%,并且混合型预测结构相比单一预测结构在准确度方面优势明显。相似文献

6.

自适应存储相关性预测器

班冬松颜世云李礼杨剑新路冬冬《计算机科学》2013,40(4):38-40

访存指令的乱序执行会导致存储相关性冲突。存储相关性预测技术能够减少相关性冲突,提升处理器性能。已有学术研究工作普遍存在硬件开销大、实现复杂度高的不足;商业处理器中的存储相关性预测技术虽然实现简单,但又存在不具有自适应性或不利于利用指令并行性等问题。设计了一种简单、高效的存储相关性预测器SMDP,它具有自适应、易实现、充分利用指令并行性等优点。实验表明,SMDP能有效提高处理器性能,在与实际处理器接近的较小指令窗口配置下,与盲预测机制相比,平均性能提高0.7991%,最高可达4.9225%。相似文献

7.

基于Boosting的代价敏感软件缺陷预测方法

杨杰燕雪峰张德平《计算机科学》2017,44(8):176-180, 206

Boosting重抽样是常用的扩充小样本数据集的方法,首先针对抽样过程中存在的维数灾难现象,提出随机属性子集选择方法以进行降维处理;进而针对软件缺陷预测对于漏报与误报的惩罚因子不同的特点,在属性选择过程中添加代价敏感算法。以多个基本k-NN预测器为弱学习器,以代价最小为属性删除原则,得到当前抽样集的k值与属性子集的预测器集合,采用代价敏感的权重更新机制对抽样过程中的不同数据实例赋予相应权值,由所有预测器集合构成自适应的集成k-NN强学习器并建立软件缺陷预测模型。基于NASA数据集的实验结果表明,在小样本情况下,基于Boosting的代价敏感软件缺陷预测方法预测的漏报率有较大程度降低,误报率有一定程度增加,整体性能优于原来的Boosting集成预测方法。相似文献

8.

动态翻译系统中的间接转移关联软件预测算法

贾宁杨春佟冬王克义《计算机研究与发展》2014,(3)

动态翻译系统每执行一次间接转移指令均需进行一次地址转换,该过程是翻译系统性能开销的主要来源之一.无特殊硬件支持的翻译系统常采用软件预测法来降低地址转换开销,而软件预测法的预测准确率较低,制约其对翻译系统整体性能的提升.低开销关联软件预测算法(low-overhead correlated software prediction,LOCSP)可利用代码副本区分待预测指令的不同转移场景,将到达该指令的多条动态执行路径分离为多个互不重合的代码缓存副本,并为各个副本提供独立的预测链.从而在不增加动态指令数的前提下实现关联预测,显著提升软件预测的预测准确率.同时,LOCSP算法基于动态剖析的结果,仅对部分难预测的热点间接转移指令进行关联软件预测,进一步降低预测开销.实验表明,相比软件预测法,LOCSP算法可将平均预测准确率从58.9%提升至82.2%,将翻译系统的整体性能开销平均降低19.3%,最高降低41.9%,而平均静态代码数量仅增加2.4%. 相似文献

9.

基于MATLAB的BP网络数字图像预测器设计

张义兵戴瑜兴《计算机仿真》2007,24(3):223-226,306

针对数字图像预测编码系统的预测器设计问题,采用三层BP网络作为预测器,以邻域像素作为输入,隐含层、输出层激活函数分别采用双曲正切S型函数、线性函数.给出了网络训练算法原理,研究运用MALTAB工具设计预测器的方法.通过实验手段讨论了在给定均方差和训练时的数据量的情况下,隐含层神经元的数目与预测精度无关的结论,介绍了基于MATLAB的多种网络训练方法,最后设计的预测器具有结构简单、易于实现的优点;对两幅特征明显的图像进行预测,取得满意的效果.实验结果表明该设计方法是可行、有效的. 相似文献

10.

基于历史长度自适应的分支预测方法

赵朝君陈晨陈志坚孟建熠《计算机辅助设计与图形学学报》2015,(4)

通过研究处理器动态分支预测器中预测效率与分支历史长度的关系,针对程序中各分支指令存在不同最优历史长度的规律,提出一种搜索各分支指令最佳历史长度的分支预测方法.该方法通过实时监测分支指令的预测准确率,在分支预测表硬件资源不变的情况下动态调整预测器的历史长度,以适应程序的动态运行特征.实验结果表明,在相同硬件资源下,文中方法相对于Gshare预测器错误率降低15.8%,相对于Bi-mode预测器预测错误率降低10.3%. 相似文献

11.

SWIP Prediction: Complexity-Effective Indirect-Branch Prediction Using Pointers

下载免费PDF全文

谢子超佟冬黄明凯史秦青程旭《计算机科学技术学报》2012,27(4):754-768

Predicting indirect-branch targets has become a performance bottleneck for many applications.Previous highperformance indirect-branch predictors usually require significant hardware storage or additional compiler support,which increases the complexity of the processor front-end or the compilers.This paper proposes a complexity-effective indirectbranch prediction mechanism,called the Set-Way Index Pointing (SWIP) prediction.It stores multiple indirect-branch targets in different branch target buffer (BTB) entries,whose set indices and way locations are treated as set-way index pointers.These pointers are stored in the existing branch-direction predictor.SWIP prediction reuses the branch direction predictor to provide such pointers,and then accesses the pointed BTB entries for the predicted indirect-branch target.Our evaluation shows that SWIP prediction could achieve attractive performance improvement without requiring large dedicated storage or additional compiler support.It improves the indirect-branch prediction accuracy by 36.5% compared to that of a commonly-used BTB,resulting in average performance improvement of 18.56%.Its energy consumption is also reduced by 14.34% over that of the baseline. 相似文献

12.

A General Low-Cost Indirect Branch Prediction Using Target Address Pointers

下载免费PDF全文

谢子超佟冬黄明凯《计算机科学技术学报》2014,29(6):929-946

Nowadays energy-efficiency becomes the first design metric in chip development. To pursue higher energy efficiency, the processor architects should reduce or eliminate those unnecessary energy dissipations. Indirect-branch pre- diction has become a performance bottleneck, especially for the applications written in object-oriented languages. Previous hardware-based indirect-branch predictors are generally inefficient, for they either require significant hardware storage or predict indirect-branch targets slowly. In this paper, we propose an energy-efficient indirect-branch prediction technique called TAP （target address pointer） prediction. Its key idea includes two parts： utilizing specific hardware pointers to accelerate the indirect branch prediction flow and reusing the existing processor components to reduce additional hardware costs and power consumption. When fetching an indirect branch, TAP prediction first gets the specific pointers called target address pointers from the conditional branch predictor, and then uses such pointers to generate virtual addresses which index the indirect-branch targets. This technique spends similar time compared to the dedicated storage techniques without requiring additional large amounts of storage. Our evaluation shows that TAP prediction with some representative state-of-the-art branch predictors can improve performance significantly over the baseline processor. Compared with those hardware-based indirect-branch predictors, the TAP-Perceptron scheme achieves performance improvement equivalent to that provided by an 8 K-entry TTC predictor, and also outperforms the VPC predictor. 相似文献

13.

Improving branch prediction accuracy by reducing pattern history table interference 总被引：1，自引：0，他引：1

Po-Yung Chang Marius Evers Yale N. Patt 《International journal of parallel programming》1997,25(5):339-362

A deeply pipelined superscalar processor needs an accurate branch predictor in order to approach its performance potential. The 2-level branch predictors have been shown to achieve high prediction accuracy, yet they still suffer a significant number of mispredictions. It has been shown that a number of these mispredictions are due to interference in the pattern history tables. This paper details a method for reducing the amount of pattern history table interference by dynamically identifying some easily predictable branches and inhibiting the pattern history table update for these branches. We show that inhibiting the update in this manner reduces the amount of destructive interference in the global history variation of the 2-level branch predictor, resulting in significantly improved branch prediction accuracy for the SPEC 95 benchmarks. For example, for a 2 K Byte gshare predictor, we eliminate 38% of the mispredictions for the gcc benchmark. 相似文献

14.

Probabilistic counter updates for predictor hysteresis and bias

Riley N. Zilles C. 《Computer Architecture Letters》2006,5(1):18-21

Hardware predictor designers have incorporated hysteresis and/or bias to achieve desired behavior by increasing the number of bits per counter. Some resulting proposed predictor designs are currently impractical because their counter tables are too large. We describe a method for dramatically reducing the amount of storage required for a predictor's counter table with minimal impact on prediction accuracy. Probabilistic updates to counter state are implemented using a hardware pseudo-random number generator to increment or decrement counters a fraction of the time, meaning fewer counter bits are required. We demonstrate the effectiveness of probabilistic updates in the context of Fields et al.'s critical path predictor, which employs a biased 6-bit counter. Averaged across the SPEC CINT2000 benchmarks, our 2-bit and 3-bit probabilistic counters closely approximate a 6-bit deterministic one (achieving speedups of 7.75% and 7.91% compared to 7.94%) when used for criticality-based scheduling in a clustered machine. Performance degrades gracefully, enabling even a 1-bit probabilistic counter to outperform the best 3-bit deterministic counter we found. 相似文献

15.

井下噪声测量仪的设计与实现

赵恩彪刘卫东邓楠《工矿自动化》2011,37(12):71-73

针对煤矿井下噪声的危害,提出了一种基于STC单片机的井下噪声测量仪的设计方案,介绍了该测量仪的硬件和软件设计。测试结果表明,该测量仪实现了噪声声级的准确测量,测定误差小于±1.25%,达到了2级声级计指标要求。相似文献

16.

A model to facilitate costing,pricing and budgeting of computer services

《Information & Management》1988,14(5):235-241

This paper addresses the problem of establishing a financial policy for managing computer systems, and proposes a model incorporating the three facets of determining the cost rates of hardware resources, evaluating the impact of pricing policy, and budget planning. This approach is based on statistical analysis of the correlation among the consumption of various hardware resources. 相似文献

17.

一种有效的同时多线程处理器取指控制机制 总被引：1，自引：0，他引：1

何立强刘志勇《计算机学报》2006,29(4):535-543

同时多线程处理器通过每时钟周期从多个运行的线程取指令执行,极大地提高了处理器的性能.分支预测器的预测精度和取指策略的效率是影响同时多线程处理器性能的关键.通过将一个基于值的分支预测器和一个基于线程推进速度的取指策略相结合,提出一种新的取指控制机制.该结构的硬件开销较小,实现复杂度较低.实验结果表明,该取指控制机制有效地提高了处理器的性能,其相对于传统取指控制机制的性能加速比为28%且该加速比也高于目前基于流缓冲区和基于分支分类器的取指控制机制. 相似文献

18.

Distributed topology discovery in self-assembled nano network-on-chip

《Computers & Electrical Engineering》2014,40(8):292-306

In this paper, we present DiSR, a distributed approach to topology discovery and defect mapping in a self-assembled nano network-on-chip. The main aim is to achieve the already-proven properties of segment-based deadlock freedom requiring neither a topology graph as input, nor a centralized algorithm to configure network paths. After introducing the conceptual elements and the execution model of DiSR, we show how the open-source Nanoxim platform has been used to evaluate the proposed approach in the process of discovering irregular network topology while establishing network segments. Comparison against a tree-based approach shows how DiSR still preserves some important properties (coverage, defect tolerance, scalability) while avoiding resource hungry solutions such as virtual channels and hardware redundancy. Finally, we propose a gate-level hardware implementation of the required control logic and storage for DiSR, demonstrating a relatively acceptable impact ranging from 10 to about 20% of the budget of transistors available for each node. 相似文献

19.

Efficient parallel architecture for multi-level forward discrete wavelet transform processors

Syed Mahfuzul Aziz^{Author Vitae} Duc Minh Pham Author Vitae 《Computers & Electrical Engineering》2012,38(5):1325-1335

A resource efficient and high-performance architecture for a two-dimensional multi-level discrete wavelet transform processor is presented in this paper. The JPEG2000 standard integer lossless 5-3 filter has been implemented. It achieves optimal hardware utilisation with minimal combinational logic block slices and high frequency of operation. To reduce the hardware complexity and to achieve high performance the proposed architecture implements lifting scheme with a single multiplier-free processing element to perform both predict and update operations. Symmetric extension is used at image boundaries without requiring any extra clock cycle. The generic architecture is very flexible and can perform up to five levels of forward transform on any arbitrary image size. Synthesis of the 5-level architecture on Xilinx Virtex 5 FPGA shows that the processor can achieve a maximum frequency of operation of 221.44 MHz. The reduced hardware complexity and high frequency of operation render the design suitable for incorporation in image processing applications requiring fast operations. The 5-level design has been successfully implemented on a Xilinx Spartan 3E FPGA, utilising only 1104 slices for a 512-by-512 pixel test image, the lowest hardware requirements for a 5-level discrete wavelet transform processor reported to date. 相似文献