首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 453 毫秒
1.
 为提高按序处理器的性能和能效性,本文提出一种基于值预测和指令复用的预执行机制(PVPIR).与传统预执行方法相比,PVPIR在预执行过程中能够预测失效Load指令的读数据并使用预测值执行与该Load指令数据相关的后续指令,从而对其中的长延时缓存失效提前发起存储访问以提高处理器性能.在退出预执行后,PVPIR通过复用有效的预执行结果来避免重复执行已正确完成的指令,以降低预执行的能耗开销.PVPIR实现了一种结合跨距(Stride)预测和AVD(Address-Value Delta)预测的值预测器,只记录发生过长延时缓存失效的Load指令信息,从而以较小的硬件开销取得较好的值预测效果.实验结果表明,与Runahead-AVD和iEA方法相比,PVPIR将性能分别提升7.5%和9.2%,能耗分别降低11.3%和4.9%,从而使能效性分别提高17.5%和12.9%.  相似文献   

2.
党向磊  王箫音  佟冬  陆俊林  程旭  王克义 《电子学报》2012,40(11):2145-2151
为提高按序执行处理器的访存性能,本文提出一种预执行指导的数据预取方法(PEDP).PEDP利用跨距预取器对规则的访存模式进行预取,并在发生L2 Cache失效后通过预执行后续指令对不规则的访存模式进行精确的预取,从而结合两者的优势提高预取覆盖率.同时,PEDP利用预执行过程中提前捕获的真实访存信息指导跨距预取器的预取过程.在预执行的指导下,跨距预取器可以对预执行能够产生的符合跨距访存模式的地址更早地发起预取请求,从而改善预取及时性.此外,为进一步优化上述指导过程,PEDP使用更新过滤器有效去除指导过程中对跨距预取器的有害更新,从而提高预取准确率.实验结果表明,在平均情况下,PEDP将基准处理器的性能提升33.0%.与跨距预取和预执行各自单独使用相比,PEDP将性能分别提高16.2%和7.3%.  相似文献   

3.
本文提出了一种VLIW处理器的预取和针对循环指令的优化策略.文中重点介绍了预取普通指令和处理循环指令的方法,以及普通预取和循环预取这两种预取模式间的切换方式.基于该设计和优化方案,可以有效减小取指操作的功耗.实验证明,在针对不同的应用上,减少的功耗从40%到90%不等,优化了该VLIW多运算簇DSP处理器的性能.  相似文献   

4.
近期缓存预取算法的研究热点是使用基于模式识别的预测技术,例如Lookahead,推算访存请求的地址.此类算法一方面很难学习访存行为中的依赖缓存失效,另一方面不能精确控制预取请求发送和写回的时机.为了解决上述问题,本文提出了一种基于分支预测技术和混合模式学习的缓存预取(Instruction Flow Based Hybrid Prediction,IFBHP)算法.使用分支预测技术识别程序未来指令流中的访存指令流,通过多种地址关联模式的学习逐一计算访存指令流中每条指令的地址,写入访存地址队列.使用阈值评估未来指令流进入处理器主流水线的时刻,精确控制指令流所对应的预取请求的发送和写回.实验表明,本文算法相比STeMS(Spatio-Temporal Memory Streaming)算法、ISB++(Irregular Stream Buffer++)算法、SANGAM算法、IPCP(Instruction Pointer Classifier based spatial Prefetching)算法一级数据的读操作缓存失效次数分别平均减少31.58%,28.85%,17.85%,11....  相似文献   

5.
杜贵然  窦勇  徐明  周兴铭 《电子学报》2002,30(2):156-159
程序中大量存在着分支指令,分析发现大多数执行分支的偏移量可从指令中直接得到.为了支持Trace预构,我们提出了分支目标提取机制——BTP,BTP扫描预取的指令块,提取分支指令及分支目标.Trace预构机制根据BTP的扫描结果,预先构造程序的执行踪迹.对SPECint95测试程序的模拟实验表明:BTP能够有效识别目标地址,一级指令Cache访问的不命中率显著下降,程序的性能也相应提高.  相似文献   

6.
基于控制流的混合指令预取   总被引:2,自引:0,他引:2  
沈立  王志英  鲁建壮  戴葵 《电子学报》2003,31(8):1141-1144
取指令能力的高低对微处理器的性能有很大影响.指令预取技术能够有效地降低指令Cache的访问失效率,提高微处理器的取指令能力,进而提高微处理器的性能.本文提出了一种基于程序控制流的混合指令预取机制,它采用顺序预取和非顺序预取相结合的方式将指令提前读入指令Cache.模拟结果显示,该方法能够有效地提高指令Cache访问的命中率,并具有实现简单,无效预取率低等特点.  相似文献   

7.
嵌入式处理器中SDRAM控制器的指令FIFO设计及优化   总被引:2,自引:0,他引:2  
本文提出了SDRAM预取FIFO的设计,充分利用SDRAM的流水特性,提高无Cache嵌入式处理器性能。通过软件指令静态分析和软件模拟两种分析方法,评估预取逻辑的深度,得到最优化的设计。基于Drystone基准程序的测试表明,本文提出的指令FIFO可以将处理器的性能提高约50%。  相似文献   

8.
应用预取策略的行缓冲指令Cache设计   总被引:1,自引:0,他引:1  
行缓冲是一种有效的低功耗方案,但其极大地降低了处理器的运算性能.设计并实现了使用预取策略的行缓冲Cache,使用一个缓冲行来预取存储在L1 Cache中的指令,从而降低了行缓冲结构中由于容量缺失而造成的流水线停顿,提升了处理器的运算性能.以Leon2的VHDL模型为试验环境进行了验证,带有预取策略的行缓冲结构较原来的结构平均提升了12.4%.  相似文献   

9.
针对动态可重构处理器的配置信息加载延时,提出了一种基于神经网络的可扩展的重构指令预取机制.增加感受器的历史指令信息,并结合感受器权重构建新型的感受器模型,通过权重与历史指令信息的协同训练学习重构指令调用规律.在处理器运行过程中,提前完成对后续重构指令的预测及配置信息的预取,隐藏指令重构成本.进一步提出了本方法的可扩展实现框架,神经网络的学习结果作为重构指令的关联信息,被移至内存并分布式存储.在重构指令预取时,完成对神经网络学习信息的加载.实验结果表明,该方法对重构指令的预测准确率达91%,综合性能平均提升40%.  相似文献   

10.
在多核处理器中,硬件预取技术是解决存储墙问题的主要技术之一,是对高速缓冲寄存器的优化.但是现有的预取技术大多只考虑内存密集型程序的性能优化,而忽视了非内存密集型程序因预取而受到的干扰.针对这个问题,本文提出基于分类的预取感知缓存分区机制,利用自适应预取控制和缓存分区技术,可以动态调整预取的激进程度和合理分配共享缓存,该机制使用Champsim进行仿真实验.实验结果表明该机制可以有效提高非内存密集型程序的吞吐量,减少核间干扰,提高系统的性能和公平性.  相似文献   

11.
大数据分析应用往往采用基于大型稀疏图的遍历算法,其主要特点是非规则数据密集访存。以频繁使用的具有大型稀疏图遍历特征的介度中心算法为例,提出一种基于帮助线程的多参数预取控制模型和参数优化方法,从而达到提高非规则数据密集程序性能的目的。在商用多核平台Q6600和I7上运用该方法后,介度中心算法在不同规模输入下平均性能加速比分别为1.20和1.11。实验结果表明,帮助线程预取能够有效提升该类非规则应用程序的性能。  相似文献   

12.
通过分析研究现有流媒体缓存管理算法和用户的访问行为特征,提出了一种新的基于选择性马尔可夫模型的缓存预取策略.该策略通过序列合并方法对用户访问拖曳行为进行建模,采用状态剪枝优化方法FP_Vlike得到选择性马尔可夫模型FPMM_Vlike,并在此之上结合替换算法LRU-2构建出一种流媒体代理服务器缓存预取机制FPVlike_LRU_2.仿真结果表明,在访问延时降低量方面,FPVlike_LRU-2要比FP_LRU-2、SP_LRU-2、LRU-2分别高出10%、12%、17%,且在最佳的情况下该值能够达到60%以上.  相似文献   

13.
We present a high performance cache structure with a hardware prefetching mechanism that enhances exploitation of spatial and temporal locality. Temporal locality is exploited by selectively moving small blocks into the direct‐mapped cache after monitoring their activity in the spatial buffer. Spatial locality is enhanced by intelligently prefetching a neighboring block when a spatial buffer hit occurs. We show that the prefetch operation is highly accurate: over 90% of all prefetches generated are for blocks that are subsequently accessed. Our results show that the system enables the cache size to be reduced by a factor of four to eight relative to a conventional direct‐mapped cache while maintaining similar performance.  相似文献   

14.
In this paper, we present an energy-aware informed prefetching technique called Eco-Storage that makes use of the application-disclosed access patterns to group the informed prefetching process in a hybrid storage system (e.g., hard disk drive and solid state disks). Since the SSDs are more energy efficient than HDDs, aggressive prefetching for the data in the HDD level enables it to have as much standby time as possible in order to save power. In the Eco-Storage system, the application can still read its on-demand I/O reading requests from the hybrid storage system while the data blocks are prefetched in groups from HDD to SSD. We show that these two steps can be handled in parallel to decreases the system’s power consumption. Our Eco-Storage technique differs from existing energy-aware prefetching schemes in two ways. First, Eco-Storage is implemented in a hybrid storage system where the SDD level is more energy efficient. Second, it can group the informed prefetching process and quickly prefetch the data from the HDD to the SSD to increase the frequent HDD standby times. This will makes the application finds most of its on-demand I/O reading requests in the SSD level. Finally, we develop a simulator to evaluate our Eco-Storage system performance. Our results show that our Eco-Storage reduces the power consumption by at least 75 % when compared with the worst case of non-Eco-Storage case using a real-world I/O trace.  相似文献   

15.
Even though user generated video sharing sites are tremendously popular, the experience of the user watching videos is often unsatisfactory. Delays due to buffering before and during a video playback at a client are quite common. In this paper, we present a prefetching approach for user-generated video sharing sites like YouTube. We motivate the need for prefetching by performing a PlanetLab-based measurement demonstrating that video playback on YouTube is often unsatisfactory and introduce a series of prefetching schemes: (1) the conventional caching scheme, which caches all the videos that users have watched, (2) the search result-based prefetching scheme, which prefetches videos that are in the search results of users' search queries, and (3) the recommendation-aware prefetching scheme, which prefetches videos that are in the recommendation lists of the videos that users watch. We evaluate and compare the proposed schemes using user browsing pattern data collected from network measurement. We find that the recommendation-aware prefetching approach can achieve an overall hit ratio of up to 81%, while the hit ratio achieved by the caching scheme can only reach 40%. Thus, the recommendation-aware prefetching approach demonstrates strong potential for improving the playback quality at the client. In addition, we explore the trade-offs and feasibility of implementing recommendation-aware prefetching.  相似文献   

16.
Web预取技术和缓存技术对缓解访问延迟有一定的作用,但各有利弊。这.里将预取技术与语义缓存技术相结合,对用户查询的访问频率进行实时监测,并通过多项式回归算法对用户的下一周期访问概率进行预测。采用基于多项式回归预取技术构建的预测模型,可以实现动态在线预测,既可避免兴趣漂移引起的预取不确定性,又可以减少历史信息的存储量,科学合理地解决Web访问延迟的问题。  相似文献   

17.
The real-time streaming of bursty continuous media, such as variable-bit rate encoded video, to buffered clients over networks can be made more efficient by collaboratively prefetching parts of the ongoing streams into the client buffers. The existing collaborative prefetching schemes have been developed for discrete time models, where scheduling decisions for all ongoing streams are typically made for one frame period at a time. This leads to inefficiencies as the network bandwidth is not utilized for some duration at the end of the frame period when no video frame ldquofitsrdquo into the remaining transmission capacity in the schedule. To overcome this inefficiency, we conduct in this paper an extensive study of collaborative prefetching in a continuous-time model. In the continuous-time model, video frames are transmitted continuously across frame periods, while making sure that frames are only transmitted if they meet their discrete playout deadlines. We specify a generic framework for continuous-time collaborative prefetching and a wide array of priority functions to be used for making scheduling decisions within the framework. We conduct an algorithm-theoretic study of the resulting continuous-time prefetching algorithms and evaluate their fairness and starvation probability performance through simulations. We find that the continuous-time prefetching algorithms give favorable fairness and starvation probability performance.  相似文献   

18.
蒋亚军  杨震伦 《电信科学》2011,27(5):104-109
VOD代理服务器的节目预取方法决定了园区网VOD系统的整体运行效率。提出一种节目预取模型,采用BP神经网络构建分类器并对VOD节目进行分类,再根据分类结果采用基于分组的方法实现代理服务器的节目预取。模型中引入遗传算法对已建立的分类模型进行改进,以克服局部极小值问题。仿真实验表明,该预取模型具有较高的命中率,能有效提高代理服务器的利用率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号