首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
“存储墙”问题已经成为处理器性能提升的主要障碍,而处理器内核猜测执行预测路径上访存指令时预载入的存储器数据所导致Cache污染会严重影响处理器性能.本文提出一种针对猜测执行过程中预载入数据的Cache污染控制方法CSDA.首先,利用置信度评估技术从所有预测路径中分离出错误概率较大的路径.然后,根据低置信度污染型访存指令识别历史表将低置信度预测路径上的访存指令划分为预取型和污染型,为污染型的访存指令建立低优先级Load/Store队列,并采用污染数据Cache存储污染数据.仿真结果表明,在双核模式下,CSDA策略相对于baseline结构来说,L1 D-Cache缺失率降低幅度从9%-23%,平均降低了17%;L2 Cache缺失率的下降范围从1.02%-14.39%,平均为5.67%;IPC的提升幅度从0.19% -5.59%,平均为2.21%.  相似文献   

2.
制造工艺的快速进步给集成电路设计提供了广阔的空间,而发展较慢的设计能力导致难以对片上资源高效利用。目前,高性能处理器片上Cache普遍占到芯片总面积的一半以上,而如何高效、智能地利用片上Cache空间,构建高性能存储系统是处理器微体系结构研究的重要内容。分析了Cache数据污染和猜测执行对处理器性能的影响,并在此基础上提出一种基于数据Tag有效位分裂的无污染Cache访问控制技术-Pease,将原先D-Cache Tag中的一位数据有效位扩展为读数据有效位(RVB)和写数据有效位(WVB)两位,根据RVB和WVB值的不同组合对数据读写访问进行控制。不但充分保留了猜测执行的数据预取性,使污染数据透明化,写入数据时无需对污染数据进行替换操作,消除了污染数据对Cache效率的影响。Pease技术相对于baseline结构来说,IPC的提升幅度为1.05%~8.40%,平均提升4.04%;L1 D-Cache缺失率降低幅度为19.05%~48.16%,平均降低29.66%。  相似文献   

3.
非一致Cache体系结构(Non-Uniform Cache Architecture,NUCA)几乎已经成为未来片上大容量Cache的设计趋势.非一致Cache中,数据提升技术通过将经常访问的数据放置在距离处理器较近的Cache bank中减少处理器对该数据访问的等待时间,对NUCA的性能有着重要影响.然而,目前已有的数据提升技术使用固定的提升策略,投有考虑所要提升到目标bank的实际状态,容易将目标bank中更有用的数据"挤"得远离处理器,从而产生Cache污染问题,严重制约了提升技术的性能发挥.针对这一问题,文中提出智能多跳提升技术.智能多跳提升技术能够感知候选目标bank的状态,为被提升的数据动态地选择合适的目标bank,从而提高了提升效率,减少了Cache污染.同时,智能多跳提升技术的设计巧妙地利用了处理器访问的反向路径,只是简单地扩充了处理器访问报文的格式,并没有增加对Cache bank的额外访问.最后使用全系统模拟器对来自NAS Parallel Benchmark和Livermore Benchmark的15个基准测试程序进行了详细测试,智能多跳提升技术单位提升操作节省的时钟周期数是已有提升技术的1.50倍,最多达到2.61倍;系统的IPC性能平均提高了6.24%,最高达到19.03%.  相似文献   

4.
分支预测允许处理器并行执行分支之后的指令,由于其高准确率具有性能和功耗方面的双重好处,是一项重要的处理器优化技术.根据分而治之的策略,返回地址栈(return-address stack,RAS)将过程返回类分支单独分出并予以预测.其中,RAS利用过程调用和返回的后入先出规则,可通过猜测执行中调用栈的模拟准确预测返回地址.但是,由于实际处理器猜测执行带来的错误路径污染,该结构需要通过恢复机制来保障所存储数据的准确性.尤其在对面积资源敏感的嵌入式领域,设计者需要在准确率和恢复机制的开销间进行细致的权衡.针对RAS存储中的冗余,通过溢出检测结合传统栈、持久化栈和后备预测3种预测方式,提出一种基于持久化栈的返回地址预测器——混合返回地址栈(hybrid return-address stack,HRAS),避免错误路径污染和对返回地址的冗余存储,从而有效降低返回误预测率.与此同时,设计解耦传统栈和持久化栈,进一步降低其面积需求.根据SPEC CPU 2000基准测试以及设计编译器的评估结果,HRAS可利用仅1.1×104μm2的设计面积将过程返回的...  相似文献   

5.
多核数字信号处理器(DSP)的性能常常受限于共享存储的长延迟Cache一致性访问.数据前向(forwarding)技术是隐藏长延迟访问的一种有效手段.根据多核DSP应用的两类重要特征,提出了一种面向共享存储多核DSP结构的数据流分簇前向技术DSCF(data stream clustered forwarding).DSCF方法的主要特点是:兼容基本的共享存储Cache一致性协议;不污染目标Cache;数据的传输速度能够与消费速度相匹配;系统结构的可扩展性好.典型测试程序的模拟评测表明,采用DSCF方法能够将Cache一致性失效率平均降低44%,将系统总体性能提升30%~70%.  相似文献   

6.
处理器性能的提升依赖于对存储系统性能的挖掘.随着片上集成内核数量的不断增大和特征尺寸的持续缩小,延迟、存储可扩展的Cache一致性协议已经成为提升访存效率的关键性因素.文中提出一种基于节点预测的直接Cache一致性协议-NPP协议,研究一致性交互延迟隐藏和目录存储开销减少技术.针对读、写缺失中存在的间接性问题和现有解决方案破坏已有数据局部性、无法获得最近数据副本等问题,分别提出节点挂起技术和直接写缺失处理技术,有效隐藏了目录访问延迟.为了实现准确的节点预测,作者还提出基于"签名"回收的历史信息更新算法,避免了冗余更新和不完整更新.使用SPLASH-2测试程序集,在基于2D MESH NoC互联的64核CMP下,相对于全映射目录协议,NPP协议的平均执行时间降幅为21.78%~31.11%;平均读缺失延迟降低14.22%~18.9%;平均写缺失延迟降低17.89%~21.13%.而获得上述性能提升的代价是网络流量平均增加6.62%~7.28%.  相似文献   

7.
针对嵌入式处理器中日益明显的指令Cache漏功耗,提出了一种基于当前指令状态标志位的分支预测和返回目标寄存器映射的昏睡子块唤醒方法;该方法根据处理器执行过程中指令状态位提前判断分支指令的目标子块,同时设计了一种返回地址目标寄存器映射的结构,提前判断函数返回指令的目标子块。在消除唤醒延迟带来的性能损失基础上,提高了处理器的性能;通过实验对比,该方法可以减小36%的指令Cache静态功耗,同时处理器性能平均有13%的提高。  相似文献   

8.
王冶  张盛兵  王党辉 《计算机工程》2012,38(1):268-269,272
为降低微处理器中片上Cache的能耗,设计一种基于预缓冲机制的指令Cache。通过预缓冲控制部件的预测,使处理器需要的指令尽可能在缓冲区命中,从而避免访问指令Cache所造成的功耗。对7个测试程序的仿真结果表明,预缓冲机制能节省23.23%的处理器功耗,程序执行性能平均提升7.53%。  相似文献   

9.
DOOC:一种能够有效消除抖动的软硬件合作管理Cache   总被引:3,自引:0,他引:3  
作为弥补处理器和主存之间速度巨大差异的桥梁,Cache已经成为现代处理器中不可或缺的一部分.经研究发现.传统Cache单独使用硬件进行管理,使用固定的Cache策略和一致性协议难以适应程序中数据访存模式的多样性,容易造成Cache抖动,以致影响性能,提出了一种新的软硬件合作管理Cache--面向数据对象Cache(data-obiect oriented cache,DOOC).DOOC动态地为程序中的数据对象分配Cache段,并且动态变化段容量、段内相联度、块大小和一致性协议,从而适应数据访存模式的多样性,还介绍了DOOC软件管理的编译方法以及面向数据对象的预取机制.分别使用CACTI和基于LEON3处理器的实验平台对DOOC的硬件开销进行评估.验证了DOOC的硬件可实现性,还使用软件模拟的方式分别测试了DOOC在单核和多核处理器平台上的性能.在单核处理器上对15个基准测试程序的评测结果表明.与传统Cache相比,DOOC失效率平均降低44.98%(最大降低93.02%),平均加速比为1.20(最大为2.36).同时.通过在4核处理器平台上运行NPB的OpenMP版本测试程序,失效率平均降低49.69%(最大降低73.99%).  相似文献   

10.
方娟  郭媚  杜文娟 《计算机科学》2013,40(8):34-37,42
针对片上多核处理器下的二级共享Cache的能耗问题提出了基于Cache划分的路预测Cache结构WPP-L2,该结构首先对共享Cache进行公平性划分,然后采用路预测的方法降低了预测命中和失效时各自的能耗开销。实验表明,在基本保持多核处理器性能的同时,8核处理器系统下WPP-L2Cache比基于路预测的L2Cache的能耗延迟乘积EDP(Energy Delay Product)平均下降24.7%,比传统的L2Cache的EDP平均下降66.1%,极大地降低了L2Cache功耗。  相似文献   

11.
Abstract This paper describes an approach to the design of interactive multimedia materials being developed in a European Community project. The developmental process is seen as a dialogue between technologists and teachers. This dialogue is often problematic because of the differences in training, experience and culture between them. Conditions needed for fruitful dialogue are described and the generic model for learning design used in the project is explained.  相似文献   

12.
European Community policy and the market   总被引:1,自引:0,他引:1  
Abstract This paper starts with some reflections on the policy considerations and priorities which are shaping European Commission (EC) research programmes. Then it attempts to position the current projects which seek to capitalise on information and communications technologies for learning in relation to these priorities and the apparent realities of the marketplace. It concludes that while there are grounds to be optimistic about the contribution EC programmes can make to the efficiency and standard of education and training, they are still too technology driven.  相似文献   

13.
融合集成方法已经广泛应用在模式识别领域,然而一些基分类器实时性能稳定性较差,导致多分类器融合性能差,针对上述问题本文提出了一种新的基于多分类器的子融合集成分类器系统。该方法考虑在度量层融合层次之上通过对各类基多分类器进行动态选择,票数最多的类别作为融合系统中对特征向量识别的类别,构成一种新的自适应子融合集成分类器方法。实验表明,该方法比传统的分类器以及分类融合方法识别准确率明显更高,具有更好的鲁棒性。  相似文献   

14.
Development of software intensive systems (systems) in practice involves a series of self-contained phases for the lifecycle of a system. Semantic and temporal gaps, which occur among phases and among developer disciplines within and across phases, hinder the ongoing development of a system because of the interdependencies among phases and among disciplines. Such gaps are magnified among systems that are developed at different times by different development teams, which may limit reuse of artifacts of systems development and interoperability among the systems. This article discusses such gaps and a systems development process for avoiding them.  相似文献   

15.
This paper presents control charts models and the necessary simulation software for the location of economic values of the control parameters. The simulation program is written in FORTRAN, requires only 10K of main storage, and can run on most mini and micro computers. Two models are presented - one describes the process when it is operating at full capacity and the other when the process is operating under capacity. The models allow the product quality to deteriorate to a further level before an existing out-of-control state is detected, and they can also be used in situations where no prior knowledge exists of the out-of-control causes and the resulting proportion defectives.  相似文献   

16.
Going through a few examples of robot artists who are recognized worldwide, we try to analyze the deepest meaning of what is called “robot art” and the related art field definition. We also try to highlight its well-marked borders, such as kinetic sculptures, kinetic art, cyber art, and cyberpunk. A brief excursion into the importance of the context, the message, and its semiotics is also provided, case by case, together with a few hints on the history of this discipline in the light of an artistic perspective. Therefore, the aim of this article is to try to summarize the main characteristics that might classify robot art as a unique and innovative discipline, and to track down some of the principles by which a robotic artifact can or cannot be considered an art piece in terms of social, cultural, and strictly artistic interest. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

17.
Although there are many arguments that logic is an appropriate tool for artificial intelligence, there has been a perceived problem with the monotonicity of classical logic. This paper elaborates on the idea that reasoning should be viewed as theory formation where logic tells us the consequences of our assumptions. The two activities of predicting what is expected to be true and explaining observations are considered in a simple theory formation framework. Properties of each activity are discussed, along with a number of proposals as to what should be predicted or accepted as reasonable explanations. An architecture is proposed to combine explanation and prediction into one coherent framework. Algorithms used to implement the system as well as examples from a running implementation are given.  相似文献   

18.
This paper provides the author's personal views and perspectives on software process improvement. Starting with his first work on technology assessment in IBM over 20 years ago, Watts Humphrey describes the process improvement work he has been directly involved in. This includes the development of the early process assessment methods, the original design of the CMM, and the introduction of the Personal Software Process (PSP)SM and Team Software Process (TSP){SM}. In addition to describing the original motivation for this work, the author also reviews many of the problems he and his associates encountered and why they solved them the way they did. He also comments on the outstanding issues and likely directions for future work. Finally, this work has built on the experiences and contributions of many people. Mr. Humphrey only describes work that he was personally involved in and he names many of the key contributors. However, so many people have been involved in this work that a full list of the important participants would be impractical.  相似文献   

19.
蒙古语言是中国蒙古族使用的通用语言,由于蒙古文区别于其他文字的书写方式和其自身变形机制等特点,在很多通用的文字处理引擎中都不被支持。在嵌入式产品开发与应用领域中Linux加QTE已经成为流行方式。该文给出了一种在QTE环境上实现基于标准Unicode的蒙古文点阵显示和变形算法, 并自定义了支持蒙古文的QTE组件,扩展了QTE功能,为在Linux加QTE方式的嵌入式体系结构中处理蒙古文提供了一种解决方法。  相似文献   

20.
基于复小波噪声方差显著修正的SAR图像去噪   总被引:4,自引:1,他引:3  
提出了一种基于复小波域统计建模与噪声方差估计显著性修正相结合的合成孔径雷达(Synthetic Aperture Radar,SAR)图像斑点噪声滤波方法。该方法首先通过对数变换将乘性噪声模型转化为加性噪声模型,然后对变换后的图像进行双树复小波变换(Dualtree Complex Wavelet Transform,DCWT),并对复数小波系数的统计分布进行建模。在此先验分布的基础上,通过运用贝叶斯估计方法从含噪系数中恢复原始系数,达到滤除噪声的目的。实验结果表明该方法在去除噪声的同时保留了图像的细节信息,取得了很好的降噪效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号