首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Cache coherence enforcement and memory latency reduction and hiding are very important and challenging problems in the design of large-scale distributed shared-memory (DSM) multiprocessors. We propose an integrated approach to solve these problems through a compiler-directed cache coherence scheme called the Cache Coherence with Data Prefetching (CCDP) scheme. The CCDP scheme enforces cache coherence by prefetching the potentially stale references in a parallel program. It also prefetches the non-stale references to hide their memory latencies. To optimize the performance of the CCDP scheme, some prefetch hardware support is provided to efficiently handle these two forms of data prefetching operations. We also developed the compiler techniques utilized by the CCDP scheme for stale reference detection, prefetch target analysis, and prefetch scheduling. We evaluated the performance of the CCDP scheme via execution-driven simulations of several numerical applications from the SPEC CFP95 and the Perfect benchmark suites. The simulation results show that the CCDP scheme provides significant performance improvements for the applications studied, comparable to that obtained with a full-map hardware cache coherence scheme.  相似文献   

针对智能终端数据共享中的网络延迟问题,本文提出一种两阶段,主动预取与被动预取相结合的数据预取缓存方法,减少网络延迟,提高用户体验。该方法利用网络空闲时间预取数据,减少用户等待时间;通过两阶段预取策略减少网络带宽消耗;通过主被动配合的预取算法来预取数据,提高预取准确率和预取效率;通过一种权重更新函数来更新客户端的缓存,减少对智能终端存储空间的消耗。实验表明使用此方法能减少用户等待时间58.2%,预取命中率为92%,带来的带宽损耗小于5%。  相似文献   

随着片上多处理器系统核数的增加,当前一致性协议上存在的许多问题使共享存储系统复杂而低效.目前一些一致性协议极其复杂,例如MESI(modified exclusive shared or invalid)协议,存在众多的中间状态和竞争.并且这些协议还会导致额外失效通信,以及大量记录共享信息的目录存储开销(目录协议)或广播消息的网络开销(监听协议).对数据无竞争的程序实现了一种简单高效一致性协议VISU(valid/invalid states based on self-updating),这种协议基于自更新操作(self-updating)、只包含2个稳定状态(valid/invalid).所设计的两状态VISU协议消除了目录和间接事务.首先基于并行编程的数据无竞争(data race free, DRF)模型,采用在同步点进行自更新共享数据来保证正确性.其次利用动态识别私有和共享数据的技术,提出了对私有数据进行写回、对共享数据进行写直达的方案.对于私有数据,简单的写回策略能够简化不必要的片上通信.在L1 cache中,对于共享数据的写直达方式能确保LLC(last level cache)中数据最新从而消除了几乎所有的一致性状态.实现的VISU协议开销低、不需要目录、没有间接传输和众多的一致性状态,且更加容易验证,同时获得了与MESI目录协议几乎相当甚至更优的性能.  相似文献   

共享存储系统中如何高效地实现高速缓存一致性是体系结构设计面临的一个关键问题和难点问题.已有的基于目录的协议存在难于实现、验证复杂和存储空间开销大等问题.面向片上众核处理器,文中提出一种由硬件结构支持、基于同步的高速缓存一致性协议.该方案不使用目录,而是通过使用bloom-filter表示一致性信息,并在并行程序中的同步点维护高速缓存一致性.与现有的基于目录的高速缓存一致性协议相比,该方案可以降低目录协议的实现、验证复杂度.用SPLASH一2测试程序集评估表明,基于同步的协议可以获得与基于目录的协议相当的性能.  相似文献   

移动环境下支持实时事务处理的数据预取   总被引:5,自引:0,他引:5  
随着移动通信技术的迅速发展,人们提出了新的应用要求:在移动环境下处理实时事务.而移动通信带宽有限性引起较大的数据访问延迟,有时甚至由于网络传输的断接使得事务得不到所需要的数据,数据预取能够很好地解决这个问题.已有的移动环境下数据预取没有考虑到数据的流行性和事务的时间特性.该文分析影响实时事务数据预取的因素,首先考虑数据易变性、活跃性等因素,获得高价值预取数据集合;然后考虑访问预取数据的事务优先级、数据流行性等因素,构造预取数据的选择函数,通过该函数在前面选取的集合中筛选出对满足实时事务截止期更有价值的数据对象进行预取.实验表明,该数据预取策略能降低移动实时事务满足截止期的比率,更好地支持移动实时事务处理.  相似文献   

The increasing gap in performance between processors and main memory has made effective instructions prefetching techniques more important than ever. A major deficiency of existing prefetching methods is that most of them require an extra port to I-cache. A recent study by Rivers et al. [19] shows that this factor alone explains why most modern microprocessors do not use such hardware-based I-cache prefetch schemes. The contribution of this paper is two-fold. First, we present a method that does not require an extra port to I-cache. Second, the performance improvement for our method is greater than the best competing method BHGP [23] even disregarding the improvement from not having an extra port. The three key features of our method that prevent the above deficiencies are as follows. First, late prefetching is prevented by correlating misses to dynamically preceding instructions. For example, if the I-cache miss latency is 12 cycles, then the instruction that was fetched 12 cycles prior to the miss is used as the prefetch trigger. Second, the miss history table is kept to a reasonable size by grouping contiguous cache misses together and associated them with one preceding instruction, and therefore, one table entry. Third, the extra I-cache port is avoided through efficient prefetch filtering methods. Experiments show that for our benchmarks, chosen for their poor I-cache performance, an average improvement of 9.2% in runtime is achieved versus the BHGP methods [23], while the hardware cost is also reduced. The improvement will be greater if the runtime impact of avoiding an extra port is considered. When compared to the original machine without prefetching, our method improves performance by about 35% for our benchmarks.  相似文献   

对于能量供应有限制的硬实时多核系统,最差情况下的能量消耗WCEC(Worst-Case Energy Consumption)是一个非常关键的问题。随着芯片工艺的发展,顺序指令预取技术可以减少缓存WCEC。为了提高指令预取的最差情况下的节能效率,提出结合指令预取和共享缓存划分的硬实时多核系统缓存WCEC优化方法。该方法通过线性规划方程ILP(Integer-Linear Programing)为每个核分配L2缓存划分因子和调整每个硬实时子任务的指令预取度,在保证硬实时系统满足时间截止期的情况下,最小化其缓存WCEC。对DEBIE系统进行实例分析,实验结果表明优化方法是有效的,在保证系统满足时间截止期的情况下,优化后的缓存WCEC比没有指令预取优化的缓存WCET平均减少了22.5%。  相似文献   

Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory.With hardware and/or software support,data prefetching brings data closer to a processor before it is actually needed.Many prefetching techniques have been developed for single-core processors.Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching t...  相似文献   

不同的Cache预取策略适用于不同的存取模式。本文介绍了存储系统Cache预取技术的研究现状,从分析存取模式出发,构造了存取模式三元组模型,并在磁盘阵列上测试了适 用于复杂环境下的Cache预取自适应策略,结果证明,自适应策略能够在不同环境上获得磁盘阵列的最优性能。  相似文献   

Emerging multiprocessor architectures such as chip multiprocessors, embedded architectures, and massively parallel architectures, demand faster, more efficient, and more scalable cache coherence schemes. In devising more cost-efficient schemes, formal insights into a system model is deemed useful. We, in this paper, build formalisms for execution in cache based Distributed shared-memory multiprocessors (DSM) obeying Release Consistency model, and derive conditions for cache coherence. A cost-efficient cache coherence scheme without directories is designed. Our approach relies on processor directed coherence actions, which are early in nature. The scheme exploits sharing information provided by a programmer-centric framework. Per-processor coherence buffers (CB) are employed to impose coherence on live shared variables between consecutive release points in the execution. Simulation of 8 entry 4-way associative CB based system achieves a speedup of 1.07–4.31 over full-map 3-hop directory scheme for six of the SPLASH-2 benchmarks.  相似文献   

针对现代计算机系统中的存储墙问题,提出一种适合于链式数据结构的数据预取方法——纯遍历推送方法。采用基于共享高速缓存的多核处理器平台CMP上的多线程技术,在主程序运行时分离出一个推送线程,由其将主线程需要的数据提前预取至处理器共享高速缓存中以隐藏主线程的存储器延迟。实验结果证明该方法在CMP架构下对以链式结构为主的内存受限程序的性能有一定的改进。  相似文献   

编译器在静态分析方式下很难对程序的非线性规律访存操作进行正确的数据预取.但采用profiling技术可以得到程序运行时候的访存规律,利用这些信息可以精确地插入数据预取指令.基于stride profiling技术,提出了新的信息收集类型stride iterative,更精确地反映程序执行时访存指令的实际行为,并结合别名分析的结果调整对同一cache行的数据预取,得到比普通数据预取更好的预取性能.安腾2上运行CPU2000的12个整型测试例子平均有8.54%的性能提升,其中mcf性能提升达到了77.87%.  相似文献   

Model Checking Data Consistency for Cache Coherence Protocols   总被引:1,自引:0,他引:1       下载免费PDF全文
A method for automatic verification of cache coherence protocols is presented, in which cache coherence protocols are modeled as concurrent value-passing processes, and control and data consistency requirement are described as formulas in first-orderμ-calculus. A model checker is employed to check if the protocol under investigation satisfies the required properties. Using this method a data consistency error has been revealed in a well-known cache coherence protocol. The error has been corrected, and the revised protocol has been shown free from data consistency error for any data domain size, by appealing to data independence technique.  相似文献   

A multiprocessor envirorLment may encounter many problems such as deadlock, load balancing and cache coherence. However, the latter is considered the most dangerous if not properly designed, the system works naturally but generates inaccurate results. This occurs if obsolete versions of a memory block are used. Users may not be aware of the presence of such problem. Two main approaches are known to maintain data consistency: namely, snoopy and directory-based protocols. Each approach has its advantages and limitations. This paper proposes a new technique that considers both previously mentioned approaches. The network architecture is slightly updated by adding an index table to each processor. The proposed protocol is expected to reduce the access time, decrease the number of accesses to main memory, maintain data consistency, and assure the usage of the most recent value of a shared variable.  相似文献   

HA-DMDB基于数据模型趋势分析的Cache一致性管理策略   总被引:1,自引:0,他引:1  
随着嵌入式微处理器、存储设备、无线通信技术的迅速发展,出现了具有无线通信能力的小体积的无线传感器(wireless sensor).将大量这样的传感器通过无线通信方式组成传感器网络,可以进行大范围的分布式系统的应用.提出了一种传感器网络的综合应用系统体系结构和模型,并提出了基于数据模型趋势分析的Cache一致性管理策略.  相似文献   

基于CC-NUMA结构的DSM多处理器系统是大规模高性能并行计算机的一个实现方式,由于比监听协议具有更好的扩展性,系统多采用基于目录的Cache一致性协议。但是,随着系统规模的不断扩大,目录协议同样面临着可扩展性的问题。本文在分析影响目录协议可扩展性因素的基础上,对当前比较典型的几种目录组织形式从存储开销方面进行了讨论,最后提出了基于目录Cache的两级目录组织方案。  相似文献   

论述了Cache在高性能计算机系统中的作用和访问Cache的过程,以及Cache数据一致性问题和解决的方法,介绍和分析了PCI协议对Cache的支持。  相似文献   

褚瑞  卢锡城  肖侬 《软件学报》2006,17(11):2234-2244
内存网格(RAM(random access memory) grid)是一种面向广域网上内存资源共享的新型网格系统.它的主要目标是在物理内存不足的情况下,提高内存密集型应用或IO密集型应用的系统性能.内存网格的应用效果取决于网络通信开销.在减少或隐藏网络通信开销的情况下,其性能可以进一步提高.通过对内存网格的分析,设计了一种基于"推"数据的内存网格预取机制.借助数据挖掘领域中序列模式挖掘的方法,提出了相应的预取算法.通过基于真实运行状态的模拟,对预取算法进行了评估和验证.  相似文献   

周琰 《计算机系统应用》2013,22(10):124-128
Godson-T缓存一致性协议是用于Godson-T众核处理器的缓存一致性协议.在Godson-T协议中,缓存一致性协议和存储一致性模型存在紧密的紧耦合关系,分析协议的一致性时发现该协议满足的缓存一致性不是强一致性,不满足传统意义上缓存透明的一致性要求.我们选取了Murphi模型检测工具作为我们建模的语言和验证工具.在对Godson-T缓存一致性协议建模的时候,由于协议的上述特点,我们需要对处理器核结点,高速缓存和内存作为一个整体建模,并成功地验证了协议的相关性质.  相似文献   

提出了一种通过查找缓存一致性协议不变量来验证带参协议正确性的新方法.缓存一致性协议验证的难点在于必须证明协议对于任意大小的带参系统都成立.我们通过寻找不变量和协议规则之间的对应关系来计算辅助不变量,从而帮助推导验证缓存一致性协议.我们设计实现了一个不变量查找工具并将该工具应用到German协议上计算它们的辅助不变量并成功地验证了协议的安全性质.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号