首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
以操作系统为中心的存储一致性模型--线程一致性模型   总被引:3,自引:0,他引:3  
分布共享存储系统为保证程序的正确执行,必须通过存储一致性模型对共享存储访问顺序加以限制,而现有模型在可扩展性和操作系统级实现方面存在不足。结合多线程的特点,提出了一种以操作系统为中心的线程一致性模型,通过并行程序执行过程中线程状态的变化来观察和限制存储访问事件的正确顺序,有利于系统的可扩展性、一致性维护信息获取的方便性和完备性以及操作系统本身的设计和实现。分别从模型的定义、正确性证明、实现方案和性能分析等几个方面展开了论述。  相似文献   

2.
支持多核并行程序确定性重放的高效访存冲突记录方法   总被引:2,自引:0,他引:2  
多核系统中并行程序执行过程的不确定性给程序调试带来了很大的困难.准确记录初始执行中冲突访存的次序是并行程序确定性重放的基础.提出了通过建立精确happens-before关系记录访存冲突的方法.此方法利用简洁高效的地址冲突检测机制确定冲突访存操作在执行中所处happens-before序关系的位置,可以抑制部分记录信息的产生,从而有效减少记录信息.与其他方式方法相比,可以进一步压缩17%的记录条数.采用逻辑向量时钟描述冲突访存操作间的happens-before关系,与采用标量时钟相比,可以避免happens-before关系的误识,降低重放执行时并行度的损失.  相似文献   

3.
戴华东  杨学军 《计算机学报》2002,25(12):1387-1396
存储一致性模型对共享存储系统的正确性,性能以及程序的复杂性都有重要的影响,该文立足于分布共享存储系统,提出了一种新的存储一致性模型框架-S^3C框架,该框架通过同步点的概念来描述不同模型正确的存储访问事件顺序;通过一致性维护点的概念,对同一模型的不同实现方式也能够进行区别和比较,结合S^3C框架,该文提出一种以操作系统为中心的线程一致性模型,并针对以顺序一致性模为代表的存储一致性模型的正确实现进行了论述。  相似文献   

4.
利用数据预取机制降低块执行模型的访存延迟   总被引:1,自引:0,他引:1  
块执行模型通过将串行程序划分成一系列可并行执行的指令块来挖掘应用中潜在的指令级并行性.访存延迟是阻碍块执行模型提高指令级并行性的主要因素之一,而数据预取技术在传统执行模型中可有效降低访存延迟,对块执行模型也同样具有较强的适应性.本文分析了在块执行模型中引入数据预取机制的可行性,并从cache命中率、访存指令的延迟等方面验证了数据预取在块执行模型中的作用,仿真结果表明数据预取可有效降低块执行模型中的访存延迟.  相似文献   

5.
结合访存失效队列状态的预取策略   总被引:1,自引:0,他引:1  
随着存储系统的访问速度与处理器的运算速度的差距越来越显著,访存性能已成为提高计算机系统性能的瓶颈.通过对指令Cache和数据Cache失效行为的分析,提出一种预取策略--结合访存失效队列状态的预取策略.该预取策略保持了指令和数据访问的次序,有利于预取流的提取.并将指令流和数据流的预取相分离,避免相互替换.在预取发起时机的选择上,不但考虑当前总线是否空闲,而且结合访存失效队列的状态,减小对处理器正常访存请求的影响.通过流过滤机制提高预取准确性,降低预取对访存带宽的需求.结果表明,采用结合访存失效队列状态的预取策略,处理器的平均访存延时减少30%,SPEC CPU2000程序的IPC值平均提高8.3%.  相似文献   

6.
提出了一种新型多素数嵌入式存储系统,能够显著改善系统跨步访问的性能。提高跨步访存的带宽,对于改善系统的整体性能有着重要的意义。但是,在嵌入式系统中,受片外结构的尺寸限制,直接应用经典的素数存储系统理论无法显著改善跨步访存性能。为此,该新型系统以素数存储系统理论为基础,引入主存访问调度策略并结合嵌入式系统的实际结构特征,构造了一种两层结构的多素数存储系统,可以用较少数量的存储模块实现,而且从逻辑地址到物理地址的映像计算简单,能够以相对较小的硬件代价实现对嵌入式存储系统跨步访问的有效支持。理论分析和实验结果均证实了该系统的正确性和有效性。  相似文献   

7.
尹孟嘉  许先斌  熊曾刚  张涛 《计算机科学》2015,42(12):13-17, 22
性能评价和优化是设计高效率并行程序必不可少的重要工作,存储系统的性能高低直接影响到处理器的整体性能。利用GPGPU-Sim对GPU的存储层次结构进行了模拟,找出了SM数量与存储控制器数量之间最佳配置关系。矩阵乘法是科学计算领域中的基本组成部分,是一种具有计算和访存密集特点的典型应用,其性能是GPU高性能计算的一个重要指标。性能模型作为并行系统性能评价的新的技术解决方案,具有许多其它性能评价方法无法比拟的优势。建立了一个性能模型,模型通过对指令流水线、共享存储器访存、全局存储器访存进行定量分析,找到了程序运行瓶颈,提高了执行速度。实验证明,该模型具有实用性,并有效地实现了矩阵乘法的优化。  相似文献   

8.
熊劲  李国杰 《计算机学报》1994,17(12):922-929
共享存储多处理机系统中,存储子系统的性能是影响整个系统性能的关键之-。我们通过基于访存地址流的模拟,从缺失率,平均访存时间和总线占用三方面,对共享存储多处理机系统中的两种两组缓存方案做了性能比较,并将它们同没有第二级缓存的情形做了性能比较。  相似文献   

9.
基于程序访存模式的低功耗存储技术   总被引:1,自引:0,他引:1  
与不断提升的计算能力相适应,移动手持设备上的存储系统结构越来越复杂,容量越来越大.这种趋势导致存储系统,主要是片上缓存和主存,在系统总能耗的占比中不断攀升.在当前手持设备多由电池驱动并且电池容量十分有限的情况下,存储系统的低功耗设计就显得十分重要.虽然现有的存储器件提供了一定的硬件节能支持,但是只有与应用程序的访存行为的规律相结合,才能充分发挥硬件的节能潜力.对现有的各种低功耗存储技术进行了梳理和总结,给出程序的访存模式的概念,归纳出访存模式在3个方面的内涵,并进一步详细介绍了程序的访存模式在片上缓存和主存低功耗技术中的应用.最后,展望未来结合访存模式进行低功耗存储系统研发的可能方向.  相似文献   

10.
跨平台系统级虚拟机软件模拟访存操作效率低,严重影响了虚拟机的性能.为提高跨平台虚拟机访存效率,提出了一种使用宿主系统TLB硬件、加速跨平台系统级虚拟机访存地址转换的软硬件协同优化方法.该方法相对于软件访存模拟方法,有效利用了宿主系统的硬件资源,提高了跨平台系统级虚拟机执行访存操作效率.实验结果表明该方法将虚拟机系统的整体性能提高了近15%.提出的方法已实际应用在龙芯系统级跨平台虚拟机中.  相似文献   

11.
1IntroductionSequelltialconsistencyl1listhepopularacceptedcriterionofcorrectexecutioninshared-memorysystems.Itdefinesacorrectexecutionastheonewhoseresultisthesameasiftheoperationsofallprocessorswereexecutedinsomesequentialorder,andtheoperationsofeachindividualprocessoraPpearinthissequenceintheorderspecifiedbytheprogram.Troicalimplemelltationofsequelltialconsistencyrequireseachaccesstobedelayeduntilthe..previousaccessinthesameprocesscompletesl2].Thisisdetrimentaltosystemperformancebecauseitdis…  相似文献   

12.
Traditional implementation of sequential consistency in shared-memory systems requires memory accesses to be globally performed in program order.Based on an event ordering model for correct executions in shared-memory systems,this paper proposes and proves that out-of-order execution does not influence the correctness of an execution providing certain condition is met.Simulation results show that out-of-order execution proposed in this paper is an effective way to improve the performance of a sequentially consistent shared-memory system.  相似文献   

13.
顺序一致共享存储系统中的乱序执行技术——模拟实现   总被引:1,自引:0,他引:1  
在文献(4)中,我们从理论上提出并证明了顺序一致共享存储系统的一种乱序执行方案本文讨论该乱序执行方案的实现策略并建立了一个地址流驱动(Trace-driven)的模拟模型来评估乱序执行对性能的影响,模拟结果表明,乱序执行能有效地提高顺序一致共享存储系统的性能。  相似文献   

14.
乱序执行是现代微处理器设计中普遍采用的提高流水线性能的方法,但乱序执行并乱序退出的全乱序结构在超标量处理器中应用并不普遍,这种全乱序的结构对基于参考模型的处理器正确性验证提出了巨大的挑战。主要介绍了从处理器的程序行为是否正确的最终标准——程序员可见的结构变量按程序行为进行顺序变化的角度对全乱序结构的处理器验证提出了一种全新的解决方法。  相似文献   

15.
To scale up to high-end configurations, shared-memory multiprocessors are evolving towards Non Uniform Memory Access (NUMA) architectures. In this paper, we address the central problem of load balancing during parallel query execution in NUMA multiprocessors. We first show that an execution model for NUMA should not use data partitioning (as shared-nothing systems do) but should strive to exploit efficient shared-memory strategies like Synchronous Pipelining (SP). However, SP has problems in NUMA, especially with skewed data. Thus, we propose a new execution strategy which solves these problems. The basic idea is to allow partial materialization of intermediate results and to make them progressivly public, i.e., able to be processed by any processor, as needed to avoid processor idle times. Hence, we call this strategy Progressive Sharing (PS). We conducted a performance comparison using an implementation of SP and PS on a 72-processor KSR1 computer, with many queries and large relations. With no skew, SP and PS have both linear speed-up. However, the impact of skew is very severe on SP performance while it is insignificant on PS. Finally, we show that, in NUMA, PS can also be beneficial in executing several pipeline chains concurrently.  相似文献   

16.
We have verified the FM9801, a microprocessor design whose features include speculative execution, out-of-order issue and completion of instructions using Tomasulo's algorithm, and precise exceptions and interrupts. As a correctness criterion, we used a commutative diagram that compares the result of the pipelined execution from a flushed state to another flushed state with that of the sequential execution. Like many pipelined microprocessors, the FM9801 may not operate correctly if the executed program modifies itself. We discuss the condition under which the processor is guaranteed to operate correctly. In order to show that the correctness criterion is satisfied, we introduce an intermediate abstraction that records the history of executed instructions. Using this abstraction, we define a number of invariant properties that must hold during the operation of the FM9801. We verify these invariant properties, and then derive the proof of the commutative diagram from them. The proof has been mechanically checked by the ACL2 theorem prover.  相似文献   

17.
Even though there have been strong research activities about distributed virtual shared-memory (DVSM) systems, their architectures have been not widely used in current high-performance computing markets. The reason is that the previously introduced DVSM systems use conventional interconnection technologies like Ethernet, which incurs high execution overhead due to process interruption at data communication for memory consistency. In this paper, we present the DVSM architecture based on the next generation of an interconnection technique, the InfiniBand Architecture (IBA). Because the IBA supports shared-memory programming semantics by means of remote direct-memory access (RDMA) and atomic operations in hardware, we can minimize the communication overhead for memory consistency on the DVSM system. For characterizing multithreaded applications on our IBA-based DVSM system, we examined two different shared-memory programming models, i.e. SPMD and OpenMP benchmarks. We show that our DVSM to use full features of the IBA can improve the performance significantly over the IPoIB-based DVSM system in all benchmarks, and also comparable to the bus-based shared-memory multiprocessor system in some benchmarks.  相似文献   

18.
《Parallel Computing》1997,23(13):1963-1986
Memory conflict is a major phenomenon which may cause dramatic loss of performance in vector pipeline multiprocessors. Various techniques have been proposed and implemented to avoid such conflicts. They rely mostly on well-tuned vector element allocation in memory banks (either using programming tools or hard-wired features). We tackle this problem in another way. Instead of trying to avoid memory contention, we aim to enhance the performance of the memory system by scheduling vector element accesses in order to increase memory accesses. This scheduling depends on the memory bank activity when an access is issued, leading to out-of-order access to vector elements. An out-of-order pipeline execution is associated with this out-of-order memory access in order to maintain the processor functional unit chaining. In this paper we study some factors influencing this execution model: vector length, number of processors and number of banks. An analysis of this model using the Markov chain technique and simulation results are also presented. They show the importance of this model in comparison with the classical one encountered in pipelined vector supercomputers.  相似文献   

19.
We present an approach for formally verifying that a high-level microprocessor model behaves as defined by an instruction-set architecture. The technique is based on a specialization of self consistency called incremental flushing and reduces the need and effort required to create manually-generated implementation abstractions. Additionally, incremental flushing reduces the computational complexity of the proof obligations generated when reasoning about out-of-order execution. This is accomplished by comparing the functional behavior of the implementation abstraction over two sets of inputs: one that represents normal operation and one that is simpler, but functionally equivalent. The approach is illustrated on a simple out-of-order microprocessor core.  相似文献   

20.
A Framework of Memory Consistency Models   总被引:2,自引:1,他引:2       下载免费PDF全文
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号