首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 109 毫秒
1.
顺序一致共享存储系统中的乱序执行技术——基本理论   总被引:1,自引:1,他引:1  
本文首先研究了共享存储系统中的访存事件及其发生次序,从访存事件次序的角度建立了顺序一致性共享存储系统行正确性模型,然后在执行正确性模型的基础上,提出并证明了一种乱序列执行的方案,根据这个方案,只要满足一定条件,取数操作就可以越过它前面的访存操作执行而不影响系统的正确性。  相似文献   

2.
为了提高性能,Java内存模型允许编译器在优化过程中改变代码的执行顺序,同时该技术也会造成共享数据的更新顺序与本来的执行顺序不同。在多线程Java并发程序中,这些代码乱序执行会引起很多难以发现的错误。现有的Java程序模型检测技术并没有考虑这些顺序改变的问题。因此,本文提出了一种建立包含多线程交互及线程内代码乱序执行的完整模型,并利用模型检测工具进行穷举检测的算法。该算法可以发现原有技术无法发现的新问题,更好地检测高可靠性要求的Java并发程序。  相似文献   

3.
戴华东  杨学军 《计算机学报》2002,25(12):1387-1396
存储一致性模型对共享存储系统的正确性,性能以及程序的复杂性都有重要的影响,该文立足于分布共享存储系统,提出了一种新的存储一致性模型框架-S^3C框架,该框架通过同步点的概念来描述不同模型正确的存储访问事件顺序;通过一致性维护点的概念,对同一模型的不同实现方式也能够进行区别和比较,结合S^3C框架,该文提出一种以操作系统为中心的线程一致性模型,并针对以顺序一致性模为代表的存储一致性模型的正确实现进行了论述。  相似文献   

4.
在虚拟分布式共享存储系统(SVM)中,为了保证程序在分布式环境下正确运行,维护存储一致性是关键问题之一,本文提出了一种全新的一致性模型-线程一致性模型(TC),并阐述了基于TC模型的虚拟分布式共享存储系统MTK的实现,线程一致性模型从操作系统内核的角度出发,把程序执行过程中的同步点与线程状态结合起来考虑存储一致 性问题,有利于数据局部性的开发,另外,多线程体系结构的一个显著优势就是能把计算和通信重叠起来,从而有效地隐藏通信延迟,在内核级实现线程一致性模型时,线程 及同步原语(锁、栅栏)都维护一个写记录,同时定义了同构唤醒线程簇。  相似文献   

5.
以操作系统为中心的存储一致性模型--线程一致性模型   总被引:3,自引:0,他引:3  
分布共享存储系统为保证程序的正确执行,必须通过存储一致性模型对共享存储访问顺序加以限制,而现有模型在可扩展性和操作系统级实现方面存在不足。结合多线程的特点,提出了一种以操作系统为中心的线程一致性模型,通过并行程序执行过程中线程状态的变化来观察和限制存储访问事件的正确顺序,有利于系统的可扩展性、一致性维护信息获取的方便性和完备性以及操作系统本身的设计和实现。分别从模型的定义、正确性证明、实现方案和性能分析等几个方面展开了论述。  相似文献   

6.
针对事务存储技术研究中的模拟实验问题,实现了一种专门用于硬件事务存储系统的模拟环境,该模拟环境采用执行驱动模拟方式,支持全系统模拟,利用系统结构模拟器Simics和多核扩展包GEMS实现多核处理器相关部件的功能和性能模拟,在此基础上扩展实现硬件事务存储系统各部件的建模和模拟,以模块化的方法支持多种事务存储系统的模拟实验和性能评价.论文在分析事务存储和系统结构模拟技术的基础上,讨论了事务存储系统模拟环境的设计思路和方案,给出了该模拟环境的组成结构,并通过一种目标事务存储系统结构和一组测试程序对模拟环境进行了实验测试.  相似文献   

7.
由于存储器间距日益扩大 ,存储系统对计算机系统整体性能的影响越来越严重 ,存储系统模拟器的研究与开发也日益重要 .传统的模拟器更多地将注意力集中于对 Cache的模拟 ,而对存储系统整体的模拟不够 .为了模拟并分析存储系统各部分的性能与其对存储系统整体性能的影响 ,本文设计并实现存储系统模拟器 Si Mem Sy(SImulator ofMEMory SYstem) .实验表明 ,Si Mem Sy可以准确、高效地对存储系统进行模拟并得到可信的结果  相似文献   

8.
通过信能不高是影响软件分布式共享存储系统性能的主要因素之一,用户级通信技术能够充分发挥高速网络的硬件性能,减少数据拷贝次数,降低软件件开发销,明显改善了带宽和延迟,为软件分布式共享存储系统性能的提高开避了新的途径,设计并实现了一个面向软件分布式存储系统的用户级通信库,它不仅改善了系统的通禽性能,同时也使得系统的并行计算性能得到改善,从而十分显著地提高了软件分布式共享存储系统的整体性能。  相似文献   

9.
乱序执行是现代微处理器设计中普遍采用的提高流水线性能的方法,但乱序执行并乱序退出的全乱序结构在超标量处理器中应用并不普遍,这种全乱序的结构对基于参考模型的处理器正确性验证提出了巨大的挑战。主要介绍了从处理器的程序行为是否正确的最终标准——程序员可见的结构变量按程序行为进行顺序变化的角度对全乱序结构的处理器验证提出了一种全新的解决方法。  相似文献   

10.
随着数据与系统规模的不断扩大,网络传输成为了键值存储系统的性能瓶颈。同时,远程直接内存访问(RDMA)技术能够支持高带宽和低时延的数据传输,为键值存储系统设计提供了新的思路。结合高性能网络中的RDMA技术,设计并实现了高性能、低CPU负载的键值存储系统Chequer;结合RDMA原语的特性,重新设计了键值存储系统的基本操作工作流程;并设计了基于线性探测的共享hash表,解决客户端缓存失效的问题以及提高hash命中率来减少客户端的读取轮数,进一步提高了系统的性能。在小规模集群上实现了Chequer系统,并通过实验验证了其性能。  相似文献   

11.
1IntroductionSequelltialconsistencyl1listhepopularacceptedcriterionofcorrectexecutioninshared-memorysystems.Itdefinesacorrectexecutionastheonewhoseresultisthesameasiftheoperationsofallprocessorswereexecutedinsomesequentialorder,andtheoperationsofeachindividualprocessoraPpearinthissequenceintheorderspecifiedbytheprogram.Troicalimplemelltationofsequelltialconsistencyrequireseachaccesstobedelayeduntilthe..previousaccessinthesameprocesscompletesl2].Thisisdetrimentaltosystemperformancebecauseitdis…  相似文献   

12.
Traditional implementation of sequential consistency in shared-memory systems requires memory accesses to be globally performed in program order.Based on an event ordering model for correct executions in shared-memory systems,this paper proposes and proves that out-of-order execution does not influence the correctness of an execution providing certain condition is met.Simulation results show that out-of-order execution proposed in this paper is an effective way to improve the performance of a sequentially consistent shared-memory system.  相似文献   

13.
现代处理器的优化技术,包括乱序执行和推测机制等,对性能至关重要.近期以Meltdown和Spectre为代表的侧信道攻击表明,由于异常延迟处理和推测错误而执行的指令结果虽然在架构级别上未显示,仍可能在处理器微架构状态中留下痕迹.通过隐蔽信道可将微架构状态的变化传输到架构层,进而恢复出秘密数据,这种攻击方式称为瞬态执行攻击.该攻击有别于传统的缓存侧信道攻击,影响面更广,缓解难度更大.本文深入分析了瞬态执行攻击的机理和实现方式,对目前的研究现状与防御方法进行了总结.首先,介绍了处理器微架构采用的优化技术,并分析了其导致瞬态执行攻击的功能特征;然后,基于触发瞬态执行的原语对瞬态执行攻击进行系统化分析,揭示攻击面上的明显差异.最后,有侧重点地针对攻击模型中的关键步骤和关键组件总结了已有防御方法,并展望了未来研究方向.  相似文献   

14.
Microarchitecture of the Godson-2 Processor   总被引:23,自引:3,他引:23       下载免费PDF全文
The Godson project is the first attempt to design high performance general-purpose microprocessors in China. This paper introduces the microarchitecture of the Godson-2 processor which is a 64-bit, 4-issue, out-of-order execution RISC processor that implements the 64-bit MlPS-like instruction set. The adoption of the aggressive out-of-order execution techniques (such as register mapping, branch prediction, and dynamic scheduling) and cache techniques (such as non-blocking cache, load speculation, dynamic memory disambiguation) helps the Godson-2 processor to achieve high performance even at not so high frequency. The Godson-2 processor has been physically implemented on a 6-metal 0.18μm CMOS technology based on the automatic placing and routing flow with the help of some crafted library cells and macros. The area of the chip is 6,700 micrometers by 6,200 micrometers and the clock cycle at typical corner is 2.3ns.  相似文献   

15.
We present an approach for formally verifying that a high-level microprocessor model behaves as defined by an instruction-set architecture. The technique is based on a specialization of self consistency called incremental flushing and reduces the need and effort required to create manually-generated implementation abstractions. Additionally, incremental flushing reduces the computational complexity of the proof obligations generated when reasoning about out-of-order execution. This is accomplished by comparing the functional behavior of the implementation abstraction over two sets of inputs: one that represents normal operation and one that is simpler, but functionally equivalent. The approach is illustrated on a simple out-of-order microprocessor core.  相似文献   

16.
This paper introduces the microarchitecture and physical implementation of the Godson-2E processor, which is a four-issue superscalar RISC processor that supports the 64-bit MIPS instruction set. The adoption of the aggressive out-of-order execution and memory hierarchy techniques help Godson-2E to achieve high performance. The Godson-2E processor has been physically designed in a 7-metal 90nm CMOS process using the cell-based methodology with some bitsliced manual placement and a number of crafted cells and macros. The processor can be run at 1GHz and achieves a SPEC CPU2000 rate higher than 500.  相似文献   

17.
We have verified the FM9801, a microprocessor design whose features include speculative execution, out-of-order issue and completion of instructions using Tomasulo's algorithm, and precise exceptions and interrupts. As a correctness criterion, we used a commutative diagram that compares the result of the pipelined execution from a flushed state to another flushed state with that of the sequential execution. Like many pipelined microprocessors, the FM9801 may not operate correctly if the executed program modifies itself. We discuss the condition under which the processor is guaranteed to operate correctly. In order to show that the correctness criterion is satisfied, we introduce an intermediate abstraction that records the history of executed instructions. Using this abstraction, we define a number of invariant properties that must hold during the operation of the FM9801. We verify these invariant properties, and then derive the proof of the commutative diagram from them. The proof has been mechanically checked by the ACL2 theorem prover.  相似文献   

18.
Current superscalar architectures strongly depend on an instruction issue queue to achieve multiple instruction issue and out-of-order execution. However, the issue queue requires a centralized structure and mainly causes globally broadcasting operations to wakeup and select the instructions. Therefore, a large issue queue ultimately results in a low clock rate along with a high circuit complexity. In other words, the increasing demands for a larger issue queue correspondingly impose a significant burden on achieving a higher clock speed.This paper discusses our Speculative Pre-Execution Assisted by compileR (SPEAR), a low-complexity issue queue design. SPEAR is designed to manage the small window superscalar architecture more efficiently without increasing the window size. To this end, we have first recognized that the long memory latency is one of the factors that demand a large window, and we aim at achieving early execution of the miss-causing load instructions using another hierarchy of an issue queue. We pre-execute those miss-causing instructions speculatively as an additional prefetching thread. Simulation results show that the SPEAR design achieves performance comparable to or even better than what would be obtained in superscalar architectures with a large issue queue. However, SPEAR is designed with smaller issue queues which consequently can be implemented with low hardware complexity and high clock speed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号