期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

顺序一致共享存储系统中的乱序执行技术——基本理论 总被引：1，自引：1，他引：1

胡伟武夏培肃《计算机学报》1997,20(6):481-490

本文首先研究了共享存储系统中的访存事件及其发生次序，从访存事件次序的角度建立了顺序一致性共享存储系统行正确性模型，然后在执行正确性模型的基础上，提出并证明了一种乱序列执行的方案，根据这个方案，只要满足一定条件，取数操作就可以越过它前面的访存操作执行而不影响系统的正确性。相似文献

2.

基于Java内存模型的并发程序模型检测

周志远张大方缪力《计算机工程与科学》2010,32(3):111-114

为了提高性能,Java内存模型允许编译器在优化过程中改变代码的执行顺序,同时该技术也会造成共享数据的更新顺序与本来的执行顺序不同。在多线程Java并发程序中,这些代码乱序执行会引起很多难以发现的错误。现有的Java程序模型检测技术并没有考虑这些顺序改变的问题。因此,本文提出了一种建立包含多线程交互及线程内代码乱序执行的完整模型,并利用模型检测工具进行穷举检测的算法。该算法可以发现原有技术无法发现的新问题,更好地检测高可靠性要求的Java并发程序。相似文献

3.

DSM系统中存储一致性模型的一种新框架——S3C框架

戴华东杨学军《计算机学报》2002,25(12):1387-1396

存储一致性模型对共享存储系统的正确性，性能以及程序的复杂性都有重要的影响，该文立足于分布共享存储系统，提出了一种新的存储一致性模型框架－S^3C框架，该框架通过同步点的概念来描述不同模型正确的存储访问事件顺序；通过一致性维护点的概念，对同一模型的不同实现方式也能够进行区别和比较，结合S^3C框架，该文提出一种以操作系统为中心的线程一致性模型，并针对以顺序一致性模为代表的存储一致性模型的正确实现进行了论述。相似文献

4.

线程一致性模型及其实现

周伟波戴华东杨学军《计算机工程与科学》2003,25(1):71-75

在虚拟分布式共享存储系统（SVM）中，为了保证程序在分布式环境下正确运行，维护存储一致性是关键问题之一，本文提出了一种全新的一致性模型－线程一致性模型（TC），并阐述了基于TC模型的虚拟分布式共享存储系统MTK的实现，线程一致性模型从操作系统内核的角度出发，把程序执行过程中的同步点与线程状态结合起来考虑存储一致性问题，有利于数据局部性的开发，另外，多线程体系结构的一个显著优势就是能把计算和通信重叠起来，从而有效地隐藏通信延迟，在内核级实现线程一致性模型时，线程及同步原语（锁、栅栏）都维护一个写记录，同时定义了同构唤醒线程簇。相似文献

5.

以操作系统为中心的存储一致性模型--线程一致性模型 总被引：3，自引：0，他引：3

戴华东杨学军《计算机研究与发展》2003,40(2):351-359

分布共享存储系统为保证程序的正确执行，必须通过存储一致性模型对共享存储访问顺序加以限制，而现有模型在可扩展性和操作系统级实现方面存在不足。结合多线程的特点，提出了一种以操作系统为中心的线程一致性模型，通过并行程序执行过程中线程状态的变化来观察和限制存储访问事件的正确顺序，有利于系统的可扩展性、一致性维护信息获取的方便性和完备性以及操作系统本身的设计和实现。分别从模型的定义、正确性证明、实现方案和性能分析等几个方面展开了论述。相似文献

6.

一种硬件事务存储系统模拟环境的研究与实现

刘轶吴名瑜张翠王永会《小型微型计算机系统》2012,33(2):409-413

针对事务存储技术研究中的模拟实验问题,实现了一种专门用于硬件事务存储系统的模拟环境,该模拟环境采用执行驱动模拟方式,支持全系统模拟,利用系统结构模拟器Simics和多核扩展包GEMS实现多核处理器相关部件的功能和性能模拟,在此基础上扩展实现硬件事务存储系统各部件的建模和模拟,以模块化的方法支持多种事务存储系统的模拟实验和性能评价.论文在分析事务存储和系统结构模拟技术的基础上,讨论了事务存储系统模拟环境的设计思路和方案,给出了该模拟环境的组成结构,并通过一种目标事务存储系统结构和一组测试程序对模拟环境进行了实验测试. 相似文献

7.

存储系统模拟器SiMemSy的设计与实现

黄震春李三立马群生《小型微型计算机系统》2002,23(1):9-13

由于存储器间距日益扩大 ,存储系统对计算机系统整体性能的影响越来越严重 ,存储系统模拟器的研究与开发也日益重要 .传统的模拟器更多地将注意力集中于对 Cache的模拟 ,而对存储系统整体的模拟不够 .为了模拟并分析存储系统各部分的性能与其对存储系统整体性能的影响 ,本文设计并实现存储系统模拟器 Si Mem Sy(SImulator ofMEMory SYstem) .实验表明 ,Si Mem Sy可以准确、高效地对存储系统进行模拟并得到可信的结果相似文献

8.

用户级通信在软件分布式共享存储系统中的应用与分析

毛永捷施巍松祝明发《计算机研究与发展》2001,38(4):451-457

通过信能不高是影响软件分布式共享存储系统性能的主要因素之一,用户级通信技术能够充分发挥高速网络的硬件性能,减少数据拷贝次数,降低软件件开发销,明显改善了带宽和延迟,为软件分布式共享存储系统性能的提高开避了新的途径,设计并实现了一个面向软件分布式存储系统的用户级通信库,它不仅改善了系统的通禽性能,同时也使得系统的并行计算性能得到改善,从而十分显著地提高了软件分布式共享存储系统的整体性能。相似文献

9.

一种基于数据相关性的乱序处理器验证方法

宁永波李谦李强张琦滨《数字社区&智能家居》2011,(4)

乱序执行是现代微处理器设计中普遍采用的提高流水线性能的方法,但乱序执行并乱序退出的全乱序结构在超标量处理器中应用并不普遍,这种全乱序的结构对基于参考模型的处理器正确性验证提出了巨大的挑战。主要介绍了从处理器的程序行为是否正确的最终标准——程序员可见的结构变量按程序行为进行顺序变化的角度对全乱序结构的处理器验证提出了一种全新的解决方法。相似文献

10.

基于远程直接内存访问的高性能键值存储系统

王成叶保留梅峰卢文达《计算机应用》2020,40(2):316-320

随着数据与系统规模的不断扩大,网络传输成为了键值存储系统的性能瓶颈。同时,远程直接内存访问（RDMA）技术能够支持高带宽和低时延的数据传输,为键值存储系统设计提供了新的思路。结合高性能网络中的RDMA技术,设计并实现了高性能、低CPU负载的键值存储系统Chequer;结合RDMA原语的特性,重新设计了键值存储系统的基本操作工作流程;并设计了基于线性探测的共享hash表,解决客户端缓存失效的问题以及提高hash命中率来减少客户端的读取轮数,进一步提高了系统的性能。在小规模集群上实现了Chequer系统,并通过实验验证了其性能。相似文献

11.

Out-of-Order Execution in Sequentially Consistent Shared-Memory Systems:Theory and Experiments

下载免费PDF全文

胡伟武!E-mail:hww water.chpc.ict.ac.cn 夏培肃!E-mail:hww water.chpc.ict.ac.cn 《计算机科学技术学报》1998,(2)

1IntroductionSequelltialconsistencyl1listhepopularacceptedcriterionofcorrectexecutioninshared-memorysystems.Itdefinesacorrectexecutionastheonewhoseresultisthesameasiftheoperationsofallprocessorswereexecutedinsomesequentialorder,andtheoperationsofeachindividualprocessoraPpearinthissequenceintheorderspecifiedbytheprogram.Troicalimplemelltationofsequelltialconsistencyrequireseachaccesstobedelayeduntilthe..previousaccessinthesameprocesscompletesl2].Thisisdetrimentaltosystemperformancebecauseitdis… 相似文献

12.

Out-of-Order Execution in Sequentially Consistent Shared-Memory Systems:Theory and Experiments

下载免费PDF全文

Hu Weiwu Xia Peisu 《计算机科学技术学报》1998,13(2):125-140

Traditional implementation of sequential consistency in shared-memory systems requires memory accesses to be globally performed in program order.Based on an event ordering model for correct executions in shared-memory systems,this paper proposes and proves that out-of-order execution does not influence the correctness of an execution providing certain condition is met.Simulation results show that out-of-order execution proposed in this paper is an effective way to improve the performance of a sequentially consistent shared-memory system. 相似文献

13.

微架构瞬态执行攻击与防御方法

吴晓慧贺也平马恒太周启明林少锋《软件学报》2020,31(2):544-563

现代处理器的优化技术,包括乱序执行和推测机制等,对性能至关重要.近期以Meltdown和Spectre为代表的侧信道攻击表明,由于异常延迟处理和推测错误而执行的指令结果虽然在架构级别上未显示,仍可能在处理器微架构状态中留下痕迹.通过隐蔽信道可将微架构状态的变化传输到架构层,进而恢复出秘密数据,这种攻击方式称为瞬态执行攻击.该攻击有别于传统的缓存侧信道攻击,影响面更广,缓解难度更大.本文深入分析了瞬态执行攻击的机理和实现方式,对目前的研究现状与防御方法进行了总结.首先,介绍了处理器微架构采用的优化技术,并分析了其导致瞬态执行攻击的功能特征;然后,基于触发瞬态执行的原语对瞬态执行攻击进行系统化分析,揭示攻击面上的明显差异.最后,有侧重点地针对攻击模型中的关键步骤和关键组件总结了已有防御方法,并展望了未来研究方向. 相似文献

14.

Microarchitecture of the Godson-2 Processor 总被引：23，自引：3，他引：23

下载免费PDF全文

Wei-WuHu Fu-XinZhang Zu-SongLi 《计算机科学技术学报》2005,20(2):0-0

The Godson project is the first attempt to design high performance general-purpose microprocessors in China. This paper introduces the microarchitecture of the Godson-2 processor which is a 64-bit, 4-issue, out-of-order execution RISC processor that implements the 64-bit MlPS-like instruction set. The adoption of the aggressive out-of-order execution techniques (such as register mapping, branch prediction, and dynamic scheduling) and cache techniques (such as non-blocking cache, load speculation, dynamic memory disambiguation) helps the Godson-2 processor to achieve high performance even at not so high frequency. The Godson-2 processor has been physically implemented on a 6-metal 0.18μm CMOS technology based on the automatic placing and routing flow with the help of some crafted library cells and macros. The area of the chip is 6,700 micrometers by 6,200 micrometers and the clock cycle at typical corner is 2.3ns. 相似文献

15.

Formal Verification of Out-of-Order Execution with Incremental Flushing

Robert B. Jones Jens U. Skakkebæk David L. Dill 《Formal Methods in System Design》2002,20(2):139-158

We present an approach for formally verifying that a high-level microprocessor model behaves as defined by an instruction-set architecture. The technique is based on a specialization of self consistency called incremental flushing and reduces the need and effort required to create manually-generated implementation abstractions. Additionally, incremental flushing reduces the computational complexity of the proof obligations generated when reasoning about out-of-order execution. This is accomplished by comparing the functional behavior of the implementation abstraction over two sets of inputs: one that represents normal operation and one that is simpler, but functionally equivalent. The approach is illustrated on a simple out-of-order microprocessor core. 相似文献

16.

Implementing a 1GHz Four-Issue Out-of-Order Execution Microprocessor in a Standard Cell ASIC Methodology 总被引：3，自引：0，他引：3

下载免费PDF全文

Wei-Wu Hu Ji-Ye Zhao Shi-Qiang Zhong Xu Yang Elio Guidetti and Chris Wu 《计算机科学技术学报》2007,22(1):1-0

This paper introduces the microarchitecture and physical implementation of the Godson-2E processor, which is a four-issue superscalar RISC processor that supports the 64-bit MIPS instruction set. The adoption of the aggressive out-of-order execution and memory hierarchy techniques help Godson-2E to achieve high performance. The Godson-2E processor has been physically designed in a 7-metal 90nm CMOS process using the cell-based methodology with some bitsliced manual placement and a number of crafted cells and macros. The processor can be run at 1GHz and achieves a SPEC CPU2000 rate higher than 500. 相似文献

17.

Verification of FM9801: An Out-of-Order Microprocessor Model with Speculative Execution, Exceptions, and Program-Modifying Capability

Jun Sawada Warren A. Hunt Jr. 《Formal Methods in System Design》2002,20(2):187-222

We have verified the FM9801, a microprocessor design whose features include speculative execution, out-of-order issue and completion of instructions using Tomasulo's algorithm, and precise exceptions and interrupts. As a correctness criterion, we used a commutative diagram that compares the result of the pipelined execution from a flushed state to another flushed state with that of the sequential execution. Like many pipelined microprocessors, the FM9801 may not operate correctly if the executed program modifies itself. We discuss the condition under which the processor is guaranteed to operate correctly. In order to show that the correctness criterion is satisfied, we introduce an intermediate abstraction that records the history of executed instructions. Using this abstraction, we define a number of invariant properties that must hold during the operation of the FM9801. We verify these invariant properties, and then derive the proof of the commutative diagram from them. The proof has been mechanically checked by the ACL2 theorem prover. 相似文献

18.

A low-complexity microprocessor design with speculative pre-execution

Won W. Jean-Luc 《Journal of Systems Architecture》2008,54(12):1101-1112

Current superscalar architectures strongly depend on an instruction issue queue to achieve multiple instruction issue and out-of-order execution. However, the issue queue requires a centralized structure and mainly causes globally broadcasting operations to wakeup and select the instructions. Therefore, a large issue queue ultimately results in a low clock rate along with a high circuit complexity. In other words, the increasing demands for a larger issue queue correspondingly impose a significant burden on achieving a higher clock speed.This paper discusses our Speculative Pre-Execution Assisted by compileR (SPEAR), a low-complexity issue queue design. SPEAR is designed to manage the small window superscalar architecture more efficiently without increasing the window size. To this end, we have first recognized that the long memory latency is one of the factors that demand a large window, and we aim at achieving early execution of the miss-causing load instructions using another hierarchy of an issue queue. We pre-execute those miss-causing instructions speculatively as an additional prefetching thread. Simulation results show that the SPEAR design achieves performance comparable to or even better than what would be obtained in superscalar architectures with a large issue queue. However, SPEAR is designed with smaller issue queues which consequently can be implemented with low hardware complexity and high clock speed. 相似文献