期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Out-of-Order Execution in Sequentially Consistent Shared-Memory Systems:Theory and Experiments

胡伟武!E-mail:hww water.chpc.ict.ac.cn 夏培肃!E-mail:hww water.chpc.ict.ac.cn 《计算机科学技术学报》1998,(2)

1IntroductionSequelltialconsistencyl1listhepopularacceptedcriterionofcorrectexecutioninshared-memorysystems.Itdefinesacorrectexecutionastheonewhoseresultisthesameasiftheoperationsofallprocessorswereexecutedinsomesequentialorder,andtheoperationsofeachindividualprocessoraPpearinthissequenceintheorderspecifiedbytheprogram.Troicalimplemelltationofsequelltialconsistencyrequireseachaccesstobedelayeduntilthe..previousaccessinthesameprocesscompletesl2].Thisisdetrimentaltosystemperformancebecauseitdis… 相似文献

2.

顺序一致共享存储系统中的乱序执行技术——基本理论 总被引：1，自引：1，他引：1

胡伟武夏培肃《计算机学报》1997,20(6):481-490

本文首先研究了共享存储系统中的访存事件及其发生次序，从访存事件次序的角度建立了顺序一致性共享存储系统行正确性模型，然后在执行正确性模型的基础上，提出并证明了一种乱序列执行的方案，根据这个方案，只要满足一定条件，取数操作就可以越过它前面的访存操作执行而不影响系统的正确性。相似文献

3.

以操作系统为中心的存储一致性模型--线程一致性模型 总被引：3，自引：0，他引：3

戴华东杨学军《计算机研究与发展》2003,40(2):351-359

分布共享存储系统为保证程序的正确执行，必须通过存储一致性模型对共享存储访问顺序加以限制，而现有模型在可扩展性和操作系统级实现方面存在不足。结合多线程的特点，提出了一种以操作系统为中心的线程一致性模型，通过并行程序执行过程中线程状态的变化来观察和限制存储访问事件的正确顺序，有利于系统的可扩展性、一致性维护信息获取的方便性和完备性以及操作系统本身的设计和实现。分别从模型的定义、正确性证明、实现方案和性能分析等几个方面展开了论述。相似文献

4.

Out-of-Order Execution in Sequentially Consistent Shared-Memory Systems:Theory and Experiments

下载免费PDF全文

Hu Weiwu Xia Peisu 《计算机科学技术学报》1998,13(2):125-140

Traditional implementation of sequential consistency in shared-memory systems requires memory accesses to be globally performed in program order.Based on an event ordering model for correct executions in shared-memory systems,this paper proposes and proves that out-of-order execution does not influence the correctness of an execution providing certain condition is met.Simulation results show that out-of-order execution proposed in this paper is an effective way to improve the performance of a sequentially consistent shared-memory system. 相似文献

5.

Chip Multithreaded Consistency Model

下载免费PDF全文

Zu-Song Li Dan-Dan Huan Wei-Wu Hu and Zhi-Min Tang 《计算机科学技术学报》2008,23(2):298-ver

Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded consistency model adapting to multithreaded processor is proposed in this paper. The restriction imposed on memory event ordering by chip multithreaded consistency is presented and formalized. With the idea of critical cycle built by Wei-Wu Hu, we prove that the proposed chip multithreaded consistency model satisfies the criterion of correct execution of sequential consistency model. Chip multithreaded consistency model provides a way of achieving high performance compared with sequential consistency model and easures the compatibility of software that the execution result in multithreaded processor is the same as the execution result in uniprocessor. The implementation strategy of chip multithreaded consistency model in Godson-2 SMT processor is also proposed. Godson-2 SMT processor supports chip multithreaded consistency model correctly by exception scheme based on the sequential memory access queue of each thread. 相似文献

6.

DSM系统中存储一致性模型的一种新框架——S3C框架

戴华东杨学军《计算机学报》2002,25(12):1387-1396

存储一致性模型对共享存储系统的正确性，性能以及程序的复杂性都有重要的影响，该文立足于分布共享存储系统，提出了一种新的存储一致性模型框架－S^3C框架，该框架通过同步点的概念来描述不同模型正确的存储访问事件顺序；通过一致性维护点的概念，对同一模型的不同实现方式也能够进行区别和比较，结合S^3C框架，该文提出一种以操作系统为中心的线程一致性模型，并针对以顺序一致性模为代表的存储一致性模型的正确实现进行了论述。相似文献

7.

Request Reordering to Enhance the Performance of Strict Consistency Models

《Computer Architecture Letters》2002,1(1):11-11

Advances in ILP techniques enable strict consistencymodels to relax memory order through speculative execution ofmemory operations. However, ordering constraints still hinderthe performance because speculatively executed operationscannot be committed out of program order for the possibility ofmis-speculation. In this paper, we propose a new technique whichallows memory operations to be non-speculatively committed outof order without violating consistency constraints. 相似文献

8.

片上多核处理器存储一致性验证 总被引：2，自引：0，他引：2

王朋宇陈云霁沈海华陈天石张珩《软件学报》2010,21(4):863-874

存储一致性验证是片上多核处理器功能验证的重要部分.由于验证并行程序的执行结果是否符合存储一致性模型理论上是NP难问题,现有的验证方法中只能采用一些时间复杂度大于O(n3)的不完全方法.发现在支持写原子性的多处理器系统中,两条执行时间不重叠的操作之间存在确定的时间序.通过引入时间序的概念,设计并实现了一种线性时间复杂度的存储一致性验证工具LCHECK.LCHECK利用时间序将验证局部化,使得在表示程序执行结果的有向图中,序关系边的推导和正确性检测都被限定在有限范围内.与现有其他方法相比,LCHECK时间复杂度低,对程序长度和访存地址数没有限制,因此验证效率更高.作为国产片上多核处理器龙芯3号的重要验证工具, LCHECK发现了一些存储系统的设计错误. 相似文献

9.

存储模型仿真器的设计与实现 总被引：2，自引：1，他引：1

吴俊敏杨超陈国良张淼辉门珂《计算机研究与发展》2005,42(3):394-403

存储一致性问题和高速缓存一致性问题是共享存储并行计算机中两个最关键的问题,通过仿真器对它们进行了量化研究,设计并实现了一个存储模型仿真器MMS．基于MMS仿真了不同并行机结构模型下多种存储一致性模型的行为;针对不同类型的计算问题比较了不同的存储一致性模型,并对实验结果进行了分析;实现了几个不同的高速缓存一致性协议,并比较了它们的性能．相似文献

10.

Verification of STM on relaxed memory models

Rachid Guerraoui Thomas A. Henzinger Vasu Singh 《Formal Methods in System Design》2011,39(3):297-331

相似文献

11.

Efficient Sequential Consistency Using Conditional Fences

Changhui Lin Vijay Nagarajan Rajiv Gupta 《International journal of parallel programming》2012,40(1):84-117

Among the various memory consistency models, the sequential consistency (SC) model is the most intuitive and enables programmers to reason about their parallel programs the best. Nevertheless, processor designers often choose to support relaxed memory consistency models because the weaker ordering constraints imposed by such models allow for more instructions to be reordered and enable higher performance. Programs running on machines supporting weaker consistency models can be transformed into ones in which SC is enforced. The compiler does this by computing a minimal set of memory access pairs whose ordering automatically guarantees SC. To ensure that these memory access pairs are not reordered, memory fences are inserted. Unfortunately, insertion of such memory fences can significantly slowdown the program. We observe that the ordering of the minimal set of memory accesses that the compiler strives to enforce, is typically already enforced in the normal course of program execution. A study we conducted on programs with compiler inserted memory fences shows that only 8% of the executed instances of the memory fences are really necessary to ensure SC. Motivated by this study we propose the conditional fence mechanism, known as C-Fence that utilizes compiler information to decide dynamically if there is a need to stall at each fence, only stalling when necessary. Our experiments with SPLASH-2 benchmarks show that, with C-Fences and a centralized active table, programs can be transformed to enforce SC incurring only 12% slowdown, as opposed to 43% slowdown using normal fence instructions. Our approach requires very little hardware support (<350 bytes of on-chip-storage) and it avoids the use of speculation and its associated costs. Furthermore, to ameliorate the contention in the centralized active table arising from the increasing number of processors, we also design a distributed active table, which further improves the performance of C-Fence for a larger number of processors. 相似文献

12.

Partition consistency

Steven Cheng Lisa Higham Jalal Kawash 《Distributed Computing》2014,27(5):363-389

This paper provides a case study of specifying an abstract memory consistency model, providing possible implementations for the model, and proving the correctness of implementations. Specifically, we introduce a class of memory consistency models called partition consistency. Existing abstract consistency models such as sequential consistency, piplined-RAM, Goodman’s processor consistency, and coherence are all members of the partition consistency class. A concrete message-passing network model is also specified. Implementations of partition consistency on this network model are then presented and proved correct. A middle level of abstraction is utilized to facilitate the proofs. All three levels of abstraction are specified using the same framework. The paper aims to illustrate a general methodology and techniques for specifying memory consistency models and proving the correctness of their implementations. 相似文献

13.

A unified formalization of four shared-memory models

Adve S.V. Hill M.D. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(6):613-624

The authors present a data-race-free-1, shared-memory model that unifies four earlier models: weak ordering, release consistency (with sequentially consistent special operations), the VAX memory model, and data-race-free-0. Data-race-free-1 unifies the models of weak ordering, release consistency, the VAX, and data-race-free-0 by formalizing the intuition that if programs synchronize explicitly and correctly, then sequential consistency can be guaranteed with high performance in a manner that retains the advantages of each of the four models. Data-race-free-1 expresses the programmer's interface more explicitly and formally than weak ordering and the VAX, and allows an implementation not allowed by weak ordering, release consistency, or data-race-free-0. The implementation proposal for data-race-free-1 differs from earlier implementations by permitting the execution of all synchronization operations of a processor even while previous data operations of the processor are in progress. To ensure sequential consistency, two sychronizing processors exchange information to delay later operations of the second processor that conflict with an incomplete data operation of the first processor 相似文献

14.

Programming for different memory consistency models

Kourosh Gharachorloo Sarita V. Adve Anoop Gupta John L. Hennessy Mark D. Hill 《Journal of Parallel and Distributed Computing》1992,15(4)

The memory consistency model, or memory model, supported by a shared-memory multiprocessor directly affects its performance. The most commonly assumed memory model is sequential consistency (SC). While SC provides a simple model for the programmer, it imposes rigid constraints on the ordering of memory accesses and restricts the use of common hardware and compiler optimizations. To remedy the shortcomings of SC, several relaxed memory models have been proposed in the literature. These include processor consistency (PC), weak ordering (WO), release consistency (RCsc/RCpc), total store ordering (TSO), and partial store ordering (PSO). While the relaxed models provide the potential for higher performance, they present a more complex model for programmers when compared to SC. Our previous research has addressed this tradeoff by taking a programmer-centric approach. We have proposed memory models (DRF0, DRF1, PL) that allow the programmer to reason with SC, but require certain information about the memory accesses. This information is used by the system to relax the ordering among memory accesses while still maintaining SC for the programmer. Our previous models formalized the information that allowed optimizations associated with WO and RCsc to be used. This paper extends the above approach by defining a new model, PLpc, that allows optimizations of the TSO, PSO, PC, and RCpc models as well. Thus, PLpc provides a unified programming model that maintains the ease of reasoning with SC while providing for efficiency and portability across a wide range of proposed system designs. 相似文献

15.

Correctness criteria for multilevel secure transactions 总被引：2，自引：0，他引：2

Smith K.P. Blaustein B.T. Jajodia S. Notargiacomo L. 《Knowledge and Data Engineering, IEEE Transactions on》1996,8(1):32-45

The benefits of distributed systems and shared database resources are widely recognized, but they often cannot be exploited by users who must protect their data by using label-based access controls. In particular, users of label-based data need to read and write data at different security levels within a single database transaction, which is not currently possible without violating multilevel security constraints. The paper presents a formal model of multilevel transactions which provide this capability. We define four ACIS (atomicity, consistency, isolation, and security) correctness properties of multilevel transactions. While atomicity, consistency and isolation are mutually achievable in standard single-site and distributed transactions, we show that the security requirements of multilevel transactions conflict with some of these goals. This forces trade-offs to be made among the ACIS correctness properties, and we define appropriate partial correctness properties. Due to such trade-offs, an important problem is to design multilevel transaction execution protocols which achieve the greatest possible degree of correctness. These protocols must provide a variety of approaches to making trade-offs according to the differing priorities of various users. We present three transaction execution protocols which achieve a high degree of correctness. These protocols exemplify the correctness trade-offs proven in the paper, and offer realistic implementation options 相似文献

16.

基于有限状态进程的事件约束定义 总被引：4，自引：1，他引：4

顾庆陈道蓄谢立韩杰孙钟秀《软件学报》2002,13(11):2162-2168

测试分布式程序需要定义事件约束来检测程序执行产生的事件序列.事件约束需要根据程序的规约来推导.FSP是一类描述并发程序形式化规约的进程代数记法.它将并发进程描述为动作序列,其中动作可对应到规约级事件.E-CSPE约束在给定状态谓词下定义前后运行事件间的顺序关系.根据FSP的操作符和并发控制机制可推导E-CSPE约束.推导出来的E-CSPE约束考虑到并发程序的安全和进展属性,可据以判断程序运行的正确性和测试的充分性. 相似文献

17.

面向RISC-V内存一致性测试的自动化分析方法

下载免费PDF全文

杨德亨徐学政王涛黄安文李琼《智能安全》2023,2(3):58-67

开源架构RISC-V定义了其内存一致性模型RVWMO,作为多核RISC-V系统软硬件设计开发的重要规范。在多核芯片的验证阶段,需要对芯片的内存一致性进行严格全面的测试。测试通常针对某一访存顺序模式,选取典型的并行程序片段进行大规模测试（又称Litmus测试）,通过程序运行的最终状态推测芯片内存一致性模型。通常,更为宽松的内存一致性会导致更多的程序状态。分析Litmus测试结果对于验证芯片的RVWMO兼容性、探索多核系统的内存一致性优化的可能性具有重要意义。以RVWMO规范下允许的程序状态为基准,芯片实测得到更多的程序状态表明其存在兼容性问题,得到更少的程序状态表明其仍具有优化空间。面对规模庞大、行为复杂的Litmus测试,如何对其测试结果进行自动化分析是亟待解决的问题。本文对Litmus测试的原理和输出结果进行了深入分析,提出一种面向RISC-V内存一致性测试的自动化分析方法,采用形式化方法对Litmus测试进行基于RVWMO规范的模拟运行,并通过与芯片的实测结果进行对比分析给出测试结论。本方法基于Hifive Unmatched开发板开展测试。实验表明,本文提出的方法可快速、有效地对RISC-V内存一致性测试进行自动化分析。相似文献

18.

Algebraic computational models of OR-parallel execution of Prolog

R. Barbuti N. De Francesco A. Santone 《Acta Informatica》1997,34(6):449-489

A specification of the OR-parallel execution of Prolog programs, using CHOCS (calculus of higher order communicating systems) [24], is presented in the paper. A translation is defined from Prolog programs and goals to CHOCS processes: the execution of the CHOCS process corresponding to a goal mimics the OR-parallel execution of the original Prolog goal. In the translation, clauses and predicate definitions of a Prolog program correspond to processes. To model OR-parallelism, the processes , corresponding to clauses (having the same head predicate ) start their execution concurrently, but, in order to respect the depth-first search rule, each is guarded by the termination of the executions of processes 's, . The computational model is proved correct with respect to the semantics of Prolog, as given in [4, 5]. Our model, because of its algebraic specification, can be easily used to prove properties of the parallel execution of Prolog programs. Moreover, the model exploits the maximum degree of parallelism, by giving the Prolog solutions in parallel, without any order among them. However, this model, being close to the Prolog semantics definition, contains sources of inefficiency which make it unpractical as a guide for the implementation. To overcome these problems, a new computational model is defined. This model is obtained by modifications of the basic one and thus its correctness can be easily proved. Finally, we show how to obtain models of different real implementations of OR-parallel Prolog by slight modification of the new model. The relations among all these models, in terms of parallelism degree, are studied by using the concepts of bisimulation and simulation, developed for concurrent calculi. Received: 5 May 1995 / 28 May 1996 相似文献

19.

Memory model sensitive bytecode verification

Thuan Quang Huynh Abhik Roychoudhury 《Formal Methods in System Design》2007,31(3):281-305

Modern concurrent programming languages like C# and Java have a programming language level memory model, which captures the set of all allowed behaviors of programs on any implementation platform—uni- or multi-processor. Such a memory model is typically weaker than Sequential Consistency and allows reordering of operations within a program thread. Therefore, programs verified correct by assuming Sequential Consistency (that is, each thread proceeds in program order) may not behave correctly on certain platforms! One solution to this problem is to develop program checkers which are memory model sensitive. In this paper, we develop a bytecode level invariant checker for the programming language C#. Our checker identifies program states which are reached only because the C# memory model is more relaxed than Sequential Consistency. It employs partial order reduction strategies to speed up the search. These strategies are different from standard partial order reduction methods since our search also considers execution traces containing bytecode re-orderings. Furthermore, our checker identifies (a) operation re-orderings which cause undesirable states to be reached, and (b) simple program modifications—by inserting memory barrier operations—which prevent such undesirable re-orderings. A preliminary version of this paper appeared as (Huynh and Roychoudhury, in: Formal Methods Symposium, 2006). The conference paper is available from . This work was done when the first author was a Research Assistant at National University of Singapore. 相似文献

20.

An empirical time analysis of evolutionary algorithms as C programs

Sergio Nesmachnow Francisco Luna Enrique Alba 《Software》2015,45(1):111-142

This article presents an empirical study devoted to characterize the computational efficiency behavior of an evolutionary algorithm (usually called canonical) as a C program. The study analyzes the effects of several implementation decisions on the execution time of the resulting evolutionary algorithm. The implementation decisions studied include: memory utilization (using dynamic vs. static variables and local vs. global variables), methods for ordering the population, code substitution mechanisms, and the routines for generating pseudorandom numbers within the evolutionary algorithm. The results obtained in the experimental analysis allow us to conclude that significant improvements in efficiency can be gained by applying simple guidelines to best program an evolutionary algorithm in C. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献