首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We extend the notion of Store Atomicity [Arvind and Jan-Willem Maessen. Memory model = instruction reordering + store atomicity. In ISCA '06: Proceedings of the 33rd annual International Symposium on Computer Architecture, 2006] to a system with atomic transactional memory. This gives a fine-grained graph-based framework for defining and reasoning about transactional memory consistency. The memory model is defined in terms of thread-local Instruction Reordering axioms and Store Atomicity, which describes inter-thread communication via memory. A memory model with Store Atomicity is serializable: there is a unique global interleaving of all operations which respects the reordering rules and serializes all the operations in a transaction together. We extend Store Atomicity to capture this ordering requirement by requiring dependencies which cross a transaction boundary to point in to the initiating instruction or out from the committing instruction. We sketch a weaker definition of transactional serialization which accounts for the ability to interleave transactional operations which touch disjoint memory. We give a procedure for enumerating the behaviors of a transactional program—noting that a safe enumeration procedure permits only one transaction to read from memory at a time. We show that more realistic models of transactional execution require speculative execution. We define the conditions under which speculation must be rolled back, and give criteria to identify which instructions must be rolled back in these cases.  相似文献   

2.
硬件事务内存(hardware transactional memory, HTM)和可字节寻址的非易失性内存(nonvolatile memory, NVM)已经可以在新的计算机设备中使用.使用HTM确保一致性和隔离性,使用NVM确保持久性,组合使用两者可以实现满足原子性、一致性、隔离性和持久性(atomicity, consistency, isolation and durability, ACID)特性的事务.ACID事务在数据库中非常有价值,但由于数据库事务通常较大,其面临的挑战是HTM固有的容量限制和争用水平.首先提出了一种通过HTM进行ACID事务处理的软硬件解决方案——持久化HTM(persistent HTM, PHTM).使用2种方法来消除PHTM的局限性:1)持久化混合事务内存(persistent hybrid TM, PHyTM),允许PHTM事务与支持任意大小的纯软件事务(software transactional memory, STM)并发执行;2)分离事务执行(split transaction execution, STE)算法,该算法为关系数据库事务量身定制,解决了大多数事务超过PHTM的容量限制的问题.简而言之,讨论了利用NVM将HTM扩展到ACID数据库事务的问题.  相似文献   

3.
Unbounded Transactional Memory   总被引:1,自引:0,他引:1  
Transactional memory should be virtualized to support transactions of arbitrary footprint and duration. The unbounded transactional memory (UTM) architecture achieves high performance in the common case of small transactions, without sacrificing correctness in large transactions.  相似文献   

4.
多核处理器性能的发挥依靠程序的并行,共享存储并行编程模型为大多数多核处理器所采用,而有效同步多个线程对共享变量的访问是其关键、也是难题.借鉴数据库中事务的思想,人们提出事务存储(transactional memory),旨在提供一种编程简单,对程序正确性推理容易的同步手段.简介了事务存储的起源,诠释了事务存储系统的概念.论述了事务存储的编程接口和执行模型.讨论了事务存储系统所涉及的主要内容,对各种方法和策略进行了比较.对事务存储中有待解决的问题进行了探讨.最后介绍了几个开源的事务存储研究平台.  相似文献   

5.
In this paper we study transactional memory (TM) as a new tool for threading codes in this new era of multi- and many-core computers. In particular, we investigate the features and study the applicability of transactional memory as an efficient and easy-to-use alternative for handling memory conflicts in unstructured mesh simulations that use shared memory. The software tool used for our preliminary analysis of this novel construct is IBM’s freely available Software Transactional Memory (STM) system. For our studies, we developed the BUSTM benchmark which is a test code with state-of-the-art unstructured-mesh bookkeeping. The numerical algorithms are simplified yet still exhibit most of the salient features of modern unstructured mesh methods. We apply STM to two frequently used algorithm types used in multi-physics codes with realistic 3-D meshes. Our computational experiments indicate a good fit between these application scenarios and the TM features.  相似文献   

6.
近年来,研究者们针对持久性内存良好的性能,设计了轻量级的持久性事务内存系统,它通过日志机制保证了事务的原子性和一致性.然而,相比于传统内存,持久性内存的存储单元往往具有更高的写延迟,并且存在有限的耐久性.发现现有的持久性事务内存系统存在日志机制带来过多的写操作问题:一方面,现有系统没有区分出事务中不同类型的写操作,即无论是对内存中已有数据的更新操作还是向事务中新分配区域添加数据的写操作,现有系统都采用相同的日志机制保证它们的一致性;另一方面,现有系统将更新操作的地址和数据等字段完整地持久化到日志中,即使其中大部分数据都可以通过压缩算法减少写入量.这2方面导致了冗余的日志操作,带来了额外的写延迟和写磨损.为了解决上述问题,设计并实现了一种基于微日志的持久性事务内存系统TLPTM,主要提出2个优化技术:1)分配操作感知的日志优化策略(allocation-aware log optimization, AALO),AALO有效地避免了向事务中新分配区域添加数据的写操作产生的日志开销;2)基于压缩算法的日志优化策略(compression-based log optimization, CBLO),CBLO将日志数据压缩后再写入到日志中,减少了日志操作的写开销.测试结果表明:相比于Mnemosyne,提出的日志优化策略AALO将事务性能提高了15%~24%,基于提出的2种优化技术实现的TLPTM将日志的写入总量降低了70%~81%.  相似文献   

7.
新兴的非易失性内存(non-volatile memory, NVM)具有字节寻址、持久性、大容量和低功耗等优点,然而,在NVM上进行并发编程往往比较困难,用户既要保证数据的崩溃一致性又要保证并发的正确性.为了降低用户开发难度,研究人员提出持久性事务内存方案,但是现有持久性事务内存普遍存在扩展性较差问题.测试发现限制扩展性的关键因素在于全局逻辑时钟和冗余NVM写操作.针对这2个方面,提出了线程逻辑时钟方法,通过允许每个线程拥有一个独立时钟,消除全局逻辑时钟中心化问题;提出了缓存行感知的双版本方法,为数据维护2个版本,通过循环更新这2个版本来保证数据的崩溃一致性,从而消除冗余的NVM写操作.基于所提出的这2个方法,实现了一个基于时间戳的高扩展的持久性软件事务内存(scalable durable transactional memory, SDTM),对比测试显示,在YCSB负载下,与DudeTM和PMDK相比,SDTM的性能最多分别提高了2.8倍和29倍.  相似文献   

8.
随着多核处理器的发展,硬件平台已经提供了充裕的并行能力,这对软件并行编程提出了更高的要求.传统的基于锁机制的并行编程模型存在着诸多难题.借鉴数据库中事务的思想,人们提出事务存储,旨在提供一种可编程性良好的同步手段.硬件事务存储快速有效的优势使之成为研究的热点.阐述了事务存储的基本概念、执行模型和编程接口.介绍了硬件事务存储系统的三大核心内容,对比了两种典型的硬件事务存储系统.分析讨论了目前硬件事务存储系统研究的热点和难点问题.最后介绍了硬件事务存储研究的平台和测试程序.  相似文献   

9.
何裕南  安虹  郭锐  梁博 《计算机科学》2007,34(1):248-254
CPU设计正在由仅开发指令级并行性的单线程单核结构转向利用线程级并行性的多线程多核结构,但至今还没有一个可移植性好并被广泛使用的开源多核处理器模拟器,限制了在这样的结构上开展高质量的研究工作。我们开发了一个多核处理器体系结构模拟器OpenCMP,用于支持当前和未来对多线程多核处理器体系结构关键技术的研究。该模拟器适当地抽象了多核处理器结构,为主流的多核处理器结构研究提供一个可扩展、灵活的模拟工具框架,包括支持对乱序、顺序的处理器核和同时多线程处理器核的模拟,以便对更大的多核设计空间进行比较性研究。本文以支持事务存储模型的多核处理器结构模拟器为例,详细描述了如何通过抽象多核结构和事务存储模型的最基本特性和组成部分,扩展单核处理器模拟器SimpleScalar,设计与实现一个多核处理器模拟器。初步研究表明,与现有的多核处理器模拟器相比,该模拟器能够较好地支持对事务存储模型和基于事务存储模型的多核处理器体系结构的研究.  相似文献   

10.
陈嘉  安虹  刘圆  王莉 《计算机仿真》2007,24(6):81-85
多核结构上采用由用户显式制导的并行程序设计模型,使用锁和同步变量来实现同步.事务存储模型能够解决由锁机制带来的一系列问题,提高程序的并发性.介绍了在文中提出的一种基于事务存储模型的多核结构(Transactional-Memory based Chip Multiple-Superscaler,TMCMS)上的并行编程模型,以及针对循环程序的执行模型;以FFT程序为例具体介绍了循环结构的并行化方法和编译转换过程.在初步的实验中,将处理单元从1增加到16个时,在所设计的编程模型的支持下,IPC(Instruction Per Cycle)有接近线性的增长,说明该并行编程模型能够充分发掘程序中潜在的细粒度线程级并行性,同时保持并行程序设计的简单性.  相似文献   

11.
This paper presents Atomic RMI, a distributed transactional memory framework that supports the control flow model of execution. Atomic RMI extends Java RMI with distributed transactions that can run on many Java virtual machines located on different network nodes. Our system employs SVA, a fully-pessimistic concurrency control algorithm that provides exclusive access to shared objects and supports rollback and fault tolerance. SVA is capable of achieving a relatively high level of parallelism by interweaving transactions that access the same objects and by making transactions that do not share objects independent of one another. It also allows any operations within transactions, including irrevocable ones, like system calls, and provides an unobtrusive API. Our evaluation shows that in most cases Atomic RMI performs better than fine grained mutual-exclusion and read/write locking mechanisms. Atomic RMI also performs better than an optimistic transactional memory in environments with high contention and a high ratio of write operations, while being competitive otherwise.  相似文献   

12.
The Transactional Memory (TM) paradigm aims at simplifying the development of concurrent applications by means of the familiar abstraction of atomic transaction. After a decade of intense research, hardware implementations of TM have recently entered the domain of mainstream computing thanks to Intel’s decision to integrate TM support, codenamed RTM (Reduced Transactional Memory), in their last generation of processors.In this work we shed light on a relevant issue with great impact on the performance of Intel’s RTM: the correct tuning of the logic that regulates how to cope with failed hardware transactions. We show that the optimal tuning of this policy is strongly workload dependent, and that the relative difference in performance among the various possible configurations can be remarkable (up to 10 × slow-downs).We address this issue by introducing a simple and effective approach that aims to identify the optimal RTM configuration at run-time via lightweight reinforcement learning techniques. The proposed technique requires no off-line sampling of the application, and can be applied to optimize both the cases in which a single global lock or a software TM implementation is used as fall-back synchronization mechanism.We propose and evaluate different designs for the proposed self-tuning mechanisms, which we integrated with GCC in order to achieve full transparency for the programmers. Our experimental study, based on standard TM benchmarks, demonstrates average gains of 60% over any static approach while remaining within 5% from the performance of manually identified optimal configurations.  相似文献   

13.
Transactional Memory: An Overview   总被引:3,自引:0,他引:3  
Writing applications that benefit from the massive computational power of future multicore chip multiprocessors will not be an easy task for mainstream programmers accustomed to sequential algorithms rather than parallel ones. This article presents a survey of transactional memory, a mechanism that promises to enable scalable performance while freeing programmers from some of the burden of modifying their parallel code.  相似文献   

14.
Memory affinity has become a key element to achieve scalable performance on multi-core platforms. Mechanisms such as thread scheduling, page allocation and cache prefetching are commonly employed to enhance memory affinity which keeps data close to the cores that access it. In particular, software transactional memory (STM) applications exhibit irregular memory access behavior that makes harder to determine which and when data will be needed by each core. Additionally, existing STM runtime systems are decoupled from issues such as thread and memory management. In this paper, we thus propose a skeleton-driven mechanism to improve memory affinity on STM applications that fit the worklist pattern employing a two-level approach. First, it addresses memory affinity in the DRAM level by automatic selecting page allocation policies. Then it employs data prefetching helper threads to improve affinity in the cache level. It relies on a skeleton framework to exploit the application pattern in order to provide automatic memory page allocation and cache prefetching. Our experimental results on the STAMP benchmark suite show that our proposed mechanism can achieve performance improvements of up to 46 %, with an average of 11 %, over a baseline version on two NUMA multi-core machines.  相似文献   

15.
现有的事务内存研究主要面向多核处理器和SMP机器,缺少对CC-NUMA系统的研究.而CC-NUMA是高端服务器的重要体系结构,随着用户对并行处理能力需求的不断上升,高端服务器将占据越来越重要的地位.文中概要阐述事务内存研究的基本情况,通过详尽的实验数据,深入分析了CC-NUMA结构的本地、远程访存差异特性对事务内存性能的影响,提出了一种面向CC-NUMA体系结构的冲突规避方法PBC.PBC在事务启动之前,对冲突可能性进行预测,并根据预测结果对事务进行调度,以降低事务的失败率.实验表明,文中提出的PBC方法可以显著提高CC-NUMA机器上运行事务内存的整体性能.  相似文献   

16.
基于依赖图的硬件事务存储技术研究   总被引:1,自引:0,他引:1  
事务存储技术能够简化并行程序中对共享资源的访问控制,是当前的研究热点之一.目前,多数基于硬件的事务存储系统采用基于冲突检测与处理的并发控制协议,当检测到两事务发生冲突时就中止二者之一.但是对事务间"冲突"更深入的分析表明,某些"冲突"并不一定会导致事务的回退,这种冲突称为"弱冲突".基于依赖图的硬件事务存储技术能够避免弱冲突引发的多余事务回退.模拟实验表明,基于依赖图的事务存储系统与基于冲突处理的事务存储系统相比具有明显的性能优势.  相似文献   

17.
Subtleties of Transactional Memory Atomicity Semantics   总被引:1,自引:0,他引:1  
Transactional memory has great potential for simplifying multithreaded programming by allowing programmers to specify regions of the program that must appear to execute atomically. Transactional memory implementations then optimistically execute these transactions concurrently to obtain high performance. This work shows that the same atomic guarantees that give transactions their power also have unexpected and potentially serious negative effects on programs that were written assuming narrower scopes of atomicity. We make four contributions: (1) we show that a direct translation of lock-based critical sections into transactions can introduce deadlock into otherwise correct programs, (2) we introduce the terms strong atomicity and weak atomicity to describe the interaction of transactional and non-transactional code, (3) we show that code that is correct under weak atomicity can deadlock under strong atomicity, and (4) we demonstrate that sequentially composing transactional code can also introduce deadlocks. These observations invalidate the intuition that transactions are strictly safer than lock-based critical sections, that strong atomicity is strictly safer than weak atomicity, and that transactions are always composable  相似文献   

18.
This paper extensively evaluates the performance of View-Oriented Transactional Memory (VOTM) based on two implementations that adopt different Transactional Memory (TM) algorithms. The Restricted Admission Control (RAC) mechanism in VOTM plays a key role in the performance gains of VOTM. In this paper, we use six applications to evaluate the performance advantage of VOTM. Experimental results show that partitioning shared data into separate views can improve performance when one of the views has high contention while others may have low contention, because the contention of each view is independently controlled by RAC. For memory-intensive applications, even when the contention on application data is not high enough to justify admission control by RAC, partitioning shared data into different views can improve the performance of TM systems due to the reduced contention on the metadata of the TM systems.  相似文献   

19.
Many researchers have developed applications using transactional memory (TM) with the purpose of benchmarking different implementations, and studying whether or not TM is easy to use. However, comparatively little has been done to provide general-purpose tools for profiling and optimizing programs which use transactions. In this paper we introduce a series of profiling and optimization techniques for TM applications. The profiling techniques are of three types: (i) techniques to identify multiple potential conflicts from a single program run, (ii) techniques to identify the data structures involved in conflicts by using a symbolic path through the heap, rather than a machine address, and (iii) visualization techniques to summarize how threads spend their time and which of their transactions conflict most frequently. Altogether they provide in-depth and comprehensive information about the wasted work caused by aborting transactions. To reduce the contention between transactions we suggest several TM specific optimizations which leverage nested transactions, transaction checkpoints, early release and etc. To examine the effectiveness of the profiling and optimization techniques, we provide a series of illustrations from the STAMP TM benchmark suite and from the synthetic WormBench workload. First we analyze the performance of TM applications using our profiling techniques and then we apply various optimizations to improve the performance of the Bayes, Labyrinth and Intruder applications. We discuss the design and implementation of the profiling techniques in the Bartok-STM system. We process data offline or during garbage collection, where possible, in order to minimize the probe effect introduced by profiling.  相似文献   

20.
Transactional Memory: The Hardware-Software Interface   总被引:1,自引:0,他引:1  
As multicore chips become ubiquitous, the need to provide architectural support for practical parallel programming is reaching critical. Conventional lock-based concurrency control techniques are difficult to use, requiring the programmer to navigate through the minefield of coarse-versus fine-grained locks, deadlock, livelock, lock convoying, and priority inversion. This explicit management of concurrency is beyond the reach of the average programmer, threatening to waste the additional parallelism available with multicore architectures. This comprehensive architecture supports nested transactions, transactional handlers, and two-phase commit. The result is a seamless integration of transactional memory with modern programming languages and runtime environments  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号