期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Selective dynamic serialization for reducing energy consumption in hardware transactional memory systems

Epifanio Gaona J. Rubén Titos-Gil Juan Fernández Manuel E. Acacio 《The Journal of supercomputing》2014,68(2):914-934

In the search for new paradigms to simplify multithreaded programming, Transactional Memory (TM) is currently being advocated as a promising alternative to deadlock-prone lock-based synchronization. In this way, future many-core CMP architectures may need to provide hardware support for TM. On the other hand, power dissipation constitutes a first class consideration in multicore processor designs. In this work, we propose Selective Dynamic Serialization (SDS) as a new technique to improve energy consumption without degrading performance in applications with conflicting transactions by avoiding wasted work due to aborted transactions. Our proposal, which is implemented on top of a hardware transactional memory (HTM) system with an eager conflict management policy, detects and serializes conflicting transactions dynamically (at run-time). In its simplest form, in case of conflict, one transaction is allowed to continue whilst the rest are completely stalled. Once the executing transaction has finished, it wakes up several of the stalling transactions. More elaborated implementations of SDS try to delay this behavior until serialization of transactions is profitable, achieving the best trade-off between performance, energy savings and network traffic. SDS implementations differ from each other in the condition that triggers the serialization mode. We have evaluated several SDS schemes using GEMS, a full-system simulator implementing the LogTM-SE Eager–Eager HTM system, and several benchmarks from the STAMP suite. Results for a 16-core CMP show that SDS obtains reductions of 6 % on average in energy consumption (more than 20 % in high contention scenarios) in a wide range of benchmarks without affecting, on average, execution time. At the same time, network traffic level is also reduced by 22 %. 相似文献

2.

Analysing software prefetching opportunities in hardware transactional memory

Shimchenko Marina Titos-Gil Rubén Fernández-Pascual Ricardo Acacio Manuel E. Kaxiras Stefanos Ros Alberto Jimborean Alexandra 《The Journal of supercomputing》2022,78(1):919-944

The Journal of Supercomputing - Hardware transactional memory emerged to make parallel programming more accessible. However, the performance pitfall of this technique is squashing speculatively... 相似文献

3.

Fast and efficient commits for Lazy-Lazy hardware transactional memory

Epifanio Gaona José L. Abellán Manuel E. Acacio 《The Journal of supercomputing》2015,71(12):4305-4326

相似文献

4.

Efficient execution of speculative threads and transactions with hardware transactional memory

《Future Generation Computer Systems》2014

Thread-level speculation (TLS) was researched to automatically parallelize portions of serial programs for execution, and transactional memory (TM) was studied as a promising alternative of lock for parallel programming due to its simplicity. Both TLS and TM require similar underlying support. In the paper, we present SeTM (sequential transactional memory), a hardware enhanced TM system which supports TLS at minor extra cost. Signature is an effective way to buffer speculative states in TM and TLS. But it cripples TM and TLS performance due to its false-positive in terms of conflict detection, especially for conflict-intensive TLS. SeTM adopts R/W bits and signature concurrently to ameliorate this bad influence. Additionally, SeTM introduces the fast rollback mechanism, which provides fast abort recovery for eager log-based HTM and TLS. The most important contribution of SeTM is the conflict-tolerant mechanism, which tolerates some ambiguous data conflicts in TLS. Finally, in order to achieve an efficient execution for these un-order transactions, we add an extra ordering mechanism for SeTM. With this ordering mechanism, the transactions in TM can also gain the performance improvement with the support of conflict-tolerant mechanism. Our evaluation major on TM and TLS separately. For the TLS applications, six representative benchmarks have been adopted to evaluate the above model. Our experimental results show that our scheme improves the execution performance of most tested codes at a modest hardware cost. For a set of important scientific loops, we report the highest speedup of 6.5 with 15 cores. Besides, experimental results also show good scalability of SeTM system. For the TM applications, with respect to LogTM-SE, the benchmarks from STAMP also gain performance improvement signally. 相似文献

5.

Optimised memory allocation for less false abortion and better performance in hardware transactional memory

Xiuhong Li Altenbek Gulila 《International Journal of Parallel, Emergent and Distributed Systems》2020,35(4):483-491

ABSTRACT

This paper introduces and tackles a special performance hazard in Hardware Transactional Memory (HTM): false abortion. False abortion causes many unnecessary transaction abortions in HTM and can greatly impact the performance, making HTM not that useful when it is adopted as a fast path for Software Transactional Memory. By introducing a new memory allocator design, we are able to put objects that are likely to be accessed together from different threads into different cache lines and thus avoid conflicts of hardware transactions in different threads. Experiments show that our method can reduce 47% of transaction abortion and achieve a speedup of up to 1.67× (averagely 22%), yet only consume 14% more memory, showing great potential to enhance current HTM technology. 相似文献

6.

Embedded-TM: Energy and complexity-effective hardware transactional memory for embedded multicore systems

Cesare Ferri Samantha Wood Tali Moreshet R. Iris Bahar Maurice Herlihy 《Journal of Parallel and Distributed Computing》2010

We investigate how transactional memory can be adapted for embedded systems. We consider energy consumption and complexity to be driving concerns in the design of these systems and therefore adapt simple hardware transactional memory (HTM) schemes in our architectural design. We propose several different cache structures and contention management schemes to support HTM and evaluate them in terms of energy, performance, and complexity. We find that ignoring energy considerations can lead to poor design choices, particularly for resource-constrained embedded platforms. We conclude that with the right balance of energy efficiency and simplicity, HTM will become an attractive choice for future embedded system designs. 相似文献

7.

事务存储研究 总被引：1，自引：0，他引：1

黄国睿张平魏广博马航《计算机工程与设计》2010,31(2)

为了研究多核处理器系统上的并行编程问题,开展了对事务存储模型的研究.阐述了事务存储,介绍了事务存储系统的实现方法,利用4种事务存储系统详细阐述了事务存储的实现;重点讨论了6种影响事务存储发展的关键技术,即实现方式、数据结构组织、并发控制,冲突检测、争用管理等;提出了事务存储将向着软硬件结合、提升性能、提高正确性和满足多核应用需求的方向发展. 相似文献

8.

Towards formally specifying and verifying transactional memory

Simon Doherty Lindsay Groves Victor Luchangco Mark Moir 《Formal Aspects of Computing》2013,25(5):769-799

Over the last decade, great progress has been made in developing practical transactional memory (TM) implementations, but relatively little attention has been paid to precisely specifying what it means for them to be correct, or formally proving that they are. In this paper, we present TMS1 (Transactional Memory Specification 1), a precise specification of correct behaviour of a TM runtime library. TMS1 targets TM runtimes used to implement transactional features in an unmanaged programming language such as C or C++. In such contexts, even transactions that ultimately abort must observe consistent states of memory; otherwise, unrecoverable errors such as divide-by-zero may occur before a transaction aborts, even in a correct program in which the error would not be possible if transactions were executed atomically. We specify TMS1 precisely using an I/O automaton (IOA). This approach enables us to also model TM implementations using IOAs and to construct fully formal and machine-checked correctness proofs for them using well established proof techniques and tools. We outline key requirements for a TM system. To avoid precluding any implementation that satisfies these requirements, we specify TMS1 to be as general as we can, consistent with these requirements. The cost of such generality is that the condition does not map closely to intuition about common TM implementation techniques, and thus it is difficult to prove that such implementations satisfy the condition. To address this concern, we present TMS2, a more restrictive condition that more closely reflects intuition about common TM implementation techniques. We present a simulation proof that TMS2 implements TMS1, thus showing that to prove that an implementation satisfies TMS1, it suffices to prove that it satisfies TMS2. We have formalised and verified this proof using the PVS specification and verification system. 相似文献

9.

A model of dynamic separation for transactional memory

Martín Abadi Tim Harris Katherine F. Moore 《Information and Computation》2010,208(10):1093-1117

Dynamic separation is a new programming discipline for systems with transactional memory. We study it formally in the setting of a small calculus with transactions. We provide a precise formulation of dynamic separation and compare it with other programming disciplines. Furthermore, exploiting dynamic separation, we investigate some possible implementations of the calculus and we establish their correctness. 相似文献

10.

Boosting performance of transactional memory through O-GEHL predictors

Ehsan Atoofian 《Microprocessors and Microsystems》2014

Time-based Software Transactional Memory (STM) exploits a global clock to validate transactional data and guarantee consistency of transactions. While this method is simple to implement it results in contentions over the clock if transactions commit simultaneously. The alternative method is thread local clock (TLC) which exploits local variables to maintain consistency of transactions. However, TLC may increase false aborts and degrade performance of STMs. In this paper, we analyze global clock and TLC in the context of STM systems, highlighting both the implementation trade-offs and the performance implications of the two techniques. We demonstrate that neither global clock nor TLC is optimum across applications. To counter this challenge, we introduce two optimization techniques: The first optimization technique is Adaptive Clock (AC) which dynamically selects one of the two validation techniques based on probability of conflicts. AC is a speculative approach and relies on software O-GEHL predictors to speculate future conflicts. The second optimization technique is AC+ which reduces timing overhead of O-GEHL predictors by implementing the predictors in hardware. In addition, we exploit information theory to eliminate unnecessary computational resources and reduce storage requirements of the O-GEHL predictors. Our evaluation with TL2 and Stamp benchmark suite reveals that AC is effective and improves execution time of transactional applications up to 65%. 相似文献

11.

面向数据中心的事务内存框架设计

下载免费PDF全文

孙勇《计算机工程与应用》2011,47(27):74-76

针对由计算机集群构成的云计算数据中心的特性,提出了一种基于事务内存的分布式编程框架。该框架将云计算任务封装为事务,自动完成所有事务的调度执行、负载均衡和故障恢复;将数据中心的分布式数据封装为事务对象,保证事务访问事务对象时的ACID特性。与同类研究相比,它无需用户关心程序的并行控制,具有简单易用性。该框架已在仿真环境下实现,实验结果表明它具有良好的可扩展性和容错性。相似文献

12.

Improving performance of software transactional memory through contention locality

Ehsan Atoofian 《The Journal of supercomputing》2013,64(2):527-547

In this paper, we introduce contention locality in Transactional Memory (TM) which describes the likelihood that a previously aborted transaction conflicts again in the future. We find that conflicts are highly predictable in TMs and we propose two optimization techniques based on contention locality: The first optimization technique is Speculative Contention Avoidance (SCA). SCA dynamically controls the number of concurrently executing transactions and serializes those transactions that are likely to conflict. As such, SCA reduces contention in TMs and improves performance. The second optimization technique is Adaptive Validation (AV). We show that there is no single validation policy that works well across all applications. AV adjusts validation based on applications’ behavior and improves performance of TMs. In this paper, SCA and AV are evaluated using Transactional Locking II (TL2) and Stamp v0.9.10 benchmark suite. The evaluation reveals that SCA and AV are effective and improve performance significantly. 相似文献

13.

Soft-error mitigation by means of decoupled transactional memory threads

Daniel Sánchez Juan M. Cebrián José M. García Juan L. Aragón 《Distributed Computing》2015,28(2):75-90

相似文献

14.

Implementation tradeoffs in the design of flexible transactional memory support

Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott 《Journal of Parallel and Distributed Computing》2010

We present FlexTM (FLEXible Transactional Memory), a high performance TM framework that allows software to determine when (eagerly, lazily, or in a mixed fashion) and how to manage conflicts, while employing hardware to manage transactional state and to track conflicts. FlexTM coordinates four decoupled hardware mechanisms: read and write signatures, which summarize per-thread access sets; per-thread conflict summary tables (CSTs), which identify the processors with which conflicts have occurred; Programmable Data Isolation, which buffers speculative updates in the local cache and uses an overflow table to handle unbounded updates; and Alert-On-Update, which notifies a thread immediately when a specified location is written by another processor. The CSTs enable an STM-inspired commit protocol that manages conflicts in a decentralized manner (no global arbitration) and allows parallel commits. 相似文献

15.

TurboLock: increasing associativity of lock table in transactional memory

Amir Ghanbari Bavarsad Ehsan Atoofian 《Computing》2015,97(6):649-661

相似文献

16.

An analytic framework for performance modeling of software transactional memory

Armin Heindl Gilles Pokam 《Computer Networks》2009,53(8):1202-1214

Analytic models based on discrete-time Markov chains (DTMC) are proposed to assess the algorithmic performance of Software Transactional Memory (TM) systems. Base STM variants are compared: optimistic STM with inplace memory updates and write buffering and pessimistic STM. Starting from an absorbing DTMC, closed-form analytic expressions are developed, which are quickly solved iteratively to determine key parameters of the considered STM systems, like the mean number of transaction restarts and the mean transaction length. Since the models reflect complex transactional behavior in terms of read/write locking, data consistency checks and conflict management independent of implementation details, they highlight the algorithmic performance advantages of one system over the other, which – due to their at times small differences – are often blurred by implementation of STM systems and even difficult to discern with statistically significant discrete-event simulations. 相似文献

17.

Identifying the optimal level of parallelism in transactional memory applications

Diego Didona Pascal Felber Derin Harmanci Paolo Romano Jörg Schenker 《Computing》2015,97(9):939-959

相似文献

18.

Window-based greedy contention management for transactional memory: theory and practice

Gokarna Sharma Costas Busch 《Distributed Computing》2012,25(3):225-248

We consider greedy contention managers for transactional memory for M?× N execution windows of transactions with M threads and N transactions per thread. We present, formally analyze, and experimentally evaluate three new randomized greedy contention management algorithms for transaction windows. Assuming that each transaction has duration τ and conflicts with at most C other transactions inside the window, the first algorithm Offline-Greedy produces a schedule of length O(τ· (C?+?N· log(MN))) with high probability. The offline algorithm depends on knowing the conflict graph which evolves while the execution of the transactions progresses. The second algorithm Online-Greedy produces a schedule of length that is only a logarithmic factor worse than Offline-Greedy, but does not require knowledge of the conflict graph. The third algorithm Adaptive-Greedy is the adaptive version of the previous algorithms which produces a schedule of length asymptotically the same as with online algorithm by adaptively guessing the value of C. All of the algorithms exhibit competitive ratio very close to O(s), where s is the number of shared resources, and at the same time, our algorithms provide new non-trivial tradeoffs for greedy transaction scheduling that parameterize window sizes and transaction conflicts within the execution window. We evaluate these window-based algorithms experimentally using the sorted link list, red-black tree, skip list, and vacation benchmarks. The evaluation results confirm their benefits in practical performance throughput and other metrics such as aborts per commit ratio and execution time overhead, along with the non-trivial provable properties of the algorithms. 相似文献

19.

一个基于JSON的对象序列化算法 总被引：4，自引：0，他引：4

下载免费PDF全文

张涛黄强毛磊雅高兴《计算机工程与应用》2007,43(15):98-100

目前基于Ajax技术的Web开发主要采用XML进行数据交换,然而XML是一种结构化的文档,需要服务器和客户端都对其进行手工解析,将会占用更多的系统资源,因此采用XML进行数据交换会导致性能低下、兼容性不够、灵敏度低的问题。JSON（JavaScript Object Notation）是一种轻量级的数据交换格式,易于被支持JavaScript的浏览器所解析。提出了一种基于JSON的对象序列化算法,该算法通过分析JSON文法并建立对象导航图,透明地将Java对象序列化成JSON表达式,使客户端能够很好地利用JavaScript引擎来解析JSON响应,有效地解决了解析XML所造成的缺陷。相似文献

20.

化学虚拟实验中序列化技术的应用与研究 总被引：1，自引：0，他引：1

鲁云灿胡珊杨春谭良《计算机与应用化学》2008,25(1):111-114

虚拟实验室是教育领域的一个研究热点,其中实验的存储是系统一个十分重要的功能模块。文中以采用VC 和OpenGL技术开发的中学化学实验仿真系统(CVExperiment)为例,详细介绍比较复杂的对象的序列化,并根据对象之间的不同关系,提出不同的序列化方法。相似文献