首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We establish a lower bound of remote memory references for N-process mutual exclusion algorithms based on reads, writes, or comparison primitives such as test-and-set and compare-and-swap. Our bound improves an earlier lower bound of established by Cypher. Our lower bound is of importance for two reasons. First, it almost matches the time complexity of the best known algorithms based on reads, writes, or comparison primitives. Second, our lower bound suggests that it is likely that, from an asymptotic standpoint, comparison primitives are no better than reads and writes when implementing local-spin mutual exclusion algorithms. Thus, comparison primitives may not be the best choice to provide in hardware if one is interested in scalable synchronization. Received: January 2002 / Accepted: September 2002 RID="*" ID="*" Work supported by NSF grants CCR 9732916, CCR 9972211, CCR 9988327, ITR 0082866, and CCR 0208289.  相似文献   

2.
We consider the time complexity of adaptive mutual exclusion algorithms, where “time” is measured by counting the number of remote memory references required per critical-section access. For systems that support (only) read, write, and comparison primitives (such as compare-and-swap), we establish a lower bound that precludes a deterministic algorithm with o(k) time complexity, where k is point contention. In particular, it is impossible to construct a deterministic O(log k) algorithm based on such primitives.  相似文献   

3.
A mutual exclusion mechanism that is both fair and space efficient can be highly valuable for shared memory systems under time and memory constraints such as embedded real-time systems. Several algorithms that utilize only one shared variable and guarantee a certain level of fairness have been proposed. However, these use hypothetical read-modify-write operations that have never been implemented in any system. This paper presents two fair algorithms that do not use such operations, each of which uses a single additional shared variable. The proposed algorithms employ commonly available operations, fetch&store and read/write, on two shared variables. The first algorithm satisfies the bounded-bypass condition. The second is an improvement on the first that satisfies the FIFO condition, which is the most stringent fairness condition. Additionally, it is shown that achieving the bounded-bypass condition using the same set of operations requires two shared variables. Both of the algorithms are thus optimal with respect to the number of shared variables.  相似文献   

4.
Mutual exclusion is a fundamental distributed coordination problem. Shared-memory mutual exclusion research focuses on local-spin algorithms and uses the remote memory references (RMRs) metric. Attiya, Hendler, and Woelfel (40th STOC, 2008) established an Ω(log N) lower bound on the number of RMRs incurred by processes as they enter and exit the critical section, where N is the number of processes in the system. This matches the upper bound of Yang and Anderson (Distrib. Comput. 9(1):51–60, 1995). The upper and lower bounds apply for algorithms that only use read and write operations. The lower bound of Attiya et al., however, only holds for deterministic algorithms. The question of whether randomized mutual exclusion algorithms, using reads and writes only, can achieve sub-logarithmic expected RMR complexity remained open. We answer this question in the affirmative by presenting starvation-free randomized mutual exclusion algorithms for the cache coherent (CC) and the distributed shared memory (DSM) model that have sub-logarithmic expected RMR complexity against the strong adversary. More specifically, each process incurs an expected number of O(log N / log log N) RMRs per passage through the entry and exit sections, while in the worst case the number of RMRs is O(log N).  相似文献   

5.
Most weak memory consistency models are incapable of supporting a solution to mutual exclusion using only read and write operations to shared variables. Processor Consistency-Goodman's version (PC-G) is an exception. Ahamad et al. showed that Peterson's mutual exclusion algorithm is correct for PC-G, but Lamport's bakery algorithm is not. This paper derives a lower bound on the number of and type of (single or multiwriter) variables that a mutual exclusion algorithm must use in order to be correct for PC-G. Specifically, any such solution for n processes must use at least one multiwriter variable and n single-writer variables. Peterson's algorithm for two processes uses one multiwriter and two single-writer variables, and therefore establishes that this bound is tight for two processes. This paper presents a new n{hbox{-}}{rm{process}} algorithm for mutual exclusion that is correct for PC-G and achieves the bound for any n. While Peterson's algorithm is fair, this extension to arbitrary n is not fair. Six known algorithms that use the same number and type of variables are shown to fail to guarantee mutual exclusion when the memory consistency model is only PC-G, as opposed to the Sequential Consistency model for which they were designed. A corollary of our investigation is that, in contrast to Sequential Consistency, multiwriter variables cannot be implemented from single-writer variables in a PC-G system.  相似文献   

6.
First-Come-First-Served (FCFS) mutual exclusion (ME) is the problem of ensuring that processes attempting to concurrently access a shared resource do so one by one, in a fair order. In this paper, we close the complexity gap between FCFS ME and ME in the asynchronous shared memory model where processes communicate using atomic reads and writes only, and do not fail. Our main result is the first known FCFS ME algorithm that makes O(log N) remote memory references (RMRs) per passage and uses only atomic reads and writes. Our algorithm is also adaptive to point contention. More precisely, the number of RMRs a process makes per passage in our algorithm is Θ(min(k, log N)), where k is the point contention. Our algorithm matches known RMR complexity lower bounds for the class of ME algorithms that use reads and writes only, and beats the RMR complexity of prior algorithms in this class that have the FCFS property.  相似文献   

7.
We consider asynchronous multiprocessors where processes communicate only by reading or writing shared memory. We show how to implement consensus, compare-and-swap and other comparison primitives, as well as load-linked/store-conditional (LL/SC) using only a constant number of remote memory references (RMRs), in both the cache-coherent and the distributed-shared-memory models of such multiprocessors. Our implementations are blocking, rather than wait-free: they ensure progress provided all processes that invoke the implemented primitive are live. Our results imply that any algorithm using read and write operations, and either comparison primitives or LL/SC, can be simulated by an algorithm that uses read and write operations only, with at most a constant-factor increase in RMR complexity.  相似文献   

8.
In the problem of mutual exclusion, concurrent access to a shared resource using a structural program abstraction called acritical section(CS) must be synchronized such that at any time only one process can enter the CS. In a distributed system, due to the lack of both a shared memory and a global clock, and due to unpredictable message delay, the design of a distributed mutual exclusion algorithm that is free from deadlock and starvation is much more complex than that in a centralized system. Based on different assumptions about communication topologies and a widely varying amount of information maintained by each site about other sites, several distributed mutual exclusion algorithms have been proposed. In this paper, we suvrey and analyze several well-known distributed mutual exclusion algorithms according to their related characteristics. We also compare the performance of these algorithms by a simulation study. Finally, we present a comparative analysis of these algorithms.  相似文献   

9.
Irregular parallel algorithms pose a significant challenge for achieving high performance because of the difficulty predicting memory access patterns or execution paths. Within an irregular application, fine-grained synchronization is one technique for managing the coordination of work; but in practice the actual performance for irregular problems depends on the input, the access pattern to shared data structures, the relative speed of processors, and the hardware support of synchronization primitives. In this paper, we focus on lock-free and mutual exclusion protocols for handling fine-grained synchronization. Mutual exclusion and lock-free protocols have received a fair amount of attention in coordinating accesses to shared data structures from concurrent processes. Mutual exclusion offers a simple programming abstraction, while lock-free data structures provide better fault tolerance and eliminate problems associated with critical sections such as priority inversion and deadlock. These synchronization protocols, however, are seldom used in parallel algorithm designs, especially for algorithms under the SPMD paradigm, as their implementations are highly hardware dependent and their costs are hard to characterize. Using graph-theoretic algorithms for illustrative purposes, we show experimental results on two shared-memory multiprocessors, the IBM pSeries 570 and the Sun Enterprise 4500, that irregular parallel algorithms with efficient fine-grained synchronization may yield good performance.  相似文献   

10.
In this paper we consider the mutual exclusion problem on a multiple access channel. Mutual exclusion is one of the fundamental problems in distributed computing. In the classic version of this problem, n processes execute a concurrent program that occasionally triggers some of them to use shared resources, such as memory, communication channel, device, etc. The goal is to design a distributed algorithm to control entries and exits to/from the shared resource (also called a critical section), in such a way that at any time, there is at most one process accessing it. In our considerations, the shared resource is the shared communication channel itself (multiple access channel), and the main challenge arises because the channel is also the only mean of communication between these processes. We consider both the classic and a slightly weaker version of mutual exclusion, called \(\varepsilon \)-mutual-exclusion, where for each period of a process staying in the critical section the probability that there is some other process in the critical section is at most \(\varepsilon \). We show that there are channel settings, where the classic mutual exclusion is not feasible even for randomized algorithms, while the \(\varepsilon \)-mutual-exclusion is. In more relaxed channel settings, we prove an exponential gap between the makespan complexity of the classic mutual exclusion problem and its weaker \(\varepsilon \)-exclusion version. We also show how to guarantee fairness of mutual exclusion algorithms, i.e., that each process that wants to enter the critical section will eventually succeed.  相似文献   

11.
Summary This paper is concerned with synchornization under read/write atomicity in shared memory multi-processors. We present a new algorithm forN-process mutual exclusion that requires only read and write operations and that hasO(logN) time complexity, where time is measured by counting remote memory references. The time complexity of this algorithm is better than that of all prior solutions to the mutual exclusion problem that are based upon atomic read and write instructions; in fact, the time complexity of most prior solutions is unbounded. Performance studies are presented that show that our mutual exclusion algorithm exhibits scalable performance under heavy contention. In fact, its performance rivals that of the fastest queue-based spin locks based on strong primitives such as compare-and-swap and fetch-and-add. We also present a modified version of our algorithm that generates onlyO(1) memory references in the absence of contention. Jae-Heon Yang received the B.S. and M. S. degrees in Computer Engineering from Seoul National University in 1985 and 1987, respectively, and the Ph.D. degree in Computer Science from the University of Maryland at College Park in 1994. Since June 1994, he has been an Assistant Professor of Computer Science at Mills College in Oakland, California. From 1987 to 1989, he was a junior researcher at the Korea Telecommunication Authority Research Center. His research interests include distributed computing and operating systems. James H. Anderson received the M. S. degree in Computer Science from Michigan State University in 1982, the M.S. degree in Computer Science from Purdue University in 1983, and the Ph.D. degree in Computer Sciences from the University of Texas at Austin in 1990. Since August 1993, he has been an Assistant Professor of Computer Science at the University of North Carolina at Chapel Hill. Prior to joining the University of North Carolina, he was an Assistant Professor of Computer Science for three years at the University of Maryland at College Park Professor Anderson's main research interests are within the area of coneurrent and distributed computing. His current interests include wait-free algorithms, scalabde synchronization mechanisms for shared-memory systems, and object-sharing strategies for hard real-time applications.Preliminary version was presented at the Twelfth Annual ACM Symposium on Principles of Distributed Computing Ithaca, New York, August 1993 [15]. Work supported, in part, by NSF Contracts CCR-9109497 and CCR-9216421 and by the Center for Excellence in Space Data and Information Sciences (CESDIS)  相似文献   

12.
We present an adaptive algorithm for N-process mutual exclusion under read/write atomicity in which all busy waiting is by local spinning. In our algorithm, each process p performs O(k) remote memory references to enter and exit its critical section, where k is the maximum “point contention” experienced by p. The space complexity of our algorithm is Θ(N), which is clearly optimal. Our algorithm is the first mutual exclusion algorithm under read/write atomicity that is adaptive when time complexity is measured by counting remote memory references.A preliminary version of this paper was presented at the 14th International Symposium on Distributed Computing [6].  相似文献   

13.
Unified Parallel C(UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space(PGAS) programming model,which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures.Therefore,UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures,such as multi-core clusters,in a more productive way,accessing remote memory by means of different high-level language constructs,such as assignments to shared variables or collective primitives.However,the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality.This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library,allowing,for example,the use of a specific source and destination thread or defining the amount of data transferred by each particular thread.This library fulfills the demands made by the UPC developers community and implements portable algorithms,independent of the specific UPC compiler/runtime being used.The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies.The results obtained confirm the suitability of the new library to provide easier programming without trading off performance,thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing.  相似文献   

14.
There has been great progress from the traditional allocation algorithms designed for small memories to more modern algorithms exemplified by McKusick's and Karels' allocator (McKusick MK, Karels MJ. Design of a general purpose memory allocator for the 4.3BSD UNIX kernel. In USENIX Conference Proceedings, Berkeley, CA, June 1988). Nonetheless, none of these algorithms have been designed to meet the needs of UNIX kernels supporting commercial data‐processing applications in a shared‐memory multiprocessor environment. On a shared‐memory multiprocessor, memory is a global resource. Therefore, allocator performance depends on synchronization primitives and manipulation of shared data as well as on raw CPU speed. Synchronization primitives and access to shared data depend on system bus interactions. The speed of system buses has not kept pace with that of CPUs, as witnessed by the ever‐larger caches found on recent systems. Thus, the performance of synchronization primitives and of memory allocators that use them have not received the full benefit of increased CPU performance. An earlier paper (McKenney PE, Slingwine J. Efficient kernel memory allocation on shared‐memory multiprocessors. In USENIX Conference Proceedings, Berkeley, CA, February 1993), describes an allocator designed to meet this situation. This article reviews the motivation for and design of the allocator and presents the experience gained during the seven years that the allocator has been in production use. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

15.
High-level parallel programming models supporting dynamic fine-grained threads in a global object space are becoming increasingly popular for expressing irregular applications based on sophisticated adaptive algorithms and pointer-based data structures. However, implementing these multithreaded computations on scalable parallel machines poses significant challenges, particularly with respect to object caching. Object caching techniques must be able to tolerate unresponsive processors and protocol handler occupancy delays. This paper examines whether these challenges can be offset by leveraging responsive general-purpose communication architectural features (such as remote memory access and atomic operations), possibly compensating for the lack of more sophisticated hardware primitives by relying upon increased involvement of the run-time system and the compiler. A detailed performance analysis of four irregular applications, using the Illinois Concert System on the Cray T3D and the SGI Origin 2000, finds that existing software distributed shared memory (DSM) systems are capable of delivering good performance only in the presence of a high level of responsive communication architecture support (specifically, support for remote atomic operations). Recognizing that this situation stems from the synchronous request–reply nature of DSM protocols, we present a composable object caching framework, called view caching, which exploits knowledge of application data access semantics to construct custom protocols that require reduced processor synchronization. View caching protocols are more tolerant to responsiveness and occupancy delays and are able to exploit even lower level responsive communication primitives (such as nonatomic remote memory accesses) for a performance benefit.  相似文献   

16.
Abstract We present an improvement to the Disk Paxos protocol by Gafni and Lamport which utilizes extended functionality and flexibility provided by Active Disks and supports unmediated concurrent data access by an unlimited number of processes. The solution facilitates coordination by an infinite number of clients using finite shared memory. It is based on a collection of read-modify-write objects with faults, that emulate a new, reliable shared memory abstraction called a ranked register. The required read-modify-write objects are readily available in Active Disks and in Object Storage Device controllers, making our solution suitable for state-of-the-art Storage Area Network (SAN) environments. A preliminary version of this work appears in Proceedings of the 21st ACM Symposium on Principles of Distributed Computing (PODC02), August 2002.  相似文献   

17.
Summary.  Concurrent systems in which there is a known upper bound Δ on memory access time are considered. Two prototypical synchronization problems, mutual exclusion and consensus, are studied, and solutions that have constant (i.e. independent of Δ and the total number of processes) time complexity in the absence of contention are presented. For mutual exclusion, in the absence of contention, a process needs only five accesses to the shared memory to enter its critical section, and in the presence of contention, the winning process may need to delay itself for 4 ⋅ Δ time units. For consensus, in absence of contention, a process decides after four accesses to the shared memory, and in the presence of contention, it may need to delay itself for Δ time units. Received: July 1993/Accepted: February 1996  相似文献   

18.
Due to advances in fiber optics and VLSI technology, interconnection networks that allow simultaneous broadcasts are becoming feasible. Distributed shared memory (DSM) implementations on such networks promise high performance even for small applications with small granularity. This paper, after summarizing the architecture of one such implementation called the Simultaneous Multiprocessor Optical Exchange Bus (SOME-Bus), presents simple algorithms for improving the performance of parallel programs running on the SOME-Bus multiprocessor implementing cache-coherent DSM. The algorithms are based on run-time data redistribution via dynamic page migration protocol. They use memory access references together with the information of average channel utilization, average channel waiting time, number of messages in the channel queue or short-term average channel waiting time reported by each node and gathered by hardware monitors to make correct decisions related to the placement of shared data. Simulations with four parallel codes on a 64-processor SOME-Bus show that the algorithms yield significant performance improvements such as reduction in the execution times, number of remote memory accesses, average channel waiting times, average network latencies and increase in average channel utilizations.  相似文献   

19.
We present an efficient and practical lock-free method for semiautomatic (application-guided) memory reclamation based on reference counting, aimed for use with arbitrary lock-free dynamic data structures. The method guarantees the safety of local as well as global references, supports arbitrary memory reuse, uses atomic primitives that are available in modern computer systems, and provides an upper bound on the amount of memory waiting to be reclaimed. To the best of our knowledge, this is the first lock-free method that provides all of these properties. We provide analytical and experimental study of the method. The experiments conducted have shown that the method can also provide significant performance improvements for lock-free algorithms of dynamic data structures that require strong memory management.  相似文献   

20.
Let nn be the number of threads that can compete for a shared resource RR. The mutual exclusion problem involves coordinating these nn concurrent threads in accessing RR in a mutually exclusive way. This paper addresses two basic questions related to the First-Come-First-Served (FCFS) mutual exclusion algorithms that use only read–write operations: one is regarding the lower bound on the shared space requirement and the other is about fairness.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号