首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We show how a sophisticated, lock-free concurrent stack implementation can be derived from an abstract specification in a series of verifiable steps. The algorithm is based on the scalable stack algorithm of Hendler et al. (Proceedings of the sixteenth annual ACM symposium on parallel algorithms, 27–30 June 2004, Barcelona, Spain, pp 206–215), which allows push and pop operations to be paired off and eliminated without affecting the central stack, thus reducing contention on the stack, and allowing multiple pairs of push and pop operations to be performed in parallel. Our algorithm uses a simpler data structure than Hendler, Shavit and Yerushalmi’s, and avoids an ABA problem. We first derive a simple lock-free stack algorithm using a linked-list implementation, and discuss issues related to memory management and the ABA problem. We then add an abstract model of the elimination process, from which we derive our elimination algorithm. This allows the basic algorithmic ideas to be separated from implementation details, and provides a basis for explaining and comparing different variants of the algorithm. We show that the elimination stack algorithm is linearisable by showing that any execution of the implementation can be transformed into an equivalent execution of an abstract model of a linearisable stack. Each step in the derivation is either a data refinement which preserves the level of atomicity, an operational refinement which may alter the level of atomicity, or a refactoring step which alters the structure of the system resulting from the preceding derivation. We verify our refinements using an extension of Lipton’s reduction method, allowing concurrent and non-concurrent aspects to be considered separately.  相似文献   

2.
The current literature offers two extremes of nonblocking software synchronization support for concurrent data structure design: intricate designs of specific structures based on single-location operations such as compare-and-swap (CAS), and general-purpose multilocation transactional memory implementations. While the former are sometimes efficient, they are invariably hard to extend and generalize. The latter are flexible and general, but costly. This paper aims at a middle ground: reasonably efficient multilocation operations that are general enough to reduce the design difficulties of algorithms based on CAS alone. We present an obstruction-free implementation of an atomic k -location-compare single-location-swap (KCSS) operation. KCSS allows for simple nonblocking manipulation of linked data structures by overcoming the key algorithmic difficulty in their design: making sure that while a pointer is being manipulated, neighboring parts of the data structure remain unchanged. Our algorithm is efficient in the common uncontended case: A successful k-location KCSS operation requires only two CAS operations, two stores, and 2k noncached loads when there is no contention. We therefore believe our results lend themselves to efficient and flexible nonblocking manipulation of list-based data structures in today’s architectures. A preliminary version of this paper appeared in the Proceedings of the Fifteenth Annual ACM Symposium on Parallel Algorithms and Architectures, pages 314–323, San Diego, California, USA, 2003.  相似文献   

3.
Summary This paper is concerned with synchornization under read/write atomicity in shared memory multi-processors. We present a new algorithm forN-process mutual exclusion that requires only read and write operations and that hasO(logN) time complexity, where time is measured by counting remote memory references. The time complexity of this algorithm is better than that of all prior solutions to the mutual exclusion problem that are based upon atomic read and write instructions; in fact, the time complexity of most prior solutions is unbounded. Performance studies are presented that show that our mutual exclusion algorithm exhibits scalable performance under heavy contention. In fact, its performance rivals that of the fastest queue-based spin locks based on strong primitives such as compare-and-swap and fetch-and-add. We also present a modified version of our algorithm that generates onlyO(1) memory references in the absence of contention. Jae-Heon Yang received the B.S. and M. S. degrees in Computer Engineering from Seoul National University in 1985 and 1987, respectively, and the Ph.D. degree in Computer Science from the University of Maryland at College Park in 1994. Since June 1994, he has been an Assistant Professor of Computer Science at Mills College in Oakland, California. From 1987 to 1989, he was a junior researcher at the Korea Telecommunication Authority Research Center. His research interests include distributed computing and operating systems. James H. Anderson received the M. S. degree in Computer Science from Michigan State University in 1982, the M.S. degree in Computer Science from Purdue University in 1983, and the Ph.D. degree in Computer Sciences from the University of Texas at Austin in 1990. Since August 1993, he has been an Assistant Professor of Computer Science at the University of North Carolina at Chapel Hill. Prior to joining the University of North Carolina, he was an Assistant Professor of Computer Science for three years at the University of Maryland at College Park Professor Anderson's main research interests are within the area of coneurrent and distributed computing. His current interests include wait-free algorithms, scalabde synchronization mechanisms for shared-memory systems, and object-sharing strategies for hard real-time applications.Preliminary version was presented at the Twelfth Annual ACM Symposium on Principles of Distributed Computing Ithaca, New York, August 1993 [15]. Work supported, in part, by NSF Contracts CCR-9109497 and CCR-9216421 and by the Center for Excellence in Space Data and Information Sciences (CESDIS)  相似文献   

4.
The compare-and-swap register (CAS) is a synchronization primitive for lock-free algorithms. Most uses of it, however, suffer from the so-called ABA problem. The simplest and most efficient solution to the ABA problem is to include a tag with the memory location such that the tag is incremented with each update of the target location. This solution, however, is theoretically unsound and has limited applicability. This paper presents a general lock-free pattern that is based on the synchronization primitive CAS without causing ABA problem or problems with wrap around. It can be used to provide lock-free functionality for any data type. Our algorithm is a CAS variation of Herlihy’s LL/SC methodology for lock-free transformation. The basis of our techniques is to poll different locations on reading and writing objects in such a way that the consistency of an object can be checked by its location instead of its tag. It consists of simple code that can be easily implemented using C-like languages. A true problem of lock-free algorithms is that they are hard to design correctly, which even holds for apparently straightforward algorithms. We therefore develop a reduction theorem that enables us to reason about the general lock-free algorithm to be designed on a higher level than the synchronization primitives. The reduction theorem is based on Lamport’s refinement mappings, and has been verified with the higher-order interactive theorem prover PVS. Using the reduction theorem, fewer invariants are required and some invariants are easier to discover and formulate without considering the internal structure of the final implementation.  相似文献   

5.
First-in-first-out (FIFO) queues are among the most fundamental and highly studied concurrent data structures. The most effective and practical dynamic-memory concurrent queue implementation in the literature is the lock-free FIFO queue algorithm of Michael and Scott, included in the standard Java TM Concurrency Package. This work presents a new dynamic-memory concurrent lock-free FIFO queue algorithm that in a variety of circumstances performs better than the Michael and Scott queue. The key idea behind our new algorithm is a novel way of replacing the singly-linked list of Michael and Scott, whose pointers are inserted using a costly compare-and-swap (CAS) operation, by an “optimistic” doubly - linked list whose pointers are updated using a simple store, yet can be “fixed” if a bad ordering of events causes them to be inconsistent. We believe it is the first example of such an “optimistic” approach being applied to a real world data structure. A preliminary version of this paper appeared in the proceedings of the 18th International Conference on Distributed Computing (DISC) 2004, pages 117–131.  相似文献   

6.
在分析了传统栈的基础上,提出了一种多个栈共享内存空间的通用算法,以提高内存空间利用率。  相似文献   

7.
Models of forest ecosystems are needed to understand how climate and land-use change can impact biodiversity. In this paper we describe an ecological dispersal model developed for the specific case of predicting seed dispersal by trees on a landscape for use in a forest simulation model. We present efficient approximation algorithms for computing seed dispersal. These algorithms allow us to simulate large landscapes for long periods of time. We also present experimental results that (1) quantify the inherent uncertainty in the dispersal model and (2) describe the variation of the approximation error as a function of the approximation parameters. Based on these experiments, we provide guidelines for choosing the right approximation parameters, for a given model simulation.  相似文献   

8.
Jason Gait 《Software》1986,16(3):225-233
This paper reports on an experimental study of the probe effect, defined as an alteration in the frequency of run-time computational errors observed when delays are introduced into concurrent programs. If the concurrent program being studied has no synchronization errors, then there is no probe effect. In the presence of synchronization errors, the frequency of observable output errors for a sample experimental program starts at a high value for small delays, oscillates rapidly as the delay is increased, and apparently settles at zero errors for larger values of delay. Thus, for sufficiently large delays, the probe effect can almost completely mask synchronization errors in concurrent programs. For sufficiently large concurrent process sets, even small values of embedded delay may mask synchronization errors, provided side effects in shared memory are not included in the observation.  相似文献   

9.
David H. Hartley 《Software》1992,22(2):101-110
Random access to local variables stored in a stack frame is more difficult for compiled functions when the target processor lacks register-plus-offset addressing. One alternative technique employs a roving pointer which the program increments or decrements as needed between stack accesses. Processors which support auto-increment and auto-decrement addressing modes are often capable of performing these increments for free when consecutive accesses are to adjacent stack locations. For these processors, the compiler's chosen ordering for the local variables in the stack frame can substantially affect the execution speed of the compiled program. This paper is concerned with finding an ordering for the local variables in the frame that maximizes the likelihood that two consecutive references at run-time will be to the same or to adjacent stack locations. We have formulated a solution to this problem in terms of finding a Hamiltonian path in a graph. Although this problem is NP-complete in general, we have developed a heuristic algorithm that delivers good results with acceptable performance.  相似文献   

10.
为了提高嵌入式系统设计的灵活性,适应未来嵌入式系统发展的方向,提出了一个基于Xilinx FPGA利用片上可编程系统(system on a programmable chip,SOPC)技术实现以太网数据传输的方法.对整个系统的硬件结构,Xilinx SOPC设计所使用的工具、流程和采用的关键技术进行了介绍.在此基础上应用Xilinx的嵌入式开发套件,以Xilinx公司提供的知识产权(intellectual property,IP)核为基础,搭建了一个SOPC平台.将uIP协议栈应用于软件设计中,实现了驱动程序和应用程序的开发.与上位机进行通信测试的结果表明了该设计的可行性和有效性.  相似文献   

11.
We present an algorithm for detecting periodicity in sequences produced by repeated application of a given function. Our algorithm uses logarithmic memory with high probability, runs in linear time, and is guaranteed to stop within the second loop through the cycle. We also present a partitioning technique that offers a time/memory tradeoff. Our algorithm is especially well suited for sequences where the cycle length is typically small compared to the length of the acyclic prefix.  相似文献   

12.
We consider a language of operations which pass parameters by means of a stack. An algebra over the set of type signatures is introduced, which allows the type signature of a program to be obtained from the type signatures of its constituent operations.Although the theories apply in principle to any stack based language, they have been evolved with particular regard to the proposed ANSI Standard Forth language, which is currently implemented in a type free manner. We hope this work will stimulate an interest in Forth amongst those applying algebraic techniques in software engineering, and we hope to lay the theoretical foundations for implementing practical type checkers to support Forth.  相似文献   

13.
In a distributed system based on Transputer components there is one clock for each processing element, and the definition of the global system time requires the choice of a hardware or software synchronization method. This paper describes the RING_SYNC algorithm, based on a ring-structured synchronization scheme. RING_SYNC has no provision for fault tolerance, but it introduces little overhead, thanks to the optimization of both the number of messages exchanged at sync time and the resynchronization frequency. The implementation of the algorithm together with the tests performed for measuring the synchronization error and their results are discussed extensively, and some typical applications are pointed out.  相似文献   

14.
Stack filters are operators that commute with the thresholding operation, i.e., thresholding a signal, applying the binary filter on each thresholded binary signals, and then summing up (stacking) the results yields the same result as applying the multi-level (gray-scale) filter on the original signal. Several approaches for designing optimal stack filters from training data, where optimality is characterized in terms of costs based on input-output joint observations, have been proposed. This work considers stack filter design from training data under a general statistical framework developed in the context of morphological image operator design. This framework (1) provides a common point of view for distinct design approaches, being useful for comparative analysis or for emphasizing differences, (2) clearly answers the issue of why binary signals from different threshold levels, although following distinct distributions, can be pooled together in the cost estimation process, and (3) helps to show that several stack filter design approaches based on lattice diagrams search methods share a common underlying formulation.  相似文献   

15.
The problem of maximal clique enumeration (MCE) is to enumerate all of the maximal cliques in a graph. Once enumerated, maximal cliques are widely used to solve problems in areas such as 3-D protein structure alignment, genome mapping, gene expression analysis, and detection of social hierarchies. Even the most efficient serial MCE algorithms require large amounts of time to enumerate the maximal cliques in networks arising from these problems that contain hundreds, thousands, or larger numbers of vertices. The previous attempts to provide practical solutions to the MCE problem through parallel implementation have had limited success, largely due to a number of challenges inherent to the nature of the MCE combinatorial search space. On the one hand, MCE algorithms often create a backtracking search tree that has a highly irregular and hard-or-impossible to predict structure; therefore, almost any static decomposition of the search tree by parallel processors results in highly unbalanced processor execution times. On the other hand, the data-intensive nature of the MCE problem often makes naive dynamic load distribution strategies that require extensive data movement prohibitively expensive. As a result, good scaling of the overall execution time of parallel MCE algorithms has been reported for only up to a couple hundred processors. In this paper, we propose a parallel, scalable, and memory-efficient MCE algorithm for distributed and/or shared memory high performance computing architectures, whose runtime scales linearly for thousands of processors on real-world application graphs with hundreds and thousands of nodes. Its scalability and efficiency are attributed to the proposed: (a) representation of the search tree decomposition to enable parallelization; (b) parallel depth-first backtracking search to both constrain the search space and minimize memory requirement; (c) least stringent synchronization to minimize data movement; and (d) on-demand work stealing intelligently coupled with work stack splitting to minimize computing elements’ idle time. To the best of our knowledge, the proposed parallel MCE algorithm is the first to achieve a linear scaling runtime using up to 2048 processors on Cray XT machines for a number of real-world biological networks.  相似文献   

16.

Context

Different method calls may have different contributions to the precision of the final application when abstracted into the call strings. The existing call string based pointer analysis algorithms do not consider such contribution difference and hence may not achieve best cost-effectiveness.

Objective

To be more cost-effective, we try to leverage the contribution information of each method call in call string based pointer analysis.

Method

The paper firstly proposes a contribution-based call stack abstraction method which abstracts the call stacks into call strings with the contribution information under consideration. Then, we apply the new call stack abstraction method to the pointer analysis of AspectJ programs and propose a concern-sensitive points-to analysis method. Besides, the new abstraction method is also applied to multi-threaded Java programs and results in a thread-sensitive pointer analysis method.

Results

The experimental results show that the two pointer analysis methods with contribution-based call stack abstraction can be more cost-effective than the ordinary call string based approaches for an application that detects harmful advices and an application that detects inter-thread data flow.

Conclusion

These pointer analysis methods more concretely and more clearly show that the contribution-based call stack abstraction can lead to better cost-effectiveness for the given applications.  相似文献   

17.
A fast scalable algorithm for discontinuous optical flow estimation   总被引:4,自引:0,他引:4  
Multiple moving objects, partially occluded objects, or even a single object moving against the background gives rise to discontinuities in the optical flow field in corresponding image sequences. While uniform global regularization based moderately fast techniques cannot provide accurate estimates of the discontinuous flow field, statistical optimization based accurate techniques suffer from excessive solution time. A `weighted anisotropic' smoothness based numerically robust algorithm is proposed that can generate discontinuous optical flow field with high speed and linear computational complexity. Weighted sum of the first-order spatial derivatives of the flow field is used for regularization. Less regularization is performed where strong gradient information is available. The flow field at any point is interpolated more from those at neighboring points along the weaker intensity gradient component. Such intensity gradient weighted regularization leads to Euler-Lagrange equations with strong anisotropies coupled with discontinuities in their coefficients. A robust multilevel iterative technique, that recursively generates coarse-level problems based on intensity gradient weighted smoothing weights, is employed to estimate discontinuous optical flow field. Experimental results are presented to demonstrate the efficacy of the proposed technique  相似文献   

18.
The advances in nanometer technology and integrated circuit technology enable the graphics card to attach individual memory and one or more processing units, named GPU, in which most of the graphing instructions can be processed in parallel. Obviously, the computation resource can be used to improve the execution efficiency of not only graphing applications but other time consuming applications like data mining. The Clustering Affinity Search Technique is a famous clustering algorithm, which is widely used in clustering the biological data. In this paper, we will propose an algorithm that can utilize the GPU and the individual memory of graphics card to accelerate the execution. The experimental results show that our proposed algorithm can deliver excellent performance in terms of execution time and is scalable to very large databases.  相似文献   

19.
A scalable, incremental learning algorithm for classification problems   总被引:5,自引:0,他引:5  
In this paper a novel data mining algorithm, Clustering and Classification Algorithm-Supervised (CCA-S), is introduced. CCA-S enables the scalable, incremental learning of a non-hierarchical cluster structure from training data. This cluster structure serves as a function to map the attribute values of new data to the target class of these data, that is, classify new data. CCA-S utilizes both the distance and the target class of training data points to derive the cluster structure. In this paper, we first present problems with many existing data mining algorithms for classification problems, such as decision trees, artificial neural networks, in scalable and incremental learning. We then describe CCA-S and discuss its advantages in scalable, incremental learning. The testing results of applying CCA-S to several common data sets for classification problems are presented. The testing results show that the classification performance of CCA-S is comparable to the other data mining algorithms such as decision trees, artificial neural networks and discriminant analysis.  相似文献   

20.
We develop an optimal algorithm for the numerical solution of semi-coercive variational inequalities by combining dual-primal FETI algorithms with recent results for bound and equality constrained quadratic programming problems. The discretized version of the model problem, obtained by using the FETI-DP methodology, is reduced by the duality theory of convex optimization to a quadratic programming problem with bound and equality constraints, which is solved by a new algorithm with a known rate of convergence given in terms of the spectral condition number of the quadratic problem. We present convergence bounds that guarantee the scalability of the algorithm. These results are confirmed by numerical experiments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号