首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 625 毫秒
1.
For in-order processors, the stack distance theory is a well-known means to fast model LRU-cache behaviors . However, it cannot be applied directly on out-of-order processors due to the changing of stack distance histograms by mechanisms such as reordering executions, speculative loads, load-in-store operations and non-blocking issues.This paper proposes an Artificial Neural Network (ANN) model to fast forecast private LRU-cache behaviors on out-of-order processors. To verify our model in real commercial applications, the evaluation scenarios chosen in this paper, not only include traditional embedded benchmark suits, such as Mibench 1.0 and Mediabench II, but also embrace Android applications from Mobybench 2.0 benchmark suit as well.Compared with results from Gem5 simulations, the average root mean square error of our ANN model is less than 6% with the prediction speed increasing about 2.5× –3×.  相似文献   

2.
Shared pools and stacks are two coordination structures with a history of applications ranging from simple producer/consumer buffers to job-schedulers and procedure stacks. This paper introduces elimination trees, a novel form of diffracting trees that offer pool and stack implementations with superior response (on average constant) under high loads, while guaranteeing logarithmic time ``deterministic' termination under sparse request patterns. Received February 28, 1996, and in final form January 23, 1997.  相似文献   

3.
The computer industry is currently examining the use of strong synchronization operations such as double compare-and-swap (DCAS) as a means of supporting non-blocking synchronization on future multiprocessor machines. However, before such a strong primitive will be incorporated into hardware design, its utility needs to be proven by developing a body of effective non-blocking data structures using DCAS. As part of this effort, we present two new linearizable non-blocking implementations of concurrent deques using the DCAS operation. The first uses an array representation, and improves on previous algorithms by allowing uninterrupted concurrent access to both ends of the deque while correctly handling the difficult boundary cases when the deque is empty or full. The second uses a linked-list representation, and is the first non-blocking, dynamically-sized deque implementation. It too allows uninterrupted concurrent access to both ends of the deque. We have proved these algorithms correct with the aid of a mechanical theorem prover; we describe these proofs in the paper.  相似文献   

4.
Shared counters are among the most basic coordination structures in distributed computing. Known implementations of shared counters are either blocking, non-linearizable, or have a sequential bottleneck. We present the first counter algorithm that is both linearizable, non-blocking, and can provably achieve high throughput in k-synchronous executions—executions in which process speeds vary by at most a constant factor k. The algorithm is based on a novel variation of the software combining paradigm that we call bounded-wait combining (BWC). It can thus be used to obtain implementations, possessing the same properties, of any object that supports combinable operations, such as a stack or a queue. Unlike previous combining algorithms where processes may have to wait for each other indefinitely, in the BWC algorithm, a process only waits for other processes for a bounded period of time and then “takes destiny in its own hands”. In order to reason rigorously about the parallelism attainable by our algorithm, we define a novel metric for measuring the throughput of shared objects, which we believe is interesting in its own right. We use this metric to prove that our algorithm achieves throughput of Ω(N/ log N) in k-synchronous executions, where N is the number of processes that can participate in the algorithm. Our algorithm uses two tools that we believe may prove useful for obtaining highly parallel non-blocking implementation of additional objects. The first are “synchronous locks”, locks that are respected by processes only in k-synchronous executions and are disregarded otherwise; the second are “pseduo-transactions”—a weakening of regular transactions that allows higher parallelism. A preliminary version of this paper appeared in [11]. D. Hendler is supported in part by a grant from the Israel Science Foundation. S. Kutten is supported in part by a grant from the Israel Science Foundation.  相似文献   

5.
Analytic resolution is a proof procedure for predicate calculus based on the ideas of semantic trees and analytic tableaux. It is related to the unit preference with set-of-support strategy, and incorporates some features of model elimination. The philosophy is to expect and compensate for “blind alleys” by a stack discipline. This eliminates pollution of the search space by a bad choice of the next step in a proof. Experimental results included compare favourably with others from the literature.  相似文献   

6.
We identify a class of feedforward nonlinear systems that are linearizable by a coordinate change. Then we develop explicit expressions for the Lyapunov-based integrator forwarding recursive procedure of Sepulchre, Jankovic, and Kokotovic, which has its roots in a coordinate transformation proposed by Mazenc and Praly. The explicit expressions that we develop allow us to also find closed-form control laws for several classes of systems that are not feedback linearizable, including some that are in the feedforward form and others that are in what we refer to as the "block-feedforward" form. Performance advantages of Lyapunov-based forwarding controllers over nested saturation controllers have been well illustrated in the literature on examples. The analytical expressions for the Lyapunov functions and the control laws allow us to give quantitative performance bounds.  相似文献   

7.
Summary.  The counting problem requires n asynchronous processes to assign themselves successive values. A solution is linearizable if the order of the values assigned reflects the real-time order in which they were requested. Linearizable counting lies at the heart of concurrent time-stamp generation, as well as concurrent implementations of shared counters, FIFO buffers, and similar data structures. We consider solutions to the linearizable counting problem in a multiprocessor architecture in which processes communicate by applying read-modify-write operations to a shared memory. Linearizable counting algorithms can be judged by three criteria: the memory contention produced, whether processes are required to wait for one another, and how long it takes a process to choose a value (the latency). A solution is ideal if it has low contention, low latency, and it eschews waiting. The conventional software solution, where processes synchronize at a single variable, avoids waiting and has low latency, but has high contention. In this paper we give two new constructions based on counting networks, one with low latency and low contention, but that requires processes to wait for one another, and one with low contention and no waiting, but that has high latency. Finally, we prove that these trade-offs are inescapable: an ideal linearizable counting algorithm is impossible. Since ideal non-linearizable counting algorithms exist, these results establish a substantial complexity gap between linearizable and non-linearizable counting. Received: November 1991 / Accepted: July 1995  相似文献   

8.
《Computer Networks》2008,52(2):303-326
This paper proposes a novel Internet Protocol (IP) packet forwarding architecture for IP routers. This architecture is comprised of a non-blocking Multizone Pipelined Cache (MPC) and of a hardware-supported IP routing lookup method. The paper also describes a method for expansion-free software lookups. The MPC achieves lower miss rates than those reported in the literature. The MPC uses a two-stage pipeline for a half-prefix/half-full address IP cache that results in lower activity than conventional caches. MPC’s updating technique allows the IP routing lookup mechanism to freely decide when and how to issue update requests. The effective miss penalty of the MPC is reduced by using a small non-blocking buffer. This design caches prefixes but requires significantly less expansion of the routing table than conventional prefix caches. The hardware-based IP lookup mechanism uses a Ternary Content Addressable Memory (TCAM) with a novel Hardware-based Longest Prefix Matching (HLPM) method. HLPM has lower signaling activity in order to process short matching prefixes as compared to alternative designs. HLPM has a simple solution to determine the longest matching prefix and requires a single write for table updates.  相似文献   

9.
This article describes an algorithm for incremental parsing of expressions in the context of syntax-directed editors for programming languages. Since a syntax-directed editor represents programs as trees and statements and expressions as nodes in trees, making minor modifications in an expression can be difficult. Consider, for example, changing a “ + ” operator to a “1” operator or adding a short subexpression at a syntactically but not structurally correct position, such as inserting “) 1 (d“ at the # mark in” (a + b # + c)”. To make these changes in a typical syntax-directed editor, the user must understand the tree structure and type a number of tree-oriented construction and manipulation commands. This article describes an algorithm that allows the user to think in terms of the syntax of the expression as it is displayed on the screen (in infix notation) rather than in terms of its internal representation (which is effectively prefix), while maintaining the benefits of syntax-directed editing. This algorithm is significantly different from other incremental parsing algorithms in that it does not involve modifications to a traditional parsing algorithm or the overhead of maintaining a parser stack or any data structure other than the syntax tree. Instead, the algorithm applies tree transformations, in real-time as each token is inserted or deleted, to maintain a correct syntax tree.  相似文献   

10.
Recursive probability trees (RPTs) are a data structure for representing several types of potentials involved in probabilistic graphical models. The RPT structure improves the modeling capabilities of previous structures (like probability trees or conditional probability tables). These capabilities can be exploited to gain savings in memory space and/or computation time during inference. This paper describes the modeling capabilities of RPTs as well as how the basic operations required for making inference on Bayesian networks operate on them. The performance of the inference process with RPTs is examined with some experiments using the variable elimination algorithm.  相似文献   

11.
The main result of the paper states that almost any analytic single-input control system, which is truly nonlinear, that is not feedback linearizable, with controllable linearization at an equilibrium point, does not admit any symmetry preserving that point. By almost any system, we mean that we exclude a small class of odd systems, that admit just one nontrivial symmetry conjugated to minus identity. The obtained results are based on a recent classification of nonlinear single-input systems under formal feedback. We also describe symmetries of feedback linearizable systems.  相似文献   

12.
The possibility that certain nonlinear systems can be linearized by the nonlinear feedback group has recently attracted a great deal of attention in the literature. At least for mechanical systems, the apparent ubiquity of linearizable systems is often accounted for by the claim that scalar control systems defined on R2 are generically linearizable. Since there are several possible interpretations of the term ‘generic’ and since, until now, no rigorous proof of genericity for any use of the term has been offered, I thought it worthwhile to investigate in what sense, if any, this ‘folklore theorem’ is valid. Contrary to the prevailing intuition, linearizable systems fail to be dense in any topology for which evaluation of vector fields at points is a continous operation. This includes, for example, both the weak and strong C-topologies.What, then, are the global properties of the class of linearizable systems? In this note, it is also shown that this class has no interior points in any reasonable, complete topology making feedback operations continuous. In particular, linearizable systems in R2 are far from being structurally stable in the weak C-topology. In the other hand, one of the main results presented here is that globally linearizable systems are open in the strong, or Whitney, topology. Thus, globally linearizable planar systems are structurally stable. If the well-known local conditions actually implied global linearizability, this result would be trivial. Recent work of Boothby shows, however, that global linearization is far more subtle. For this reason, in the course of the proof it is necessary to modify the existing global linearization criteria. In particular, a ‘nonvanishing’ criterion for irjectivity of smooth maps on R″, sharpening the so-called ‘ratio condition’, is presented. This result appears to be of independent interest and leads to improvements of the global linearization condition in R″, due to Hunt, Su and Meyer, and of the global non-linear observability conditions in R″, due to Kou, Elliott and Tarn.  相似文献   

13.
《Theoretical computer science》2003,290(2):1127-1148
There are two approaches to reduce the overhead associated with coordinated checkpointing: first is to minimize the number of synchronization messages and the number of checkpoints; the other is to make the checkpointing process non-blocking. In our previous work (IEEE Parallel Distributed Systems 9 (12) (1998) 1213), we proved that there does not exist a non-blocking algorithm which forces only a minimum number of processes to take their checkpoints. In this paper, we present a min-process algorithm which relaxes the non-blocking condition while tries to minimize the blocking time, and a non-blocking algorithm which relaxes the min-process condition while minimizing the number of checkpoints saved on the stable storage. The proposed non-blocking algorithm is based on the concept of “mutable checkpoint”, which is neither a tentative checkpoint nor a permanent checkpoint. Based on mutable checkpoints, our non-blocking algorithm avoids the avalanche effect and forces only a minimum number of processes to take their checkpoints on the stable storage.  相似文献   

14.
In this paper, the problem of supervisory control of discrete-event systems (DES) with output is presented and discussed at length. In such systems, causal output maps are employed to assign to each sequence of input events a corresponding sequence of output events. When the specification of desired behaviour is given by a formal language over the output alphabet, necessary and sufficient conditions are derived for the existence of non-blocking input as well as non-blocking output supervisory control. After making minor adjustments the theory is applied to non-deterministic discrete-event abstractions of hybrid systems, giving rise to the development of a theory for non-blocking supervisory control of hybrid systems. Our results enable one to apply classical supervisory control theory to design supervisors for DES approximations of hybrid systems, and to import many interesting concepts from classical theory such as modular and hierarchical control.  相似文献   

15.
Data prefetching is a useful approach for reduction of memory access stalls in many scientific applications. However, it suffers from cache pollution severly in some applications. In this paper, we study the effectiveness of combining data prefetching with non-blocking loads on cache pollution and explain why it shows good result in our simulation.  相似文献   

16.
Fast similarity retrieval is critical for content-based image retrieval systems. Tree indexing is a classical technique for fast retrieval, but the practical performance increase offered by the indexing tree depends on the intrinsic dimension of the data. Data with a low intrinsic dimension can be indexed more efficiently than data with high intrinsic dimension. This suggests that an indexing tree that is adapted to the data distribution may be more efficient. This paper proposes two adaptation procedures that are guaranteed to improve indexing efficiency. The procedures are based on a formula for average number of node tests incurred during the retrieval. The formula clearly shows how indexing performance varies with the distribution of feature points and the query. Greedy and optimal tree adaptation procedures are derived based on the formula. Both procedures explicitly enhance the retrieval performance of indexing trees. The optimally adapted tree carries the mathematical guarantee that it is the best performing tree in a set of possible trees obtained by node elimination. The adaptation procedures are applied to kdb-trees and hierarchical clustering trees for indexing synthetic as well as real data sets in medical image databases. Experimental results validate the claim that adaptation procedures increase retrieval efficiency.  相似文献   

17.
Given a graph, finding an optimal vertex ranking and constructing a minimum height elimination tree are two related problems. However, an optimal vertex ranking does not by itself provide enough information to construct an elimination tree of minimum height. On the other hand, an optimal vertex ranking can readily be found directly from an elimination tree of minimum height. On n-vertex trees, the optimal vertex ranking problem already has a linear-time algorithm in the literature. However, there is no linear-time algorithm for the problem of finding a minimum height elimination tree. A naive algorithm for this problem requires O(nlogn) time. In this paper, we propose a linear-time algorithm for constructing a minimum height elimination tree of a tree.  相似文献   

18.
Separation logic allows simple proofs of concurrent algorithms which use blocking mechanisms such as semaphores. It can even deal with non-blocking algorithms. With the addition of mechanisms borrowed from rely-guarantee, we can make reasonably simple proofs of some simple non-blocking algorithms. We show that it extends to proofs of some intricate algorithms, including Simpson’s famous asynchronous four-slot buffer and Harris’s novel three-slot algorithm, in a manner that is arguably simpler than earlier treatments, though we cannot claim that we have yet found proofs that are as simple as we would wish. Our example proofs show functional correctness but do not deal with questions of liveness.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号