首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
We consider policy evaluation algorithms within the context of infinite-horizon dynamic programming problems with discounted cost. We focus on discrete-time dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradient-like algorithm involving least-squares subproblems and a diminishing stepsize, which is based on the -policy iteration method of Bertsekas and Ioffe. The second method is the LSTD() algorithm recently proposed by Boyan, which for =0 coincides with the linear least-squares temporal-difference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(), with probability 1, for every [0, 1].  相似文献   

2.
The postal network is an interconnection network that possesses many desirable properties in networking applications. It includes hypercubes and Fibonacci cubes as its special cases. Basically, the postal network forms a series (with series number ) that is based on the sequence N (n)=N (n–1)+N (n–), where n is the dimension and N (n) represents the number of nodes in an n-dimensional postal network in series . In this paper, we study topological properties of postal networks and relationships between different postal networks. One application of postal networks is also shown in implementing barrier synchronization using a special spanning tree called a postal tree. The postal network can also be considered as a flexible version of the hypercube by relaxing the restriction on the number of nodes, and hence, makes it possible to construct multicomputers with arbitrary sizes.  相似文献   

3.
Learning to Play Chess Using Temporal Differences   总被引:4,自引:0,他引:4  
Baxter  Jonathan  Tridgell  Andrew  Weaver  Lex 《Machine Learning》2000,40(3):243-263
In this paper we present TDLEAF(), a variation on the TD() algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program KnightCap used TDLEAF() to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF() can yield better results in the domain of backgammon, where TD() has previously yielded striking success.  相似文献   

4.
The notion of obvious inference in predicate logic is discussed from the viewpoint of proof-checker applications in logic and mathematics education. A class of inferences in predicate logic is defined and it is proposed to identify it with the class of obvious logical inferences. The definition is compared with other approaches. The algorithm for implementing the obviousness decision procedure follows directly from the definition.  相似文献   

5.
We present a new definition of optimality intervals for the parametric right-hand side linear programming (parametric RHS LP) Problem () = min{c t x¦Ax =b + ¯b,x 0}. We then show that an optimality interval consists either of a breakpoint or the open interval between two consecutive breakpoints of the continuous piecewise linear convex function (). As a consequence, the optimality intervals form a partition of the closed interval {; ¦()¦ < }. Based on these optimality intervals, we also introduce an algorithm for solving the parametric RHS LP problem which requires an LP solver as a subroutine. If a polynomial-time LP solver is used to implement this subroutine, we obtain a substantial improvement on the complexity of those parametric RHS LP instances which exhibit degeneracy. When the number of breakpoints of () is polynomial in terms of the size of the parametric problem, we show that the latter can be solved in polynomial time.This research was partially funded by the United States Navy-Office of Naval Research under Contract N00014-87-K-0202. Its financial support is gratefully acknowledged.  相似文献   

6.
Technical Update: Least-Squares Temporal Difference Learning   总被引:2,自引:0,他引:2  
Boyan  Justin A. 《Machine Learning》2002,49(2-3):233-246
TD./ is a popular family of algorithms for approximate policy evaluation in large MDPs. TD./ works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1–3, 33–57) eliminates all stepsize parameters and improves data efficiency.This paper updates Bradtke and Barto's work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from = 0 to arbitrary values of ; at the extreme of = 1, the resulting new algorithm is shown to be a practical, incremental formulation of supervised linear regression. Third, it presents a novel and intuitive interpretation of LSTD as a model-based reinforcement learning technique.  相似文献   

7.
A text is a triple=(, 1, 2) such that is a labeling function, and 1 and 2 are linear orders on the domain of ; hence may be seen as a word (, 1) together with an additional linear order 2 on the domain of . The order 2 is used to give to the word (, 1) itsindividual hierarchical representation (syntactic structure) which may be a tree but it may be also more general than a tree. In this paper we introducecontext-free grammars for texts and investigate their basic properties. Since each text has its own individual structure, the role of such a grammar should be that of a definition of a pattern common to all individual texts. This leads to the notion of ashapely context-free text grammar also investigated in this paper.  相似文献   

8.
This paper presents aut, a modern Automath checker. It is a straightforward re-implementation of the Zandleven Automath checker from the seventies. It was implemented about five years ago, in the programming language C. It accepts both the AUT-68 and AUT-QE dialects of Automath. This program was written to restore a damaged version of Jutting's translation of Landau's Grundlagen. Some notable features: It is fast. On a 1 GHz machine it will check the full Jutting formalization (736 K of nonwhitespace Automath source) in 0.6 seconds. Its implementation of -terms does not use named variables or de Bruijn indices (the two common approaches) but instead uses a graph representation. In this representation variables are represented by pointers to a binder. The program can compile an Automath text into one big Automath single line-style -term. It outputs such a term using de Bruijn indices. (These -terms cannot be checked by modern systems like Coq or Agda, because the -typed -calculi of de Bruijn are different from the -typed -calculi of modern type theory.)The source of aut is freely available on the Web at the address .  相似文献   

9.
Incremental Multi-Step Q-Learning   总被引:23,自引:0,他引:23  
Peng  Jing  Williams  Ronald J. 《Machine Learning》1996,22(1-3):283-290
This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD() return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quantization. The resulting algorithm, Q()-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.  相似文献   

10.
Q()-learning uses TD()-methods to accelerate Q-learning. The update complexity of previous online Q() implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.  相似文献   

11.
This paper is an informal introduction to the theory of types which use a connective for the intersection of two types and a constant for a universal type, besides the usual connective for function-types. This theory was first devised in about 1977 by Coppo, Dezani and Sallé in the context of-calculus and its main development has been by Coppo and Dezani and their collaborators in Turin. With suitable axioms and rules to assign types to-calculus terms, they obtained a system in which (i) the set of types given to a term does not change under-conversion, (ii) some interesting sets of terms, for example the solvable terms and the terms with normal form, can be characterised exactly by the types of their members, and (iii) the type-apparatus is not so complex as polymorphic systems with quantifier-containing types and therefore probably not so expensive to implement mechanically as these systems.There are in fact several variant systems with different detailed properties. This paper defines and motivates the simplest one from which the others are derived, and describes its most basic properties. No proofs are given but the motivation is shown by examples. A comprehensive bibliography is included.  相似文献   

12.
When interpolating incomplete data, one can choose a parametric model, or opt for a more general approach and use a non-parametric model which allows a very large class of interpolants. A popular non-parametric model for interpolating various types of data is based on regularization, which looks for an interpolant that is both close to the data and also smooth in some sense. Formally, this interpolant is obtained by minimizing an error functional which is the weighted sum of a fidelity term and a smoothness term.The classical approach to regularization is: select optimal weights (also called hyperparameters) that should be assigned to these two terms, and minimize the resulting error functional.However, using only the optimal weights does not guarantee that the chosen function will be optimal in some sense, such as the maximum likelihood criterion, or the minimal square error criterion. For that, we have to consider all possible weights.The approach suggested here is to use the full probability distribution on the space of admissible functions, as opposed to the probability induced by using a single combination of weights. The reason is as follows: the weight actually determines the probability space in which we are working. For a given weight , the probability of a function f is proportional to exp(– f2 uu du) (for the case of a function with one variable). For each different , there is a different solution to the restoration problem; denote it by f. Now, if we had known , it would not be necessary to use all the weights; however, all we are given are some noisy measurements of f, and we do not know the correct . Therefore, the mathematically correct solution is to calculate, for every , the probability that f was sampled from a space whose probability is determined by , and average the different f's weighted by these probabilities. The same argument holds for the noise variance, which is also unknown.Three basic problems are addressed is this work: Computing the MAP estimate, that is, the function f maximizing Pr(f/D) when the data D is given. This problem is reduced to a one-dimensional optimization problem. Computing the MSE estimate. This function is defined at each point x as f(x)Pr(f/D) f. This problem is reduced to computing a one-dimensional integral.In the general setting, the MAP estimate is not equal to the MSE estimate. Computing the pointwise uncertainty associated with the MSE solution. This problem is reduced to computing three one-dimensional integrals.  相似文献   

13.
Modular Control and Coordination of Discrete-Event Systems   总被引:1,自引:0,他引:1  
In the supervisory control of discrete-event systems based on controllable languages, a standard way to handle state explosion in large systems is by modular supervision: either horizontal (decentralized) or vertical (hierarchical). However, unless all the relevant languages are prefix-closed, a well-known potential hazard with modularity is that of conflict. In decentralized control, modular supervisors that are individually nonblocking for the plant may nevertheless produce blocking, or even deadlock, when operating on-line concurrently. Similarly, a high-level hierarchical supervisor that predicts nonblocking at its aggregated level of abstraction may inadvertently admit blocking in a low-level implementation. In two previous papers, the authors showed that nonblocking hierarchical control can be guaranteed provided high-level aggregation is sufficiently fine; the appropriate conditions were formalized in terms of control structures and observers. In this paper we apply the same technique to decentralized control, when specifications are imposed on local models of the global process; in this way we remove the restriction in some earlier work that the plant and specification (marked) languages be prefix-closed. We then solve a more general problem of coordination: namely how to determine a high level coordinator that forestalls conflict in a decentralized architecture when it potentially arises, but is otherwise minimally intrusive on low-level control action. Coordination thus combines both vertical and horizontal modularity. The example of a simple production process is provided as a practical illustration. We conclude with an appraisal of the computational effort involved.  相似文献   

14.
Continuation passing style (CPS) translations of typed -calculi have numerous applications. However, the range of these applications has been confined by the fact that CPS translations are known for non-dependent type systems only, thus excluding well-known systems like the calculus of constructions (CC) and the logical frameworks (LF). This paper presents techniques for CPS translating systems with dependent types, with an emphasis on pure type-theoretical applications.In the first part of the paper we review several lines of work in which the need for CPS translations of dependent type systems has arisen, and discuss the difficulties involved with CPS translating such systems. One way of overcoming these difficulties is to work with so-called domain-free type systems. Thus, instead of Barendregt's -cube we shall consider the domain-free -cube, and instead of traditional pure type systems, we shall consider domain-free pure type systems.We therefore begin the second part by reviewing the domain-free -cube, which includes domain-free versions of CC and LF, and then present CPS translations for all the systems of the domain-free -cube. We also introduce Direct Style (DS) (i.e., inverse CPS) translations for all the systems of the domain-free -cube; such DS translations, which have been used in a number of applications, were previously formulated for untyped and simply-typed languages only.In the third part we review domain-free pure type systems and generalize the CPS translations of the domain-free -cube to a large class of domain-free pure type systems which includes most of the systems that appear in the literature, including those of the domain-free -cube. Many translations that appear in the literature arise as special cases of ours.In the fourth part of the paper we present two approaches to CPS translations of traditional pure type systems. The first, indirect, technique lifts the CPS translation of domain-free pure type systems to the analogous class of traditional pure type systems by using results that relate derivations in domain-free and traditional pure type systems. The second, direct, approach translates derivations, requiring a certain order on derivations to be well-founded. Both techniques yield translations for most of the systems that appear in the literature, including those of Barendregt's -cube.  相似文献   

15.
Let (X, #) be an orthogonality space such that the lattice C(X, #) of closed subsets of (X, #) is orthomodular and let (, ) denote the free orthogonality monoid over (X, #). Let C0(, ) be the subset of C(, ), consisting of all closures of bounded orthogonal sets. We show that C0(, ) is a suborthomodular lattice of C(, ) and we provide a necessary and sufficient condition for C0(, ) to carry a full set of dispersion free states.The work of the second author on this paper was supported by National Science Foundation Grant GP-9005.  相似文献   

16.
This paper aims to provide a basis for renewed talk about use in computing. Four current discourse arenas are described. Different intentions manifest in each arena are linked to failures in translation, different terminologies crossing disciplinary and national boundaries non-reflexively. Analysis of transnational use discourse dynamics shows much miscommunication. Conflicts like that between the Scandinavian System Development School and the usability approach have less current salience. Renewing our talk about use is essential to a participatory politics of information technology and will lead to clearer perception of the implications of letting new systems becoming primary media of social interaction.  相似文献   

17.
'Racial' disparities among cancers, particularly of the breast and prostate, are something of a mystery. For the US, in the face of slavery and its sequelae, centuries of interbreeding has greatly leavened genetic differences between Blacks and Whites, but marked contrasts in disease prevalence and progression persist. Adjustment for socioeconomic status and lifestyle, while statistically accounting for much of the variance in breast cancer, only begs the question of ultimate causality. Here we propose a more basic biological explanation that extends the theory of immune cognition to include an elaborate tumor control mechanism constituting the principal selection pressure acting on pathologically mutating cell clones. The interplay between them occurs in the context of an embedding, highly structured, system of culturally-specific psychosocial stress. A rate distortion argument finds that larger system able to literally write an image of itself onto the disease process, in terms of enhanced risk behaviour, accelerated mutation rate, and depressed mutation control. The dynamics are analogous to punctuated equilibrium in simple evolutionary systems, accounting for the staged nature of disease progression. We conclude that 'social exposures' are, for human populations, far more than incidental cofactors in cancer etiology. Rather, they are part of the basic biology of the disorder. The aphorism that culture is as much a part of human biology as the enamel on our teeth appears literally true at a fundamental cellular level.  相似文献   

18.
19.
Our starting point is a definition of conditional event EH which differs from many seemingly similar ones adopted in the relevant literature since 1935, starting with de Finetti. In fact, if we do not assign the same third value u (undetermined) to all conditional events, but make it depend on EH, it turns out that this function t(EH) can be taken as a general conditional uncertainty measure, and we get (through a suitable – in a sense, compulsory – choice of the relevant operations among conditional events) the natural axioms for many different (besides probability) conditional measures.  相似文献   

20.
Optimal shape design problems for an elastic body made from physically nonlinear material are presented. Sensitivity analysis is done by differentiating the discrete equations of equilibrium. Numerical examples are included.Notation U ad set of admissible continuous design parameters - U h ad set of admissible discrete design parameters - function fromU h ad defining shape of body - h function fromU h ad defining approximated shape of body - vector of nodal values of h - { n} sequence of functions tending to - () domain defined by - K bulk modulus - shear modulus - penalty parameter for contact condition - V() space of virtual displacements in() - V h(h) finite element approximation ofV() - J cost functional - J h discretized cost functional - J algebraic form ofJ h - (u) stress tensor - e(u) strain tensor - K stiffness matrix - f force vector - b(q) term arising from nonlinear boundary conditions - q vector of nodal degrees of freedom - p vector of adjoint state variables - J Jacobian of isoparametric mapping - |J| determinant ofJ - N vector of shape function values on parent element - L matrix of shape function derivatives on parent element - G matrix of Cartesian derivatives of shape functions - X matrix of nodal coordinates of element - D matrix of elastic coefficients - B strain-displacement matrix - P part of boundary where tractions are prescribed - u part of boundary where displacements are prescribed - variable part of boundary - strain invariant  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号