首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper deals with a new approach based on Q-learning for solving the problem of mobile robot path planning in complex unknown static environments.As a computational approach to learning through interaction with the environment,reinforcement learning algorithms have been widely used for intelligent robot control,especially in the field of autonomous mobile robots.However,the learning process is slow and cumbersome.For practical applications,rapid rates of convergence are required.Aiming at the problem of slow convergence and long learning time for Q-learning based mobile robot path planning,a state-chain sequential feedback Q-learning algorithm is proposed for quickly searching for the optimal path of mobile robots in complex unknown static environments.The state chain is built during the searching process.After one action is chosen and the reward is received,the Q-values of the state-action pairs on the previously built state chain are sequentially updated with one-step Q-learning.With the increasing number of Q-values updated after one action,the number of actual steps for convergence decreases and thus,the learning time decreases,where a step is a state transition.Extensive simulations validate the efficiency of the newly proposed approach for mobile robot path planning in complex environments.The results show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time,compared with the one-step Q-learning algorithm and the Q(λ)-learning algorithm.  相似文献   

2.
A linear algebraic approach to multisequence shift-register synthesis   总被引:1,自引:0,他引:1  
An efficient algorithm which synthesizes all shortest linear-feedback shift registers generating K given sequences with possibly different lengths over a field is derived, and its correctness is proved. The proposed algorithm generalizes the Berlekamp-Massey and Feng-Tzeng algorithms and is based on Massey’s ideas. The time complexity of the algorithm is O(KλN) ≲ O(KN 2), where N is the length of a longest sequence and λ is the linear complexity of the sequences.  相似文献   

3.
We study how a mobile robot can learn an unknown environment in a piecemeal manner. The robot's goal is to learn a complete map of its environment, while satisfying the constraint that it must return every so often to its starting position (for refueling, say). The environment is modeled as an arbitrary, undirected graph, which is initially unknown to the robot. We assume that the robot can distinguish vertices and edges that it has already explored. We present a surprisingly efficient algorithm for piecemeal learning an unknown undirected graph G=(VE) in which the robot explores every vertex and edge in the graph by traversing at most O(E+V1+o(1)) edges. This nearly linear algorithm improves on the best previous algorithm, in which the robot traverses at most O(E+V2) edges. We also give an application of piecemeal learning to the problem of searching a graph for a “treasure.”  相似文献   

4.
M. R. Crisci  E. Russo 《Calcolo》1977,14(3):243-259
Sommario Facendo ricorso al metodo τ, si determina un nuovo approssimante polinomiale diJ λ(x), valido perx≥0 e λ reale, e di esso si dà una stima dell'errore. Si individua, poi, sperimentalmente una regioe del piano (x,λ) in cui i risultati numerici sono esatti fino alla quinta cifra significativa.
A new polynomial approximation ofJ λ(x) is found, forx≥0 and real λ, using the τ-method. The error is evaluated.,A region of the (x,λ)-plane is empirically determined, where the numerical results are correct to the fifth significant digit.


Lavoro eseguito nell'ambito dell'attività dei Gruppi di Ricerca Matematica del C.N.R.  相似文献   

5.
In this paper we address the problem of simultaneous learning and coordination in multiagent Markov decision problems (MMDPs) with infinite state-spaces. We separate this problem in two distinct subproblems: learning and coordination. To tackle the problem of learning, we survey Q-learning with soft-state aggregation (Q-SSA), a well-known method from the reinforcement learning literature (Singh et al. in Advances in neural information processing systems. MIT Press, Cambridge, vol 7, pp 361–368, 1994). Q-SSA allows the agents in the game to approximate the optimal Q-function, from which the optimal policies can be computed. We establish the convergence of Q-SSA and introduce a new result describing the rate of convergence of this method. In tackling the problem of coordination, we start by pointing out that the knowledge of the optimal Q-function is not enough to ensure that all agents adopt a jointly optimal policy. We propose a novel coordination mechanism that, given the knowledge of the optimal Q-function for an MMDP, ensures that all agents converge to a jointly optimal policy in every relevant state of the game. This coordination mechanism, approximate biased adaptive play (ABAP), extends biased adaptive play (Wang and Sandholm in Advances in neural information processing systems. MIT Press, Cambridge, vol 15, pp 1571–1578, 2003) to MMDPs with infinite state-spaces. Finally, we combine Q-SSA with ABAP, this leading to a novel algorithm in which learning of the game and coordination take place simultaneously. We discuss several important properties of this new algorithm and establish its convergence with probability 1. We also provide simple illustrative examples of application.  相似文献   

6.
We consider the following problem: given an undirected weighted graph G=(V,E,c) with nonnegative weights, minimize function c(δ(Π))−λ|Π| for all values of parameter λ. Here Π is a partition of the set of nodes, the first term is the cost of edges whose endpoints belong to different components of the partition, and |Π| is the number of components. The current best known algorithm for this problem has complexity O(|V|2) maximum flow computations. We improve it to |V| parametric maximum flow computations. We observe that the complexity can be improved further for families of graphs which admit a good separator, e.g. for planar graphs.  相似文献   

7.
L. Gatteschi 《Calcolo》1979,16(4):447-458
In this paper we obtain a new asymptotic formula for the ultraspherical polynomialP n (λ) (x), asn→∞, with an error term which isO (n λ−5 ) uniformly in the interval −1+δ≤x≤1−δ,δ>0. Very accurate approximations for the zeros ofP n (λ) (x) are also derived from the preceding formula.

Lavoro eseguito nell'ambito del Gruppo Nazionale per l'Informatica Matematica del C. N. R..  相似文献   

8.
For compact Euclidean bodiesP, Q, we define (P, Q) to be the smallest ratior/s wherer > 0,s > 0 satisfy . HeresQ denotes a scaling ofQ by the factors, andQ,Q are some translates ofQ. This function gives us a new distance function between bodies which, unlike previously studied measures, is invariant under affine transformations. If homothetic bodies are identified, the logarithm of this function is a metric. (Two bodies arehomothetic if one can be obtained from the other by scaling and translation.)For integerk 3, define (k) to be the minimum value such that for each convex polygonP there exists a convexk-gonQ with (P, Q) (k). Among other results, we prove that 2.118 ... <-(3) 2.25 and (k) = 1 + (k –2). We give anO(n 2 log2 n)-time algorithm which, for any input convexn-gonP, finds a triangleT that minimizes (T, P) among triangles. However, in linear time we can find a trianglet with (t, P)<-2.25.Our study is motivated by the attempt to reduce the complexity of the polygon containment problem, and also the motion-planning problem. In each case we describe algorithms which run faster when certain implicitslackness parameters of the input are bounded away from 1. These algorithms illustrate a new algorithmic paradigm in computational geometry for coping with complexity.Work of all authors was partially supported by the ESPRIT II Basic Research Actions Program of the EC under Contract No. 3075 (project ALCOM). Rudolf Fleischer and Kurt Mehlhorn acknowledge also DFG (Grant SPP Me 620/6). Chee Yap acknowledges also DFG (Grant Be 142/46-1) and NSF (Grants DCR-84-01898 and CCR-87-03458). This research was performed when Günter Rote and Chee Yap were at the Freie Universität Berlin.  相似文献   

9.
The paper addresses the problem of determining an outer interval solution of the parametric eigenvalue problem A(p)x = λx, A(p) ∈ ℝn×n for the general case where the matrix elements aij(p) are continuous nonlinear functions of the parameter vector p, p belonging to the interval vector p. A method for computing an interval enclosure of each eigenpair (λμ, x(μ)), μ = 1, ..., n, is suggested for the case where λμ is a simple eigenvalue. It is based on the use of an affine interval approximation of aij(p) in p and reduces, essentially, to setting up and solving a real system of n or 2n incomplete quadratic equations for each real or complex eigenvalue, respectively.  相似文献   

10.
This paper introduces the path planning of a 1 cm3 mobile microrobot that is designed for microassembly in a microfactory. Since the conventional path planning method can not achieve high microassembly positioning accuracy, a supervised learning assisted reinforcement learning (SL-RL) method has been developed. In this mixed learning method, the reinforcement learning (RL) is used to search a movement path in the normal learning area. But when the microrobot moves into the buffer area, the supervised learning (SL) is employed to prevent it from moving out of the boundary. The SL-RL uses a gradient descent algorithm based on uniform grid tile coding under SARSA(λ) to handle the large learning state space. In addition to the uniform grid tile model, two irregular tile models called an uneven grid tile model and a cobweb tile model are designed to partition the microrobot state space. The main conclusions demonstrated by simulations are as follows: First, the SL-RL method achieves higher positioning accuracy than the conventional path planning method; second, the SL-RL method achieves higher positioning accuracy and learning efficiency than the single RL method; and third, the irregular tile models show higher learning efficiency than the uniform tile model. The cobweb tile model performs especially well.  相似文献   

11.
G. Pesamosca 《Calcolo》1978,15(2):181-196
Sommario SeA è una matrice ad elementi reali edf(λ) una funzione analitica reale, si fornisce per la matricef(A) una espressione in termini di sole matrici costituenti reali. Successivamente si espone un algoritmo per il calcolo dif(A).
A definition off(A) by means of real constituent matrices is given for any real matrixA, and any analitic real functionf(λ). An efficient algorithm forf(A) computation is shown.


Lavoro eseguito presso la Fondazione Ugo Bordoni, come da convenzione in atto tra l'Amministrazione P. T. e la Fondazione Ugo Bordoni.  相似文献   

12.
Let P(d) be a program implementing a partial recursive function φ. Let $ \mathcal{O} $ \mathcal{O} P denote a function defined on the domain of function φ that maps an input data d 0 onto the path of computation of P on the input d 0. Let Q(p, d) be a program returning a value if and only if p = $ \mathcal{O} $ \mathcal{O} P (d), and let the value of the program be Q($ \mathcal{O} $ \mathcal{O} P (d), d) = P(d). Program Q(p, d), which is totally absurd from the point of view of its practical computation on concrete input data, may be practically useful when it is analyzed by a metaprogram. It is shown in the paper how program Q(p, d) can be used for verification of a postcondition imposed on program P(d). The proposed method was tested on verification tasks for cache coherence protocols and other distributed computing systems.  相似文献   

13.
F. Di-Guglielmo 《Calcolo》1971,8(3):185-213
Summary The present paper is devoted to the approximate solution of variational elliptic boundary value problems of the form: α(u, v)=(f, v)vV by using approximations of the Hilbert spaceV with several degrees of freedom as constructed in a preceding paper [7]. These approximations lead to finite difference schemes involving several arbitrary parameters, whose solution converge to the exact solution of the boundary value problem if the values of these parameters are small enough. This fact can be utilized to diminish the error between the exact and the approximate solution by a suitable choice of these arbitrary parameters, so as to avoid the use of very small step lengths. The method may prove useful in cases where the coercivity constant of the bilinear form α (u, v) is small when compated to its continuity constant, and more generally for problems of the form: α (u, v)−λ (u. v.)=(f, v) where the constant λ is close to an eigenvalue of the boundary value problem.   相似文献   

14.
This paper presents a numerical study of the bottom-up and top-down inference processes in hierarchical models using the And-Or graph as an example. Three inference processes are identified for each node A in a recursively defined And-Or graph in which stochastic context sensitive image grammar is embedded: the α(A) process detects node A directly based on image features, the β(A) process computes node A by binding its child node(s) bottom-up and the γ(A) process predicts node A top-down from its parent node(s). All the three processes contribute to computing node A from images in complementary ways. The objective of our numerical study is to explore how much information each process contributes and how these processes should be integrated to improve performance. We study them in the task of object parsing using And-Or graph formulated under the Bayesian framework. Firstly, we isolate and train the α(A), β(A) and γ(A) processes separately by blocking the other two processes. Then, information contributions of each process are evaluated individually based on their discriminative power, compared with their respective human performance. Secondly, we integrate the three processes explicitly for robust inference to improve performance and propose a greedy pursuit algorithm for object parsing. In experiments, we choose two hierarchical case studies: one is junctions and rectangles in low-to-middle-level vision and the other is human faces in high-level vision. We observe that (i) the effectiveness of the α(A), β(A) and γ(A) processes depends on the scale and occlusion conditions, (ii) the α(face) process is stronger than the α processes of facial components, while β(junctions) and β(rectangle) work much better than their α processes, and (iii) the integration of the three processes improves performance in ROC comparisons.  相似文献   

15.
We consider summation of consecutive values (φ(v), φ(v + 1), ..., φ(w) of a meromorphic function φ(z), where v, w ∈ ℤ. We assume that φ(z) satisfies a linear difference equation L(y) = 0 with polynomial coefficients, and that a summing operator for L exists (such an operator can be found—if it exists—by the Accurate Summation algorithm, or, alternatively, by Gosper’s algorithm when ordL = 1). The notion of bottom summation which covers the case where φ(z) has poles in ℤ is introduced. The text was submitted by the authors in English.  相似文献   

16.
This paper describes our experience in designing, developing and deploying systems for supporting human–robot teams during disaster response. It is based on R&D performed in the EU-funded project NIFTi. NIFTi aimed at building intelligent, collaborative robots that could work together with humans in exploring a disaster site, to make a situational assessment. To achieve this aim, NIFTi addressed key scientific design aspects in building up situation awareness in a human–robot team, developing systems using a user-centric methodology involving end users throughout the entire R&D cycle, and regularly deploying implemented systems under real-life circumstances for experimentation and testing. This has yielded substantial scientific advances in the state-of-the-art in robot mapping, robot autonomy for operating in harsh terrain, collaborative planning, and human–robot interaction. NIFTi deployed its system in actual disaster response activities in Northern Italy, in July 2012, aiding in structure damage assessment.  相似文献   

17.
Exploiting the cone structure of the set of unnormalized mixed quantum states, we offer an approach to detect separability independently of the dimensions of the subsystems. We show that any mixed quantum state can be decomposed as ρ = (1−λ)C ρ  + λE ρ , where C ρ is a separable matrix whose rank equals that of ρ and the rank of E ρ is strictly lower than that of ρ. With the simple choice Cr=M1?M2{C_{\rho}=M_{1}\otimes M_{2}} we have a necessary condition of separability in terms of λ, which is also sufficient if the rank of E ρ equals 1. We give a first extension of this result to detect genuine entanglement in multipartite states and show a natural connection between the multipartite separability problem and the classification of pure states under stochastic local operations and classical communication. We argue that this approach is not exhausted with the first simple choices included herein.  相似文献   

18.
We present a randomized algorithm for finding maximum matchings in planar graphs in timeO(n ω/2), whereω is the exponent of the best known matrix multiplication algorithm. Sinceω<2.38, this algorithm breaks through theO(n 1.5) barrier for the matching problem. This is the first result of this kind for general planar graphs. We also present an algorithm for generating perfect matchings in planar graphs uniformly at random usingO(n ω/2) arithmetic operations. Our algorithms are based on the Gaussian elimination approach to maximum matchings introduced in [16]. This research was supported by KBN Grant 4T11C04425.  相似文献   

19.
20.
The queuing system with infinite buffer, single server, and exponential servicing was considered. A double stochastic Poisson flow whose intensity λ(t) is a stepwise process with exponentially distributed constancy intervals arrives to it. It was assumed that the values of process λ(t) to the left and right of the discontinuity point were independent. The nonstationary and stationary characteristics were determined using the method of generating functions. Existence and uniqueness of the stationary mode and stabilization of the nonstationary queuing system mode were proved. The results of numerical analysis and application of the considered queuing system were discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号