首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart–pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm.  相似文献   

2.
This paper addresses a new method for combination of supervised learning and reinforcement learning (RL). Applying supervised learning in robot navigation encounters serious challenges such as inconsistent and noisy data, difficulty for gathering training data, and high error in training data. RL capabilities such as training only by one evaluation scalar signal, and high degree of exploration have encouraged researchers to use RL in robot navigation problem. However, RL algorithms are time consuming as well as suffer from high failure rate in the training phase. Here, we propose Supervised Fuzzy Sarsa Learning (SFSL) as a novel idea for utilizing advantages of both supervised and reinforcement learning algorithms. A zero order Takagi–Sugeno fuzzy controller with some candidate actions for each rule is considered as the main module of robot's controller. The aim of training is to find the best action for each fuzzy rule. In the first step, a human supervisor drives an E-puck robot within the environment and the training data are gathered. In the second step as a hard tuning, the training data are used for initializing the value (worth) of each candidate action in the fuzzy rules. Afterwards, the fuzzy Sarsa learning module, as a critic-only based fuzzy reinforcement learner, fine tunes the parameters of conclusion parts of the fuzzy controller online. The proposed algorithm is used for driving E-puck robot in the environment with obstacles. The experiment results show that the proposed approach decreases the learning time and the number of failures; also it improves the quality of the robot's motion in the testing environments.  相似文献   

3.
One of the key problems in reinforcement learning (RL) is balancing exploration and exploitation. Another is learning and acting in large Markov decision processes (MDPs) where compact function approximation has to be used. This paper introduces REKWIRE, a provably efficient, model-free algorithm for finite-horizon RL problems with value function approximation (VFA) that addresses the exploration-exploitation tradeoff in a principled way. The crucial element of this algorithm is a reduction of RL to online regression in the recently proposed KWIK learning model. We show that, if the KWIK online regression problem can be solved efficiently, then the sample complexity of exploration of REKWIRE is polynomial. Therefore, the reduction suggests a new and sound direction to tackle general RL problems. The efficiency of our algorithm is verified on a set of proof-of-concept experiments where popular, ad hoc exploration approaches fail.  相似文献   

4.
This paper deals with a scheduling problem on a single machine with an availability constraint. The problem is known to be NP-complete and admits several approximation algorithms. In this paper we study the approximation scheme described in He et al. [Y. He, W. Zhong, H. Gu, Improved algorithms for two single machine scheduling problems, Theoretical Computer Science 363 (2006) 257–265]. We provide the computation of an improved relative error of this heuristic, as well as a proof that this new bound is tight. We also present some computational experiments to test this heuristic on random instances. These experiments include an implementation of the fully-polynomial time approximation scheme given in Kacem and Ridha Mahjoub [I. Kacem, A. Ridha Mahjoub, Fully polynomial time approximation scheme for the weighted flow-time minimization on a single machine with a fixed non-availability interval, Computers and Industrial Engineering 56 (2009) 1708–1712].  相似文献   

5.
We introduce a new framework for designing and analyzing algorithms. Our framework applies best to problems that are inapproximable according to the standard worst-case analysis. We circumvent such negative results by designing guarantees for classes of instances, parameterized according to properties of the optimal solution. Given our parameterized approximation, called PArametrized by the Signature of the Solution (PASS) approximation, we design algorithms with optimal approximation ratios for problems with additive and submodular objective functions such as the capacitated maximum facility location problems. We consider two types of algorithms for these problems. For greedy algorithms, our framework provides a justification for preferring a certain natural greedy rule over some alternative greedy rules that have been used in similar contexts. For LP-based algorithms, we show that the natural LP relaxation for these problems is not optimal in our framework. We design a new LP relaxation and show that this LP relaxation coupled with a new randomized rounding technique is optimal in our framework. In passing, we note that our results strictly improve over previous results of Kleinberg et al. (J. ACM 51(2):263–280, 2004) concerning the approximation ratio of the greedy algorithm.  相似文献   

6.
We formalize the problem of Structured Prediction as a Reinforcement Learning task. We first define a Structured Prediction Markov Decision Process (SP-MDP), an instantiation of Markov Decision Processes for Structured Prediction and show that learning an optimal policy for this SP-MDP is equivalent to minimizing the empirical loss. This link between the supervised learning formulation of structured prediction and reinforcement learning (RL) allows us to use approximate RL methods for learning the policy. The proposed model makes weak assumptions both on the nature of the Structured Prediction problem and on the supervision process. It does not make any assumption on the decomposition of loss functions, on data encoding, or on the availability of optimal policies for training. It then allows us to cope with a large range of structured prediction problems. Besides, it scales well and can be used for solving both complex and large-scale real-world problems. We describe two series of experiments. The first one provides an analysis of RL on classical sequence prediction benchmarks and compares our approach with state-of-the-art SP algorithms. The second one introduces a tree transformation problem where most previous models fail. This is a complex instance of the general labeled tree mapping problem. We show that RL exploration is effective and leads to successful results on this challenging task. This is a clear confirmation that RL could be used for large size and complex structured prediction problems.  相似文献   

7.
Ensemble Algorithms in Reinforcement Learning   总被引:1,自引:0,他引:1  
This paper describes several ensemble methods that combine multiple different reinforcement learning (RL) algorithms in a single agent. The aim is to enhance learning speed and final performance by combining the chosen actions or action probabilities of different RL algorithms. We designed and implemented four different ensemble methods combining the following five different RL algorithms: $Q$ -learning, Sarsa, actor–critic (AC), $QV$-learning, and AC learning automaton. The intuitively designed ensemble methods, namely, majority voting (MV), rank voting, Boltzmann multiplication (BM), and Boltzmann addition, combine the policies derived from the value functions of the different RL algorithms, in contrast to previous work where ensemble methods have been used in RL for representing and learning a single value function. We show experiments on five maze problems of varying complexity; the first problem is simple, but the other four maze tasks are of a dynamic or partially observable nature. The results indicate that the BM and MV ensembles significantly outperform the single RL algorithms.   相似文献   

8.
Agents can learn to improve their coordination with their teammates and increase team performance. There are finite training instances, where each training instance is an opportunity for the learning agents to improve their coordination. In this article, we focus on allocating training instances to learning agent pairs, i.e., pairs that improve coordination with each other, with the goal of team formation. Agents learn at different rates, and hence, the allocation of training instances affects the performance of the team formed. We build upon previous work on the Synergy Graph model, that is learned completely from data and represents agents’ capabilities and compatibility in a multi-agent team. We formally define the learning agents team formation problem, and compare it with the multi-armed bandit problem. We consider learning agent pairs that improve linearly and geometrically, i.e., the marginal improvement decreases by a constant factor. We contribute algorithms that allocate the training instances, and compare against algorithms from the multi-armed bandit problem. In our simulations, we demonstrate that our algorithms perform similarly to the bandit algorithms in the linear case, and outperform them in the geometric case. Further, we apply our model and algorithms to a multi-agent foraging problem, thus demonstrating the efficacy of our algorithms in general multi-agent problems.  相似文献   

9.
We investigate the parameterized computational complexity of the satisfiability problem for modal logic and attempt to pinpoint relevant structural parameters which cause the problem??s combinatorial explosion, beyond the number of propositional variables v. To this end we study the modality depth, a natural measure which has appeared in the literature, and show that, even though modal satisfiability parameterized by v and the modality depth is FPT, the running time??s dependence on the parameters is a tower of exponentials (unless P = NP). To overcome this limitation we propose possible alternative parameters, namely diamond dimension and modal width. We show fixed-parameter tractability results using these measures where the exponential dependence on the parameters is much milder (doubly and singly exponential respectively) than in the case of modality depth thus leading to FPT algorithms for modal satisfiability with much more reasonable running times. We also give lower bound arguments which prove that our algorithms cannot be improved significantly unless the Exponential Time Hypothesis fails.  相似文献   

10.
11.
ProbLog is a probabilistic extension of Prolog. Given the complexity of exact inference under ProbLog??s semantics, in many applications in machine learning approximate inference is necessary. Current approximate inference algorithms for ProbLog however require either dealing with large numbers of proofs or do not guarantee a low approximation error. In this paper we introduce a new approximate inference algorithm which addresses these shortcomings. Given a user-specified parameter k, this algorithm approximates the success probability of a query based on at most k proofs and ensures that the calculated probability p is (1?1/e)p ???p??p ?, where p ? is the highest probability that can be calculated based on any set of k proofs. Furthermore a useful feature of the set of calculated proofs is that it is diverse. Our experiments show the utility of the proposed algorithm.  相似文献   

12.
Similarity functions are a fundamental component of many learning algorithms. When dealing with string or tree-structured data, measures based on the edit distance are widely used, and there exist a few methods for learning them from data. However, these methods offer no theoretical guarantee as to the generalization ability and discriminative power of the learned similarities. In this paper, we propose an approach to edit similarity learning based on loss minimization, called GESL. It is driven by the notion of (?,??,??)-goodness, a theory that bridges the gap between the properties of a similarity function and its performance in classification. Using the notion of uniform stability, we derive generalization guarantees that hold for a large class of loss functions. We also provide experimental results on two real-world datasets which show that edit similarities learned with GESL induce more accurate and sparser classifiers than other (standard or learned) edit similarities.  相似文献   

13.
Human experts as well as autonomous agents in a referral network must decide whether to accept a task or refer to a more appropriate expert, and if so to whom. In order for the referral network to improve over time, the experts must learn to estimate the topical expertise of other experts. This article extends concepts from Multi-agent Reinforcement Learning and Active Learning to referral networks for distributed learning in referral networks. Among a wide array of algorithms evaluated, Distributed Interval Estimation Learning (DIEL), based on Interval Estimation Learning, was found to be superior for learning appropriate referral choices, compared to ??-Greedy, Q-learning, Thompson Sampling and Upper Confidence Bound (UCB) methods. In addition to a synthetic data set, we compare the performance of the stronger learning-to-refer algorithms on a referral network of high-performance Stochastic Local Search (SLS) SAT solvers where expertise does not obey any known parameterized distribution. An evaluation of overall network performance and a robustness analysis is conducted across the learning algorithms, with an emphasis on capacity constraints and evolving networks, where experts with known expertise drop off and new experts of unknown performance enter — situations that arise in real-world scenarios but were heretofore ignored.  相似文献   

14.
Relational reinforcement learning (RRL) combines traditional reinforcement learning (RL) with a strong emphasis on a relational (rather than attribute-value) representation. Earlier work used RRL on a learning version of the classic Blocks World planning problem (a version where the learner does not know what the result of taking an action will be) and the Tetris game. Learning results based on the structure of training examples were obtained, such as learning in a mixed 3–5 block environment and being able to perform in a 3 or 10 block environment. Here, we instead take a function approximation approach to RL for the Blocks World problem. We obtain similar learning accuracies, with better running times, allowing us to consider much larger problem sizes. For instance, we can train on 15 blocks and then perform well on worlds with 100–800 blocks–using less running time than the relational method required to perform well for 3–10 blocks.  相似文献   

15.
Multi-instance learning was first proposed by Dietterich et al. (Artificial Intelligence 89(1–2):31–71, 1997) when they were investigating the problem of drug activity prediction. Here, the training set is composed of labeled bags, each of which consists of many unlabeled instances. And the goal of this learning framework is to learn some classifier from the training set for correctly labeling unseen bags. After Dietterich et al., many studies about this new learning framework have been started and many new algorithms have been proposed, for example, DD, EM-DD, Citation-kNN and so on. All of these algorithms are working on the full data set. But as in single-instance learning, different feature in training set has different effect on the training about classifier. In this paper, we will study the problem about feature selection in multi-instance learning. We will extend the data reliability measure and make it select the key feature in multi-instance scenario.  相似文献   

16.
We study the problem of determining the spanning tree congestion of a?graph. We present some sharp contrasts in the parameterized complexity of this problem. First, we show that on apex-minor-free graphs, a general class of graphs containing planar graphs, graphs of bounded treewidth, and graphs of bounded genus, the problem to determine whether a given graph has spanning tree congestion at most k can be solved in linear time for every fixed k. We also show that for every fixed k and d the problem is solvable in linear time for graphs of degree at most d. In contrast, if we allow only one vertex of unbounded degree, the problem immediately becomes NP-complete for any fixed k??8. Moreover, the hardness result holds for graphs excluding the complete graph on 6 vertices as a minor. We also observe that for k??3 the problem becomes polynomially time solvable.  相似文献   

17.
强化学习(reinforcement learning)是机器学习和人工智能领域的重要分支,近年来受到社会各界和企业的广泛关注。强化学习算法要解决的主要问题是,智能体如何直接与环境进行交互来学习策略。但是当状态空间维度增加时,传统的强化学习方法往往面临着维度灾难,难以取得好的学习效果。分层强化学习(hierarchical reinforcement learning)致力于将一个复杂的强化学习问题分解成几个子问题并分别解决,可以取得比直接解决整个问题更好的效果。分层强化学习是解决大规模强化学习问题的潜在途径,然而其受到的关注不高。本文将介绍和回顾分层强化学习的几大类方法。  相似文献   

18.

Unit commitment problem (UCP) aims at optimizing generation cost for meeting a given load demand under several operational constraints. We propose to use fuzzy reinforcement learning (RL) approach for efficient and reliable solution to the unit commitment problem. In particular, we cast UCP as a multiagent fuzzy reinforcement learning task wherein individual generators act as players for optimizing the cost to meet a given load over a twenty-four-hour period. Unit commitment task has been fuzzified, and the most optimal unit commitment solution is generated by employing RL on this fuzzy multigenerator setup. Our proposed multiagent RL framework does not assume any a priori task or system knowledge, and the generators gradually learn to produce most optimal output solely based on their collective generation. We look at the UCP as a sequential decision-making task with reward/penalty to reduce the collective generation cost of generators. To the best of our knowledge, ours is a first attempt at solving UCP by employing fuzzy reinforcement learning. We test our approach on a ten-generating-unit system with several equality and inequality constraints. Simulation results and comparisons against several recent UCP solution methods prove superiority and viability of our proposed multiagent fuzzy reinforcement learning technique.

  相似文献   

19.
Multi-task learning, learning of a set of tasks together, can improve performance in the individual learning tasks. Gaussian process models have been applied to learning a set of tasks on different data sets, by constructing joint priors for functions underlying the tasks. In these previous Gaussian process models, the setting has been symmetric in the sense that all the tasks have been assumed to be equally important, whereas in settings such as transfer learning the goal is asymmetric, to enhance performance in a target task given the other tasks. We propose a focused Gaussian process model which introduces an ??explaining away?? model for each of the additional tasks to model their non-related variation, in order to focus the transfer to the task-of-interest. This focusing helps reduce the key problem of negative transfer, which may cause performance to even decrease if the tasks are not related closely enough. In experiments, our model improves performance compared to single-task learning, symmetric multi-task learning using hierarchical Dirichlet processes, transfer learning based on predictive structure learning, and symmetric multi-task learning with Gaussian processes.  相似文献   

20.
Engineers and researchers are paying more attention to reinforcement learning (RL) as a key technique for realizing adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. Our approach mainly deals with the problem of designing state and action spaces. Previously, an adaptive state space construction method which is called a ??state space filter?? and an adaptive action space construction method which is called ??switching RL??, have been proposed after the other space has been fixed. Then, we have reconstituted these two construction methods as one method by treating the former method and the latter method as a combined method for mimicking an infant??s perceptual and motor developments and we have proposed a method which is based on introducing and referring to ??entropy??. In this paper, a computational experiment was conducted using a so-called ??robot navigation problem?? with three-dimensional continuous state space and two-dimensional continuous action space which is more complicated than a so-called ??path planning problem??. As a result, the validity of the proposed method has been confirmed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号