首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 46 毫秒
1.
Technical Note: Q-Learning   总被引:6,自引:0,他引:6  
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states.This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989). We show thatQ-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely. We also sketch extensions to the cases of non-discounted, but absorbing, Markov environments, and where manyQ values can be changed each iteration, rather than just one.  相似文献   

2.
基于多Agent的并行Q-学习算法   总被引:1,自引:0,他引:1  
提出了一种多Agent并行Q-学习算法.学习系统中存在多个Agent,它们的学习环境、学习任务及自身功能均相同,在每个学习周期内,各个Agent在各自独立的学习环境中进行学习,当一个学习周期结束后,对各个Agent的学习结果进行融合,融合后的结果被所有的Agent共享,并以此为基础进行下一个周期的学习.实验结果表明了该方法的可行性和有效性。  相似文献   

3.
Computers and algorithms are widely used to help in stock market decision making. A few questions with regards to the profitability of algorithms for stock trading are can computers be trained to beat the markets? Can an algorithm take decisions for optimal profits? And so forth. In this research work, our objective is to answer some of these questions. We propose an algorithm using deep Q-Reinforcement Learning techniques to make trading decisions. Trading in stock markets involves potential risk because the price is affected by various uncertain events ranging from political influences to economic constraints. Models that trade using predictions may not always be profitable mainly due to the influence of various unknown factors in predicting the future stock price. Trend Following is a trading idea in which, trading decisions, like buying and selling, are taken purely according to the observed market trend. A stock trend can be up, down, or sideways. Trend Following does not predict the stock price but follows the reversals in the trend direction. A trend reversal can be used to trigger a buy or a sell of a certain stock. In this research paper, we describe a deep Q-Reinforcement Learning agent able to learn the Trend Following trading by getting rewarded for its trading decisions. Our results are based on experiments performed on the actual stock market data of the American and the Indian stock markets. The results indicate that the proposed model outperforms forecasting-based methods in terms of profitability. We also limit risk by confirming trading actions with the trend before actual trading.  相似文献   

4.
Continuous-Action Q-Learning   总被引:1,自引:0,他引:1  
This paper presents a Q-learning method that works in continuous domains. Other characteristics of our approach are the use of an incremental topology preserving map (ITPM) to partition the input space, and the incorporation of bias to initialize the learning process. A unit of the ITPM represents a limited region of the input space and maps it onto the Q-values of M possible discrete actions. The resulting continuous action is an average of the discrete actions of the winning unit weighted by their Q-values. Then, TD() updates the Q-values of the discrete actions according to their contribution. Units are created incrementally and their associated Q-values are initialized by means of domain knowledge. Experimental results in robotics domains show the superiority of the proposed continuous-action Q-learning over the standard discrete-action version in terms of both asymptotic performance and speed of learning. The paper also reports a comparison of discounted-reward against average-reward Q-learning in an infinite horizon robotics task.  相似文献   

5.
基于经验知识的Q-学习算法   总被引:1,自引:0,他引:1  
为了提高智能体系统中的典型的强化学习Q-学习的学习速度和收敛速度,使学习过程充分利用环境信息,本文提出了一种基于经验知识的Q-学习算法.该算法利用具有经验知识信息的函数,使智能体在进行无模型学习的同时学习系统模型,避免对环境模型的重复学习,从而加速智能体的学习速度.仿真实验结果表明:该算法使学习过程建立在较好的学习基础上,从而更快地趋近于最优状态,其学习效率和收敛速度明显优于标准的Q-学习.  相似文献   

6.
This paper investigates how adaptive software agents may utilize reinforcement learning algorithms such as Q-learning to make economic decisions such as setting prices in a competitive marketplace. For a single adaptive agent facing fixed-strategy opponents, ordinary Q-learning is guaranteed to find the optimal policy. However, for a population of agents each trying to adapt in the presence of other adaptive agents, the problem becomes non-stationary and history dependent, and it is not known whether any global convergence will be obtained, and if so, whether such solutions will be optimal. In this paper, we study simultaneous Q-learning by two competing seller agents in three moderately realistic economic models. This is the simplest case in which interesting multi-agent phenomena can occur, and the state space is small enough so that lookup tables can be used to represent the Q-functions. We find that, despite the lack of theoretical guarantees, simultaneous convergence to self-consistent optimal solutions is obtained in each model, at least for small values of the discount parameter. In some cases, exact or approximate convergence is also found even at large discount parameters. We show how the Q-derived policies increase profitability and damp out or eliminate cyclic price wars compared to simpler policies based on zero lookahead or short-term lookahead. In one of the models (the Shopbot model) where the sellers' profit functions are symmetric, we find that Q-learning can produce either symmetric or broken-symmetry policies, depending on the discount parameter and on initial conditions.  相似文献   

7.
Q学习算法在RoboCup带球中的应用   总被引:1,自引:0,他引:1  
机器人世界杯足球锦标赛(RoboCup)是全球影响力最大的机器人足球比赛之一,而仿真组比赛是其重要的组成部分。鉴于带球技术在仿真组比赛中的重要性,我们将Q学习算法应用于带球技术训练中,使智能体本身具有学习和适应能力,能够自己从环境中获取知识。本文描述了应用Q学习算法在特定场景中进行1vs.1带球技术训练的方法和实验过程,并将训练方法应用于实际球队的训练之中进行了验证。  相似文献   

8.
平均报酬模型的多步强化学习算法   总被引:3,自引:0,他引:3  
讨论模型未知的平均报酬强化学习算法。通过结合即时差分学习与R学习算法,将折扣问题中的一些方法推广到了平均准则问题中,提出了两类算法:R(λ)学习。现有的R学习可视为R(λ)学习和TTD(λ)学习当λ=0时的一个特例。仿真结果表明,λ取中间值的R(λ)和TTD(λ)学习比现有的方法在可靠性与收敛速度上均有提高。  相似文献   

9.
协同设计任务调度的多步Q学习算法   总被引:3,自引:0,他引:3  
首先建立任务调度问题的目标模型,在分析Q学习算法的基础上,给出调度问题的马尔可夫决策过程描述;针对任务调度的Q学习算法更新速度慢的问题,提出一种基于多步信息更新值函数的多步Q学习调度算法.应用实例表明,该算法能够提高收敛速度,有效地解决任务调度问题.  相似文献   

10.
提出一种基于Q-learning算法的建筑能耗预测方法.通过将建筑能耗预测问题建模为一个标准的马尔科夫决策过程,利用深度置信网对建筑能耗进行状态建模,结合Q-learning算法,实现对建筑能耗的实时预测.通过美国巴尔的摩燃气和电力公司公开的建筑能耗数据进行测试实验,结果表明,基于本文所提出的模型,利用Q-learning算法可以实现对建筑能耗的有效预测,并在此基础上,基于深度置信网的Q-learning算法具有更高的预测精度.此外,实验部分还进一步验证了算法中相关参数对实验性能的影响.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号