首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Dynamic control theory has long been used in solving optimal asset allocation problems, and a number of trading decision systems based on reinforcement learning methods have been applied in asset allocation and portfolio rebalancing. In this paper, we extend the existing work in recurrent reinforcement learning (RRL) and build an optimal variable weight portfolio allocation under a coherent downside risk measure, the expected maximum drawdown, E(MDD). In particular, we propose a recurrent reinforcement learning method, with a coherent risk adjusted performance objective function, the Calmar ratio, to obtain both buy and sell signals and asset allocation weights. Using a portfolio consisting of the most frequently traded exchange-traded funds, we show that the expected maximum drawdown risk based objective function yields superior return performance compared to previously proposed RRL objective functions (i.e. the Sharpe ratio and the Sterling ratio), and that variable weight RRL long/short portfolios outperform equal weight RRL long/short portfolios under different transaction cost scenarios. We further propose an adaptive E(MDD) risk based RRL portfolio rebalancing decision system with a transaction cost and market condition stop-loss retraining mechanism, and we show that the proposed portfolio trading system responds to transaction cost effects better and outperforms hedge fund benchmarks consistently.  相似文献   

2.
The portfolio management for trading in the stock market poses a challenging stochastic control problem of significant commercial interests to finance industry. To date, many researchers have proposed various methods to build an intelligent portfolio management system that can recommend financial decisions for daily stock trading. Many promising results have been reported from the supervised learning community on the possibility of building a profitable trading system. More recently, several studies have shown that even the problem of integrating stock price prediction results with trading strategies can be successfully addressed by applying reinforcement learning algorithms. Motivated by this, we present a new stock trading framework that attempts to further enhance the performance of reinforcement learning-based systems. The proposed approach incorporates multiple Q-learning agents, allowing them to effectively divide and conquer the stock trading problem by defining necessary roles for cooperatively carrying out stock pricing and selection decisions. Furthermore, in an attempt to address the complexity issue when considering a large amount of data to obtain long-term dependence among the stock prices, we present a representation scheme that can succinctly summarize the history of price changes. Experimental results on a Korean stock market show that the proposed trading framework outperforms those trained by other alternative approaches both in terms of profit and risk management.  相似文献   

3.
Driessens  Kurt  Džeroski  Sašo 《Machine Learning》2004,57(3):271-304
Reinforcement learning, and Q-learning in particular, encounter two major problems when dealing with large state spaces. First, learning the Q-function in tabular form may be infeasible because of the excessive amount of memory needed to store the table, and because the Q-function only converges after each state has been visited multiple times. Second, rewards in the state space may be so sparse that with random exploration they will only be discovered extremely slowly. The first problem is often solved by learning a generalization of the encountered examples (e.g., using a neural net or decision tree). Relational reinforcement learning (RRL) is such an approach; it makes Q-learning feasible in structural domains by incorporating a relational learner into Q-learning. The problem of sparse rewards has not been addressed for RRL. This paper presents a solution based on the use of reasonable policies to provide guidance. Different types of policies and different strategies to supply guidance through these policies are discussed and evaluated experimentally in several relational domains to show the merits of the approach.  相似文献   

4.
 We analyze learning classifier systems in the light of tabular reinforcement learning. We note that although genetic algorithms are the most distinctive feature of learning classifier systems, it is not clear whether genetic algorithms are important to learning classifiers systems. In fact, there are models which are strongly based on evolutionary computation (e.g., Wilson's XCS) and others which do not exploit evolutionary computation at all (e.g., Stolzmann's ACS). To find some clarifications, we try to develop learning classifier systems “from scratch”, i.e., starting from one of the most known reinforcement learning technique, Q-learning. We first consider thebasics of reinforcement learning: a problem modeled as a Markov decision process and tabular Q-learning. We introduce a formal framework to define a general purpose rule-based representation which we use to implement tabular Q-learning. We formally define generalization within rules and discuss the possible approaches to extend our rule-based Q-learning with generalization capabilities. We suggest that genetic algorithms are probably the most general approach for adding generalization although they might be not the only solution.  相似文献   

5.
This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H(infinity) control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H(infinity) control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.  相似文献   

6.
针对建筑节能领域中传统控制方法对于建筑物相关设备控制存在收敛速度慢、不稳定等问题,结合强化学习中经典的Q学习方法,提出一种强化学习自适应控制方法--RLAC。该方法通过对建筑物内能耗交换机制进行建模,结合Q学习方法,求解最优值函数,进一步得出最优控制策略,确保在不降低建筑物人体舒适度的情况下,达到建筑节能的目的。将所提出的RLAC与On/Off以及Fuzzy-PD方法用于模拟建筑物能耗问题进行对比实验,实验结果表明,RLAC具有较快的收敛速度以及较好的收敛精度。  相似文献   

7.
Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.  相似文献   

8.
Relational reinforcement learning (RRL) combines traditional reinforcement learning (RL) with a strong emphasis on a relational (rather than attribute-value) representation. Earlier work used RRL on a learning version of the classic Blocks World planning problem (a version where the learner does not know what the result of taking an action will be) and the Tetris game. Learning results based on the structure of training examples were obtained, such as learning in a mixed 3–5 block environment and being able to perform in a 3 or 10 block environment. Here, we instead take a function approximation approach to RL for the Blocks World problem. We obtain similar learning accuracies, with better running times, allowing us to consider much larger problem sizes. For instance, we can train on 15 blocks and then perform well on worlds with 100–800 blocks–using less running time than the relational method required to perform well for 3–10 blocks.  相似文献   

9.
提出一种改进深度强化学习算法(NDQN),解决传统Q-learning算法处理复杂地形中移动机器人路径规划时面临的维数灾难.提出一种将深度学习融于Q-learning框架中,以网络输出代替Q值表的深度强化学习方法.针对深度Q网络存在严重的过估计问题,利用更正函数对深度Q网络中的评价函数进行改进.将改进深度强化学习算法与...  相似文献   

10.
样本有限关联值递归Q学习算法及其收敛性证明   总被引:5,自引:0,他引:5  
一个激励学习Agent通过学习一个从状态到动作映射的最优策略来解决问题,求解最优决策一般有两种途径:一种是求最大奖赏方法,另一种最求最优费用方法,利用求解最优费用函数的方法给出了一种新的Q学习算法,Q学习算法是求解信息不完全Markov决策问题的一种有效激励学习方法。Watkins提出了Q学习的基本算法,尽管他证明了在满足一定条件下Q值学习的迭代公式的收敛性,但是在他给出的算法中,没有考虑到在迭代过程中初始状态与初始动作的选取对后继学习的影响,因此提出的关联值递归Q学习算法改进了原来的Q学习算法,并且这种算法有比较好的收敛性质,从求解最优费用函数的方法出发,给出了Q学习的关联值递归算法,这种方法的建立可以使得动态规划(DP)算法中的许多结论直接应用到Q学习的研究中来。  相似文献   

11.
李金娜  尹子轩 《控制与决策》2019,34(11):2343-2349
针对具有数据包丢失的网络化控制系统跟踪控制问题,提出一种非策略Q-学习方法,完全利用可测数据,在系统模型参数未知并且网络通信存在数据丢失的情况下,实现系统以近似最优的方式跟踪目标.首先,刻画具有数据包丢失的网络控制系统,提出线性离散网络控制系统跟踪控制问题;然后,设计一个Smith预测器补偿数据包丢失对网络控制系统性能的影响,构建具有数据包丢失补偿的网络控制系统最优跟踪控制问题;最后,融合动态规划和强化学习方法,提出一种非策略Q-学习算法.算法的优点是:不要求系统模型参数已知,利用网络控制系统可测数据,学习基于预测器状态反馈的最优跟踪控制策略;并且该算法能够保证基于Q-函数的迭代Bellman方程解的无偏性.通过仿真验证所提方法的有效性.  相似文献   

12.
基于每阶段平均费用最优的激励学习算法   总被引:4,自引:0,他引:4  
文中利用求解最优费用函数的方法给出了一种新的激励学习算法,即基于每阶段平均费用最优的激励学习算法。这种学习算法是求解信息不完全Markov决策问题的一种有效激励学习方法,它从求解分阶段最优平均费用函数的方法出发,分析了最优解的存在性,分阶段最优平均费用函数与初始状态的关系以及与之相关的Bellman方程。这种方法的建立,可以使得动态规划(DP)算法中的许多结论直接应用到激励学习的研究中来。  相似文献   

13.
RRL is a relational reinforcement learning system based on Q-learning in relational state-action spaces. It aims to enable agents to learn how to act in an environment that has no natural representation as a tuple of constants. For relational reinforcement learning, the learning algorithm used to approximate the mapping between state-action pairs and their so called Q(uality)-value has to be very reliable, and it has to be able to handle the relational representation of state-action pairs. In this paper we investigate the use of Gaussian processes to approximate the Q-values of state-action pairs. In order to employ Gaussian processes in a relational setting we propose graph kernels as a covariance function between state-action pairs. The standard prediction mechanism for Gaussian processes requires a matrix inversion which can become unstable when the kernel matrix has low rank. These instabilities can be avoided by employing QR-factorization. This leads to better and more stable performance of the algorithm and a more efficient incremental update mechanism. Experiments conducted in the blocks world and with the Tetris game show that Gaussian processes with graph kernels can compete with, and often improve on, regression trees and instance based regression as a generalization algorithm for RRL. Editors: David Page and Akihiro Yamamoto  相似文献   

14.
This paper presents a neural architecture for learning category nodes encoding mappings across multimodal patterns involving sensory inputs, actions, and rewards. By integrating adaptive resonance theory (ART) and temporal difference (TD) methods, the proposed neural model, called TD fusion architecture for learning, cognition, and navigation (TD-FALCON), enables an autonomous agent to adapt and function in a dynamic environment with immediate as well as delayed evaluative feedback (reinforcement) signals. TD-FALCON learns the value functions of the state-action space estimated through on-policy and off-policy TD learning methods, specifically state-action-reward-state-action (SARSA) and Q-learning. The learned value functions are then used to determine the optimal actions based on an action selection policy. We have developed TD-FALCON systems using various TD learning strategies and compared their performance in terms of task completion, learning speed, as well as time and space efficiency. Experiments based on a minefield navigation task have shown that TD-FALCON systems are able to learn effectively with both immediate and delayed reinforcement and achieve a stable performance in a pace much faster than those of standard gradient-descent-based reinforcement learning systems.  相似文献   

15.
基于有限样本的最优费用关联值递归Q学习算法   总被引:4,自引:2,他引:4  
一个激励学习Agent通过学习一个从状态到动作映射的最优策略来求解决策问题。求解最优决策一般有两种途径,一种是求最大奖赏方法,另一种是求最优费用方法。该文利用求解最优费用函数的方法给出了一种新的Q学习算法。Q学习算法是求解信息不完全Markov决策问题的一种有效激励学习方法。文章从求解最优费用函数的方法出发,给出了Q学习的关联值递归算法,这种方法的建立,可以使得动态规划(DP)算法中的许多结论直接应用到Q学习的研究中来。  相似文献   

16.
Stock trading is an important decision-making problem that involves both stock selection and asset management. Though many promising results have been reported for predicting prices, selecting stocks, and managing assets using machine-learning techniques, considering all of them is challenging because of their complexity. In this paper, we present a new stock trading method that incorporates dynamic asset allocation in a reinforcement-learning framework. The proposed asset allocation strategy, called meta policy (MP), is designed to utilize the temporal information from both stock recommendations and the ratio of the stock fund over the asset. Local traders are constructed with pattern-based multiple predictors, and used to decide the purchase money per recommendation. Formulating the MP in the reinforcement learning framework is achieved by a compact design of the environment and the learning agent. Experimental results using the Korean stock market show that the proposed MP method outperforms other fixed asset-allocation strategies, and reduces the risks inherent in local traders.  相似文献   

17.
The present study aims at contributing to the current state-of-the art of activity-based travel demand modelling by presenting a framework to simulate sequential data. To this end, the suitability of a reinforcement learning approach to reproduce sequential data is explored. Additionally, as traditional reinforcement learning techniques are not capable of learning efficiently in large state and action spaces with respect to memory and computational time requirements on the one hand, and of generalizing based on infrequent visits of all state-action pairs on the other hand, the reinforcement learning technique as used in most applications, is enhanced by means of regression tree function approximation.Three reinforcement learning algorithms are implemented to validate their applicability: the traditional Q-learning and Q-learning with bucket-brigade updating are tested against the improved reinforcement learning approach with a CART function approximator. These methods are applied on data of 26 diary days. The results are promising and show that the proposed techniques offer great opportunity of simulating sequential data. Moreover, the reinforcement learning approach improved by introducing a regression tree function approximator learns a more optimal solution much faster than the two traditional Q-learning approaches.  相似文献   

18.
Kernel-Based Reinforcement Learning   总被引:5,自引:0,他引:5  
Ormoneit  Dirk  Sen  Śaunak 《Machine Learning》2002,49(2-3):161-178
We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem.  相似文献   

19.
Q学习算法在库存控制中的应用   总被引:9,自引:0,他引:9  
Q学习算法是Watkins提出的求解信息不完全马尔可夫决策问题的一种强化学习 方法.这里提出了一种新的探索策略,并将该策略和Q学习算法有效结合来求解一类典型的 有连续状态和决策空间的库存控制问题.仿真表明,该方法所求解的控制策略和用值迭代法 在模型已知的情况下所求得的最优策略非常逼近,从而证实了Q学习算法在一些系统模型 未知的工程控制问题中的应用潜力.  相似文献   

20.
Q学习算法是Watkins提出的求解信息不完全马尔可夫决策问题的一种强化学习方法.这里提出了一种新的探索策略,并将该策略和Q学习算法有效结合来求解一类典型的有连续状态和决策空间的库存控制问题.仿真表明,该方法所求解的控制策略和用值迭代法在模型已知的情况下所求得的最优策略非常逼近,从而证实了Q学习算法在一些系统模型未知的工程控制问题中的应用潜力.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号