首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
双轮驱动移动机器人的学习控制器设计方法*   总被引:1,自引:0,他引:1  
提出一种基于增强学习的双轮驱动移动机器人路径跟随控制方法,通过将机器人运动控制器的优化设计问题建模为Markov决策过程,采用基于核的最小二乘策略迭代算法(KLSPI)实现控制器参数的自学习优化。与传统表格型和基于神经网络的增强学习方法不同,KLSPI算法在策略评价中应用核方法进行特征选择和值函数逼近,从而提高了泛化性能和学习效率。仿真结果表明,该方法通过较少次数的迭代就可以获得优化的路径跟随控制策略,有利于在实际应用中的推广。  相似文献   

2.
As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In this paper, a novel RL approach with fast policy search and adaptive basis function selection, which is called Continuous-action Approximate Policy Iteration (CAPI), is proposed for RL in MDPs with both continuous state and action spaces. In CAPI, based on the value functions estimated by temporal-difference learning, a fast policy search technique is suggested to search for optimal actions in continuous spaces, which is computationally efficient and easy to implement. To improve the generalization ability and learning efficiency of CAPI, two adaptive basis function selection methods are developed so that sparse approximation of value functions can be obtained efficiently both for linear function approximators and kernel machines. Simulation results on benchmark learning control tasks with continuous state and action spaces show that the proposed approach not only can converge to a near-optimal policy in a few iterations but also can obtain comparable or even better performance than Sarsa-learning, and previous approximate policy iteration methods such as LSPI and KLSPI.  相似文献   

3.
吴军  徐昕  连传强  黄岩 《机器人》2011,33(3):379-384
提出一种分布式的核增强学习方法来优化多机器人编队控制性能.首先,通过添加虚拟领队机器人,结合分布式的跟随控制策略,实现基本的多机器人编队控制:其次,提出结合最小二乘策略迭代和策略评测的核增强学习方法,即利用基于核的最小二乘策略迭代算法离线获取初始的编队优化控制策略,再利用基于核的最小二乘策略计测算法实现编队控制策略的在...  相似文献   

4.
Robust motion control is fundamental to autonomous mobile robots. In the past few years, reinforcement learning (RL) has attracted considerable attention in the feedback control of wheeled mobile robot. However, it is still difficult for RL to solve problems with large or continuous state spaces, which is common in robotics. To improve the generalization ability of RL, this paper presents a novel hierarchical RL approach for optimal path tracking of wheeled mobile robots. In the proposed approach, a graph Laplacian-based hierarchical approximate policy iteration (GHAPI) algorithm is developed, in which the basis functions are constructed automatically using the graph Laplacian operator. In GHAPI, the state space of an Markov decision process is divided into several subspaces and approximate policy iteration is carried out on each subspace. Then, a near-optimal path-tracking control strategy can be obtained by GHAPI combined with proportional-derivative (PD) control. The performance of the proposed approach is evaluated by using a P3-AT wheeled mobile robot. It is demonstrated that the GHAPI-based PD control can obtain better near-optimal control policies than previous approaches.  相似文献   

5.
基于状态-动作图测地高斯基的策略迭代强化学习   总被引:3,自引:2,他引:1  
在策略迭代强化学习中,基函数构造是影响动作值函数逼近精度的一个重要因素.为了给动作值函数逼近提供合适的基函数,提出一种基于状态-动作图测地高斯基的策略迭代强化学习方法.首先,根据离策略方法建立马尔可夫决策过程的状态-动作图论描述;然后,在状态-动作图上定义测地高斯核函数,利用基于近似线性相关的核稀疏方法自动选择测地高斯...  相似文献   

6.
This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a preference-based approach to reinforcement learning is the observation that in many real-world domains, numerical feedback signals are not readily available, or are defined arbitrarily in order to satisfy the needs of conventional RL algorithms. Instead, we propose an alternative framework for reinforcement learning, in which qualitative reward signals can be directly used by the learner. The framework may be viewed as a generalization of the conventional RL framework in which only a partial order between policies is required instead of the total order induced by their respective expected long-term reward. Therefore, building on novel methods for preference learning, our general goal is to equip the RL agent with qualitative policy models, such as ranking functions that allow for sorting its available actions from most to least promising, as well as algorithms for learning such models from qualitative feedback. As a proof of concept, we realize a first simple instantiation of this framework that defines preferences based on utilities observed for trajectories. To that end, we build on an existing method for approximate policy iteration based on roll-outs. While this approach is based on the use of classification methods for generalization and policy learning, we make use of a specific type of preference learning method called label ranking. Advantages of preference-based approximate policy iteration are illustrated by means of two case studies.  相似文献   

7.
徐昕  沈栋  高岩青  王凯 《自动化学报》2012,38(5):673-687
基于马氏决策过程(Markov decision process, MDP)的动态系统学习控制是近年来一个涉及机器学习、控制理论和运筹学等多个学科的交叉研究方向, 其主要目标是实现系统在模型复杂或者不确定等条件下基于数据驱动的多阶段优化控制. 本文对基于MDP的动态系统学习控制理论、算法与应用的发展前沿进行综述,重点讨论增强学习(Reinforcement learning, RL)与近似动态规划(Approximate dynamic programming, ADP)理论与方法的研究进展,其中包括时域差值学习理论、求解连续状态与行为空间MDP的值函数逼近方法、 直接策略搜索与近似策略迭代、自适应评价设计算法等,最后对相关研究领域的应用及发展趋势进行分析和探讨.  相似文献   

8.
基于核学习的强大非线性映射性能,针对短时交通流量预测,提出一类基于核学习方法的预测模型。核递推最小二乘(KRLS)基于近似线性依赖(approximate linear dependence,ALD) 技术可降低计算复杂度及存储量,是一种在线核学习方法,适用于较大规模数据集的学习;核偏最小二乘(KPLS)方法将输入变量投影在潜在变量上,利用输入与输出变量之间的协方差信息提取潜在特征;核极限学习机(KELM)方法用核函数表示未知的隐含层非线性特征映射,通过正则化最小二乘算法计算网络的输出权值,能以极快的学习速度获得良好的推广性。为验证所提方法的有效性,将KELM、KPLS、ALD-KRLS用于不同实测交通流数据中,在同等条件下,与现有方法进行比较。实验结果表明,不同核学习方法的预测精度和训练速度均有提高,体现了核学习方法在短时交通流量预测中的应用潜力。  相似文献   

9.
This paper addresses the problem of automatically tuning multiple kernel parameters for the kernel-based linear discriminant analysis (LDA) method. The kernel approach has been proposed to solve face recognition problems under complex distribution by mapping the input space to a high-dimensional feature space. Some recognition algorithms such as the kernel principal components analysis, kernel Fisher discriminant, generalized discriminant analysis, and kernel direct LDA have been developed in the last five years. The experimental results show that the kernel-based method is a good and feasible approach to tackle the pose and illumination variations. One of the crucial factors in the kernel approach is the selection of kernel parameters, which highly affects the generalization capability and stability of the kernel-based learning methods. In view of this, we propose an eigenvalue-stability-bounded margin maximization (ESBMM) algorithm to automatically tune the multiple parameters of the Gaussian radial basis function kernel for the kernel subspace LDA (KSLDA) method, which is developed based on our previously developed subspace LDA method. The ESBMM algorithm improves the generalization capability of the kernel-based LDA method by maximizing the margin maximization criterion while maintaining the eigenvalue stability of the kernel-based LDA method. An in-depth investigation on the generalization performance on pose and illumination dimensions is performed using the YaleB and CMU PIE databases. The FERET database is also used for benchmark evaluation. Compared with the existing PCA-based and LDA-based methods, our proposed KSLDA method, with the ESBMM kernel parameter estimation algorithm, gives superior performance.  相似文献   

10.

In order to curb the model expansion of the kernel learning methods and adapt the nonlinear dynamics in the process of the nonstationary time series online prediction, a new online sequential learning algorithm with sparse update and adaptive regularization scheme is proposed based on kernel-based incremental extreme learning machine (KB-IELM). For online sparsification, a new method is presented to select sparse dictionary based on the instantaneous information measure. This method utilizes a pruning strategy, which can prune the least “significant” centers, and preserves the important ones by online minimizing the redundancy of dictionary. For adaptive regularization scheme, a new objective function is constructed based on basic ELM model. New model has different structural risks in different nonlinear regions. At each training step, new added sample could be assigned optimal regularization factor by optimization procedure. Performance comparisons of the proposed method with other existing online sequential learning methods are presented using artificial and real-word nonstationary time series data. The results indicate that the proposed method can achieve higher prediction accuracy, better generalization performance and stability.

  相似文献   

11.

In this paper, we develop a novel non-parametric online actor-critic reinforcement learning (RL) algorithm to solve optimal regulation problems for a class of continuous-time affine nonlinear dynamical systems. To deal with the value function approximation (VFA) with inherent nonlinear and unknown structure, a reproducing kernel Hilbert space (RKHS)-based kernelized method is designed through online sparsification, where the dictionary size is fixed and consists of updated elements. In addition, the linear independence check condition, i.e., an online criteria, is designed to determine whether the online data should be inserted into the dictionary. The RHKS-based kernelized VFA has a variable structure in accordance with the online data collection, which is different from classical parametric VFA methods with a fixed structure. Furthermore, we develop a sparse online kernelized actor-critic learning RL method to learn the unknown optimal value function and the optimal control policy in an adaptive fashion. The convergence of the presented kernelized actor-critic learning method to the optimum is provided. The boundedness of the closed-loop signals during the online learning phase can be guaranteed. Finally, a simulation example is conducted to demonstrate the effectiveness of the presented kernelized actor-critic learning algorithm.

  相似文献   

12.
为了解决增量式最小二乘孪生支持向量回归机存在构成的核矩阵无法很好地逼近原核矩阵的问题,提出了一种增量式约简最小二乘孪生支持向量回归机(IRLSTSVR)算法。该算法首先利用约简方法,判定核矩阵列向量之间的相关性,筛选出用于构成核矩阵列向量的样本作为支持向量以降低核矩阵中列向量的相关性,使得构成的核矩阵能够更好地逼近原核矩阵,保证解的稀疏性。然后通过分块矩阵求逆引理高效增量更新逆矩阵,进一步缩短了算法的训练时间。最后在基准测试数据集上验证算法的可行性和有效性。实验结果表明,与现有的代表性算法相比,IRLSTSVR算法能够获得稀疏解和更接近离线算法的泛化性能。  相似文献   

13.
《Automatica》2014,50(12):3281-3290
This paper addresses the model-free nonlinear optimal control problem based on data by introducing the reinforcement learning (RL) technique. It is known that the nonlinear optimal control problem relies on the solution of the Hamilton–Jacobi–Bellman (HJB) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, most practical systems are too complicated to establish an accurate mathematical model. To overcome these difficulties, we propose a data-based approximate policy iteration (API) method by using real system data rather than a system model. Firstly, a model-free policy iteration algorithm is derived and its convergence is proved. The implementation of the algorithm is based on the actor–critic structure, where actor and critic neural networks (NNs) are employed to approximate the control policy and cost function, respectively. To update the weights of actor and critic NNs, a least-square approach is developed based on the method of weighted residuals. The data-based API is an off-policy RL method, where the “exploration” is improved by arbitrarily sampling data on the state and input domain. Finally, we test the data-based API control design method on a simple nonlinear system, and further apply it to a rotational/translational actuator system. The simulation results demonstrate the effectiveness of the proposed method.  相似文献   

14.

针对核函数选择对最小二乘支持向量机回归模型泛化性的影响, 提出一种新的基于????- 范数约束的最小二乘支持向量机多核学习算法. 该算法提供了两种求解方法, 均通过两重循环进行求解, 外循环用于更新核函数的权值, 内循环用于求解最小二乘支持向量机的拉格朗日乘数, 充分利用该多核学习算法, 有效提高了最小二乘支持向量机的泛化能力, 而且对惩罚参数的选择具有较强的鲁棒性. 基于单变量和多变量函数的仿真实验表明了所提出算法的有效性.

  相似文献   

15.
This paper develops an online adaptive critic algorithm based on policy iteration for partially unknown nonlinear optimal control with infinite horizon cost function. In the proposed method, only a critic network is established, which eliminates the action network, to simplify its architecture. The online least squares support vector machine (LS‐SVM) is utilized to approximate the gradient of the associated cost function in the critic network by updating the input‐output data. Additionally, a data buffer memory is added to alleviate computational load. Finally, the feasibility of the online learning algorithm is demonstrated in simulation on two example systems.  相似文献   

16.
针对移动装弹机械臂系统非线性、强耦合、受多种不确定因素影响的问题,本文基于自适应动态规划方法,提出了仅包含评价网络结构的轨迹跟踪控制方法,有效减小了系统跟踪误差.首先,考虑到系统非线性特性、变量间强耦合作用及重力因素的影响,通过拉格朗日方程建立了移动装弹机械臂的动力学模型.其次,针对系统存在不确定性上界未知的问题,建立单网络评价结构,通过策略迭代算法,求解哈密顿–雅可比–贝尔曼方程,基于李雅普诺夫稳定性理论,设计了自适应动态规划轨迹跟踪控制方法.最后,通过仿真实验将该控制方法与自适应滑模控制方法进行了对比,进一步检验了所设计控制方法的有效性.  相似文献   

17.
We consider the revenue management problem of capacity control under customer choice behavior. An exact solution of the underlying stochastic dynamic program is difficult because of the multi-dimensional state space and, thus, approximate dynamic programming (ADP) techniques are widely used. The key idea of ADP is to encode the multi-dimensional state space by a small number of basis functions, often leading to a parametric approximation of the dynamic program’s value function. In general, two classes of ADP techniques for learning value function approximations exist: mathematical programming and simulation. So far, the literature on capacity control largely focuses on the first class.In this paper, we develop a least squares approximate policy iteration (API) approach which belongs to the second class. Thereby, we suggest value function approximations that are linear in the parameters, and we estimate the parameters via linear least squares regression. Exploiting both exact and heuristic knowledge from the value function, we enforce structural constraints on the parameters to facilitate learning a good policy. We perform an extensive simulation study to investigate the performance of our approach. The results show that it is able to obtain competitive revenues compared to and often outperforms state-of-the-art capacity control methods in reasonable computational time. Depending on the scarcity of capacity and the point in time, revenue improvements of around 1% or more can be observed. Furthermore, the proposed approach contributes to simulation-based ADP, bringing forth research on numerically estimating piecewise linear value function approximations and their application in revenue management environments.  相似文献   

18.
基于神经网络的强化学习算法研究   总被引:11,自引:0,他引:11  
BP神经网络在非线性控制系统中被广泛运用,但作为有导师监督的学习算法,要求批量提供输入输出对神经网络训练,而在一些并不知道最优策略的系统中,这样的输入输出对事先并无法得到,另一方面,强化学习从实际系统学习经验来调整策略,并且是一个逼近最优策略的过程,学习过程并不需要导师的监督。提出了将强化学习与BP神经网络结合的学习算法-RBP模型。该模型的基本思想是通过强化学习控制策略,经过一定周期的学习后再用学到的知识训练神经网络,以使网络逐步收敛到最优状态。最后通过实验验证了该方法的有效性及收敛性。  相似文献   

19.
In many classification problems, the class distribution is imbalanced. Learning from the imbalance data is a remarkable challenge in the knowledge discovery and data mining field. In this paper, we propose a scaling kernel-based support vector machine (SVM) approach to deal with the multi-class imbalanced data classification problem. We first use standard SVM algorithm to gain an approximate hyperplane. Then, we present a scaling kernel function and calculate its parameters using the chi-square test and weighting factors. Experimental results on KEEL data sets show the proposed algorithm can resolve the classifier performance degradation problem due to data skewed distribution and has a good generalization.  相似文献   

20.
A Kernel-Based Two-Class Classifier for Imbalanced Data Sets   总被引:3,自引:0,他引:3  
Many kernel classifier construction algorithms adopt classification accuracy as performance metrics in model evaluation. Moreover, equal weighting is often applied to each data sample in parameter estimation. These modeling practices often become problematic if the data sets are imbalanced. We present a kernel classifier construction algorithm using orthogonal forward selection (OFS) in order to optimize the model generalization for imbalanced two-class data sets. This kernel classifier identification algorithm is based on a new regularized orthogonal weighted least squares (ROWLS) estimator and the model selection criterion of maximal leave-one-out area under curve (LOO-AUC) of the receiver operating characteristics (ROCs). It is shown that, owing to the orthogonalization procedure, the LOO-AUC can be calculated via an analytic formula based on the new regularized orthogonal weighted least squares parameter estimator, without actually splitting the estimation data set. The proposed algorithm can achieve minimal computational expense via a set of forward recursive updating formula in searching model terms with maximal incremental LOO-AUC value. Numerical examples are used to demonstrate the efficacy of the algorithm  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号