共查询到20条相似文献,搜索用时 375 毫秒
1.
2.
构建了用于倒立摆平衡控制的神经网络学习模型。该模型利用可生长结构神经网络的优势,不需要预先规定网络的结构和规模,便可以在学习过程中根据需要生长。基于可生长结构的神经网络将监督与无监督学习结合,能够快速学习刺激与响应之间的潜在关系。该神经网络离线进行监督学习,训练后作为控制器作用于倒立摆系统,构成基于可生长结构的倒立摆控制模型。以Matlab为开发工具进行了仿真实验。仿真结果表明,该模型能够完成一级倒立摆平衡控制任务,并验证了其有效性和抗干扰能力: 相似文献
3.
强化学习是解决自适应问题的重要方法,被广泛地应用于连续状态下的学习控制,然而存在效率不高和收敛速度较慢的问题.在运用反向传播(back propagation,BP)神经网络基础上,结合资格迹方法提出一种算法,实现了强化学习过程的多步更新.解决了输出层的局部梯度向隐层节点的反向传播问题,从而实现了神经网络隐层权值的快速更新,并提供一个算法描述.提出了一种改进的残差法,在神经网络的训练过程中将各层权值进行线性优化加权,既获得了梯度下降法的学习速度又获得了残差梯度法的收敛性能,将其应用于神经网络隐层的权值更新,改善了值函数的收敛性能.通过一个倒立摆平衡系统仿真实验,对算法进行了验证和分析.结果显示,经过较短时间的学习,本方法能成功地控制倒立摆,显著提高了学习效率. 相似文献
4.
倒立摆系统是强化学习的一种重要的应用领域。首先分析指出在倒立摆系统中,常用的强化学习算法存在着极限环问题,算法无法正确收敛、控制策略不稳定。但是由于在简单的一级倒立摆系统中算法的控制策略不稳定的现象还不明显,因此极限环问题常常被忽视。针对强化学习算法中极限环问题,提出基于动作连续性准则的强化学习算法。算法采用修正强化信号和改进探索策略的方法克服极限环对倒立摆系统的影响。将提出的算法用于二级倒立摆的实际系统控制中,实验结果证明算法不仅能成功控制倒立摆,而且可以保持控制策略的稳定。 相似文献
5.
基于Q学习算法和BP神经网络的倒立摆控制 总被引:37,自引:1,他引:37
Q学习是Watkins[1]提出的求解信息不完全马尔可夫决策问题的一种强化学习方
法.将Q学习算法和BP神经网络有效结合,实现了状态未离散化的倒立摆的无模型学习控
制.仿真表明:该方法不仅能成功解决确定和随机倒立摆模型的平衡控制,而且和Anderson[2]
的AHC(Adaptive Heuristic Critic)等方法相比,具有更好的学习效果. 相似文献
6.
对单级倒立摆系统的平衡控制问题进行了研究,分别采用PD,PI和PID三种方案实现了单级倒立摆系统的平衡控制。首先,建立系统的数学模型,然后通过仿真实验设计并整定各方案的控制器参数,将所设计的控制器分别在实际的物理设备上进行实时控制实验,都成功地实现了倒立摆的平衡控制。实际控制结果验证了各方案的正确性和有效性。 相似文献
7.
进化神经网络在倒立摆控制中的应用 总被引:2,自引:1,他引:2
倒立摆作为典型的非线性系统,伴随着多变量、快速运动和绝对不稳定的特征,难于建立精确的数学模型,这就使得对倒立摆的控制变得异常困难和复杂。智能控制理论则是解决此问题的一个有效途径,该文针对倒立摆控制的传统神经网络算法(即BP算法)的缺点,将遗传算法与神经网络结合起来,提出了倒立摆的进化神经网络控制方法。控制器在结构上采用神经网络,利用遗传算法优化神经网络的连接权值。实验研究表明,该控制器不仅具有良好的动态和稳态控制性能,而且对于干扰也具有很强的抑制能力。同时还具备结构简单,易于实现的优点。 相似文献
8.
倒立摆系统以其自身的不稳定性而难以控制,也因此成为自动控制实验中验证控制策略优劣的极好的实验装置;针对倒立摆系统的平衡控制问题,提出了用一种应用神经网络来控制倒立摆的方法,同时由于神经元网络的训练的反复性,因此在系统中加入一个模糊控制器,来对神经网络输出的控制变量进行补偿,使神经元网络训练的权值能够始终保持在某一稳定值,从而保证了控制器稳定,仿真实验结果表明采用该方法设计的并联型模糊神经网络控制器对倒立摆这一先天不稳定的系统具有理想的控制效果。 相似文献
9.
10.
11.
12.
In this article, we propose a new control method using reinforcement learning (RL) with the concept of sliding mode control
(SMC). Some remarkable characteristics of the SMC method are good robustness and stability for deviations from control conditions.
On the other hand, RL may be applicable to complex systems that are difficult to model. However, applying reinforcement learning
to a real system has a serious problem, i.e., many trials are required for learning. We intend to develop a new control method
with good characteristics for both these methods. To realize it, we employ the actor-critic method, a kind of RL, to unite
with the SMC. We are able to verify the effectiveness of the proposed control method through a computer simulation of inverted
pendulum control without the use of inverted pendulum dynamics. In particular, it is shown that the proposed method enables
the RL to learn in fewer trials than the reinforcement learning method.
This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January
31–February 2, 2008 相似文献
13.
《Advanced Robotics》2013,27(10):1215-1229
Reinforcement learning is the scheme for unsupervised learning in which robots are expected to acquire behavior skills through self-explorations based on reward signals. There are some difficulties, however, in applying conventional reinforcement learning algorithms to motion control tasks of a robot because most algorithms are concerned with discrete state space and based on the assumption of complete observability of the state. Real-world environments often have partial observablility; therefore, robots have to estimate the unobservable hidden states. This paper proposes a method to solve these two problems by combining the reinforcement learning algorithm and a learning algorithm for a continuous time recurrent neural network (CTRNN). The CTRNN can learn spatio-temporal structures in a continuous time and space domain, and can preserve the contextual flow by a self-organizing appropriate internal memory structure. This enables the robot to deal with the hidden state problem. We carried out an experiment on the pendulum swing-up task without rotational speed information. As a result, this task is accomplished in several hundred trials using the proposed algorithm. In addition, it is shown that the information about the rotational speed of the pendulum, which is considered as a hidden state, is estimated and encoded on the activation of a context neuron. 相似文献
14.
A reinforcement scheme for learning automata, applicable to real situations where the reinforcement received from the environment is delayed, is presented. The scheme divides the state space into regions following the boxes approach of Michie and Chambers. Each region maintains estimates of the reward characteristics of the environment and contains a local automaton that updates action probabilities whenever the system state enters it. Estimates of reward characteristics are obtained using reinforcement received during the period of eligibility. Results obtained through computer simulation of the inverted pendulum problem are compared with the adaptive critic learning developed by Barto et al. (1983). 相似文献
15.
A consideration of human immunity-based reinforcement learning with continuous states 总被引:1,自引:1,他引:0
Many reinforcement learning methods have been studied on the assumption that a state is discretized and the environment size
is predetermined. However, an operating environment may have a continuous state and its size may not be known in advance,
e.g., in robot navigation and control. When applying these methods to the environment described above, we may need a large
amount of time for learning or failing to learn. In this study, we improve our previous human immunity-based reinforcement
learning method so that it will work in continuous state space environments. Since our method selects an action based on the
distance between the present state and the memorized action, information about the environment (e.g., environment size) is
not required in advance. The validity of our method is demonstrated through simulations for the swingup control of an inverted
pendulum. 相似文献
16.
针对连续空间下的强化学习控制问题,提出了一种基于自组织模糊RBF网络的Q学习方法.网络的输入为状态,输出为连续动作及其Q值,从而实现了“连续状态—连续动作”的映射关系.首先将连续动作空间离散化为确定数目的离散动作,采用完全贪婪策略选取具有最大Q值的离散动作作为每条模糊规则的局部获胜动作.然后采用命令融合机制对获胜的离散动作按其效用值进行加权,得到实际作用于系统的连续动作.另外,为简化网络结构和提高学习速度,采用改进的RAN算法和梯度下降法分别对网络的结构和参数进行在线自适应调整.倒立摆平衡控制的仿真结果验证了所提Q学习方法的有效性. 相似文献
17.
连续状态自适应离散化基于K-均值聚类的强化学习方法 总被引:5,自引:1,他引:5
使用聚类算法对连续状态空间进行自适应离散化.得到了基于K-均值聚类的强化学习方法.该方法的学习过程分为两部分:对连续状态空间进行自适应离散化的状态空间学习,使用K-均值聚类算法;寻找最优策略的策略学习.使用替代合适迹Sarsa学习算法.对连续状态的强化学习基准问题进行仿真实验,结果表明该方法能实现对连续状态空间的自适应离散化,并最终学习到最优策略.与基于CMAC网络的强化学习方法进行比较.结果表明该方法具有节省存储空间和缩短计算时间的优点. 相似文献
18.
针对倒立摆系统,提出了在结构上可生长的神经网络控制方案。网络利用细胞生长结构算法,在工作域中实现对状态变量的模式分类,并通过新神经元的插入实现网络规模的生长演化。在输出域中针对倒立摆控制任务采用强化Hebb学习机制,实现不同的神经元以最佳方式响应不同性质的信号刺激。仿真表明,通过神经网络自身的发育,该方案有效控制了倒立摆系统。 相似文献
19.
20.
提出一种模糊神经网络的自适应控制方案。针对连续空间的复杂学习任务,提出了一种竞争式Takagi—Sugeno模糊再励学习网络,该网络结构集成了Takagi-Sugeno模糊推理系统和基于动作的评价值函数的再励学习方法。相应地,提出了一种优化学习算法,其把竞争式Takagi-Sugeno模糊再励学习网络训练成为一种所谓的Takagi-Sugeno模糊变结构控制器。以一级倒立摆控制系统为例.仿真研究表明所提出的学习算法在性能上优于其它的再励学习算法。 相似文献