期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

卓睿陈宗海陈春林《计算机仿真》2005,22(8):157-162

自主导航是移动机器人的一项关键技术。该文采用强化学习结合模糊逻辑的方法实现了未知环境下自主式移动机机器人的导航控制。文中首先介绍了强化学习原理,然后设计了一种未知环境下机器人导航框架。该框架由避碰模块、寻找目标模块和行为选择模块组成。针对该框架,提出了一种基于强化学习和模糊逻辑的学习、规划算法：在对避碰和寻找目标行为进行独立学习后,利用超声波传感器得到的环境信息进行行为选择,使机器人在成功避碰的同时到达目标点。最后通过大量的仿真实验,证明了算法的有效性。相似文献

2.

基于模糊神经网络的强化学习及其在机器人导航中的应用 总被引：5，自引：0，他引：5

段勇徐心和《控制与决策》2007,22(5):525-529

研究基于行为的移动机器人控制方法.将模糊神经网络与强化学习理论相结合,构成模糊强化系统.它既可获取模糊规则的结论部分和模糊隶属度函数参数,也可解决连续状态空间和动作空间的强化学习问题.将残差算法用于神经网络的学习,保证了函数逼近的快速性和收敛性.将该系统的学习结果作为反应式自主机器人的行为控制器,有效地解决了复杂环境中的机器人导航问题. 相似文献

3.

A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance 总被引：4，自引：0，他引：4

Cang Ye Yung N.H.C. Danwei Wang 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2003,33(1):17-27

Fuzzy logic systems are promising for efficient obstacle avoidance. However, it is difficult to maintain the correctness, consistency, and completeness of a fuzzy rule base constructed and tuned by a human expert. A reinforcement learning method is capable of learning the fuzzy rules automatically. However, it incurs a heavy learning phase and may result in an insufficiently learned rule base due to the curse of dimensionality. In this paper, we propose a neural fuzzy system with mixed coarse learning and fine learning phases. In the first phase, a supervised learning method is used to determine the membership functions for input and output variables simultaneously. After sufficient training, fine learning is applied which employs reinforcement learning algorithm to fine-tune the membership functions for output variables. For sufficient learning, a new learning method using a modification of Sutton and Barto's model is proposed to strengthen the exploration. Through this two-step tuning approach, the mobile robot is able to perform collision-free navigation. To deal with the difficulty of acquiring a large amount of training data with high consistency for supervised learning, we develop a virtual environment (VE) simulator, which is able to provide desktop virtual environment (DVE) and immersive virtual environment (IVE) visualization. Through operating a mobile robot in the virtual environment (DVE/IVE) by a skilled human operator, training data are readily obtained and used to train the neural fuzzy system. 相似文献

4.

Real-world reinforcement learning for autonomous humanoid robot docking

Nicolás Navarro-Guerrero Cornelius Weber Pascal Schroeter Stefan Wermter 《Robotics and Autonomous Systems》2012,60(11):1400-1407

Reinforcement learning (RL) is a biologically supported learning paradigm, which allows an agent to learn through experience acquired by interaction with its environment. Its potential to learn complex action sequences has been proven for a variety of problems, such as navigation tasks. However, the interactive randomized exploration of the state space, common in reinforcement learning, makes it difficult to be used in real-world scenarios. In this work we describe a novel real-world reinforcement learning method. It uses a supervised reinforcement learning approach combined with Gaussian distributed state activation. We successfully tested this method in two real scenarios of humanoid robot navigation: first, backward movements for docking at a charging station and second, forward movements to prepare grasping. Our approach reduces the required learning steps by more than an order of magnitude, and it is robust and easy to be integrated into conventional RL techniques. 相似文献

5.

基于轨迹引导的移动机器人导航策略优化算法

李忠伟刘伟鹏罗偲《计算机应用研究》2024,41(5)

针对在杂乱、障碍物密集的复杂环境下移动机器人使用深度强化学习进行自主导航所面临的探索困难,进而导致学习效率低下的问题,提出了一种基于轨迹引导的导航策略优化（TGNPO）算法。首先,使用模仿学习的方法为移动机器人训练一个能够同时提供专家示范行为与导航轨迹预测功能的专家策略,旨在全面指导深度强化学习训练;其次,将专家策略预测的导航轨迹与当前时刻移动机器人所感知的实时图像进行融合,并结合坐标注意力机制提取对移动机器人未来导航起引导作用的特征区域,提高导航模型的学习性能;最后,使用专家策略预测的导航轨迹对移动机器人的策略轨迹进行约束,降低导航过程中的无效探索和错误决策。通过在仿真和物理平台上部署所提算法,实验结果表明,相较于现有的先进方法,所提算法在导航的学习效率和轨迹平滑方面取得了显著的优势。这充分证明了该算法能够高效、安全地执行机器人导航任务。相似文献

6.

基于PPO的机械臂控制研究方法

郭坤武曲张义《数字社区&智能家居》2021,(4)

目前应用于机械臂控制中有许多不同的算法,如传统的自适应PD控制、模糊自适应控制等,这些大多需要基于数学模型。也有基于强化学习的控制方法,如:DQN(Deep Q Network)、Sarsa等。但这些强化学习算法在连续高维的动作空间中存在学习效率不高、回报奖励设置困难、控制效果不佳等问题。论文对基于PPO(Proximal Policy Optimization近端策略优化)算法实现任意位置的机械臂抓取应用进行研究,并将实验数据与Actor-Critic(演员-评论家)算法的进行对比,验证了使用PPO算法的控制效果良好,学习效率较高且稳定。相似文献

7.

A topological reinforcement learning agent for navigation

Arthur P. S. Braga Aluízio F. R. Araújo 《Neural computing & applications》2003,12(3-4):220-236

This article proposes a reinforcement learning procedure for mobile robot navigation using a latent-like learning schema. Latent learning refers to learning that occurs in the absence of reinforcement signals and is not apparent until reinforcement is introduced. This concept considers that part of a task can be learned before the agent receives any indication of how to perform such a task. In the proposed topological reinforcement learning agent (TRLA), a topological map is used to perform the latent learning. The propagation of the reinforcement signal throughout the topological neighborhoods of the map permits the estimation of a value function which takes in average less trials and with less updatings per trial than six of the main temporal difference reinforcement learning algorithms: Q-learning, SARSA, Q(λ)-learning, SARSA(λ), Dyna-Q and fast Q(λ)-learning. The RL agents were tested in four different environments designed to consider a growing level of complexity in accomplishing navigation tasks. The tests suggested that the TRLA chooses shorter trajectories (in the number of steps) and/or requires less value function updatings in each trial than the other six reinforcement learning (RL) algorithms. 相似文献

8.

A Markov Game-Adaptive Fuzzy Controller for Robot Manipulators

Sharma R. Gopal M. 《Fuzzy Systems, IEEE Transactions on》2008,16(1):171-186

This paper develops an adaptive fuzzy controller for robot manipulators using a Markov game formulation. The Markov game framework offers a promising platform for robust control of robot manipulators in the presence of bounded external disturbances and unknown parameter variations. We propose fuzzy Markov games as an adaptation of fuzzy Q-learning (FQL) to a continuous-action variation of Markov games, wherein the reinforcement signal is used to tune online the conclusion part of a fuzzy Markov game controller. The proposed Markov game-adaptive fuzzy controller uses a simple fuzzy inference system (FIS), is computationally efficient, generates a swift control, and requires no exact dynamics of the robot system. To illustrate the superiority of Markov game-adaptive fuzzy control, we compare the performance of the controller against a) the Markov game-based robust neural controller, b) the reinforcement learning (RL)-adaptive fuzzy controller, c) the FQL controller, d) the H_infin theory-based robust neural game controller, and e) a standard RL-based robust neural controller, on two highly nonlinear robot arm control problems of i) a standard two-link rigid robot arm and ii) a 2-DOF SCARA robot manipulator. The proposed Markov game-adaptive fuzzy controller outperformed other controllers in terms of tracking errors and control torque requirements, over different desired trajectories. The results also demonstrate the viability of FISs for accelerating learning in Markov games and extending Markov game-based control to continuous state-action space problems. 相似文献

9.

Backward Q-learning: The combination of Sarsa algorithm and Q-learning

Yin-Hao Wang Tzuu-Hseng S. Li Chih-Jui Lin 《Engineering Applications of Artificial Intelligence》2013,26(9):2184-2193

Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart–pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm. 相似文献

10.

基于深度强化学习的移动机器人导航策略研究

下载免费PDF全文

江其洲曾碧《计算机测量与控制》2019,27(8):217-221

针对移动机器人在复杂动态变化的环境下导航的局限性,采用了一种将深度学习和强化学习结合起来的深度强化学习方法。研究以在OpenCV平台下搭建的仿真环境的图像作为输入数据,输入至TensorFlow创建的卷积神经网络模型中处理,提取其中的机器人的动作状态信息,结合强化学习的决策能力求出最佳导航策略。仿真实验结果表明：在经过深度强化学习的方法训练后,移动机器人在环境发生了部分场景变化时,依然能够实现随机起点到随机终点的高效准确的导航。相似文献

11.

强化学习在足球机器人基本动作学习中的应用 总被引：1，自引：0，他引：1

段勇杨淮清崔宝侠徐心和《机器人》2008,30(5):1

主要研究了强化学习算法及其在机器人足球比赛技术动作学习问题中的应用．强化学习的状态空间和动作空间过大或变量连续,往往导致学习的速度过慢甚至难于收敛．针对这一问题,提出了基于T-S 模型模糊神经网络的强化学习方法,能够有效地实现强化学习状态空间到动作空间的映射．此外,使用提出的强化学习方法设计了足球机器人的技术动作,研究了在不需要专家知识和环境模型情况下机器人的行为学习问题．最后,通过实验证明了所研究方法的有效性,其能够满足机器人足球比赛的需要．相似文献

12.

自主机器人的强化学习研究进展 总被引：9，自引：1，他引：8

陈卫东席裕庚顾冬雷《机器人》2001,23(4):379-384

虽然基于行为控制的自主机器人具有较高的鲁棒性,但其对于动态环境缺乏必要的自适应能力．强化学习方法使机器人可以通过学习来完成任务,而无需设计者完全预先规定机器人的所有动作,它是将动态规划和监督学习结合的基础上发展起来的一种新颖的学习方法 ,它通过机器人与环境的试错交互,利用来自成功和失败经验的奖励和惩罚信号不断改进机器人的性能,从而达到目标,并容许滞后评价．由于其解决复杂问题的突出能力,强化学习已成为一种非常有前途的机器人学习方法．本文系统论述了强化学习方法在自主机器人中的研究现状,指出了存在的问题,分析了几种问题解决途径,展望了未来发展趋势．相似文献

13.

基于神经网络的强化学习算法研究 总被引：11，自引：0，他引：11

陆鑫高阳李宁陈世福《计算机研究与发展》2002,39(8):981-985

BP神经网络在非线性控制系统中被广泛运用，但作为有导师监督的学习算法，要求批量提供输入输出对神经网络训练，而在一些并不知道最优策略的系统中，这样的输入输出对事先并无法得到，另一方面，强化学习从实际系统学习经验来调整策略，并且是一个逼近最优策略的过程，学习过程并不需要导师的监督。提出了将强化学习与BP神经网络结合的学习算法-RBP模型。该模型的基本思想是通过强化学习控制策略，经过一定周期的学习后再用学到的知识训练神经网络，以使网络逐步收敛到最优状态。最后通过实验验证了该方法的有效性及收敛性。相似文献

14.

复杂工业过程的遗传模糊神经网络控制 总被引：3，自引：0，他引：3

王耀南张昌《控制理论与应用》1999,16(6):886-891

本文提出一种基于遗传算法和监督学习方法的有效模糊神经网络控制，这种控制器采用并行处理的推理网络，具有两个重要特点：自适应和学习性，所提方法经过仿真和温控验证表明控制性能良好。相似文献

15.

A fuzzy curved search algorithm for neural network learning

Peitsang Wu 《Computers & Industrial Engineering》2002,43(4):693-702

In this paper, we develop a curved search algorithm which uses second-order information, for the learning algorithm for a supervised neural network. With the objective of reducing the training time, we introduce a fuzzy controller for adjusting the first and second-order approximation parameters in the iterative method to further reduce the training time and to avoid the spikes in the learning curve which sometimes occurred with the fixed step length. Computational results indicate a significant reduction in training when comparing with the delta learning rule. 相似文献

16.

引入注意力机制的自监督光流计算

下载免费PDF全文

安峰戴军韩振严仲兴《图学学报》2022,43(5):841-848

光流计算是诸多计算机视觉系统的关键模块,广泛应用于动作识别、机器人定位与导航等领域。但目前端到端的光流计算仍受限于数据源的缺少,尤其是真实场景下的光流数据难以获取。人工合成的光流数据占绝大多数,且合成数据不能完全反应真实场景(如树叶晃动、行人倒影等),难以避免过拟合等情况。无监督或自监督方法可以利用海量的视频数据进行训练,摆脱了对数据集的依赖,是解决数据集缺少的有效途径。基于此搭建了一个自监督学习光流计算网络,其中的“Teacher”模块和“Student”模块集成了最新光流计算网络：稀疏相关体网络(SCV),减少了计算冗余量;同时引入注意力模型作为网络的一个节点,以提高图像特征在通道和空间上的维度属性。将SCV与注意力机制集成在自监督学习光流计算网络之中,在KITTI 2015数据集上的测试结果达到或超过了常见的有监督训练网络。相似文献

17.

Multi-agent systems with reinforcement hierarchical neuro-fuzzy models

Marcelo França Corrêa Marley Vellasco Karla Figueiredo 《Autonomous Agents and Multi-Agent Systems》2014,28(6):867-895

This paper introduces a new multi-agent model for intelligent agents, called reinforcement learning hierarchical neuro-fuzzy multi-agent system. This class of model uses a hierarchical partitioning of the input space with a reinforcement learning algorithm to overcome limitations of previous RL methods. The main contribution of the new system is to provide a flexible and generic model for multi-agent environments. The proposed generic model can be used in several applications, including competitive and cooperative problems, with the autonomous capacity to create fuzzy rules and expand their own rule structures, extracting knowledge from the direct interaction between the agents and the environment, without any use of supervised algorithms. The proposed model was tested in three different case studies, with promising results. The tests demonstrated that the developed system attained good capacity of convergence and coordination among the autonomous intelligent agents. 相似文献

18.

Ensemble Algorithms in Reinforcement Learning 总被引：1，自引：0，他引：1

Wiering M.A. van Hasselt H. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2008,38(4):930-936

This paper describes several ensemble methods that combine multiple different reinforcement learning (RL) algorithms in a single agent. The aim is to enhance learning speed and final performance by combining the chosen actions or action probabilities of different RL algorithms. We designed and implemented four different ensemble methods combining the following five different RL algorithms: $Q$ -learning, Sarsa, actor–critic (AC), $QV$-learning, and AC learning automaton. The intuitively designed ensemble methods, namely, majority voting (MV), rank voting, Boltzmann multiplication (BM), and Boltzmann addition, combine the policies derived from the value functions of the different RL algorithms, in contrast to previous work where ensemble methods have been used in RL for representing and learning a single value function. We show experiments on five maze problems of varying complexity; the first problem is simple, but the other four maze tasks are of a dynamic or partially observable nature. The results indicate that the BM and MV ensembles significantly outperform the single RL algorithms. 相似文献

19.

Intelligent Adaptive Mobile Robot Navigation

S. Nefti M. Oussalah K. Djouani J. Pontnau 《Journal of Intelligent and Robotic Systems》2001,30(4):311-329

This paper deals with the application of a neuro-fuzzy inference system to a mobile robot navigation in an unknown, or partially unknown environment. The final aim of the robot is to reach some pre-defined goal. For this purpose, a sort of a co-operation between three main sub-modules is performed. These sub-modules consist in three elementary robot tasks: following a wall, avoiding an obstacle and running towards the goal. Each module acts as a Sugeno–Takagi fuzzy controller where the inputs are the different sensor information and the output corresponds to the orientation of the robot. The rule-base is generated by the controller after some learning process based on a neural architecture close to that used by Wang and Menger. This leads to adaptive neuro-fuzzy inference systems (ANFIS) (one for each module). The adaptive navigation system (ANFIS), based on integrated reactive-cognitive parts, learns and generates the required knowledge for achieving the desired task. However, the generated rule-base suffers from redundancy and abundance of data, most of which are less useful. This makes the assignment of a linguistic label to the associated variable difficult and sometimes counter-intuitive. Consequently, a simplification phase allowing elimination of redundancy is required. For this purpose, an algorithm based on the class of fuzzy c-means algorithm introduced by Bezdek and we have developed an inclusion structure. Experimental results confirm the meaningfulness of the elaborated methodology when dealing with navigation of a mobile robot in unknown, or partially unknown environment. 相似文献

20.

强化学习算法与应用综述

李茹杨彭慧民李仁刚赵坤《计算机系统应用》2020,29(12):13-25

强化学习是机器学习领域的研究热点, 是考察智能体与环境的相互作用, 做出序列决策、优化策略并最大化累积回报的过程. 强化学习具有巨大的研究价值和应用潜力, 是实现通用人工智能的关键步骤. 本文综述了强化学习算法与应用的研究进展和发展动态, 首先介绍强化学习的基本原理, 包括马尔可夫决策过程、价值函数、探索-利用问题. 其次, 回顾强化学习经典算法, 包括基于价值函数的强化学习算法、基于策略搜索的强化学习算法、结合价值函数和策略搜索的强化学习算法, 以及综述强化学习前沿研究, 主要介绍多智能体强化学习和元强化学习方向. 最后综述强化学习在游戏对抗、机器人控制、城市交通和商业等领域的成功应用, 以及总结与展望. 相似文献