首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
结合强化学习技术讨论了单移动Agent学习的过程,然后扩展到多移动Agent学习领域,提出一个多移动Agent学习算法MMAL(MultiMobileAgentLearning)。算法充分考虑了移动Agent学习的特点,使得移动Agent能够在不确定和有冲突目标的上下文中进行决策,解决在学习过程中Agent对移动时机的选择,并且能够大大降低计算代价。目的是使Agent能在随机动态的环境中进行自主、协作的学习。最后,通过仿真试验表明这种学习算法是一种高效、快速的学习方法。  相似文献   

2.
We described a new preteaching method for re-inforcement learning using a self-organizing map (SOM). The purpose is to increase the learning rate using a small amount of teaching data generated by a human expert. In our proposed method, the SOM is used to generate the initial teaching data for the reinforcement learning agent from a small amount of teaching data. The reinforcement learning function of the agent is initialized by using the teaching data generated by the SOM in order to increase the probability of selecting the optimal actions it estimates. Because the agent can get high rewards from the start of reinforcement learning, it is expected that the learning rate will increase. The results of a mobile robot simulation showed that the learning rate had increased even though the human expert had showed only a small amount of teaching data. This work was presented in part at the 7th International Symposium on Artificial Life and Robotics, Oita, Japan, January 16–18, 2002  相似文献   

3.
未知环境中移动机器人柔性的行为决策是完成各种任务的前提.目前的机器人行为决策方法在面对动态变化的环境时柔性较差,机器人难以获得持续稳定的学习能力.本文作者曾尝试通过集成小脑监督学习和基底神经节的强化学习来实现移动机器人动态环境下的柔性行为决策,但所提算法适应动态环境的能力有限.在前期工作基础上,本文设计了更有生物学意义的好奇度指标代替原来的警觉度指标,通过模拟蓝斑活动在基音模式和阶段模式之间的动态切换,实现移动机器人环境探索–利用的动态自适应调节.同时,设计随外部环境变化的自适应调节因子,实现移动机器人动态环境中基于小脑监督学习和基底神经节强化学习的柔性行为决策,使机器人可以获得持续稳定的学习能力.动态环境和实际环境中的实验结果验证了本文所提算法的有效性.  相似文献   

4.
This article proposes a reinforcement learning procedure for mobile robot navigation using a latent-like learning schema. Latent learning refers to learning that occurs in the absence of reinforcement signals and is not apparent until reinforcement is introduced. This concept considers that part of a task can be learned before the agent receives any indication of how to perform such a task. In the proposed topological reinforcement learning agent (TRLA), a topological map is used to perform the latent learning. The propagation of the reinforcement signal throughout the topological neighborhoods of the map permits the estimation of a value function which takes in average less trials and with less updatings per trial than six of the main temporal difference reinforcement learning algorithms: Q-learning, SARSA, Q(λ)-learning, SARSA(λ), Dyna-Q and fast Q(λ)-learning. The RL agents were tested in four different environments designed to consider a growing level of complexity in accomplishing navigation tasks. The tests suggested that the TRLA chooses shorter trajectories (in the number of steps) and/or requires less value function updatings in each trial than the other six reinforcement learning (RL) algorithms.  相似文献   

5.
针对现有基于策略梯度的深度强化学习方法应用于办公室、走廊等室内复杂场景下的机器人导航时,存在训练时间长、学习效率低的问题,本文提出了一种结合优势结构和最小化目标Q值的深度强化学习导航算法.该算法将优势结构引入到基于策略梯度的深度强化学习算法中,以区分同一状态价值下的动作差异,提升学习效率,并且在多目标导航场景中,对状态价值进行单独估计,利用地图信息提供更准确的价值判断.同时,针对离散控制中缓解目标Q值过估计方法在强化学习主流的Actor-Critic框架下难以奏效,设计了基于高斯平滑的最小目标Q值方法,以减小过估计对训练的影响.实验结果表明本文算法能够有效加快学习速率,在单目标、多目标连续导航训练过程中,收敛速度上都优于柔性演员评论家算法(SAC),双延迟深度策略性梯度算法(TD3),深度确定性策略梯度算法(DDPG),并使移动机器人有效远离障碍物,训练得到的导航模型具备较好的泛化能力.  相似文献   

6.
An approach to learning mobile robot navigation   总被引:1,自引:0,他引:1  
This paper describes an approach to learning an indoor robot navigation task through trial-and-error. A mobile robot, equipped with visual, ultrasonic and laser sensors, learns to servo to a designated target object. In less than ten minutes of operation time, the robot is able to navigate to a marked target object in an office environment. The central learning mechanism is the explanation-based neural network learning algorithm (EBNN). EBNN initially learns function purely inductively using neural network representations. With increasing experience, EBNN employs domain knowledge to explain and to analyze training data in order to generalize in a more knowledgeable way. Here EBNN is applied in the context of reinforcement learning, which allows the robot to learn control using dynamic programming.  相似文献   

7.
Asada  Minoru  Noda  Shoichi  Tawaratsumida  Sukoya  Hosoda  Koh 《Machine Learning》1996,23(2-3):279-303
This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal. We discuss several issues in applying the reinforcement learning method to a real robot with vision sensor by which the robot can obtain information about the changes in an environment. First, we construct a state space in terms of size, position, and orientation of a ball and a goal in an image, and an action space is designed in terms of the action commands to be sent to the left and right motors of a mobile robot. This causes a state-action deviation problem in constructing the state and action spaces that reflect the outputs from physical sensors and actuators, respectively. To deal with this issue, an action set is constructed in a way that one action consists of a series of the same action primitive which is successively executed until the current state changes. Next, to speed up the learning time, a mechanism of Learning from Easy Missions (or LEM) is implemented. LEM reduces the learning time from exponential to almost linear order in the size of the state space. The results of computer simulations and real robot experiments are given.  相似文献   

8.
Rapid, safe, and incremental learning of navigation strategies   总被引:1,自引:0,他引:1  
In this paper we propose a reinforcement connectionist learning architecture that allows an autonomous robot to acquire efficient navigation strategies in a few trials. Besides rapid learning, the architecture has three further appealing features. First, the robot improves its performance incrementally as it interacts with an initially unknown environment, and it ends up learning to avoid collisions even in those situations in which its sensors cannot detect the obstacles. This is a definite advantage over nonlearning reactive robots. Second, since it learns from basic reflexes, the robot is operational from the very beginning and the learning process is safe. Third, the robot exhibits high tolerance to noisy sensory data and good generalization abilities. All these features make this learning robot's architecture very well suited to real-world applications. We report experimental results obtained with a real mobile robot in an indoor environment that demonstrate the appropriateness of our approach to real autonomous robot control.  相似文献   

9.
This paper presents an artificial emotional-cognitive system-based autonomous robot control architecture for a four-wheel driven and four-wheel steered mobile robot. Discrete stochastic state-space mathematical model is considered for behavioral and emotional transition processes of the autonomous mobile robot in the dynamic realistic environment. The term of cognitive mechanism system which is composed from rule base and reinforcement self-learning algorithm explain all of the deliberative events such as learning, reasoning and memory (rule spaces) of the autonomous mobile robot. The artificial cognitive model of autonomous robot control architecture has a dynamic associative memory including behavioral transition rules which are able to be learned for achieving multi-objective robot tasks. Motivation module of architecture has been considered as behavioral gain effect generator for achieving multi-objective robot tasks. According to emotional and behavioral state transition probabilities, artificial emotions determine sequences of behaviors for long-term action planning. Also reinforcement self-learning and reasoning ability of artificial cognitive model and motivational gain effects of proposed architecture can be observed on the executing behavioral sequences during simulation. The posture and speed of the robot and the configurations, speeds and torques of the wheels and all deliberative and cognitive events can be observed from the simulation plant and virtual reality viewer. This study constitutes basis for the multi-goal robot tasks and artificial emotions and cognitive mechanism-based behavior generation experiments on a real mobile robot.  相似文献   

10.
The main goal of this paper is modelling attention while using it in efficient path planning of mobile robots. The key challenge in concurrently aiming these two goals is how to make an optimal, or near-optimal, decision in spite of time and processing power limitations, which inherently exist in a typical multi-sensor real-world robotic application. To efficiently recognise the environment under these two limitations, attention of an intelligent agent is controlled by employing the reinforcement learning framework. We propose an estimation method using estimated mixture-of-experts task and attention learning in perceptual space. An agent learns how to employ its sensory resources, and when to stop observing, by estimating its perceptual space. In this paper, static estimation of the state space in a learning task problem, which is examined in the WebotsTM simulator, is performed. Simulation results show that a robot learns how to achieve an optimal policy with a controlled cost by estimating the state space instead of continually updating sensory information.  相似文献   

11.
李保罗  蔡明钰  阚震 《控制与决策》2023,38(7):1835-1844
针对动态不确定环境下机器人执行复杂任务的需求,提出一种线性时序逻辑(linear temporal logic, LTL)引导的无模型安全强化学习算法,能在最大化任务完成概率的同时保证学习过程的安全性.首先,综合考虑环境中的不确定因素,构建马尔可夫决策过程(Markov decision process, MDP),再用LTL刻画智能体的复杂任务,将其转化为有多接受集的基于转移的有限确定性广义布奇自动机(transition-based limit deterministic generalized Büchi automaton, t LDGBA),并通过接受边界函数构建可记录当前待访问接受集的约束型tLDGBA (constrained tLDGBA,ctLDGBA);其次,构建乘积MDP用于强化学习搜索最优策略;最后,基于LTL对安全性的描述和MDP的观测函数构建安全博弈,并根据安全博弈设计安全盾机制保证系统在学习过程中的安全性.严格的分析证明了所提出的算法能获得最大化LTL任务完成概率的最优策略.仿真结果验证了LTL引导的安全强化学习算法的有效性.  相似文献   

12.
It was confirmed that a real mobile robot with a simple visual sensor could learn appropriate motions to reach a target object by direct-vision-based reinforcement learning (RL). In direct-vision-based RL, raw visual sensory signals are put directly into a layered neural network, and then the neural network is trained using back propagation, with the training signal being generated by reinforcement learning. Because of the time-delay in transmitting the visual sensory signals, the actor outputs are trained by the critic output at two time-steps ahead. It was shown that a robot with a simple monochrome visual sensor can learn to reach a target object from scratch without any advance knowledge of this task by direct-vision-based RL. This work was presented in part at the 7th International Symposium on Artificial Life and Robotics, Oita, Japan, January 16–18, 2002  相似文献   

13.
针对移动机器人在复杂动态变化的环境下导航的局限性,采用了一种将深度学习和强化学习结合起来的深度强化学习方法。研究以在OpenCV平台下搭建的仿真环境的图像作为输入数据,输入至TensorFlow创建的卷积神经网络模型中处理,提取其中的机器人的动作状态信息,结合强化学习的决策能力求出最佳导航策略。仿真实验结果表明:在经过深度强化学习的方法训练后,移动机器人在环境发生了部分场景变化时,依然能够实现随机起点到随机终点的高效准确的导航。  相似文献   

14.
Online navigation with known target and unknown obstacles is an interesting problem in mobile robotics. This article presents a technique based on utilization of neural networks and reinforcement learning to enable a mobile robot to learn constructed environments on its own. The robot learns to generate efficient navigation rules automatically without initial settings of rules by experts. This is regarded as the main contribution of this work compared to traditional fuzzy models based on notion of artificial potential fields. The ability for generalization of rules has also been examined. The initial results qualitatively confirmed the efficiency of the model. More experiments showed at least 32 % of improvement in path planning from the first till the third path planning trial in a sample environment. Analysis of the results, limitations, and recommendations is included for future work.  相似文献   

15.
The paper describes a self-learning control system for a mobile robot. Based on sensor information the control system has to provide a steering signal in such a way that collisions are avoided. Since in our case no «examples« are available, the system learns on the basis of an external reinforcement signal which is negative in case of a collision and zero otherwise. We describe the adaptive algorithm which is used for a discrete coding of the state space, and the adaptive algorithm for learning the correct mapping from the input (state) vector to the output (steering) signal.  相似文献   

16.
无人机设备能够适应复杂地形,但由于电池容量等原因,无人机无法长时间执行任务。无人机与其他无人系统(无人车、无人船等)协同能够有效提升无人机的工作时间,完成既定任务,当无人机完成任务后,将无人机迅速稳定地降落至移动平台上是一项必要且具有挑战性的工作。针对降落问题,文中提出了基于矫正纠偏COACH(corrective advice communicated humans)方法的深度强化学习比例积分微分(proportional-integral-derivative, PID)方法,为无人机降落至移动平台提供了最优路径。首先在仿真环境中使用矫正纠偏框架对强化学习模型进行训练,然后在仿真环境和真实环境中,使用训练后的模型输出控制参数,最后利用输出参数获得无人机位置控制量。仿真结果和真实无人机实验表明,基于矫正纠偏COACH方法的深度强化学习PID方法优于传统控制方法,且能稳定完成在移动平台上的降落任务。  相似文献   

17.
Unpredictable and time-variable adhesion force between the rubber unstacking robot and the rubber block is generated, which makes it difficult for the robot to smoothly complete the rubber disassembly task, thereby bringing about new robot control problems. For solving the above problems, a novel method of inner/outer loop impedance control based on natural gradient actor-critic (NAC) reinforcement learning is proposed in this paper. The required impedance is applied by the inner/outer loop impedance control with time delay estimation, which can correct the modeling error and compensate the nonlinear dynamics term to improve the computational efficiency of the system. In addition, the NAC reinforcement learning algorithm based on recursive least squares filtering is used to optimize the impedance parameters online, which can improve the impedance accuracy and robustness in the unstructured dynamic environment. At the same time, three stability constraints of the control strategy are derived in the analysis process. Finally, by setting up the experimental platform, it is verified that the control strategy can make the robot work smoothly under the action of unpredictable and time-variable adhesion force to reduce vibration and improve rubber unstacking performance.  相似文献   

18.
Recently, a novel probabilistic model-building evolutionary algorithm (so called estimation of distribution algorithm, or EDA), named probabilistic model building genetic network programming (PMBGNP), has been proposed. PMBGNP uses graph structures for its individual representation, which shows higher expression ability than the classical EDAs. Hence, it extends EDAs to solve a range of problems, such as data mining and agent control. This paper is dedicated to propose a continuous version of PMBGNP for continuous optimization in agent control problems. Different from the other continuous EDAs, the proposed algorithm evolves the continuous variables by reinforcement learning (RL). We compare the performance with several state-of-the-art algorithms on a real mobile robot control problem. The results show that the proposed algorithm outperforms the others with statistically significant differences.  相似文献   

19.
提高强化学习速度的方法研究   总被引:4,自引:0,他引:4  
强化学习一词出自于行为心理学,这门学科把学习看作为反复试验的过程,以便把环境的状态映射为动作。强化学习的这种特性必然增加智能系统的困难性,学习时间增长。强化学习学习速度较慢的原因是没有明确的监督信号。因此,强化学习系统在与环境交互时不得不采取反复试验的方法依靠外部评价信号来调整自己的行为。智能系统必然经过很长的学习过程。如何提高强化学习速度是一个最重要的研究问题。该文从几个方面来讨论提高强化学习速度的方法。  相似文献   

20.
This paper presents a reinforcement learning algorithm which allows a robot, with a single camera mounted on a pan tilt platform, to learn simple skills such as watch and orientation and to obtain the complex skill called approach combining the previously learned ones. The reinforcement signal the robot receives is a real continuous value so it is not necessary to estimate an expected reward. Skills are implemented with a generic structure which permits complex skill creation from sequencing, output addition and data flow of available simple skills.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号