首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Robust motion control is fundamental to autonomous mobile robots. In the past few years, reinforcement learning (RL) has attracted considerable attention in the feedback control of wheeled mobile robot. However, it is still difficult for RL to solve problems with large or continuous state spaces, which is common in robotics. To improve the generalization ability of RL, this paper presents a novel hierarchical RL approach for optimal path tracking of wheeled mobile robots. In the proposed approach, a graph Laplacian-based hierarchical approximate policy iteration (GHAPI) algorithm is developed, in which the basis functions are constructed automatically using the graph Laplacian operator. In GHAPI, the state space of an Markov decision process is divided into several subspaces and approximate policy iteration is carried out on each subspace. Then, a near-optimal path-tracking control strategy can be obtained by GHAPI combined with proportional-derivative (PD) control. The performance of the proposed approach is evaluated by using a P3-AT wheeled mobile robot. It is demonstrated that the GHAPI-based PD control can obtain better near-optimal control policies than previous approaches.  相似文献   

2.
Reinforcement learning (RL) is a popular method for solving the path planning problem of autonomous mobile robots in unknown environments. However, the primary difficulty faced by learning robots using the RL method is that they learn too slowly in obstacle-dense environments. To more efficiently solve the path planning problem of autonomous mobile robots in such environments, this paper presents a novel approach in which the robot’s learning process is divided into two phases. The first one is to accelerate the learning process for obtaining an optimal policy by developing the well-known Dyna-Q algorithm that trains the robot in learning actions for avoiding obstacles when following the vector direction. In this phase, the robot’s position is represented as a uniform grid. At each time step, the robot performs an action to move to one of its eight adjacent cells, so the path obtained from the optimal policy may be longer than the true shortest path. The second one is to train the robot in learning a collision-free smooth path for decreasing the number of the heading changes of the robot. The simulation results show that the proposed approach is efficient for the path planning problem of autonomous mobile robots in unknown environments with dense obstacles.  相似文献   

3.
Learning and self-adaptation ability is highly required to be integrated in path planning algorithm for underwater robot during navigation through an unspecified underwater environment. High frequency oscillations during underwater motion are responsible for nonlinearities in dynamic behavior of underwater robot as well as uncertainties in hydrodynamic coefficients. Reactive behaviors of underwater robot are designed considering the position and orientation of both target and nearest obstacle from robot’s current position. Human like reasoning power and approximation based learning skill of neural based adaptive fuzzy inference system (ANFIS) has been found to be effective for underwater multivariable motion control. More than one ANFIS models are used here for achieving goal and obstacle avoidance while avoiding local minima situation in both horizontal and vertical plane of three dimensional workspace. An error gradient approach based on input-output training patterns for learning purpose has been promoted to spawn trajectory of underwater robot optimizing path length as well as time taken. The simulation and experimental results endorse sturdiness and viability of the proposed method in comparison with other navigational methodologies to negotiate with hectic conditions during motion of underwater mobile robot.  相似文献   

4.
The control of soft continuum robots is challenging owing to their mechanical elasticity and complex dynamics. An additional challenge emerges when we want to apply Learning from Demonstration (LfD) and need to collect necessary demonstrations due to the inherent control difficulty. In this paper, we provide a multi-level architecture from low-level control to high-level motion planning for the Bionic Handling Assistant (BHA) robot. We deploy learning across all levels to enable the application of LfD for a real-world manipulation task. To record the demonstrations, an actively compliant controller is used. A variant of dynamical systems' application that are able to encode both position and orientation then maps the recorded 6D end-effector pose data into a virtual attractor space. A recent LfD method encodes the pose attractors within the same model for point-to-point motion planning. In the proposed architecture, hybrid models that combine an analytical approach and machine learning techniques are used to overcome the inherent slow dynamics and model imprecision of the BHA. The performance and generalization capability of the proposed multi-level approach are evaluated in simulation and with the real BHA robot in an apple-picking scenario which requires high accuracy to control the pose of the robot's end-effector.  相似文献   

5.
A new fuzzy-based potential field method is presented in this paper for autonomous mobile robot motion planning with dynamic environments including static or moving target and obstacles. Two fuzzy Mamdani and TSK models have been used to develop the total attractive and repulsive forces acting on the mobile robot. The attractive and repulsive forces were estimated using four inputs representing the relative position and velocity between the target and the robot in the x and y directions, in one hand, and between the obstacle and the robot, on the other hand. The proposed fuzzy potential field motion planning was investigated based on several conducted MATLAB simulation scenarios for robot motion planning within realistic dynamic environments. As it was noticed from these simulations that the proposed approach was able to provide the robot with collision-free path to softly land on the moving target and solve the local minimum problem within any stationary or dynamic environment compared to other potential field-based approaches.  相似文献   

6.
Reinforcement learning (RL) for robot control is an important technology for future robots since it enables us to design a robot’s behavior using the reward function. However, RL for high degree-of-freedom robot control is still an open issue. This paper proposes a discrete action space DCOB which is generated from the basis functions (BFs) given to approximate a value function. The remarkable feature is that, by reducing the number of BFs to enable the robot to learn quickly the value function, the size of DCOB is also reduced, which improves the learning speed. In addition, a method WF-DCOB is proposed to enhance the performance, where wire-fitting is utilized to search for continuous actions around each discrete action of DCOB. We apply the proposed methods to motion learning tasks of a simulated humanoid robot and a real spider robot. The experimental results demonstrate outstanding performance.  相似文献   

7.
This paper addresses a new method for combination of supervised learning and reinforcement learning (RL). Applying supervised learning in robot navigation encounters serious challenges such as inconsistent and noisy data, difficulty for gathering training data, and high error in training data. RL capabilities such as training only by one evaluation scalar signal, and high degree of exploration have encouraged researchers to use RL in robot navigation problem. However, RL algorithms are time consuming as well as suffer from high failure rate in the training phase. Here, we propose Supervised Fuzzy Sarsa Learning (SFSL) as a novel idea for utilizing advantages of both supervised and reinforcement learning algorithms. A zero order Takagi–Sugeno fuzzy controller with some candidate actions for each rule is considered as the main module of robot's controller. The aim of training is to find the best action for each fuzzy rule. In the first step, a human supervisor drives an E-puck robot within the environment and the training data are gathered. In the second step as a hard tuning, the training data are used for initializing the value (worth) of each candidate action in the fuzzy rules. Afterwards, the fuzzy Sarsa learning module, as a critic-only based fuzzy reinforcement learner, fine tunes the parameters of conclusion parts of the fuzzy controller online. The proposed algorithm is used for driving E-puck robot in the environment with obstacles. The experiment results show that the proposed approach decreases the learning time and the number of failures; also it improves the quality of the robot's motion in the testing environments.  相似文献   

8.
现有基于深度强化学习的机械臂轨迹规划方法在未知环境中学习效率偏低,规划策略鲁棒性差。为了解决上述问题,提出了一种基于新型方位奖励函数的机械臂轨迹规划方法A-DPPO,基于相对方向和相对位置设计了一种新型方位奖励函数,通过降低无效探索,提高学习效率。将分布式近似策略优化(DPPO)首次用于机械臂轨迹规划,提高了规划策略的鲁棒性。实验证明相比现有方法,A-DPPO有效地提升了学习效率和规划策略的鲁棒性。  相似文献   

9.
This article presents a novel method of robot pose trajectory synchronization planning. First of all, based on triple NURBS curves, a method of describing the position and orientation synchronization of the robot is proposed. Then, through considering geometric and kinematic constraints, especially angular velocity constraint, and employing bidirectional interpolation algorithm, a robot pose trajectory planning approach is developed, which has limited linear jerk, continuous bounded angular velocity and approximate optimal time, and does not need an optimization program. Ultimately, two robot pose paths, blade-shaped curve and fan-shaped curve, are utilized for simulations, and the results indicate that the proposed trajectory planning method can satisfy the given constraint conditions, i.e. the linear jerk is limited and the angular velocity is continuous bounded. The trajectory tracking experiments are further carried out on a 6-DOF industrial robot, and the results show that the proposed planning method can generate smooth trajectories to ensure the stability of the robot motion without impact in practical situations.  相似文献   

10.

This paper proposes a systematic methodology to obtain a closed-form formulation for dynamics analysis of a new design of a fully spherical robot that is called a 3(RSS)-S parallel manipulator with real co-axial actuated shafts. The proposed robot can completely rotate about a vertical axis and can be used in celestial orientation and rehabilitation applications. After describing the robot and its inverse position, velocity and acceleration analysis is performed. Next, based on Kane’s method, a methodology for deriving the dynamical equations of motion is developed. The elaborated approach shows that the inverse dynamics of the manipulator can be reduced to solving a system of three linear equations in three unknowns. Finally, a computational algorithm to solve the inverse dynamics of the manipulator is advised and several trajectories of the moving platform are simulated.

  相似文献   

11.
As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In this paper, a novel RL approach with fast policy search and adaptive basis function selection, which is called Continuous-action Approximate Policy Iteration (CAPI), is proposed for RL in MDPs with both continuous state and action spaces. In CAPI, based on the value functions estimated by temporal-difference learning, a fast policy search technique is suggested to search for optimal actions in continuous spaces, which is computationally efficient and easy to implement. To improve the generalization ability and learning efficiency of CAPI, two adaptive basis function selection methods are developed so that sparse approximation of value functions can be obtained efficiently both for linear function approximators and kernel machines. Simulation results on benchmark learning control tasks with continuous state and action spaces show that the proposed approach not only can converge to a near-optimal policy in a few iterations but also can obtain comparable or even better performance than Sarsa-learning, and previous approximate policy iteration methods such as LSPI and KLSPI.  相似文献   

12.
A virtual target tracking approach is proposed for kinematic control of mobile robot. In the controller, linear and angular velocity inputs are generated by using the local data of robot position and orientation along with the estimated velocity of target object. Applying the proposed approach to a cooperative robot group with arbitrary number of multiple mobile robots, it is possible to create various robot formations for cooperative navigation and tracking of moving object. The developed controller is shown to be stable and convergent through theoretical proof and a series of experiments.  相似文献   

13.
In this paper, a practically viable approach for conflict free, coordinated motion planning of multiple robots is proposed. The presented approach is a two phase decoupled method that can provide the desired coordination among the participating robots in offline mode. In the first phase, the collision free path with respect to stationary obstacles for each robot is obtained by employing an A* algorithm. In the second phase, the coordination among multiple robots is achieved by resolving conflicts based on a path modification approach. The paths of conflicting robots are modified based on their position in a dynamically computed path modification sequence (PMS). To assess the effectiveness of the developed methodology, the coordination among robots is also achieved by different strategies such as fixed priority sequence allotment for motion of each robot, reduction in the velocities of joints of the robot, and introduction of delay in starting of each robot. The performance is assessed in terms of the length of path traversed by each robot, time taken by the robot to realize the task and computational time. The effectiveness of the proposed approach for multi-robot motion planning is demonstrated with two case studies that considered the tasks with three and four robots. The results obtained from realistic simulation of multi-robot environment demonstrate that the proposed approach assures rapid, concurrent and conflict free coordinated path planning for multiple robots.  相似文献   

14.
Reinforcement learning (RL) has been widely used as a mechanism for autonomous robots to learn state-action pairs by interacting with their environment. However, most RL methods usually suffer from slow convergence when deriving an optimum policy in practical applications. To solve this problem, a stochastic shortest path-based Q-learning (SSPQL) is proposed, combining a stochastic shortest path-finding method with Q-learning, a well-known model-free RL method. The rationale is, if a robot has an internal state-transition model which is incrementally learnt, then the robot can infer the local optimum policy by using a stochastic shortest path-finding method. By increasing state-action pair values comprising of these local optimum policies, a robot can then reach a goal quickly and as a result, this process can enhance convergence speed. To demonstrate the validity of this proposed learning approach, several experimental results are presented in this paper.  相似文献   

15.
Reinforcement learning (RL) is an area of machine learning that is concerned with how an agent learns to make decisions sequentially in order to optimize a particular performance measure. For achieving such a goal, the agent has to choose either 1) exploiting previously known knowledge that might end up at local optimality or 2) exploring to gather new knowledge that expects to improve the current performance. Among other RL algorithms, Bayesian model-based RL (BRL) is well-known to be able to trade-off between exploitation and exploration optimally via belief planning, i.e. partially observable Markov decision process (POMDP). However, solving that POMDP often suffers from curse of dimensionality and curse of history. In this paper, we make two major contributions which are: 1) an integration framework of temporal abstraction into BRL that eventually results in a hierarchical POMDP formulation, which can be solved online using a hierarchical sample-based planning solver; 2) a subgoal discovery method for hierarchical BRL that automatically discovers useful macro actions to accelerate learning. In the experiment section, we demonstrate that the proposed approach can scale up to much larger problems. On the other hand, the agent is able to discover useful subgoals for speeding up Bayesian reinforcement learning.  相似文献   

16.
This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are usually grounded in the agent’s decision making architecture, of which RL is an important subclass. Studying emotions in RL-based agents is useful for three research fields. For machine learning (ML) researchers, emotion models may improve learning efficiency. For the interactive ML and human–robot interaction community, emotions can communicate state and enhance user investment. Lastly, it allows affective modelling researchers to investigate their emotion theories in a successful AI agent class. This survey provides background on emotion theory and RL. It systematically addresses (1) from what underlying dimensions (e.g. homeostasis, appraisal) emotions can be derived and how these can be modelled in RL-agents, (2) what types of emotions have been derived from these dimensions, and (3) how these emotions may either influence the learning efficiency of the agent or be useful as social signals. We also systematically compare evaluation criteria, and draw connections to important RL sub-domains like (intrinsic) motivation and model-based RL. In short, this survey provides both a practical overview for engineers wanting to implement emotions in their RL agents, and identifies challenges and directions for future emotion-RL research.  相似文献   

17.
邵杰  杜丽娟  杨静宇 《计算机科学》2013,40(8):249-251,292
XCS分类器在解决机器人强化学习方面已显示出较强的能力,但在多机器人领域仅局限于MDP环境,只能解决环境空间较小的学习问题。提出了XCSG来解决多机器人的强化学习问题。XCSG建立低维的逼近函数,梯度下降技术利用在线知识建立稳定的逼近函数,使Q-表格一直保持在稳定低维状态。逼近函数Q不仅所需的存储空间更小,而且允许机器人在线对已获得的知识进行归纳一般化。仿真实验表明,XCSG算法很好地解决了多机器人学习空间大、学习速度慢、学习效果不确定等问题。  相似文献   

18.
A structured artificial neural-network (ANN) approach has been proposed here to control the motion of a robot manipulator. Many neural-network models use threshold units with sigmoid transfer functions and gradient descent-type learning rules. The learning equations used are those of the backpropagation algorithm. In this work, the solution of the kinematics of a six-degrees-of-freedom robot manipulator is implemented by using ANN. Work has been undertaken to find the best ANN configurations for this problem. Both the placement and orientation angles of a robot manipulator are used to fin the inverse kinematics solutions.  相似文献   

19.
This article proposes a reinforcement learning procedure for mobile robot navigation using a latent-like learning schema. Latent learning refers to learning that occurs in the absence of reinforcement signals and is not apparent until reinforcement is introduced. This concept considers that part of a task can be learned before the agent receives any indication of how to perform such a task. In the proposed topological reinforcement learning agent (TRLA), a topological map is used to perform the latent learning. The propagation of the reinforcement signal throughout the topological neighborhoods of the map permits the estimation of a value function which takes in average less trials and with less updatings per trial than six of the main temporal difference reinforcement learning algorithms: Q-learning, SARSA, Q(λ)-learning, SARSA(λ), Dyna-Q and fast Q(λ)-learning. The RL agents were tested in four different environments designed to consider a growing level of complexity in accomplishing navigation tasks. The tests suggested that the TRLA chooses shorter trajectories (in the number of steps) and/or requires less value function updatings in each trial than the other six reinforcement learning (RL) algorithms.  相似文献   

20.
Reinforcement Learning (RL) is a well-known technique for learning the solutions of control problems from the interactions of an agent in its domain. However, RL is known to be inefficient in problems of the real-world where the state space and the set of actions grow up fast. Recently, heuristics, case-based reasoning (CBR) and transfer learning have been used as tools to accelerate the RL process. This paper investigates a class of algorithms called Transfer Learning Heuristically Accelerated Reinforcement Learning (TLHARL) that uses CBR as heuristics within a transfer learning setting to accelerate RL. The main contributions of this work are the proposal of a new TLHARL algorithm based on the traditional RL algorithm Q(λ) and the application of TLHARL on two distinct real-robot domains: a robot soccer with small-scale robots and the humanoid-robot stability learning. Experimental results show that our proposed method led to a significant improvement of the learning rate in both domains.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号