期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot

Ruben Martinez-Cantin Nando de Freitas Eric Brochu José Castellanos Arnaud Doucet 《Autonomous Robots》2009,27(2):93-103

We address the problem of online path planning for optimal sensing with a mobile robot. The objective of the robot is to learn the most about its pose and the environment given time constraints. We use a POMDP with a utility function that depends on the belief state to model the finite horizon planning problem. We replan as the robot progresses throughout the environment. The POMDP is high-dimensional, continuous, non-differentiable, nonlinear, non-Gaussian and must be solved in real-time. Most existing techniques for stochastic planning and reinforcement learning are therefore inapplicable. To solve this extremely complex problem, we propose a Bayesian optimization method that dynamically trades off exploration (minimizing uncertainty in unknown parts of the policy space) and exploitation (capitalizing on the current best solution). We demonstrate our approach with a visually-guide mobile robot. The solution proposed here is also applicable to other closely-related domains, including active vision, sequential experimental design, dynamic sensing and calibration with mobile sensors. 相似文献

2.

Planning in partially-observable switching-mode continuous domains

Emma Brunskill Leslie Pack Kaelbling Tomás Lozano-Pérez Nicholas Roy 《Annals of Mathematics and Artificial Intelligence》2010,58(3-4):185-216

Continuous-state POMDPs provide a natural representation for a variety of tasks, including many in robotics. However, most existing parametric continuous-state POMDP approaches are limited by their reliance on a single linear model to represent the world dynamics. We introduce a new switching-state dynamics model that can represent multi-modal state-dependent dynamics. We present the Switching Mode POMDP (SM-POMDP) planning algorithm for solving continuous-state POMDPs using this dynamics model. We also consider several procedures to approximate the value function as a mixture of a bounded number of Gaussians. Unlike the majority of prior work on approximate continuous-state POMDP planners, we provide a formal analysis of our SM-POMDP algorithm, providing bounds, where possible, on the quality of the resulting solution. We also analyze the computational complexity of SM-POMDP. Empirical results on an unmanned aerial vehicle collisions avoidance simulation, and a robot navigation simulation where the robot has faulty actuators, demonstrate the benefit of SM-POMDP over a prior parametric approach. 相似文献

3.

A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Lei Zheng Siu-Yeung Cho 《Neural Processing Letters》2011,33(2):187-200

Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree’s state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems. 相似文献

4.

Improving POMDP Tractability via Belief Compression and Clustering

《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2010,40(1):125-136

Partially observable Markov decision process (POMDP) is a commonly adopted mathematical framework for solving planning problems in stochastic environments. However, computing the optimal policy of POMDP for large-scale problems is known to be intractable, where the high dimensionality of the underlying belief space is one of the major causes. In this paper, we propose a hybrid approach that integrates two different approaches for reducing the dimensionality of the belief space: 1) belief compression and 2) value-directed compression. In particular, a novel orthogonal nonnegative matrix factorization is derived for the belief compression, which is then integrated in a value-directed framework for computing the policy. In addition, with the conjecture that a properly partitioned belief space can have its per-cluster intrinsic dimension further reduced, we propose to apply a $k$-means-like clustering technique to partition the belief space to form a set of sub-POMDPs before applying the dimension reduction techniques to each of them. We have evaluated the proposed belief compression and clustering approaches based on a set of benchmark problems and demonstrated their effectiveness in reducing the cost for computing policies, with the quality of the policies being retained. 相似文献

5.

Global motion planning under uncertain motion, sensing, and environment map

Hanna Kurniawati Tirthankar Bandyopadhyay Nicholas M. Patrikalakis 《Autonomous Robots》2012,33(3):255-272

Uncertainty in motion planning is often caused by three main sources: motion error, sensing error, and imperfect environment map. Despite the significant effect of all three sources of uncertainty to motion planning problems, most planners take into account only one or at most two of them. We propose a new motion planner, called Guided Cluster Sampling (GCS), that takes into account all three sources of uncertainty for robots with active sensing capabilities. GCS uses the Partially Observable Markov Decision Process (POMDP) framework and the point-based POMDP approach. Although point-based POMDPs have shown impressive progress over the past few years, it performs poorly when the environment map is imperfect. This poor performance is due to the extremely high dimensional state space, which translates to the extremely large belief space?B. We alleviate this problem by constructing a more suitable sampling distribution based on the observations that when the robot has active sensing capability, B can be partitioned into a collection of much smaller sub-spaces, and an optimal policy can often be generated by sufficient sampling of a small subset of the collection. Utilizing these observations, GCS samples B in two-stages, a subspace is sampled from the collection and then a belief is sampled from the subspace. It uses information from the set of sampled sub-spaces and sampled beliefs to guide subsequent sampling. Simulation results on marine robotics scenarios suggest that GCS can generate reasonable policies for motion planning problems with uncertain motion, sensing, and environment map, that are unsolvable by the best point-based POMDPs today. Furthermore, GCS handles POMDPs with continuous state, action, and observation spaces. We show that for a class of POMDPs that often occur in robot motion planning, given enough time, GCS converges to the optimal policy. To the best of our knowledge, this is the first convergence result for point-based POMDPs with continuous action space. 相似文献

6.

Real-time hierarchical POMDPs for autonomous robot navigation

《Robotics and Autonomous Systems》2007,55(7):561-571

This paper proposes a new hierarchical formulation of POMDPs for autonomous robot navigation that can be solved in real-time, and is memory efficient. It will be referred to in this paper as the Robot Navigation–Hierarchical POMDP (RN-HPOMDP). The RN-HPOMDP is utilized as a unified framework for autonomous robot navigation in dynamic environments. As such, it is used for localization, planning and local obstacle avoidance. Hence, the RN-HPOMDP decides at each time step the actions the robot should execute, without the intervention of any other external module for obstacle avoidance or localization. Our approach employs state space and action space hierarchy, and can effectively model large environments at a fine resolution. Finally, the notion of the reference POMDP is introduced. The latter holds all the information regarding motion and sensor uncertainty, which makes the proposed hierarchical structure memory efficient and enables fast learning. The RN-HPOMDP has been experimentally validated in real dynamic environments. 相似文献

7.

基于采样的POMDP近似算法 总被引：1，自引：0，他引：1

陈茂陈小平《计算机仿真》2006,23(5):64-67

部分可观察马尔科夫决策过程（POMDP）是一种描述机器人在动态不确定环境下行动选择的问题模型。对于具有稀疏转移矩阵的POMDP问题模型,该文提出了一种求解该问题模型的快速近似算法。该算法首先利用QMDP算法产生的策略进行信念空间采样,并通过点迭代算法快速生成POMDP值函数,从而产生近似的最优行动选择策略。在相同的POMDP试验模型上,执行该算法产生的策略得到的回报值与执行其他近似算法产生的策略得到的回报值相当,但该算法计算速度快,它产生的策略表示向量集合小于现有其他近似算法产生的集合。因此,它比这些近似算法更适应于大规模的稀疏状态转移矩阵POMDP模型求解计算。相似文献

8.

融合对比预测编码的深度双Q网络

下载免费PDF全文

刘剑锋普杰信孙力帆《计算机工程与应用》2023,59(6):162-170

在模型未知的部分可观测马尔可夫决策过程（partially observable Markov decision process,POMDP）下,智能体无法直接获取环境的真实状态,感知的不确定性为学习最优策略带来挑战。为此,提出一种融合对比预测编码表示的深度双Q网络强化学习算法,通过显式地对信念状态建模以获取紧凑、高效的历史编码供策略优化使用。为改善数据利用效率,提出信念回放缓存池的概念,直接存储信念转移对而非观测与动作序列以减少内存占用。此外,设计分段训练策略将表示学习与策略学习解耦来提高训练稳定性。基于Gym-MiniGrid环境设计了POMDP导航任务,实验结果表明,所提出算法能够捕获到与状态相关的语义信息,进而实现POMDP下稳定、高效的策略学习。相似文献

9.

基于策略迭代和值迭代的POMDP算法

孙湧仵博冯延蓬《计算机研究与发展》2008,45(10)

部分可观察Markov决策过程是通过引入信念状态空间将非Markov链问题转化为Markov链问题来求解,其描述真实世界的特性使它成为研究随机决策过程的重要分支.介绍了部分可观察Markov决策过程的基本原理和决策过程,提出一种基于策略迭代和值迭代的部分可观察Markov决策算法,该算法利用线性规划和动态规划的思想,解决当信念状态空间较大时出现的"维数灾"问题,得到Markov决策的逼近最优解.实验数据表明该算法是可行的和有效的. 相似文献

10.

Navigation strategies for multiple autonomous mobile robots moving in formation

P. K. C. Wang 《野外机器人技术杂志》1991,8(2):177-195

The problem of deriving navigation strategies for a fleet of autonomous mobile robots moving in formation is considered. Here each robot is represented by a particle with a spherical effective spatial domain and a specified cone of visibility. The global motion of each robot in the world space is described by the equations of motion of the robot's center of mass. First, methods for formation generation are discussed. Then, simple navigation strategies for robots moving in formation are derived. A sufficient condition for the stability of a desired formation pattern for a fleet of robots each equipped with the navigation strategy based on nearest neighbor tracking is developed. The dynamic behavior of robot fleets consisting of three or more robots moving in formation in a plane is studied by means of computer simulation. 相似文献

11.

部分可观察马尔可夫决策过程研究进展

仵博吴敏《计算机工程与设计》2007,28(9):2116-2119,2126

部分可观察马尔可夫决策过程是通过引入信念状态空间将非马尔可夫链问题转化为马尔可夫链问题来求解,其描述真实世界的特性使它成为研究随机决策过程的重要分支.介绍了部分可观察马尔可夫决策过程的基本原理和决策过程,然后介绍了3种典型的算法,它们分别是Littman等人的Witness算法、hcremental Pruning算法和Pineau等人的基于点的值迭代算法,对这3种算法进行了分析比较.讲述部分可观察马尔可夫决策过程的应用. 相似文献

12.

Cooperative multi-robot belief space planning for autonomous navigation in unknown environments

Vadim Indelman 《Autonomous Robots》2018,42(2):353-373

We investigate the problem of cooperative multi-robot planning in unknown environments, which is important in numerous applications in robotics. The research community has been actively developing belief space planning approaches that account for the different sources of uncertainty within planning, recently also considering uncertainty in the environment observed by planning time. We further advance the state of the art by reasoning about future observations of environments that are unknown at planning time. The key idea is to incorporate within the belief indirect multi-robot constraints that correspond to these future observations. Such a formulation facilitates a framework for active collaborative state estimation while operating in unknown environments. In particular, it can be used to identify best robot actions or trajectories among given candidates generated by existing motion planning approaches, or to refine nominal trajectories into locally optimal paths using direct trajectory optimization techniques. We demonstrate our approach in a multi-robot autonomous navigation scenario and consider its applicability for autonomous navigation in unknown obstacle-free and obstacle-populated environments. Results indicate that modeling future multi-robot interaction within the belief allows to determine robot actions (paths) that yield significantly improved estimation accuracy. 相似文献

13.

基于杂合标准的POMDP值迭代求解算法^*

刘峰《模式识别与人工智能》2016,29(11):961-968

基于点的值迭代方法是求解部分可观测马尔科夫决策过程(POMDP)问题的一类有效算法.目前基于点的值迭代算法大都基于单一启发式标准探索信念点集,从而限制算法效果.基于此种情况,文中提出基于杂合标准探索信念点集的值迭代算法(HHVI),可以同时维持值函数的上界和下界.在扩展探索点集时,选取值函数上下界差值大于阈值的信念点进行扩展,并且在值函数上下界差值大于阈值的后继信念点中选择与已探索点集距离最远的信念点进行探索,保证探索点集尽量有效分布于可达信念空间内.在4个基准问题上的实验表明,HHVI能保证收敛效率,并能收敛到更好的全局最优解. 相似文献

14.

Cognitive navigation based on nonuniform Gabor space sampling, unsupervised growing networks, and reinforcement learning

Arleo A. Smeraldi F. Gerstner W. 《Neural Networks, IEEE Transactions on》2004,15(3):639-652

We study spatial learning and navigation for autonomous agents. A state space representation is constructed by unsupervised Hebbian learning during exploration. As a result of learning, a representation of the continuous two-dimensional (2-D) manifold in the high-dimensional input space is found. The representation consists of a population of localized overlapping place fields covering the 2-D space densely and uniformly. This space coding is comparable to the representation provided by hippocampal place cells in rats. Place fields are learned by extracting spatio-temporal properties of the environment from sensory inputs. The visual scene is modeled using the responses of modified Gabor filters placed at the nodes of a sparse Log-polar graph. Visual sensory aliasing is eliminated by taking into account self-motion signals via path integration. This solves the hidden state problem and provides a suitable representation for applying reinforcement learning in continuous space for action selection. A temporal-difference prediction scheme is used to learn sensorimotor mappings to perform goal-oriented navigation. Population vector coding is employed to interpret ensemble neural activity. The model is validated on a mobile Khepera miniature robot. 相似文献

15.

Motion strategies for exploration and map building under uncertainty with multiple heterogeneous robots

Luis Valentin Lourdes Muñoz-Gómez Rigoberto López-Padilla Moises Alencastre-Miranda 《Advanced Robotics》2014,28(17):1133-1149

In this paper, we present a multi-robot exploration strategy for map building. We consider an indoor structured environment and a team of robots with different sensing and motion capabilities. We combine geometric and probabilistic reasoning to propose a solution to our problem. We formalize the proposed solution using stochastic dynamic programming (SDP) in states with imperfect information. Our modeling can be considered as a partially observable Markov decision process (POMDP), which is optimized using SDP. We apply the dynamic programming technique in a reduced search space that allows us to incrementally explore the environment. We propose realistic sensor models and provide a method to compute the probability of the next observation given the current state of the team of robots based on a Bayesian approach. We also propose a probabilistic motion model, which allows us to take into account errors (noise) on the velocities applied to each robot. This modeling also allows us to simulate imperfect robot motions, and to estimate the probability of reaching the next state given the current state. We have implemented all our algorithms and simulations results are presented. 相似文献

16.

Point-based online value iteration algorithm in large POMDP

Bo Wu Hong-Yan Zheng Yan-Peng Feng 《Applied Intelligence》2014,40(3):546-555

Partially observable Markov decision process (POMDP) is an ideal framework for sequential decision-making under uncertainty in stochastic domains. However, it is notoriously computationally intractable to solving POMDP in real-time system. In order to address this problem, this paper proposes a point-based online value iteration (PBOVI) algorithm which involves performing value backup at specific reachable belief points, rather than over the entire belief simplex, to speed up computation processes, exploits branch-and-bound pruning approach to prune the AND/OR tree of belief states online, and proposes a novel idea to reuse the belief states that have been searched to avoid repeated computation. The experiment and simulation results show that the proposed algorithm can simultaneously satisfy the requirement of low errors and high timeliness in real-time system. 相似文献

17.

Developing reinforcement learning for adaptive co-construction of continuous high-dimensional state and action spaces

Masato Nagayoshi Hajime Murao Hisashi Tamaki 《Artificial Life and Robotics》2012,17(2):204-210

Engineers and researchers are paying more attention to reinforcement learning (RL) as a key technique for realizing adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. Our approach mainly deals with the problem of designing state and action spaces. Previously, an adaptive state space construction method which is called a ??state space filter?? and an adaptive action space construction method which is called ??switching RL??, have been proposed after the other space has been fixed. Then, we have reconstituted these two construction methods as one method by treating the former method and the latter method as a combined method for mimicking an infant??s perceptual and motor developments and we have proposed a method which is based on introducing and referring to ??entropy??. In this paper, a computational experiment was conducted using a so-called ??robot navigation problem?? with three-dimensional continuous state space and two-dimensional continuous action space which is more complicated than a so-called ??path planning problem??. As a result, the validity of the proposed method has been confirmed. 相似文献

18.

Real user evaluation of a POMDP spoken dialogue system using automatic belief compression

《Computer Speech and Language》2014,28(4):873-887

This article describes an evaluation of a POMDP-based spoken dialogue system (SDS), using crowdsourced calls with real users. The evaluation compares a “Hidden Information State” POMDP system which uses a hand-crafted compression of the belief space, with the same system instead using an automatically computed belief space compression. Automatically computed compressions are a way of introducing automation into the design process of statistical SDSs and promise a principled way of reducing the size of the very large belief spaces which often make POMDP approaches intractable. This is the first empirical comparison of manual and automatic approaches on a problem of realistic scale (restaurant, pub and coffee shop domain) with real users. The evaluation took 2193 calls from 85 users. After filtering for minimal user participation the two systems were compared on more than 1000 calls. 相似文献

19.

Image-based memory for robot navigation using properties of omnidirectional images 总被引：4，自引：0，他引：4

Emanuele Menegatti Takeshi Maeda Hiroshi Ishiguro 《Robotics and Autonomous Systems》2004,47(4):423-267

This paper proposes a new technique for vision-based robot navigation. The basic framework is to localise the robot by comparing images taken at its current location with reference images stored in its memory. In this work, the only sensor mounted on the robot is an omnidirectional camera. The Fourier components of the omnidirectional image provide a signature for the views acquired by the robot and can be used to simplify the solution to the robot navigation problem. The proposed system can calculate the robot position with variable accuracy (‘hierarchical localisation’) saving computational time when the robot does not need a precise localisation (e.g. when it is travelling through a clear space). In addition, the system is able to self-organise its visual memory of the environment. The self-organisation of visual memory is essential to realise a fully autonomous robot that is able to navigate in an unexplored environment. Experimental evidence of the robustness of this system is given in unmodified office environments. 相似文献

20.

结合优势结构和最小目标Q值的深度强化学习导航算法

朱威洪力栋施海东何德峰《控制理论与应用》2024,41(4):716-728

针对现有基于策略梯度的深度强化学习方法应用于办公室、走廊等室内复杂场景下的机器人导航时,存在训练时间长、学习效率低的问题,本文提出了一种结合优势结构和最小化目标Q值的深度强化学习导航算法.该算法将优势结构引入到基于策略梯度的深度强化学习算法中,以区分同一状态价值下的动作差异,提升学习效率,并且在多目标导航场景中,对状态价值进行单独估计,利用地图信息提供更准确的价值判断.同时,针对离散控制中缓解目标Q值过估计方法在强化学习主流的Actor-Critic框架下难以奏效,设计了基于高斯平滑的最小目标Q值方法,以减小过估计对训练的影响.实验结果表明本文算法能够有效加快学习速率,在单目标、多目标连续导航训练过程中,收敛速度上都优于柔性演员评论家算法(SAC),双延迟深度策略性梯度算法(TD3),深度确定性策略梯度算法(DDPG),并使移动机器人有效远离障碍物,训练得到的导航模型具备较好的泛化能力. 相似文献