首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 625 毫秒
1.
In active perception tasks, an agent aims to select sensory actions that reduce its uncertainty about one or more hidden variables. For example, a mobile robot takes sensory actions to efficiently navigate in a new environment. While partially observable Markov decision processes (POMDPs) provide a natural model for such problems, reward functions that directly penalize uncertainty in the agent’s belief can remove the piecewise-linear and convex (PWLC) property of the value function required by most POMDP planners. Furthermore, as the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially with it, making POMDP planning infeasible with traditional methods. In this article, we address a twofold challenge of modeling and planning for active perception tasks. We analyze \(\rho \)POMDP and POMDP-IR, two frameworks for modeling active perception tasks, that restore the PWLC property of the value function. We show the mathematical equivalence of these two frameworks by showing that given a \(\rho \)POMDP along with a policy, they can be reduced to a POMDP-IR and an equivalent policy (and vice-versa). We prove that the value function for the given \(\rho \)POMDP (and the given policy) and the reduced POMDP-IR (and the reduced policy) is the same. To efficiently plan for active perception tasks, we identify and exploit the independence properties of POMDP-IR to reduce the computational cost of solving POMDP-IR (and \(\rho \)POMDP). We propose greedy point-based value iteration (PBVI), a new POMDP planning method that uses greedy maximization to greatly improve scalability in the action space of an active perception POMDP. Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function. We establish the conditions under which the value function of an active perception POMDP is guaranteed to be submodular. Finally, we present a detailed empirical analysis on a dataset collected from a multi-camera tracking system employed in a shopping mall. Our method achieves similar performance to existing methods but at a fraction of the computational cost leading to better scalability for solving active perception tasks.  相似文献   

2.
Uncertainty in motion planning is often caused by three main sources: motion error, sensing error, and imperfect environment map. Despite the significant effect of all three sources of uncertainty to motion planning problems, most planners take into account only one or at most two of them. We propose a new motion planner, called Guided Cluster Sampling (GCS), that takes into account all three sources of uncertainty for robots with active sensing capabilities. GCS uses the Partially Observable Markov Decision Process (POMDP) framework and the point-based POMDP approach. Although point-based POMDPs have shown impressive progress over the past few years, it performs poorly when the environment map is imperfect. This poor performance is due to the extremely high dimensional state space, which translates to the extremely large belief space?B. We alleviate this problem by constructing a more suitable sampling distribution based on the observations that when the robot has active sensing capability, B can be partitioned into a collection of much smaller sub-spaces, and an optimal policy can often be generated by sufficient sampling of a small subset of the collection. Utilizing these observations, GCS samples B in two-stages, a subspace is sampled from the collection and then a belief is sampled from the subspace. It uses information from the set of sampled sub-spaces and sampled beliefs to guide subsequent sampling. Simulation results on marine robotics scenarios suggest that GCS can generate reasonable policies for motion planning problems with uncertain motion, sensing, and environment map, that are unsolvable by the best point-based POMDPs today. Furthermore, GCS handles POMDPs with continuous state, action, and observation spaces. We show that for a class of POMDPs that often occur in robot motion planning, given enough time, GCS converges to the optimal policy. To the best of our knowledge, this is the first convergence result for point-based POMDPs with continuous action space.  相似文献   

3.
We consider partially observable Markov decision processes (POMDPs) with ω-regular conditions specified as parity objectives. The class of ω-regular languages provides a robust specification language to express properties in verification, and parity objectives are canonical forms to express them. The qualitative analysis problem given a POMDP and a parity objective asks whether there is a strategy to ensure that the objective is satisfied with probability 1 (resp. positive probability). While the qualitative analysis problems are undecidable even for special cases of parity objectives, we establish decidability (with optimal complexity) for POMDPs with all parity objectives under finite-memory strategies. We establish optimal (exponential) memory bounds and EXPTIME-completeness of the qualitative analysis problems under finite-memory strategies for POMDPs with parity objectives. We also present a practical approach, where we design heuristics to deal with the exponential complexity, and have applied our implementation on a number of POMDP examples.  相似文献   

4.
Solving partially observable Markov decision processes (POMDPs) is a complex task that is often intractable. This paper examines the problem of finding an optimal policy for POMDPs. While a lot of effort has been made to develop algorithms to solve POMDPs, the question of automatically finding good low-dimensional spaces in multi-agent co-operative learning domains has not been explored thoroughly. To identify this question, an online algorithm CMEAS is presented to improve the POMDP model. This algorithm is based on a look-ahead search to find the best action to execute at each cycle. Thus the overwhelming complexity of computing a policy for each possible situation is avoided. A series of simulations demonstrate this good strategy and performance of the proposed algorithm when multiple agents co-operate to find an optimal policy for POMDPs.  相似文献   

5.
Continuous-state partially observable Markov decision processes (POMDPs) are an intuitive choice of representation for many stochastic planning problems with a hidden state. We consider a continuous-state POMDPs with finite action and observation spaces, where the POMDP is parametrised by weighted sums of Gaussians, or Gaussian mixture models (GMMs). In particular, we study the problem of optimising the selection of measurement channel in such a framework. A new error bound for a point-based value iteration algorithm is derived, and a method for constructing a subset of belief states that attempts to reduce the error bound is implemented. In the experiments, applying continuous-state POMDPs for optimal selection of the measurement channel is demonstrated, and the performance of three GMM simplification methods is compared. Convergence of a point-based value iteration algorithm is investigated by considering various metrics for the obtained control policies.  相似文献   

6.
Control in spoken dialog systems is challenging largely because automatic speech recognition is unreliable, and hence the state of the conversation can never be known with certainty. Partially observable Markov decision processes (POMDPs) provide a principled mathematical framework for planning and control in this context; however, POMDPs face severe scalability challenges, and past work has been limited to trivially small dialog tasks. This paper presents a novel POMDP optimization technique-composite summary point-based value iteration (CSPBVI)-which enables optimization to be performed on slot-filling POMDP-based dialog managers of a realistic size. Using dialog models trained on data from a tourist information domain, simulation results show that CSPBVI scales effectively, outperforms non-POMDP baselines, and is robust to estimation errors.  相似文献   

7.
Monte-Carlo tree search for Bayesian reinforcement learning   总被引:2,自引:2,他引:0  
Bayesian model-based reinforcement learning can be formulated as a partially observable Markov decision process (POMDP) to provide a principled framework for optimally balancing exploitation and exploration. Then, a POMDP solver can be used to solve the problem. If the prior distribution over the environment’s dynamics is a product of Dirichlet distributions, the POMDP’s optimal value function can be represented using a set of multivariate polynomials. Unfortunately, the size of the polynomials grows exponentially with the problem horizon. In this paper, we examine the use of an online Monte-Carlo tree search (MCTS) algorithm for large POMDPs, to solve the Bayesian reinforcement learning problem online. We will show that such an algorithm successfully searches for a near-optimal policy. In addition, we examine the use of a parameter tying method to keep the model search space small, and propose the use of nested mixture of tied models to increase robustness of the method when our prior information does not allow us to specify the structure of tied models exactly. Experiments show that the proposed methods substantially improve scalability of current Bayesian reinforcement learning methods.  相似文献   

8.
部分可观测马尔可夫决策过程(POMDP)是马尔可夫决策过程(MDP)的扩展。通常利用POMDPs来模拟在部分可观测的随机环境中决策的Agents。针对完整POMDP的求解方法扩展能力弱的问题,提出把一个多元的POMDP分解成一组受限制的POMDPs,然后分别独立地求解每个这样的模型,获得一个值函数并将这些受限制的POMDPs的值函数结合起来以便获得一个完整POMDP的策略。该方法主要阐述了识别与独立任务相关的状态变量的过程,以及如何构造一个被限制在一个单独任务上的模型。将该方法应用到两个不同规模的岩石采样问题中,实验结果表明,该方法能够获得很好的策略。  相似文献   

9.
Reinforcement learning (RL) is an area of machine learning that is concerned with how an agent learns to make decisions sequentially in order to optimize a particular performance measure. For achieving such a goal, the agent has to choose either 1) exploiting previously known knowledge that might end up at local optimality or 2) exploring to gather new knowledge that expects to improve the current performance. Among other RL algorithms, Bayesian model-based RL (BRL) is well-known to be able to trade-off between exploitation and exploration optimally via belief planning, i.e. partially observable Markov decision process (POMDP). However, solving that POMDP often suffers from curse of dimensionality and curse of history. In this paper, we make two major contributions which are: 1) an integration framework of temporal abstraction into BRL that eventually results in a hierarchical POMDP formulation, which can be solved online using a hierarchical sample-based planning solver; 2) a subgoal discovery method for hierarchical BRL that automatically discovers useful macro actions to accelerate learning. In the experiment section, we demonstrate that the proposed approach can scale up to much larger problems. On the other hand, the agent is able to discover useful subgoals for speeding up Bayesian reinforcement learning.  相似文献   

10.
In this paper, we address the problem of suboptimal behavior during online partially observable Markov decision process (POMDP) planning caused by time constraints on planning. Taking inspiration from the related field of reinforcement learning (RL), our solution is to shape the agent’s reward function in order to lead the agent to large future rewards without having to spend as much time explicitly estimating cumulative future rewards, enabling the agent to save time to improve the breadth planning and build higher quality plans. Specifically, we extend potential-based reward shaping (PBRS) from RL to online POMDP planning. In our extension, information about belief states is added to the function optimized by the agent during planning. This information provides hints of where the agent might find high future rewards beyond its planning horizon, and thus achieve greater cumulative rewards. We develop novel potential functions measuring information useful to agent metareasoning in POMDPs (reflecting on agent knowledge and/or histories of experience with the environment), theoretically prove several important properties and benefits of using PBRS for online POMDP planning, and empirically demonstrate these results in a range of classic benchmark POMDP planning problems.  相似文献   

11.
Continuous-state POMDPs provide a natural representation for a variety of tasks, including many in robotics. However, most existing parametric continuous-state POMDP approaches are limited by their reliance on a single linear model to represent the world dynamics. We introduce a new switching-state dynamics model that can represent multi-modal state-dependent dynamics. We present the Switching Mode POMDP (SM-POMDP) planning algorithm for solving continuous-state POMDPs using this dynamics model. We also consider several procedures to approximate the value function as a mixture of a bounded number of Gaussians. Unlike the majority of prior work on approximate continuous-state POMDP planners, we provide a formal analysis of our SM-POMDP algorithm, providing bounds, where possible, on the quality of the resulting solution. We also analyze the computational complexity of SM-POMDP. Empirical results on an unmanned aerial vehicle collisions avoidance simulation, and a robot navigation simulation where the robot has faulty actuators, demonstrate the benefit of SM-POMDP over a prior parametric approach.  相似文献   

12.
Quantum-mechanical motion of a spin-half particle is examined in the axially symmetric fields of static naked singularities formed by a mass distribution with a quadrupole moment (q-metric). The analysis is performed by means of the method of effective potentials of the Dirac equation generalized to the case where radial and angular variables are not separated. If ?1 < q < qlim, |qlim| ? 1, where q is the quadrupolemoment in proper units, the naked singularities do not exclude the existence of stationary bound states of Dirac particles for a prolate mass distribution in the q-metric along the axial axis. For an oblate mass distribution, the naked singularities of the q-metric are separated from a Dirac particle by infinitely large repulsive barriers followed by a potential well which deepens while moving apart from the equator (from θ = θ min or θ = π ? θ min) toward the poles. The poles make an exception, and at 0 < q < q*, there are some points θ i for particle states with j ≥ 3/2.  相似文献   

13.
洪晔  边信黔 《计算机仿真》2007,24(6):146-149
自治式水下机器人在复杂海洋环境航行时要求寻找一条从给定起始点到终止点的较优的运动路径,安全、无碰撞地绕过所有的障碍物.提出了一种基于部分可观察马尔可夫决策过程,并结合预测障碍物运动的全局路径规划新方法; 给出了部分可观马尔可夫决策的数学模型;建立了树状的分层部分可观马尔可夫决策模型,并在路径规划中应用;提出了短期预测和长期预测两种针对水下障碍物运动轨迹预测的方法;最后通过仿真实验对AUV的全局路径规划能力进行了仿真验证,为今后的实艇试验打下了很好的基础.  相似文献   

14.
部分可观察Markov决策过程是通过引入信念状态空间将非Markov链问题转化为Markov链问题来求解,其描述真实世界的特性使它成为研究随机决策过程的重要分支.介绍了部分可观察Markov决策过程的基本原理和决策过程,提出一种基于策略迭代和值迭代的部分可观察Markov决策算法,该算法利用线性规划和动态规划的思想,解决当信念状态空间较大时出现的"维数灾"问题,得到Markov决策的逼近最优解.实验数据表明该算法是可行的和有效的.  相似文献   

15.
This paper proposes a new hierarchical formulation of POMDPs for autonomous robot navigation that can be solved in real-time, and is memory efficient. It will be referred to in this paper as the Robot Navigation–Hierarchical POMDP (RN-HPOMDP). The RN-HPOMDP is utilized as a unified framework for autonomous robot navigation in dynamic environments. As such, it is used for localization, planning and local obstacle avoidance. Hence, the RN-HPOMDP decides at each time step the actions the robot should execute, without the intervention of any other external module for obstacle avoidance or localization. Our approach employs state space and action space hierarchy, and can effectively model large environments at a fine resolution. Finally, the notion of the reference POMDP is introduced. The latter holds all the information regarding motion and sensor uncertainty, which makes the proposed hierarchical structure memory efficient and enables fast learning. The RN-HPOMDP has been experimentally validated in real dynamic environments.  相似文献   

16.
Online trip planning is a popular service that has facilitated a lot of people greatly. However, little attention has been paid to personalized trip planning which is even more useful. In this paper, we define a highly expressive personalized route planning query-the Personalized and Sequenced Route (PSR) Query which considers both personalization and sequenced constraint, and propose a novel framework to deal with the query. The framework consists of three phases: guessing, crossover and refinement. The guessing phase strives to obtain one high quality route as the baseline to bound the search space into a circular region. The crossover phase heuristically improve the quality of multiple guessed routes via a modified genetic algorithm, which further narrows the radius of the search space. The refinement phase backwardly examines each candidate point and partial route to rule out impossible ones. The combination of these phases can efficiently and effectively narrow our search space via a few iterations. In the experiment part, we firstly show our evaluation results of each phase separately, proving the effectiveness of each phase. Then, we present the evaluation results of the combination of them, which offers insight into the merits of the proposed framework.  相似文献   

17.
In 2013 Gaspar-Cunha et al. proposed a set of novel robust multi-objective benchmark functions to increase the difficulty of the current test problems and effectively mimic the characteristics of real search spaces. Despite the merits of the proposed benchmark problems, it is observed that the robust Pareto optimal fronts are located on the boundaries of the search space, which may result in the infeasibility of solutions obtained in case of perturbations along the negative side of the second parameter. This paper modifies the proposed test functions by Gaspar-Cunha et al. to mimic real problems better and allow the parameters to be fluctuated by any degree of perturbations. In fact, the robust fronts are shifted to the centre of the search space, so that any degree of uncertainties can be considered. The paper considers theoretical and experimental analysis of both set of test functions as well.  相似文献   

18.
Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree’s state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems.  相似文献   

19.
Recently, sparse subspace clustering, as a subspace learning technique, has been successfully applied to several computer vision applications, e.g. face clustering and motion segmentation. The main idea of sparse subspace clustering is to learn an effective sparse representation that are used to construct an affinity matrix for spectral clustering. While most of existing sparse subspace clustering algorithms and its extensions seek the forms of convex relaxation, the use of non-convex and non-smooth l q (0 < q < 1) norm has demonstrated better recovery performance. In this paper we propose an l q norm based Sparse Subspace Clustering method (lqSSC), which is motivated by the recent work that l q norm can enhance the sparsity and make better approximation to l 0 than l 1. However, the optimization of l q norm with multiple constraints is much difficult. To solve this non-convex problem, we make use of the Alternating Direction Method of Multipliers (ADMM) for solving the l q norm optimization, updating the variables in an alternating minimization way. ADMM splits the unconstrained optimization into multiple terms, such that the l q norm term can be solved via Smooth Iterative Reweighted Least Square (SIRLS), which converges with guarantee. Different from traditional IRLS algorithms, the proposed algorithm is based on gradient descent with adaptive weight, making it well suit for general sparse subspace clustering problem. Experiments on computer vision tasks (synthetic data, face clustering and motion segmentation) demonstrate that the proposed approach achieves considerable improvement of clustering accuracy than the convex based subspace clustering methods.  相似文献   

20.
仵博  吴敏  佘锦华 《软件学报》2013,24(1):25-36
部分可观察马尔可夫决策过程(partially observable Markov decision processes,简称POMDPs)是动态不确定环境下序贯决策的理想模型,但是现有离线算法陷入信念状态“维数灾”和“历史灾”问题,而现有在线算法无法同时满足低误差与高实时性的要求,造成理想的POMDPs模型无法在实际工程中得到应用.对此,提出一种基于点的POMDPs在线值迭代算法(point-based online value iteration,简称PBOVI).该算法在给定的可达信念状态点上进行更新操作,避免对整个信念状态空间单纯体进行求解,加速问题求解;采用分支界限裁剪方法对信念状态与或树进行在线裁剪;提出信念状态结点重用思想,重用上一时刻已求解出的信念状态点,避免重复计算.实验结果表明,该算法具有较低误差率、较快收敛性,满足系统实时性的要求.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号