期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Point-based online value iteration algorithm in large POMDP

Bo Wu Hong-Yan Zheng Yan-Peng Feng 《Applied Intelligence》2014,40(3):546-555

Partially observable Markov decision process (POMDP) is an ideal framework for sequential decision-making under uncertainty in stochastic domains. However, it is notoriously computationally intractable to solving POMDP in real-time system. In order to address this problem, this paper proposes a point-based online value iteration (PBOVI) algorithm which involves performing value backup at specific reachable belief points, rather than over the entire belief simplex, to speed up computation processes, exploits branch-and-bound pruning approach to prune the AND/OR tree of belief states online, and proposes a novel idea to reuse the belief states that have been searched to avoid repeated computation. The experiment and simulation results show that the proposed algorithm can simultaneously satisfy the requirement of low errors and high timeliness in real-time system. 相似文献

2.

基于点的POMDP算法的预处理方法

卞爱华王崇骏陈世福《软件学报》2008,19(6):1309-1316

基于点的算法是部分可观察马尔可夫决策过程(partially observable Markov decision processes,简称POMDP)的一类近似算法.它们只在一个信念点集上进行Backup操作,避免了线性规划并使用了更少的中间变量,从而将计算瓶颈由选择向量转向了生成向量.但这类算法在生成向量时含有大量重复和无意义计算,针对于此,提出了基于点的POMDP算法的预处理方法(preprocessing method for point-based algorithms,简称PPBA).该方法对每个样本信念点作预处理,并且在生成α-向量之前首先计算出该选取哪个动作和哪些α-向量,从而消除了重复计算.PPBA还提出了基向量的概念,利用问题的稀疏性避免了无意义计算.通过在Perseus上的实验,表明PPBA很大地提高了算法的执行速度. 相似文献

3.

部分可观察马尔可夫决策过程研究进展

仵博吴敏《计算机工程与设计》2007,28(9):2116-2119,2126

部分可观察马尔可夫决策过程是通过引入信念状态空间将非马尔可夫链问题转化为马尔可夫链问题来求解,其描述真实世界的特性使它成为研究随机决策过程的重要分支.介绍了部分可观察马尔可夫决策过程的基本原理和决策过程,然后介绍了3种典型的算法,它们分别是Littman等人的Witness算法、hcremental Pruning算法和Pineau等人的基于点的值迭代算法,对这3种算法进行了分析比较.讲述部分可观察马尔可夫决策过程的应用. 相似文献

4.

基于杂合标准的POMDP值迭代求解算法^*

刘峰《模式识别与人工智能》2016,29(11):961-968

基于点的值迭代方法是求解部分可观测马尔科夫决策过程(POMDP)问题的一类有效算法.目前基于点的值迭代算法大都基于单一启发式标准探索信念点集,从而限制算法效果.基于此种情况,文中提出基于杂合标准探索信念点集的值迭代算法(HHVI),可以同时维持值函数的上界和下界.在扩展探索点集时,选取值函数上下界差值大于阈值的信念点进行扩展,并且在值函数上下界差值大于阈值的后继信念点中选择与已探索点集距离最远的信念点进行探索,保证探索点集尽量有效分布于可达信念空间内.在4个基准问题上的实验表明,HHVI能保证收敛效率,并能收敛到更好的全局最优解. 相似文献

5.

基于点的POMDPs在线值迭代算法

仵博吴敏佘锦华《软件学报》2013,24(1):25-36

部分可观察马尔可夫决策过程(partially observable Markov decision processes,简称POMDPs)是动态不确定环境下序贯决策的理想模型,但是现有离线算法陷入信念状态“维数灾”和“历史灾”问题,而现有在线算法无法同时满足低误差与高实时性的要求,造成理想的POMDPs模型无法在实际工程中得到应用.对此,提出一种基于点的POMDPs在线值迭代算法(point-based online value iteration,简称PBOVI).该算法在给定的可达信念状态点上进行更新操作,避免对整个信念状态空间单纯体进行求解,加速问题求解;采用分支界限裁剪方法对信念状态与或树进行在线裁剪;提出信念状态结点重用思想,重用上一时刻已求解出的信念状态点,避免重复计算.实验结果表明,该算法具有较低误差率、较快收敛性,满足系统实时性的要求. 相似文献

6.

一种基于独立任务的POMDP问题的解决方法_*

房俊恒朱斐刘全伏玉琛凌兴宏《计算机应用研究》2016,33(1)

部分可观测马尔可夫决策过程(POMDP)是马尔可夫决策过程(MDP)的扩展。通常利用POMDPs来模拟在部分可观测的随机环境中决策的Agents。针对完整POMDP的求解方法扩展能力弱的问题,提出把一个多元的POMDP分解成一组受限制的POMDPs,然后分别独立地求解每个这样的模型,获得一个值函数并将这些受限制的POMDPs的值函数结合起来以便获得一个完整POMDP的策略。该方法主要阐述了识别与独立任务相关的状态变量的过程,以及如何构造一个被限制在一个单独任务上的模型。将该方法应用到两个不同规模的岩石采样问题中,实验结果表明,该方法能够获得很好的策略。相似文献

7.

A survey of point-based POMDP solvers

Guy Shani Joelle Pineau Robert Kaplow 《Autonomous Agents and Multi-Agent Systems》2013,27(1):1-51

The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly due to the idea of restricting value function computations to a finite subset of the belief space, permitting only local value updates for this subset. This approach, known as point-based value iteration, avoids the exponential growth of the value function, and is thus applicable for domains with longer horizons, even with relatively large state spaces. Many extensions were suggested to this basic idea, focusing on various aspects of the algorithm—mainly the selection of the belief space subset, and the order of value function updates. In this survey, we walk the reader through the fundamentals of point-based value iteration, explaining the main concepts and ideas. Then, we survey the major extensions to the basic algorithm, discussing their merits. Finally, we include an extensive empirical analysis using well known benchmarks, in order to shed light on the strengths and limitations of the various approaches. 相似文献

8.

SHP-VI:一种基于最短哈密顿通路的POMDP值迭代算法

冯奇周雪忠黄厚宽张小平《计算机研究与发展》2011,(12)

基于试探(trial-based)的值迭代算法是求解部分可观察Markov决策过程(partially observable Markov decision process,POMDP)模型的一类有效算法,其中FSVI算法是目前最快的算法之一.然而对于较大规模的POMDP问题,FSVI计算MDP值函数的时间是不容忽视的.提出一种基于最短哈密顿通路(shortest Hamiltonian path)的值迭代算法(shortest Hamiltonian path-based value iteration,SHP-VI).该方法用求解最短哈密顿通路问题的蚁群算法计算一条最优信念状态轨迹,然后在这些信念状态上反向更新值函数.通过与FSVI算法的实验比较,结果表明SHP-VI算法很大程度地提高了基于试探的算法计算信念状态轨迹的效率. 相似文献

9.

On the design of communication-aware fault-tolerant scheduling algorithms for precedence constrained tasks in grid computing systems with dedicated communication devices

Qin Zheng Bharadwaj Veeravalli 《Journal of Parallel and Distributed Computing》2009

Fault-tolerant scheduling is an imperative step for large-scale computational Grid systems, as often geographically distributed nodes co-operate to execute a task. By and large, primary-backup approach is a common methodology used for fault tolerance wherein each task has a primary and a backup on two different processors. In this paper, we address the problem of how to schedule DAGs in Grids with communication delays so that service failures can be avoided in the presence of processors faults. The challenge is, that as tasks in a DAG have dependence on each other, a task must be scheduled to make sure that it will succeed when any of its predecessors fails due to a processor failure. We first propose a communication model and determine when communications between a backup and backups of its successors are necessary. Then we determine when a backup can start and its eligible processors so as to guarantee that every DAG can complete upon any processor failure. We develop two algorithms to schedule backups, which minimize response time and replication cost, respectively. We also develop a suboptimal algorithm which targets minimizing replication cost while not affecting response time. We conduct extensive simulation experiments to quantify the performance of the proposed algorithms. 相似文献

10.

基于环境状态分布优化的POMDP值迭代求解算法

朱荣鑫《计算机应用研究》2022,39(2):374-378

基于点的值迭代算法是一类解决POMDP问题的有效算法,PBVI是基于点集的经典算法,但是其算法效率较为低下。FSVI使用内在的MDP最优策略来降低算法复杂度,但求解大规模问题的效果较差。为解决上述问题,提出了基于环境状态分布优化的前向搜索值迭代算法（PBVI-OSD）,通过基于权重值的Q^MDP选出最佳的动作,基于信念状态和转换函数选取最大可能的状态,基于动作和状态从观察中随机选取一个观察概率大于阈值的观察,由此获得更具探索价值的后继信念点集,提升值迭代收敛的质量。在四个基准问题上的实验表明,相比于FSVI和PBVI,PBVI-OSD能保证收敛效率,特别是在大规模问题上能收敛到更好的全局最优解。相似文献

11.

Planning for multiple measurement channels in a continuous-state POMDP

Mikko Lauri Risto Ritala 《Annals of Mathematics and Artificial Intelligence》2013,67(3-4):283-317

Continuous-state partially observable Markov decision processes (POMDPs) are an intuitive choice of representation for many stochastic planning problems with a hidden state. We consider a continuous-state POMDPs with finite action and observation spaces, where the POMDP is parametrised by weighted sums of Gaussians, or Gaussian mixture models (GMMs). In particular, we study the problem of optimising the selection of measurement channel in such a framework. A new error bound for a point-based value iteration algorithm is derived, and a method for constructing a subset of belief states that attempts to reduce the error bound is implemented. In the experiments, applying continuous-state POMDPs for optimal selection of the measurement channel is demonstrated, and the performance of three GMM simplification methods is compared. Convergence of a point-based value iteration algorithm is investigated by considering various metrics for the obtained control policies. 相似文献

12.

Global motion planning under uncertain motion, sensing, and environment map

Hanna Kurniawati Tirthankar Bandyopadhyay Nicholas M. Patrikalakis 《Autonomous Robots》2012,33(3):255-272

Uncertainty in motion planning is often caused by three main sources: motion error, sensing error, and imperfect environment map. Despite the significant effect of all three sources of uncertainty to motion planning problems, most planners take into account only one or at most two of them. We propose a new motion planner, called Guided Cluster Sampling (GCS), that takes into account all three sources of uncertainty for robots with active sensing capabilities. GCS uses the Partially Observable Markov Decision Process (POMDP) framework and the point-based POMDP approach. Although point-based POMDPs have shown impressive progress over the past few years, it performs poorly when the environment map is imperfect. This poor performance is due to the extremely high dimensional state space, which translates to the extremely large belief space?B. We alleviate this problem by constructing a more suitable sampling distribution based on the observations that when the robot has active sensing capability, B can be partitioned into a collection of much smaller sub-spaces, and an optimal policy can often be generated by sufficient sampling of a small subset of the collection. Utilizing these observations, GCS samples B in two-stages, a subspace is sampled from the collection and then a belief is sampled from the subspace. It uses information from the set of sampled sub-spaces and sampled beliefs to guide subsequent sampling. Simulation results on marine robotics scenarios suggest that GCS can generate reasonable policies for motion planning problems with uncertain motion, sensing, and environment map, that are unsolvable by the best point-based POMDPs today. Furthermore, GCS handles POMDPs with continuous state, action, and observation spaces. We show that for a class of POMDPs that often occur in robot motion planning, given enough time, GCS converges to the optimal policy. To the best of our knowledge, this is the first convergence result for point-based POMDPs with continuous action space. 相似文献

13.

Parametric POMDPs for planning in continuous state spaces

《Robotics and Autonomous Systems》2006,54(11):887-897

This work addresses the problem of decision-making under uncertainty for robot navigation. Since robot navigation is most naturally represented in a continuous domain, the problem is cast as a continuous-state POMDP. Probability distributions over state space, or beliefs, are represented in parametric form using low-dimensional vectors of sufficient statistics. The belief space, over which the value function must be estimated, has dimensionality equal to the number of sufficient statistics. Compared to methods based on discretising the state space, this work trades the loss of the belief space’s convexity for a reduction in its dimensionality and an efficient closed-form solution for belief updates. Fitted value iteration is used to solve the POMDP. The approach is empirically compared to a discrete POMDP solution method on a simulated continuous navigation problem. We show that, for a suitable environment and parametric form, the proposed method is capable of scaling to large state-spaces. 相似文献

14.

副版本不可抢占的全局容错调度算法

彭浩陆阳孙峰韩江洪《软件学报》2016,27(12):3158-3171

容错是硬实时系统的关键能力,容错调度算法可以在有错误发生的情况下满足任务的实时性需求.在主副版本机制的容错调度算法中,主版本出错后留给副版本运行的时间窗口小,副版本容易错失截止期.针对副版本需要快速响应的问题,提出副版本不可抢占的全局容错调度算法FTGS-NPB（fault-tolerant global scheduling with non-preemptive backups）,赋予副版本全局最高优先级,使副版本在主版本出错后可以立刻获得处理器资源,并且在运行过程中不会被其他任务抢占.这样,副版本可以在最短时间内响应.分别基于截止期分析和响应时间分析建立了FTGS-NPB的可调度性测试,并分析了两种可调度性测试分别适用于不同的优先级分配算法.仿真实验结果表明,FTGS-NPB可以有效地减少实现容错的代价. 相似文献

15.

Scalable skyline computation using a balanced pivot selection technique

《Information Systems》2014

Skyline queries have recently received considerable attention as an alternative decision-making operator in the database community. The conventional skyline algorithms have primarily focused on optimizing the dominance of points in order to remove non-skyline points as efficiently as possible, but have neglected to take into account the incomparability of points in order to bypass unnecessary comparisons. To design a scalable skyline algorithm, we first analyze a cost model that copes with both dominance and incomparability, and develop a novel technique to select a cost-optimal point, called a pivot point, that minimizes the number of comparisons in point-based space partitioning. We then implement the proposed pivot point selection technique in the existing sorting- and partitioning-based algorithms. For point insertions/deletions, we also discuss how to maintain the current skyline using a skytree, derived from recursive point-based space partitioning. Furthermore, we design an efficient greedy algorithm for the k representative skyline using the skytree. Experimental results demonstrate that the proposed algorithms are significantly faster than the state-of-the-art algorithms. 相似文献

16.

Exploiting submodular value functions for scaling up active perception

Yash Satsangi Shimon Whiteson Frans A. Oliehoek Matthijs T. J. Spaan 《Autonomous Robots》2018,42(2):209-233

In active perception tasks, an agent aims to select sensory actions that reduce its uncertainty about one or more hidden variables. For example, a mobile robot takes sensory actions to efficiently navigate in a new environment. While partially observable Markov decision processes (POMDPs) provide a natural model for such problems, reward functions that directly penalize uncertainty in the agent’s belief can remove the piecewise-linear and convex (PWLC) property of the value function required by most POMDP planners. Furthermore, as the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially with it, making POMDP planning infeasible with traditional methods. In this article, we address a twofold challenge of modeling and planning for active perception tasks. We analyze $\rho $POMDP and POMDP-IR, two frameworks for modeling active perception tasks, that restore the PWLC property of the value function. We show the mathematical equivalence of these two frameworks by showing that given a $\rho $POMDP along with a policy, they can be reduced to a POMDP-IR and an equivalent policy (and vice-versa). We prove that the value function for the given $\rho $POMDP (and the given policy) and the reduced POMDP-IR (and the reduced policy) is the same. To efficiently plan for active perception tasks, we identify and exploit the independence properties of POMDP-IR to reduce the computational cost of solving POMDP-IR (and $\rho $POMDP). We propose greedy point-based value iteration (PBVI), a new POMDP planning method that uses greedy maximization to greatly improve scalability in the action space of an active perception POMDP. Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function. We establish the conditions under which the value function of an active perception POMDP is guaranteed to be submodular. Finally, we present a detailed empirical analysis on a dataset collected from a multi-camera tracking system employed in a shopping mall. Our method achieves similar performance to existing methods but at a fraction of the computational cost leading to better scalability for solving active perception tasks. 相似文献

17.

一种用户行为分析的动态备份机制

李金苗袁烁锋杨冬菊刘倩《电脑与微电子技术》2012,(5):18-22,35

目前大多数分布式存储系统采用静态备份机制保证系统的高可靠性,灵活性差,不适合系统规模及用户访问量不断变化的存储系统．因此提出一种基于用户行为分析的动态备份机制,通过分析用户行为获得相关信息来动态改变备份数量及确定备份位置,具有较强的灵活性,适合大规模的分布式存储系统;实现动态备份系统将其应用于已建立的科技信息数据中心．通过实验分析表明,该动态备份机制能提高系统的可靠性。相似文献

18.

A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Lei Zheng Siu-Yeung Cho 《Neural Processing Letters》2011,33(2):187-200

Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree’s state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems. 相似文献

19.

Random 3-SAT: The Plot Thickens

Cristian Coarfa Demetrios D. Demopoulos Alfonso San Miguel Aguirre Devika Subramanian Moshe Y. Vardi 《Constraints》2003,8(3):243-261

This paper presents an experimental investigation of the following questions: how does the average-case complexity of random 3-SAT, understood as a function of the order (number of variables) for fixed density (ratio of number of clauses to order) instances, depend on the density? Is there a phase transition in which the complexity shifts from polynomial to exponential in the order? Is the transition dependent or independent of the solver? Our experiment design uses three complete SAT solvers embodying different algorithms: GRASP, CPLEX, and CUDD. We observe new phase transitions for all three solvers, where the median running time shifts from polynomial in the order to exponential. The location of the phase transition appears to be solver-dependent. GRASP shifts from polynomial to exponential complexity near the density of 3.8, CPLEX shifts near density 3, while CUDD exhibits this transition between densities of 0.1 and 0.5. This experimental result underscores the dependence between the solver and the complexity phase transition, and challenges the widely held belief that random 3-SAT exhibits a phase transition in computational complexity very close to the crossover point. 相似文献

20.

Improving POMDP Tractability via Belief Compression and Clustering

《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2010,40(1):125-136

Partially observable Markov decision process (POMDP) is a commonly adopted mathematical framework for solving planning problems in stochastic environments. However, computing the optimal policy of POMDP for large-scale problems is known to be intractable, where the high dimensionality of the underlying belief space is one of the major causes. In this paper, we propose a hybrid approach that integrates two different approaches for reducing the dimensionality of the belief space: 1) belief compression and 2) value-directed compression. In particular, a novel orthogonal nonnegative matrix factorization is derived for the belief compression, which is then integrated in a value-directed framework for computing the policy. In addition, with the conjecture that a properly partitioned belief space can have its per-cluster intrinsic dimension further reduced, we propose to apply a $k$-means-like clustering technique to partition the belief space to form a set of sub-POMDPs before applying the dimension reduction techniques to each of them. We have evaluated the proposed belief compression and clustering approaches based on a set of benchmark problems and demonstrated their effectiveness in reducing the cost for computing policies, with the quality of the policies being retained. 相似文献