期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于采样的POMDP近似算法 总被引：1，自引：0，他引：1

陈茂陈小平《计算机仿真》2006,23(5):64-67

部分可观察马尔科夫决策过程（POMDP）是一种描述机器人在动态不确定环境下行动选择的问题模型。对于具有稀疏转移矩阵的POMDP问题模型,该文提出了一种求解该问题模型的快速近似算法。该算法首先利用QMDP算法产生的策略进行信念空间采样,并通过点迭代算法快速生成POMDP值函数,从而产生近似的最优行动选择策略。在相同的POMDP试验模型上,执行该算法产生的策略得到的回报值与执行其他近似算法产生的策略得到的回报值相当,但该算法计算速度快,它产生的策略表示向量集合小于现有其他近似算法产生的集合。因此,它比这些近似算法更适应于大规模的稀疏状态转移矩阵POMDP模型求解计算。相似文献

2.

基于最小聚类划分的K-means聚类(1+ε)近似算法 总被引：3，自引：0，他引：3

王守强朱大铭史士英《计算机研究与发展》2008,45(Z1):26-30

k-means聚类算法是解决聚类问题的一个常用方法.近年来,国外许多学者对该问题的近似常数算法和(1 ε)近似算法进行了研究.利用Kumar等人随机取样技术对于基于最小聚类划分k-means提出一个(1 ε)随机近似算法.该算法利用随机取样技术从集合中求出部分取样点,再对随机取样点进行组合找出每个聚类的部分点,将该部分点的质心点作为相应子聚类簇的质心点.通过多次运行该算法可以以较高概率求出k-means聚类的1 ε近似值. 相似文献

3.

口语对话管理综述* 总被引：2，自引：0，他引：2

王菁华钟义信王枞刘建毅《计算机应用研究》2005,22(10):5-8

主要介绍了口语对话系统中对话管理的作用、基本问题和设计方法。对话管理在整个对话系统中处于核心地位,控制整个对话的进行,负责对用户输入的理解以及根据领域内容决定系统对用户的反应。对话管理的设计主要有基于状态图的结构（有限状态机）、填充槽结构和基于任务的结构三种方法,提出了一种基于逻辑表达式的结构,并设计了状态图/逻辑表达式双层结构。相似文献

4.

基于环境状态分布优化的POMDP值迭代求解算法

朱荣鑫《计算机应用研究》2022,39(2):374-378

基于点的值迭代算法是一类解决POMDP问题的有效算法,PBVI是基于点集的经典算法,但是其算法效率较为低下。FSVI使用内在的MDP最优策略来降低算法复杂度,但求解大规模问题的效果较差。为解决上述问题,提出了基于环境状态分布优化的前向搜索值迭代算法（PBVI-OSD）,通过基于权重值的Q^MDP选出最佳的动作,基于信念状态和转换函数选取最大可能的状态,基于动作和状态从观察中随机选取一个观察概率大于阈值的观察,由此获得更具探索价值的后继信念点集,提升值迭代收敛的质量。在四个基准问题上的实验表明,相比于FSVI和PBVI,PBVI-OSD能保证收敛效率,特别是在大规模问题上能收敛到更好的全局最优解。相似文献

5.

WSN中基于MDP与博弈论的入侵检测系统

陈明文颖谭涛《计算机工程与应用》2015,51(9):117-121

针对无线传感器网络（WSNs）中容易遭受多种攻击的问题,提出一种融合马尔可夫决策过程（MDP）和博弈论的WSN入侵检测系统（IDS）,称为马尔可夫博弈入侵检测系统（MG-IDS）。MG-IDS采用博弈论和MDP的异常、误用检测技术来确定最佳的防御策略,同时利用MDP和攻击模式挖掘算法,根据攻击记录来预测未来攻击模式。通过仿真实验,比较了MG-IDS、仅博弈论和仅MDP三种方案,在不同攻击频率下,对多类型混合攻击的防御性能进行了比较,实验结果表明,所提出的MG-IDS具有较高的防御成功率。相似文献

6.

基于GM(1,1)预测算法的民航飞机实时轨迹可视算法

钱睿乐俊刘丹《计算机系统应用》2017,26(11):187-192

针对民航飞机动态飞行可视化中滞后和跳跃问题,研究并提出了一种基于预测的民航飞机实时轨迹可视化算法.算法包括轨迹点预测,目标点行进和误差修正三部分.轨迹点预测基于GM （1,1）算法进行改进,在每次迭代时依据所有先验点和搜索方向动态调整发展系数.目标点行进基于预测坐标点提出TSUS （Time Slice Uniform Speed）算法,TSUS算法保证了在既定时间内目标确实移动到目的地,且能够根据初始速度方向调整轨迹弧度.误差修正采用分段式误差修正策略,以平衡不同情况下精确性和实用性.实验表明,提出的算法模型可用于民航飞机可视轨迹渲染中,且能获得良好的效果,提高系统的可用性和用户体验. 相似文献

7.

最大化流水作业加权成套订单数的研究

周水银傅青《控制工程》2007,14(2):212-214

针对流水作业的加工过程及其复杂性,建立了基于两台机器流水作业工件排序的0-1混合整数规划模型,以最大化加权成套订单数为目标函数.基于求解此类NP难题主要用启发式近似算法,提出了合成分派规则与局部搜索算法相结合的近似算法,用来求解所提出的混合整数规划模型.应用实例表明,用启发式近似算法,以最大化两机流水作业成套订单数为目标,可得出较满意的工件排列次序,从而表明了该算法的有效性. 相似文献

8.

一种基于链暗示技术的二分图受约束最小点覆盖问题的近似算法

许小双王建新刘云龙陈建二《计算机科学》2007,34(6):270-273

二分图受约束最小点覆盖问题作为一个NP-完全问题,无法在多项式时间内得到最优解,除非P=NP。基于此,本文提出了一种基于链暗示技术的二分图受约束最小点覆盖问题的近似算法,具体为：当二分图受约束最小点覆盖问题实例中存在满足约束条件的最小点覆盖（ku,kl）时,对任意给定的近似率δ=1＋ε〉1,一定可以找到一个受约束近似点覆盖（ku,kl）,对应的近似率为max{ku^＊/ku,kl^＊/kl}≤1＋ε,整个近似算法的运行时间复杂度为O（22/ε）。显然,它是二分图受约束最小点覆盖问题的一个多项式时间近似方案（polynomial time approximation scheme,PTAS算法）。相似文献

9.

密钥覆盖问题的建模、变换及近似算法

陆正福洪孙焱《小型微型计算机系统》2007,28(7):1189-1194

组密钥管理是组安全、多播安全中的核心问题.本文给出了密钥覆盖问题模型的建立过程,首次给出密钥覆盖问题（KCP）与顶点覆盖问题（VCP）的相互变换.基于从VCP到KCP的变换,证明了密钥覆盖问题是NP完全的;基于从KCP到VCP的变换,基于VCP的算法为KCP设计了一类近似算法并给出了模拟试验.本文的结果为组安全、多播安全研究提供了更为坚实的算法基础. 相似文献

10.

禁忌搜索在MPLS离线型规划设计中的应用研究 总被引：1，自引：0，他引：1

胡永良《计算机工程与设计》2006,27(3):387-389

MPLS离线型规划的主要优点是可以实现全局的优化设计,其主要目标是在满足业务QOS要求的前提下,最小化所需要的跳数,并进行合理流量分配以均衡系统负栽。最小路径集和流量分配问题（MPSFAP）是NP完全问题。提出了基于禁忌搜索的启发式算法求解MPSFAP问题,仿真结果表明此近似算法能很好地逼近精确解。相似文献

11.

Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems

Blaise Thomson Steve Young 《Computer Speech and Language》2010,24(4):562-588

This paper describes a statistically motivated framework for performing real-time dialogue state updates and policy learning in a spoken dialogue system. The framework is based on the partially observable Markov decision process (POMDP), which provides a well-founded, statistical model of spoken dialogue management. However, exact belief state updates in a POMDP model are computationally intractable so approximate methods must be used. This paper presents a tractable method based on the loopy belief propagation algorithm. Various simplifications are made, which improve the efficiency significantly compared to the original algorithm as well as compared to other POMDP-based dialogue state updating approaches. A second contribution of this paper is a method for learning in spoken dialogue systems which uses a component-based policy with the episodic Natural Actor Critic algorithm.The framework proposed in this paper was tested on both simulations and in a user trial. Both indicated that using Bayesian updates of the dialogue state significantly outperforms traditional definitions of the dialogue state. Policy learning worked effectively and the learned policy outperformed all others on simulations. In user trials the learned policy was also competitive, although its optimality was less conclusive. Overall, the Bayesian update of dialogue state framework was shown to be a feasible and effective approach to building real-world POMDP-based dialogue systems. 相似文献

12.

基于观测的POMDP 优化算法及其仿真

黄静殷保群李俊《信息与控制》2008,37(3):1-1

在分析马尔可夫决策过程（Markov Decision Process, MDP）性能灵敏度的基础上,讨论了部分可观测马尔可夫决策过程（Partially Observable Markov Decision Process, POMDP）的性能优化问题．给出了POMDP 性能灵敏度分析公式,并以此为基础提出了两种基于观测的POMDP 优化算法：策略梯度优化算法和策略迭代优化算法．最后以准许控制问题为仿真实例,验证了这两个算法的有效性．相似文献

13.

A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Lei Zheng Siu-Yeung Cho 《Neural Processing Letters》2011,33(2):187-200

Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree’s state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems. 相似文献

14.

SHP-VI:一种基于最短哈密顿通路的POMDP值迭代算法

冯奇周雪忠黄厚宽张小平《计算机研究与发展》2011,(12)

基于试探(trial-based)的值迭代算法是求解部分可观察Markov决策过程(partially observable Markov decision process,POMDP)模型的一类有效算法,其中FSVI算法是目前最快的算法之一.然而对于较大规模的POMDP问题,FSVI计算MDP值函数的时间是不容忽视的.提出一种基于最短哈密顿通路(shortest Hamiltonian path)的值迭代算法(shortest Hamiltonian path-based value iteration,SHP-VI).该方法用求解最短哈密顿通路问题的蚁群算法计算一条最优信念状态轨迹,然后在这些信念状态上反向更新值函数.通过与FSVI算法的实验比较,结果表明SHP-VI算法很大程度地提高了基于试探的算法计算信念状态轨迹的效率. 相似文献

15.

The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management

Steve Young Milica Ga&#x;i&#x; Simon Keizer Franois Mairesse Jost Schatzmann Blaise Thomson Kai Yu 《Computer Speech and Language》2010,24(2):150-174

This paper explains how Partially Observable Markov Decision Processes (POMDPs) can provide a principled mathematical framework for modelling the inherent uncertainty in spoken dialogue systems. It briefly summarises the basic mathematics and explains why exact optimisation is intractable. It then describes in some detail a form of approximation called the Hidden Information State model which does scale and which can be used to build practical systems. A prototype HIS system for the tourist information domain is evaluated and compared with a baseline MDP system using both user simulations and a live user trial. The results give strong support to the central contention that the POMDP-based framework is both a tractable and powerful approach to building more robust spoken dialogue systems. 相似文献

16.

Planning for robotic exploration based on forward simulation

《Robotics and Autonomous Systems》2016

We address the problem of controlling a mobile robot to explore a partially known environment. The robot’s objective is the maximization of the amount of information collected about the environment. We formulate the problem as a partially observable Markov decision process (POMDP) with an information-theoretic objective function, and solve it applying forward simulation algorithms with an open-loop approximation. We present a new sample-based approximation for mutual information useful in mobile robotics. The approximation can be seamlessly integrated with forward simulation planning algorithms. We investigate the usefulness of POMDP based planning for exploration, and to alleviate some of its weaknesses propose a combination with frontier based exploration. Experimental results in simulated and real environments show that, depending on the environment, applying POMDP based planning for exploration can improve performance over frontier exploration. 相似文献

17.

Prioritizing Point-Based POMDP Solvers

《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2008,38(6):1592-1605

Recent scaling up of partially observable Markov decision process (POMDP) solvers toward realistic applications is largely due to point-based methods that quickly converge to an approximate solution for medium-sized domains. These algorithms compute a value function for a finite reachable set of belief points, using backup operations. Point-based algorithms differ on the selection of the set of belief points and on the order by which backup operations are executed on the selected belief points. We first show how current algorithms execute a large number of backups that can be removed without reducing the quality of the value function. We demonstrate that the ordering of backup operations on a predefined set of belief points is important. In the simpler domain of MDP solvers, prioritizing the order of equivalent backup operations on states is known to speed up convergence. We generalize the notion of prioritized backups to the POMDP framework, showing how existing algorithms can be improved by prioritizing backups. We also present a new algorithm, which is the prioritized value iteration, and show empirically that it outperforms current point-based algorithms. Finally, a new empirical evaluation measure (in addition to the standard runtime comparison), which is based on the number of atomic operations and the number of belief points, is proposed in order to provide more accurate benchmark comparisons. 相似文献

18.

Partially observable Markov decision processes for spoken dialog systems

《Computer Speech and Language》2007,21(2):393-422

In a spoken dialog system, determining which action a machine should take in a given situation is a difficult problem because automatic speech recognition is unreliable and hence the state of the conversation can never be known with certainty. Much of the research in spoken dialog systems centres on mitigating this uncertainty and recent work has focussed on three largely disparate techniques: parallel dialog state hypotheses, local use of confidence scores, and automated planning. While in isolation each of these approaches can improve action selection, taken together they currently lack a unified statistical framework that admits global optimization. In this paper we cast a spoken dialog system as a partially observable Markov decision process (POMDP). We show how this formulation unifies and extends existing techniques to form a single principled framework. A number of illustrations are used to show qualitatively the potential benefits of POMDPs compared to existing techniques, and empirical results from dialog simulations are presented which demonstrate significant quantitative gains. Finally, some of the key challenges to advancing this method – in particular scalability – are briefly outlined. 相似文献

19.

一种基于独立任务的POMDP问题的解决方法_*

房俊恒朱斐刘全伏玉琛凌兴宏《计算机应用研究》2016,33(1)

部分可观测马尔可夫决策过程(POMDP)是马尔可夫决策过程(MDP)的扩展。通常利用POMDPs来模拟在部分可观测的随机环境中决策的Agents。针对完整POMDP的求解方法扩展能力弱的问题,提出把一个多元的POMDP分解成一组受限制的POMDPs,然后分别独立地求解每个这样的模型,获得一个值函数并将这些受限制的POMDPs的值函数结合起来以便获得一个完整POMDP的策略。该方法主要阐述了识别与独立任务相关的状态变量的过程,以及如何构造一个被限制在一个单独任务上的模型。将该方法应用到两个不同规模的岩石采样问题中,实验结果表明,该方法能够获得很好的策略。相似文献

20.

融合对比预测编码的深度双Q网络

下载免费PDF全文

刘剑锋普杰信孙力帆《计算机工程与应用》2023,59(6):162-170

在模型未知的部分可观测马尔可夫决策过程（partially observable Markov decision process,POMDP）下,智能体无法直接获取环境的真实状态,感知的不确定性为学习最优策略带来挑战。为此,提出一种融合对比预测编码表示的深度双Q网络强化学习算法,通过显式地对信念状态建模以获取紧凑、高效的历史编码供策略优化使用。为改善数据利用效率,提出信念回放缓存池的概念,直接存储信念转移对而非观测与动作序列以减少内存占用。此外,设计分段训练策略将表示学习与策略学习解耦来提高训练稳定性。基于Gym-MiniGrid环境设计了POMDP导航任务,实验结果表明,所提出算法能够捕获到与状态相关的语义信息,进而实现POMDP下稳定、高效的策略学习。相似文献