共查询到20条相似文献,搜索用时 62 毫秒
1.
部分可观察马尔可夫决策过程是通过引入信念状态空间将非马尔可夫链问题转化为马尔可夫链问题来求解,其描述真实世界的特性使它成为研究随机决策过程的重要分支.介绍了部分可观察马尔可夫决策过程的基本原理和决策过程,然后介绍了3种典型的算法,它们分别是Littman等人的Witness算法、hcremental Pruning算法和Pineau等人的基于点的值迭代算法,对这3种算法进行了分析比较.讲述部分可观察马尔可夫决策过程的应用. 相似文献
2.
马尔可夫决策过程两种抽象模式 总被引:1,自引:1,他引:1
抽象层次上马尔可夫决策过程的引入,使得人们可简洁地、陈述地表达复杂的马尔可夫决策过程,解决常规马尔可夫决策过程(MDPs)在实际中所遇到的大型状态空间的表达问题.介绍了结构型和概括型两种不同类型抽象马尔可夫决策过程基本概念以及在各种典型抽象MDPs中的最优策略的精确或近似算法,其中包括与常规MDPs根本不同的一个算法:把Bellman方程推广到抽象状态空间的方法,并且对它们的研究历史进行总结和对它们的发展做一些展望,使得人们对它们有一个透彻的、全面而又重点的理解. 相似文献
3.
逻辑马尔可夫决策过程和关系马尔可夫决策过程的引入,使得人们可能简洁地、陈述地表达复杂的马尔可夫决策过程。本文首先介绍有关逻辑马尔可夫决策过程和关系马尔可夫决策过程的概念,然后重点介绍它们与普通的马尔可夫决策过程根本不同的一些算法:①依赖于基本状态空间RL的转换法;②把Bellman方程推广到抽象状态空间的方法;③利用策略偏置空间寻求近似最优策略方法。最后对它们的研究现状进行总结及其对它们发展的一些展望。 相似文献
4.
马尔可夫决策过程自适应决策的进展 总被引:6,自引:0,他引:6
在介绍一般马尔可夫决策过程的基础上,分析了当前主要马尔可夫过程自适应决策方法的基本思想、具体算法实现以及相应结论,总结了现有马尔可夫过程自适应决策算法的特点,并指出了需要进一步解决的问题。 相似文献
5.
6.
人类在处理问题中往往分为两个层次,首先在整体上把握问题,即提出大体方案,然后再具体实施.也就是说人类就是具有多分辨率智能系统的极好例子,他能够在多个层次上从底向上泛化(即看问题角度粒度变粗,它类似于抽象),并且又能从顶向下进行实例化(即看问题角度变细,它类似于具体化).由此构造了由在双层(理想空间即泛化和实际空间即实例化)上各自运行的马尔可夫决策过程组成的半马尔可夫决策过程,称之为双马尔可夫决策过程联合模型.然后讨论该联合模型的最优策略算法,最后给出一个实例说明双马尔可夫决策联合模型能够经济地节约思想,是运算有效性和可行性的一个很好的折中. 相似文献
7.
为了降低群体动画中生成大量自然而又相似的人体运动的难度和复杂性,研究了一种基于学习的群体动画生成技术。该技术首先通过建立基于高斯过程隐变量模型和隐空间动态模型的运动姿势学习模型,将高维运动姿势映射到低维隐空间中,并在低维隐空间对相邻姿势的动态演化进行建模;然后通过对已有运动数据的学习来获得组成该运动的姿势的概率分布,再通过隐空间中的动态预测和Hybrid Monte Carlo采样来得到符合给定概率分布的隐轨迹;最后通过姿势重构来得到与原运动非常相似但又不同的一系列自然的运动,以产生群体动画,从而避开了传统的基于几何和物理约束的逆运动方法固有的困难和复杂性。 相似文献
8.
基于马尔可夫过程模型的商业客户群体分析 总被引:3,自引:0,他引:3
利用马尔可夫算法建立相应的商业客户群体分析模型,对客户群体组成进行预测,然后根据预测结果进行分析,为企业制定市场策略提供了一定的依据。 相似文献
9.
无线传感器网络近年来得到了较为广泛的应用,其中能耗问题为该领域的研究热点问题。同时,随着无线传感器网络技术的不断发展,现在在传感器网络中常使用多速率进行网络传输,此多速率的属性提供了可进一步提高网络能耗性能的机会。本文提出一种基于马尔可夫决策过程控制无线传感器网络的多速率之间的转换,进而达到使网络更加节能的目的。仿真结果表明,在不影响通信质量的情况下,网络能耗性能得到了提高。 相似文献
10.
随着网络规模日益扩大,网络安全事件层出不穷,传统的网络入侵检测方法已不能满足当前网络的发展态势.为解决传统入侵检测方法中误报率过高、检测率和检测效率低等问题,提出了一种基于马尔可夫决策过程的入侵检测模型.在入侵检测系统内,根据马尔可夫的基本要素建立马尔可夫决策过程,采用模糊层次分析法为用户设置信用度,完成对用户信用度体... 相似文献
11.
Alexander L. Strehl 《Journal of Computer and System Sciences》2008,74(8):1309-1331
Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents a theoretical analysis of MBIE and a new variation called MBIE-EB, proving their efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less “online” cousins from the literature. 相似文献
12.
Visual motion segmentation (VMS) is an important and key part of many intelligent crowd systems. It can be used to figure out the flow behavior through a crowd and to spot unusual life-threatening incidents like crowd stampedes and crashes, which pose a serious risk to public safety and have resulted in numerous fatalities over the past few decades. Trajectory clustering has become one of the most popular methods in VMS. However, complex data, such as a large number of samples and parameters, makes it difficult for trajectory clustering to work well with accurate motion segmentation results. This study introduces a spatial-angular stacked sparse autoencoder model (SA-SSAE) with l2-regularization and softmax, a powerful deep learning method for visual motion segmentation to cluster similar motion patterns that belong to the same cluster. The proposed model can extract meaningful high-level features using only spatial-angular features obtained from refined tracklets (a.k.a ‘trajectories’). We adopt l2-regularization and sparsity regularization, which can learn sparse representations of features, to guarantee the sparsity of the autoencoders. We employ the softmax layer to map the data points into accurate cluster representations. One of the best advantages of the SA-SSAE framework is it can manage VMS even when individuals move around randomly. This framework helps cluster the motion patterns effectively with higher accuracy. We put forward a new dataset with its manual ground truth, including 21 crowd videos. Experiments conducted on two crowd benchmarks demonstrate that the proposed model can more accurately group trajectories than the traditional clustering approaches used in previous studies. The proposed SA-SSAE framework achieved a 0.11 improvement in accuracy and a 0.13 improvement in the F-measure compared with the best current method using the CUHK dataset. 相似文献
13.
14.
Quality of software is one of the most critical concerns in software system development, and many products fail to meet the quality objectives when constructed initially. Software quality is highly affected by the development process's actual dynamics. This article proposes the use of the Markov decision process (MDP) for the assessment of software quality because MDP is a useful technique to abstract the model of dynamics of the development process and to test its impact on quality. Additionally, the MDP modeling of the dynamics leads to early prediction of the quality, from the design phases all the way through the different stages of development. The proposed approach is based on the stochastic nature of the software development process, including project architecture, construction strategy of Software Quality Assurance system, its qualification actions, and team assignment strategy. It accepts these factors as inputs, generating a relative quality degree as an output. The proposed approach has been demonstrated for the design phase with a case study taken from the literature. The results prove its robustness and capability to identify appropriate policies in terms of quality, cost, and time. © 2011 Wiley Periodicals, Inc. 相似文献
15.
针对空间高速运动目标的运动特征,分析目标距离徙动轨迹(Range migration trajectory,RMT)与等效运动模型,提出了一种基于距离徙动轨迹的联合运动补偿算法。该算法依据距离像全局熵值最小化原则,从RMT中估计出目标的平动参数,根据平动参数分别补偿距离像偏移并校正一维距离像畸变,从而实现对空间目标回波的距离对齐和脉内走动的联合平动补偿。仿真和实测数据处理结果表明该算法准确性较高,更重要的是,距离对齐步骤不会引入随机偏移误差和相位误差,这也是应用高分辨成像方法的前提条件。 相似文献
16.
提出一种计算效率高且能以任意给定精度实现决策近优的新方法。该方法的原理是根据要求的决策精度对参数集进行有限区分,利用有偏极大似然估计器估计未知参数,并在决策过程中根据估计参数所在的分区获得控制对Markov过程进行决策。 相似文献
17.
为了实现多角色运动合成,提出将多角色可变形运动模型与运动片元相结合的方法.在运动片元构造阶段,使用多角色可变形运动模型来为片元库增加语义相同而细节不同的多角色交互性运动片元;在运动片元拼接阶段,使用随机抽样算法和确定性搜索算法相结合的策略来拼接运动片元;在多角色运动合成阶段,使用自顶向下策略匹配存在环境约束的情况,并使用大片元优先策略匹配片元形状不规则的情况.实验结果表明,该方法能实现运动片元间的平滑过渡,得到在时间上连续、空间上不重叠的无缝拼接图. 相似文献
18.
传统无线传感器网络(WSN)节点定位算法难以适应节点快速移动的高拓扑变化环境,导致识别误差较大。针对该问题,提出一种基于运动轨迹捕捉与正交覆盖机制的WSN节点定位算法。利用捕捉锚节点射频强度的方法对节点运动轨迹进行覆盖定位,获取性能最佳的锚节点及其坐标,改善因锚节点失效或信号强度弱导致的弱定位现象。在此基础上,采用拉格朗日插值函数设计运动轨迹捕捉方法,联合纵向及横向坐标维度进行节点运动矢量的精确捕捉,在精度可控的条件下实现对下一时刻节点坐标的初步预测,优化锚节点对运动节点的区域覆盖。同时利用正交覆盖方式设计基于过滤机制的区域优化方法,提高覆盖区域坐标抽样和网络信号定位精度。仿真结果表明,与2S-HGR机制和TDLM机制相比,该算法具有较好的动态路径捕捉效果与坐标定位准确性。 相似文献
19.
The operation of complex environmental systems usually accounts for multiple, conflicting objectives, whose presence imposes to explicitly consider the preference structure of the parties involved. Multi-objective Markov Decision Processes are a useful mathematical framework for the resolution of such sequential, decision-making problems. However, the computational requirements of the available optimization techniques limit their application to problems involving few objectives. In real-world applications it is therefore common practice to select few, representative objectives with respect to which the problem is solved. This paper proposes a dimensionality reduction approach, based on the Non-negative Principal Component Analysis (NPCA), to aggregate the original objectives into a reduced number of principal components, with respect to which the optimization problem is solved. The approach is evaluated on the daily operation of a multi-purpose water reservoir (Tono Dam, Japan) with 10 operating objectives, and compared against a 5-objectives formulation of the same problem. Results show that the NPCA-based approach provides a better representation of the Pareto front, especially in terms of consistency and solution diversity. 相似文献