首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The goal of dialogue management in a spoken dialogue system is to take actions based on observations and inferred beliefs. To ensure that the actions optimize the performance or robustness of the system, researchers have turned to reinforcement learning methods to learn policies for action selection. To derive an optimal policy from data, the dynamics of the system is often represented as a Markov Decision Process (MDP), which assumes that the state of the dialogue depends only on the previous state and action. In this article, we investigate whether constraining the state space by the Markov assumption, especially when the structure of the state space may be unknown, truly affords the highest reward. In simulation experiments conducted in the context of a dialogue system for interacting with a speech-enabled web browser, models under the Markov assumption did not perform as well as an alternative model which classifies the total reward with accumulating features. We discuss the implications of the study as well as its limitations.  相似文献   

2.
An analytical expression is provided to evaluate the sensitivity (i.e. the derivative with respect to a system parameter) of the cumulative reward distribution for systems modeled by homogeneous Markov reward processes. Both transition rates and reward rates are assumed to be function of the system parameter. An upper bound is also provided for the error introduced by the numerical evaluation of the sensitivity.  相似文献   

3.
We present a non-equilibrium analysis and control approach for the Active Queue Management (AQM) problem in communication networks. Using simplified fluid models, we carry out a bifurcation study of the complex dynamic queue behavior to show that non-equilibrium methods are essential for analysis and optimization in the AQM problem. We investigate an ergodic theoretic framework for stochastic modeling of the non-equilibrium behavior in deterministic models and use it to identify parameters of a fluid model from packet level simulations. For computational tractability, we use set-oriented numerical methods to construct finite-dimensional Markov models, including control Markov chains and hidden Markov models. Subsequently, we develop and analyze an example AQM algorithm using a Markov Decision Process (MDP) based control framework. The control scheme developed is optimal with respect to a reward function, defined over the queue size and aggregate flow rate. We implement and simulate our illustrative AQM algorithm in the ns-2 network simulator. The results obtained confirm the theoretical analysis and exhibit promising performance when compared with well-known alternative schemes under persistent non-equilibrium queue behavior.  相似文献   

4.
We consider a discrete time, finite state Markov reward process that depends on a set of parameters. We start with a brief review of (stochastic) gradient descent methods that tune the parameters in order to optimize the average reward, using a single (possibly simulated) sample path of the process of interest. The resulting algorithms can be implemented online, and have the property that the gradient of the average reward converges to zero with probability 1. On the other hand, the updates can have a high variance, resulting in slow convergence. We address this issue and propose two approaches to reduce the variance. These approaches rely on approximate gradient formulas, which introduce an additional bias into the update direction. We derive bounds for the resulting bias terms and characterize the asymptotic behavior of the resulting algorithms. For one of the approaches considered, the magnitude of the bias term exhibits an interesting dependence on the time it takes for the rewards to reach steady-state. We also apply the methodology to Markov reward processes with a reward-free termination state, and an expected total reward criterion. We use a call admission control problem to illustrate the performance of the proposed algorithms.  相似文献   

5.
在马尔可夫决策模型框架下,提出一种基于轨迹分析的计算评估方法,通过分析驾驶回报设置和车辆轨迹的特征期望衡量自主导航系统的性能。假定回报函数是回报特征的线性组合,通过逼近不同的车辆自主驾驶策略,求解应用于沙盒场景的回报设置,从而仿真导航轨迹的特征期望。实验结果表明,该方法能对自主导航系统的轨迹数据实现定性和定量评估。  相似文献   

6.
The ins and outs of the probabilistic model checker MRMC   总被引:1,自引:0,他引:1  
The Markov Reward Model Checker (MRMC) is a software tool for verifying properties over probabilistic models. It supports PCTL and CSL model checking, and their reward extensions. Distinguishing features of MRMC are its support for computing time- and reward-bounded reachability probabilities, (property-driven) bisimulation minimization, and precise on-the-fly steady-state detection. Recent tool features include time-bounded reachability analysis for continuous-time Markov decision processes (CTMDPs) and CSL model checking by discrete-event simulation. This paper presents the tool’s current status and its implementation details.  相似文献   

7.
Solution techniques for Markov decision problems rely on exact knowledge of the transition rates, which may be difficult or impossible to obtain. In this paper, we consider Markov decision problems with uncertain transition rates represented as compact sets. We first consider the problem of sensitivity analysis where the aim is to quantify the range of uncertainty of the average per‐unit‐time reward given the range of uncertainty of the transition rates. We then develop solution techniques for the problem of obtaining the max‐min optimal policy, which maximizes the worst‐case average per‐unit‐time reward. In each of these problems, we distinguish between systems that can have their transition rates chosen independently and those where the transition rates depend on each other. Our solution techniques are applicable to Markov decision processes with fixed but unknown transition rates and to those with time‐varying transition rates.  相似文献   

8.
Video services are likely to dominate the traffic in future broadband networks. Most of these services will be provided by large- scale public-access video servers. Research to date has shown that disk arrays are a promising technology for providing the storage and throughput required to serve many independent video streams to a large customer population. Large disk arrays, however, are susceptible to disk failures which can greatly affect their reliability. In this paper, we discuss suitable redundancy mechanisms to increase the reliability of disk arrays and compare the performance of the RAID-3 and RAID-5 redundancy schemes. We use cost and performability analyses to rigorously compare the two schemes over a variety of conditions. Accurate cost models are developed and Markov reward models (with time-dependent reward structures) are developed and used to give insight into the tradeoffs between system cost and revenue earning potential. The paper concludes that for large-scale video servers, coarse-grained striping in a RAID-5 style of disk array is most cost effective.  相似文献   

9.
Web浏览预测的Markov模型综述   总被引:5,自引:0,他引:5  
Web访问模式挖掘研究的一个重要议题是Web浏览预测,Markov模型是一种经典的Web浏览预测模型.本文首先介绍了基本Markov浏览预测模型,包括基本Markov浏览行为模型,模型的学习训练及其在Web浏览预测问题中的应用;然后重点分析了扩展的Markov浏览预测模型,包括一序组合预测模型、高序模型、混合模型、隐Mark-ov模型、连续时间Markov模型等,综述了各种扩展模型所考虑的浏览预测问题的本质出发点、模型的学习方法及预测方法,最后分析了Markov浏览预测模型有待进一步研究的问题.  相似文献   

10.
基于静态马尔可夫链模型的实时异常检测   总被引:7,自引:0,他引:7  
马尔可夫链模型可以用来描述系统的正常行为模式,文中提出了一种基于静态马尔可夫链的异常检测方法,在此基础上进行了算法实现。实验结果表明该方法实现简单,准确率较高,可适用于不同环境下的实时检测。  相似文献   

11.
12.
We generalize and build on the PAC Learning framework for Markov Decision Processes developed in Jain and Varaiya (2006). We consider the reward function to depend on both the state and the action. Both the state and action spaces can potentially be countably infinite. We obtain an estimate for the value function of a Markov decision process, which assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We derive bounds on the number of runs needed for the convergence of the empirical average to the expected reward uniformly for a class of policies, in terms of the V-C or pseudo dimension of the policy class. We then propose a framework to obtain an ?-optimal policy from simulation. We provide sample complexity of such an approach.  相似文献   

13.
This paper describes efficient procedures for model checking Markov reward models, that allow us to evaluate, among others, the performability of computer-communication systems. We present the logic CSRL (Continuous Stochastic Reward Logic) to specify performability measures. It provides flexibility in measure specification and paves the way for the numerical evaluation of a wide variety of performability measures. The formal measure specification in CSRL also often helps in reducing the size of the Markov reward models that need to be numerically analysed. The paper presents background on Markov-reward models, as well as on the logic CSRL (syntax and semantics), before presenting an important duality result between reward and time. We discuss CSRL model-checking algorithms, and present five numerical algorithms and their computational complexity for verifying time- and reward-bounded until-properties, one of the key operators in CSRL. The versatility of our approach is illustrated through a performability case study.  相似文献   

14.
基于隐马尔可夫模型的火焰检测   总被引:1,自引:0,他引:1       下载免费PDF全文
吴铮  孙立  汪亚明  夏一民 《计算机工程》2008,34(20):213-214
提出一种利用隐马尔可夫模型对普通视频中的火焰进行分析的方法,除应用运动和颜色分析对火焰进行识别外,还通过隐马尔可夫模型对火焰的闪烁特性进行分析。实验结果表明,该方法能有效区分火焰和具有火焰颜色的普通运动物体,减少了火灾监测中误报警的次数,具有一定的实际意义。  相似文献   

15.
A weakness of classical Markov decision processes (MDPs) is that they scale very poorly due to the flat state-space representation. Factored MDPs address this representational problem by exploiting problem structure to specify the transition and reward functions of an MDP in a compact manner. However, in general, solutions to factored MDPs do not retain the structure and compactness of the problem representation, forcing approximate solutions, with approximate linear programming (ALP) emerging as a promising MDP-approximation technique. To date, most ALP work has focused on the primal-LP formulation, while the dual LP, which forms the basis for solving constrained Markov problems, has received much less attention. We show that a straightforward linear approximation of the dual optimization variables is problematic, because some of the required computations cannot be carried out efficiently. Nonetheless, we develop a composite approach that symmetrically approximates the primal and dual optimization variables (effectively approximating both the objective function and the feasible region of the LP), leading to a formulation that is computationally feasible and suitable for solving constrained MDPs. We empirically show that this new ALP formulation also performs well on unconstrained problems.   相似文献   

16.
文章主要介绍了在蛋白质结构预测中两种被有效使用的模型--隐马尔可夫模型和输入隐马尔可夫模型,分别阐述了其原理、算法及应用实例。数值实验表明,这两种方法对小样本的预测实用具有较强的适应性。  相似文献   

17.
传统的基于吸收马尔科夫链进行图像显著性检测方法只能检测出与图像背景差异较大的目标,或者位于图像中心的显著目标.但通常情况下,被关注的目标并不具有这样的条件.提出了一种面向对象的吸收马尔科夫链的显著性检测算法,并将其应用于金丝猴面部的显著性检测中.算法在传统的吸收马尔科夫链进行图像显著性检测的过程中,引入惩罚因子,依据一定的先验信息来动态调整吸收时间.根据超像素块与目标色彩信息之间的差异对颜色权重进行相应的奖励或惩罚,以指引算法能够正确提取多个显著目标.实验表明:相对于传统算法,算法能够更准确地检测出被关注的显著目标,尤其在图像中含有多个关注目标时,效果更加显著.  相似文献   

18.
Wavelet analysis has found widespread use in signal processing and many classification tasks. Nevertheless, its use in dynamic pattern recognition have been much more restricted since most of wavelet models cannot handle variable length sequences properly. Recently, composite hidden Markov models which observe structured data in the wavelet domain were proposed to deal with this kind of sequences. In these models, hidden Markov trees account for local dynamics in a multiresolution framework, while standard hidden Markov models capture longer correlations in time. Despite these models have shown promising results in simple applications, only generative approaches have been used so far for parameter estimation. The goal of this work is to take a step forward in the development of dynamic pattern recognizers using wavelet features by introducing a new discriminative training method for this Markov models. The learning strategy relies on the minimum classification error approach and provides re-estimation formulas for fully non-tied models. Numerical experiments on phoneme recognition show important improvement over the recognition rate achieved by the same models trained using maximum likelihood estimation.  相似文献   

19.
Markov chain Monte Carlo algorithms are computationally expensive for large models. Especially, the so-called one-block Metropolis-Hastings (M-H) algorithm demands large computational resources, and parallel computing seems appealing. A parallel one-block M-H algorithm for latent Gaussian Markov random field (GMRF) models is introduced. Important parts of this algorithm are parallel exact sampling and evaluation of GMRFs. Parallelisation is achieved with parallel algorithms from linear algebra for sparse symmetric positive definite matrices. The parallel GMRF sampler is tested for GMRFs on lattices and irregular graphs, and gives both good speed-up and good scalability. The parallel one-block M-H algorithm is used to make inference for a geostatistical GMRF model with a latent spatial field of 31,500 variables.  相似文献   

20.
Admission control of hospitalization considering patient gender is an interesting issue in the study of hospital bed management. This paper addresses the decision on the admission of patients who should immediately be admitted into a same-gender room or rejected. Note that a patient is admitted depending on different conditions, such as his/her health condition, gender, the availability of beds, the length of stay, and the reward of hospitalization. Focusing on the key factor, patient gender, this paper sets up an infinite-horizon total discounted reward Markov decision process model with the purpose to maximize the total expected reward for the hospital, which leads to an optimal dynamic policy. Then, the structural properties of the optimal policy are analyzed. Additionally, a value iteration algorithm is proposed to find the optimal policy. Finally, some numerical experiments are used to discuss how the optimal dynamic policy depends on some key parameters of the system. Furthermore, the performance of the optimal policy is discussed though comparison with the three other policies by means of simulating different scenarios.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号