期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王蓁蓁邢汉承张志政倪庆剑《计算机科学》2007,34(10):1-7

逻辑马尔可夫决策过程和关系马尔可夫决策过程的引入,使得人们可能简洁地、陈述地表达复杂的马尔可夫决策过程。本文首先介绍有关逻辑马尔可夫决策过程和关系马尔可夫决策过程的概念,然后重点介绍它们与普通的马尔可夫决策过程根本不同的一些算法：①依赖于基本状态空间RL的转换法;②把Bellman方程推广到抽象状态空间的方法;③利用策略偏置空间寻求近似最优策略方法。最后对它们的研究现状进行总结及其对它们发展的一些展望。相似文献

2.

基于马尔可夫决策模型的测试向量排序新方法

王冠军王茂励赵莹《计算机科学》2010,37(5):287-290

时延测试向量排序是降低测试功耗的有效技术。提出了基于马尔可夫决策模型的时延测试向量排序新方法。对时延测试向量进行重排序,利用基于转换频度的诱导开关方程和海明距离来定义测试向量序列的转移概率,根据转移概率决定测试向量的顺序,降低测试电路的开关翻转频率,以达到降低峰值功耗和平均功耗的目的。给出了完整的算法TVO-MDP并进行算法最优性和复杂性分析。实验结果证实了本方法的有效性。相似文献

3.

Meaningful Update and Repair of Markov Decision Processes for Self-Adaptive Systems

下载免费PDF全文

杨文华潘敏学周宇黄志球《计算机科学技术学报》2022,37(1):106-127

Self-adaptive systems are able to adjust their behaviour in response to environmental condition changes and are widely deployed as Internetwares.Considered as a promising way to handle the ever-growing complexity of software systems,they have seen an increasing level of interest and are covering a variety of applications,e.g.,autonomous car systems and adaptive network systems.Many approaches for the construction of self-adaptive systems have been developed,and probabilistic models,such as Markov decision processes(MDPs),are one of the favoured.However,the majority of them do not deal with the problems of the underlying MDP being obsolete under new environments or unsatisfactory to the given properties.This results in the generated policies from such MDP failing to guide the self-adaptive system to run correctly and meet goals.In this article,we propose a systematic approach to updating an obsolete MDP by exploring new states and transitions and removing obsolete ones,and repairing an unsatisfactory MDP by adjusting its structure in a more meaningful way rather than arbitrarily changing the transition probabilities to values not in line with reality.Experimental results show that the MDPs updated and repaired by our approach are more competent in guiding the self-adaptive systems’correct running compared with the original ones. 相似文献

4.

Policy Gradient Approach of Event‐Based Optimization and Its Online Implementation

Li Xia 《Asian journal of control》2014,16(6):1735-1743

In the theory of event‐based optimization (EBO), the decision making is triggered by events, which is different from the traditional state‐based control in Markov decision processes (MDP). In this paper, we propose a policy gradient approach of EBO. First, an equation of performance gradient in the event‐based policy space is derived based on a fundamental quantity called Q‐factors of EBO. With the performance gradient, we can find the local optimum of EBO using the gradient‐based algorithm. Compared to the policy iteration approach in EBO, this policy gradient approach does not require restrictive conditions and it has a wider application scenario. The policy gradient approach is further implemented based on the online estimation of Q‐factors. This approach does not require the prior information about the system parameters, such as the transition probability. Finally, we use an EBO model to formulate the admission control problem and demonstrate the main idea of this paper. Such online algorithm provides an effective implementation of the EBO theory in practice. 相似文献

5.

马尔可夫决策过程两种抽象模式 总被引：1，自引：1，他引：1

王蓁蓁邢汉承张志政倪庆剑《计算机科学》2008,35(10):6-14

抽象层次上马尔可夫决策过程的引入,使得人们可简洁地、陈述地表达复杂的马尔可夫决策过程,解决常规马尔可夫决策过程(MDPs)在实际中所遇到的大型状态空间的表达问题.介绍了结构型和概括型两种不同类型抽象马尔可夫决策过程基本概念以及在各种典型抽象MDPs中的最优策略的精确或近似算法,其中包括与常规MDPs根本不同的一个算法:把Bellman方程推广到抽象状态空间的方法,并且对它们的研究历史进行总结和对它们的发展做一些展望,使得人们对它们有一个透彻的、全面而又重点的理解. 相似文献

6.

Interactive visualization for testing Markov Decision Processes: MDPVIS

《Journal of Visual Languages and Computing》2017

Markov Decision Processes (MDPs) are a formulation for optimization problems in sequential decision making. Solving MDPs often requires implementing a simulator for optimization algorithms to invoke when updating decision making rules known as policies. The combination of simulator and optimizer are subject to failures of specification, implementation, integration, and optimization that may produce invalid policies. We present these failures as queries for a visual analytic system (MDPVIS). MDPVIS addresses three visualization research gaps. First, the data acquisition gap is addressed through a general simulator-visualization interface. Second, the data analysis gap is addressed through a generalized MDP information visualization. Finally, the cognition gap is addressed by exposing model components to the user. MDPVIS generalizes a visualization for wildfire management. We use that problem to illustrate MDPVIS and show the visualization's generality by connecting it to two reinforcement learning frameworks that implement many different MDPs of interest in the research community. 相似文献

7.

Recursive Approaches for Single Sample Path Based Markov Reward Processes

Hai‐Tao Fang Han‐Fu Chen Xi‐Ren Cao 《Asian journal of control》2001,3(1):21-26

In this paper, two single sample path‐based recursive approaches for Markov decision problems are proposed. One is based on the simultaneous perturbation approach and can be applied to the general state problem, but its convergence rate is low. In this algorithm, the small perturbation on current parameters is necessary to get another sample path for comparison, but it may worsen the system. Hence, we introduce another approach, which directly estimates the gradient of the performance for optimization by “potential” theory. This algorithm, however, is limited to finite state space systems, but its convergence speed is higher than the first one. The estimate for gradient can be obtained by using the sample path with current parameters without any perturbation. This approach is more acceptable for practical applications. 相似文献

8.

连续时间Markov决策过程在呼叫接入控制中的应用

周亚平奚宏生殷保群唐昊《控制与决策》2001,16(Z1):795-799

应用Markov决策过程与性能势相结合的方法,给出了呼叫接入控制的策略优化算法.所得到的最优策略是状态相关的策略,与基于节点已占用带宽决定行动的策略相比,状态相关策略具有更好的性能值,而且该算法具有很快的收敛速度. 相似文献

9.

基于马尔可夫决策过程的无线传感器网络速率控制

李向鹏《计算机与现代化》2012,(7):152-154

无线传感器网络近年来得到了较为广泛的应用,其中能耗问题为该领域的研究热点问题。同时,随着无线传感器网络技术的不断发展,现在在传感器网络中常使用多速率进行网络传输,此多速率的属性提供了可进一步提高网络能耗性能的机会。本文提出一种基于马尔可夫决策过程控制无线传感器网络的多速率之间的转换,进而达到使网络更加节能的目的。仿真结果表明,在不影响通信质量的情况下,网络能耗性能得到了提高。相似文献

10.

A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Kearns Michael Mansour Yishay Ng Andrew Y. 《Machine Learning》2002,49(2-3):193-208

A critical issue for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or infinite state spaces, traditional planning and reinforcement learning algorithms may be inapplicable, since their running time typically grows linearly with the state space size in the worst case. In this paper we present a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states. The running time is exponential in the horizon time (which depends only on the discount factor and the desired degree of approximation to the optimal policy). Our algorithm thus provides a different complexity trade-off than classical algorithms such as value iteration—rather than scaling linearly in both horizon time and state space size, our running time trades an exponential dependence on the former in exchange for no dependence on the latter.Our algorithm is based on the idea of sparse sampling. We prove that a randomly sampled look-ahead tree that covers only a vanishing fraction of the full look-ahead tree nevertheless suffices to compute near-optimal actions from any state of an MDP. Practical implementations of the algorithm are discussed, and we draw ties to our related recent results on finding a near-best strategy from a given class of strategies in very large partially observable MDPs (Kearns, Mansour, & Ng. Neural information processing systems 13, to appear). 相似文献

11.

Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes 总被引：1，自引：1，他引：1

Hyeong Soo Chang Robert Givan Edwin K. P. Chong 《Discrete Event Dynamic Systems》2004,14(3):309-341

We propose a novel approach, called parallel rollout, to solving (partially observable) Markov decision processes. Our approach generalizes the rollout algorithm of Bertsekas and Castanon (1999) by rolling out a set of multiple heuristic policies rather than a single policy. In particular, the parallel rollout approach aims at the class of problems where we have multiple heuristic policies available such that each policy performs near-optimal for a different set of system paths. Parallel rollout automatically combines the given multiple policies to create a new policy that adapts to the different system paths and improves the performance of each policy in the set. We formally prove this claim for two criteria: total expected reward and infinite horizon discounted reward. The parallel rollout approach also resolves the key issue of selecting which policy to roll out among multiple heuristic policies whose performances cannot be predicted in advance. We present two example problems to illustrate the effectiveness of the parallel rollout approach: a buffer management problem and a multiclass scheduling problem. 相似文献

12.

On the General Utility of Discounted Markov Decision Processes

Y. Kadota M. Kurano M. Yasuda 《International Transactions in Operational Research》1998,5(1):27-34

We will discuss an expected utility of rewards which are generated by Markov decision processes. This is applied to the optimal stopping problem with a utility treatment. Also a combined model of the decision processes and the stopping problem, called a stopped Markov decision, is considered under the utility. 相似文献

13.

基于部分可观马氏决策的AUV全局路径规划

洪晔边信黔《计算机仿真》2007,24(6):146-149

自治式水下机器人在复杂海洋环境航行时要求寻找一条从给定起始点到终止点的较优的运动路径,安全、无碰撞地绕过所有的障碍物.提出了一种基于部分可观察马尔可夫决策过程,并结合预测障碍物运动的全局路径规划新方法; 给出了部分可观马尔可夫决策的数学模型;建立了树状的分层部分可观马尔可夫决策模型,并在路径规划中应用;提出了短期预测和长期预测两种针对水下障碍物运动轨迹预测的方法;最后通过仿真实验对AUV的全局路径规划能力进行了仿真验证,为今后的实艇试验打下了很好的基础. 相似文献

14.

基于Markov决策过程用交叉熵方法优化软件测试 总被引：3，自引：1，他引：2

张德平聂长海徐宝文《软件学报》2008,19(10):2770-2779

研究了待测软件某些参数已知的条件下,以最小化平均测试费用为目标的软件测试优化问题.将软件测试过程处理成马尔可夫(Markov)决策过程,给出了软件测试的马尔可夫决策模型,运用交叉熵方法,通过一种学习策略获得软件测试的最优测试剖面,用于优化软件测试.模拟结果表明,学习策略给出的测试剖面要优于随机测试策略,检测和排除相同数目的软件缺陷,学习策略比随机测试能够显著地减少测试用例数,降低测试成本,提高缺陷检测效率. 相似文献

15.

Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes

Mohammed Shahid Abdulla Shalabh Bhatnagar 《Discrete Event Dynamic Systems》2007,17(1):23-52

This article proposes several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes with finite state-space under the average cost criterion. Two of the algorithms are for the compact (non-discrete) action setting while the rest are for finite-action spaces. On the slower timescale, all the algorithms perform a gradient search over corresponding policy spaces using two different Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates. On the faster timescale, the differential cost function corresponding to a given stationary policy is updated and an additional averaging is performed for enhanced performance. A proof of convergence to a locally optimal policy is presented. Next, we discuss a memory efficient implementation that uses a feature-based representation of the state-space and performs TD(0) learning along the faster timescale. The TD(0) algorithm does not follow an on-line sampling of states but is observed to do well on our setting. Numerical experiments on a problem of rate based flow control are presented using the proposed algorithms. We consider here the model of a single bottleneck node in the continuous time queueing framework. We show performance comparisons of our algorithms with the two-timescale actor-critic algorithms of Konda and Borkar (1999) and Bhatnagar and Kumar (2004). Our algorithms exhibit more than an order of magnitude better performance over those of Konda and Borkar (1999).

Shalabh Bhatnagar (Corresponding author)Email:

相似文献

16.

基于知识粗糙度的决策树生成方法

路红梅《数字社区&智能家居》2007,(9):1412-1412,1442

决策树是一种重要的数据分类方法,测试属性的选择直接影响到决策树中结点的个数和深度,本文提出了一种基于知识粗糙度的方法。通过比较我们发现：在决策树的构造上,粗集理论中知识粗糙度的方法计算量较小,构造的决策树比经典ID3算法简洁,并且具有较高的分类精度。相似文献

17.

基于知识粗糙度的决策树生成方法

路红梅《数字社区&智能家居》2007,(17)

决策树是一种重要的数据分类方法,测试属性的选择直接影响到决策树中结点的个数和深度,本文提出了一种基于知识粗糙度的方法.通过比较我们发现:在决策树的构造上,粗集理论中知识粗糙度的方法计算量较小,构造的决策树比经典ID3算法简洁,并且具有较高的分类精度. 相似文献

18.

Markov决策过程不确定策略特征模式

黄镇谨陆阳杨娟方欢《计算机科学》2013,40(4):263-266

马尔科夫决策过程可以建模具有不确定性特征的复杂系统,而在进行模型分析时需要采用策略对不确定性进行处理。首先,研究不同策略下时空有界可达概率问题,给出不确定性解决策略的定义及分类方法。其次,在时间无关策略下,证明基于确定性选取动作和随机选取动作的时空有界可达概率的一致性,并且论证了时间依赖策略相对于时间无关策略具有更好的时空有界可达概率。最后结合实例简要阐述了结论的正确性。相似文献

19.

Web服务组合的马氏策略规划

曾伟胡垚《计算机工程与科学》2009,31(3)

针对Web服务存在的业务逻辑与服务质量的不确定性,以及时序、时间窗约束,本文提出了利用马尔可夫决策理论来解决Web服务组合中最优策略规划问题的方法。该方法首先将Web服务组合描述为有向无环图表示的任务网络,网络中每个节点代表一个任务。任务是由相应的Web服务来实现,任务之间的弧线代表任务间时序的约束,任务执行应满足时间窗的约束。在此基础上,建立Web服务组合的马尔可夫决策模型,从而获得Web服务组合的最优策略。相似文献

20.

Creating Decision Trees from Rules using RBDT‐1

下载免费PDF全文

Amany Abdelhalim Issa Traore Youssef Nakkabi 《Computational Intelligence》2016,32(2):216-239

Most of the methods that generate decision trees for a specific problem use the examples of data instances in the decision tree–generation process. This article proposes a method called RBDT‐1—rule‐based decision tree—for learning a decision tree from a set of decision rules that cover the data instances rather than from the data instances themselves. The goal is to create on demand a short and accurate decision tree from a stable or dynamically changing set of rules. The rules could be generated by an expert, by an inductive rule learning program that induces decision rules from the examples of decision instances such as AQ‐type rule induction programs, or extracted from a tree generated by another method, such as the ID3 or C4.5. In terms of tree complexity (number of nodes and leaves in the decision tree), RBDT‐1 compares favorably with AQDT‐1 and AQDT‐2, which are methods that create decision trees from rules. RBDT‐1 also compares favorably with ID3 while it is as effective as C4.5 where both (ID3 and C4.5) are well‐known methods that generate decision trees from data examples. Experiments show that the classification accuracies of the decision trees produced by all methods under comparison are indistinguishable. 相似文献