首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
针对车载边缘计算环境中卸载场景的动态变化对计算卸载的影响,提出了一种基于马尔科夫决策过程的计算切换策略,在保证任务完成时间的基础上,对计算卸载的整体过程进行分析,从而进一步降低了计算切换的引入对卸载效果的影响。仿真实验针对计算切换的引入是否有助于提升计算卸载的效果以及如何进一步降低计算切换的引入对计算卸载的影响进行了4种算法的对比,实验结果表明,基于文中提出的计算切换策略,可以提升计算卸载的效率,保证用户的服务体验。  相似文献   

2.
以B2B电子市场中卖方agent的智能定价问题为应用背景,在库诺特短视调整基础上,应用Q学习算法,提出了基于情节序列训练的学习方法,将纯粹以结果为反馈的强化学习方法和以推理为目标的慎思过程结合起来,提高了算法的在线学习性能。仿真实验验证了算法的有效性,为推向实际应用奠定了基础。  相似文献   

3.
4.
This paper propose a fuzzy concept of return cost of Markov Decision Process (MDP) model which is an application of dynamic programming to the solution of probabilistic decision process. The return structure of the process is measured by Triangular Fuzzy Number (TFN). The comparison method is based on the ranking method.

The goal of this research is to provide the optimal solution for a finite stage and infinite stage which can be manipulated to study the real-world situation for the purpose of aiding the decision maker [6,7].  相似文献   


5.
Hierarchical algorithms for Markov decision processes have been proved to be useful for the problem domains with multiple subtasks. Although the existing hierarchical approaches are strong in task decomposition, they are weak in task abstraction, which is more important for task analysis and modeling. In this paper, we propose a task-oriented design to strengthen the task abstraction. Our approach learns an episodic task model from the problem domain, with which the planner obtains the same control effect, with concise structure and much improved performance than the original model. According to our analysis and experimental evaluation, our approach has better performance than the existing hierarchical algorithms, such as MAXQ and HEXQ.  相似文献   

6.
Many real-life critical systems are described with large models and exhibit both probabilistic and non-deterministic behaviour. Verification of such systems requires techniques to avoid the state space explosion problem. Symbolic model checking and compositional verification such as assume-guarantee reasoning are two promising techniques to overcome this barrier. In this paper, we propose a probabilistic symbolic compositional verification approach (PSCV) to verify probabilistic systems where each component is a Markov decision process (MDP). PSCV starts by encoding implicitly the system components using compact data structures. To establish the symbolic compositional verification process, we propose a sound and complete symbolic assume-guarantee reasoning rule. To attain completeness of the symbolic assume-guarantee reasoning rule, we propose to model assumptions using interval MDP. In addition, we give a symbolic MTBDD-learning algorithm to generate automatically the symbolic assumptions. Moreover, we propose to use causality to generate small counterexamples in order to refine the conjecture assumptions. Experimental results suggest promising outlooks for our probabilistic symbolic compositional approach.  相似文献   

7.
In this paper we consider a completely ergodic Markov decision process with finite state and decision spaces using the average return per unit time criterion. An algorithm is derived which approximates the optimal solution. It will be shown that this algorithm is finite and supplies upper and lower bounds for the maximal average return and a nearly optimal policy with average return between these bounds.  相似文献   

8.
9.
The immune system is attracting attention as a new biological information processing-type paradigm. It is a large-scale system equipped with a complicated biological defense function. It has functions of memory and learning that use interactions such as stimulus and suppression between immune cells. In this article, we propose and construct a reinforcement learning method based on an immune network adapted to a semi-Markov decision process (SMDP). We show that the proposed method is capable of dealing with a problem which is modeled as a SMDP environment through computer simulation. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

10.
多智能体强化学习方法在仿真模拟、游戏对抗、推荐系统等许多方面取得了突出的进展。然而,现实世界的复杂问题使得强化学习方法存在无效探索多、训练速度慢、学习能力难以持续提升等问题。该研究嵌入规则的多智能体强化学习技术,提出基于组合训练的规则与学习结合的方式,分别设计融合规则的多智能体强化学习模型与规则选择模型,通过组合训练将两者有机结合,能够根据当前态势决定使用强化学习决策还是使用规则决策,有效解决在学习中使用哪些规则以及规则使用时机的问题。依托中国电子科技集团发布的多智能体对抗平台,对提出的方法进行实验分析和验证。通过与内置对手对抗,嵌入规则的方法经过约1.4万局训练就收敛到60%的胜率,而没有嵌入规则的算法需要约1.7万局的时候收敛到50%的胜率,结果表明嵌入规则的方法能够有效提升学习的收敛速度和最终效果。  相似文献   

11.
The image denoising is a very basic but important issue in the field of image procession. Most of the existing methods addressing this issue only show desirable performance when the image complies with their underlying assumptions. Especially, when there is more than one kind of noises, most of the existing methods may fail to dispose the corresponding image. To address this problem, we propose a two-step image denoising method motivated by the statistical learning theory. Under the proposed framework, the type and variance of noise are estimated with support vector machine (SVM) first, and then this information is employed in the proposed denoising algorithm to further improve its denoising performance. Finally, comparative study is constructed to demonstrate the advantages and effectiveness of the proposed method.  相似文献   

12.
在智能规划问题上,寻找规划解都是NP甚至NP完全问题,如果动作的执行效果带有不确定性,如在Markov决策过程的规划问题中,规划的求解将会更加困难,现有的Markov决策过程的规划算法往往用一个整体状态节点来描述某个动作的实际执行效果,试图回避状态内部的复杂性,而现实中的大量动作往往都会产生多个命题效果,对应多个命题节点。为了能够处理和解决这个问题,提出了映像动作,映像路节和映像规划图等概念,并在其基础上提出了Markov决策过程的蚁群规划算法,从而解决了这一问题。并且证明了算法得到的解,即使在不确定的执行环境下,也具有不低于一定概率的可靠性。  相似文献   

13.
王学宁  贺汉根  徐昕 《控制与决策》2004,19(11):1263-1266
针对部分可观测马氏决策过程(POMDP)中,由于感知混淆现象的存在,利用Sarsa等算法得到的无记忆策略可能发生振荡的现象,研究了一种基于记忆的强化学习算法——CPnSarsa(λ)学习算法来解决该问题.它通过重新定义状态,Agent结合观测历史来识别混淆状态.将CPnSarsa(λ)算法应用到一些典型的POMDP,最后得到的是最优或近似最优策略,与以往算法相比,该算法的收敛速度有了很大提高.  相似文献   

14.
徐明  刘广钟 《计算机应用》2015,35(11):3047-3050
针对水声传感器网络低带宽、高延迟特性造成的空时不确定性以及网络状态不能充分观察的问题,提出一种基于部分可观测马尔可夫决策过程(POMDP)的水声传感器网络介质访问控制协议.该协议首先将每个传感器节点的链路质量和剩余能量划分为多个离散等级来表达节点的状态信息.此后,接收节点通过信道状态观测和接入动作的历史信息对信道的占用概率进行预测,从而得出发送节点的信道最优调度策略;发送节点按照该策略中的调度序列在各自所分配的时隙内依次与接收节点进行通信,传输数据包.通信完成后,相关节点根据网络转移概率的统计量估计下一个时隙的状态.仿真实验表明,与传统的水声传感器网络介质访问控制协议相比,基于POMDP的介质访问控制协议可以提高数据包传输成功率和网络吞吐量,并且降低网络的能量消耗.  相似文献   

15.
16.
基于统计特征的JPEG图像通用隐写分析   总被引:1,自引:0,他引:1  
分析了嵌入信息前后原始图像与隐密图像之间统计特征的差异,根据隐密图像,采用裁剪、重压缩的方法,得到图像F2来代替原始图像,利用15个向量函数提取出特征进行实验,并运用支持向量机(SVM)对待测图像分类,得到了较好的效果.  相似文献   

17.
18.
We present a design and implementation of a flexible videoconference system (VCS) using multiagent computing technology. The proposed system, we named FVCS, aims to reduce the burden of the users under the operational environment with insufficient computational resources, such as the Internet environment with small-scale computers at homes and offices, by embedding flexibility to the conventional videoconference system. In this paper, we design and implement FVCS with knowledge-based multiagent framework to realize adaptability of FVCS. We also evaluate the adaptability of the prototype systems of FVCS based on an operational situation observed in its experiments. From the result of the experiments, we conclude that the multiagent-based design and implementation is reasonable for construction of FVCS.  相似文献   

19.
Markov decision processes (MDP) are widely used in problems whose solutions may be represented by a certain series of actions. A lot of papers demonstrate successful MDP use in model problems, robotic control problems, planning problems, etc. In addition, economic problems have the property of multistep motion towards a goal as well. This paper is dedicated to MDP application to the problem of pricing policy management. The problem of dynamic pricing is stated in terms of MDP. Additional attention is paid to the method of constructing an MDP model based on data mining. Based on the data on sales of an actual industrial plant, construction of an MDP model that includes the searching for and generalization of regularities is demonstrated.  相似文献   

20.
张汝波  孟雷  史长亭 《计算机应用》2015,35(8):2375-2379
针对智能水下机器人(AUV)软件故障修复过程中存在的修复代价过高和系统环境只有部分可观察的问题,提出了一种基于微重启技术和部分客观马尔可夫决策(POMDP)模型的AUV软件故障修复方法。该方法结合AUV软件系统分层结构特点,构建了基于微重启的三层重启结构,便于细粒度的自修复微重启策略的实施;并依据部分可观马尔可夫决策过程理论,给出AUV软件自修复POMDP模型,同时采用基于点的值迭代(PBVI)算法求解生成修复策略,以最小化累积修复代价为目标,使系统在部分可观环境下能够以较低的修复代价执行修复动作。仿真实验结果表明,基于微重启技术和POMDP模型的AUV软件故障修复方法能够解决由软件老化及系统调用引起的AUV软件故障,同与两层微重启策略和三层微重启固定策略相比,该方法在累积故障修复时间和运行稳定性上明显更优。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号