首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于遗传算法和强化学习的贝叶斯网络结构学习算法   总被引:1,自引:0,他引:1  
遗传算法是基于自然界中生物遗传规律的适应性原则对问题解空间进行搜寻和最优化的方法。贝叶斯网络是对不确定性知识进行建模、推理的主要方法,Bayesian网中的学习问题(参数学习与结构学习)是个NP-hard问题。强化学习是利用新顺序数据来更新学习结果的在线学习方法。介绍了利用强化学习指导遗传算法,实现对贝叶斯网结构进行有效学习。  相似文献   

2.
In this paper, an adaptive reinforcement learning approach is developed for a class of discrete‐time affine nonlinear systems with unmodeled dynamics. The multigradient recursive (MGR) algorithm is employed to solve the local optimal problem, which is inherent in gradient descent method. The MGR radial basis function neural network approximates the utility functions and unmodeled dynamics, which has a faster rate of convergence than that of the gradient descent method. A novel strategic utility function and cost function are defined for the affine systems. Finally, it concludes that all the signals in the closed‐loop system are semiglobal uniformly ultimately bounded through differential Lyapunov function method, and two simulation examples are presented to demonstrate the effectiveness of the proposed scheme.  相似文献   

3.
无人机自组织网络(FANET)被广泛应用于军事、应急救灾和环境监测等情况下的网络通信服务,良好的路由协议能为其在通信条件恶劣场景下的可靠传输提供保障。利用强化学习将路由选择描述为一个马尔可夫决策过程进行路由决策成为研究热点。为了更进一步地介绍和挖掘基于强化学习的FANET路由协议研究现状,首先介绍近几年来FANET传统路由协议上的一些改进;其次,基于强化学习的FANET路由协议研究的最新调研结果进行详细的介绍;同时,对路由研究算法中的状态、动作和奖励等建模规律进行深度挖掘,从路由的优化标准和强化学习优化过程等方面进行了比较;最后,根据目前基于强化学习FANET路由协议的研究现状进行总结和展望。  相似文献   

4.
Neural reinforcement learning for behaviour synthesis   总被引:5,自引:0,他引:5  
We present the results of a research aimed at improving the Q-learning method through the use of artificial neural networks. Neural implementations are interesting due to their generalisation ability. Two implementations are proposed: one with a competitive multilayer perceptron and the other with a self-organising map. Results obtained on a task of learning an obstacle avoidance behaviour for the mobile miniature robot Khepera show that this last implementation is very effective, learning more than 40 times faster than the basic Q-learning implementation. These neural implementations are also compared with several Q-learning enhancements, like the Q-learning with Hamming distance, Q-learning with statistical clustering and Dyna-Q.  相似文献   

5.
In this work, RL is used to find an optimal policy for a marketing campaign. Data show a complex characterization of state and action spaces. Two approaches are proposed to circumvent this problem. The first approach is based on the self-organizing map (SOM), which is used to aggregate states. The second approach uses a multilayer perceptron (MLP) to carry out a regression of the action-value function. The results indicate that both approaches can improve a targeted marketing campaign. Moreover, the SOM approach allows an intuitive interpretation of the results, and the MLP approach yields robust results with generalization capabilities.  相似文献   

6.
为了提高移动传感器网络时延系统控制能力,提出基于强化学习的移动传感器网络时延系统控制模型,采用高阶近似微分方程构建移动传感器网络时延系统的控制目标函数,结合最大似然估计方法进行移动传感器网络的时延参数估计,采用强化学习方法进行移动传感器网络的收敛性控制和自适应调度,建立传感器网络时延系统控制的多维测度信息配准模型,在强化跟踪学习寻优模式下实现移动传感器网络时延系统的自适应控制。仿真结果表明,采用该方法进行移动传感器网络时延系统控制的自适应性较好,时延参数估计准确度较高,控制过程的鲁棒性较强。  相似文献   

7.
The dynamicity of available resources and network conditions, such as channel capacity and traffic characteristics, have posed major challenges to scheduling in wireless networks. Reinforcement learning (RL) enables wireless nodes to observe their respective operating environment, learn, and make optimal or near-optimal scheduling decisions. Learning, which is the main intrinsic characteristic of RL, enables wireless nodes to adapt to most forms of dynamicity in the operating environment as time goes by. This paper presents an extensive review on the application of the traditional and enhanced RL approaches to various types of scheduling schemes, namely packet, sleep-wake and task schedulers, in wireless networks, as well as the advantages and performance enhancements brought about by RL. Additionally, it presents how various challenges associated with scheduling schemes have been approached using RL. Finally, we discuss various open issues related to RL-based scheduling schemes in wireless networks in order to explore new research directions in this area. Discussions in this paper are presented in a tutorial manner in order to establish a foundation for further research in this field.  相似文献   

8.
In this paper, an integral reinforcement learning (IRL) algorithm on an actor–critic structure is developed to learn online the solution to the Hamilton–Jacobi–Bellman equation for partially-unknown constrained-input systems. The technique of experience replay is used to update the critic weights to solve an IRL Bellman equation. This means, unlike existing reinforcement learning algorithms, recorded past experiences are used concurrently with current data for adaptation of the critic weights. It is shown that using this technique, instead of the traditional persistence of excitation condition which is often difficult or impossible to verify online, an easy-to-check condition on the richness of the recorded data is sufficient to guarantee convergence to a near-optimal control law. Stability of the proposed feedback control law is shown and the effectiveness of the proposed method is illustrated with simulation examples.  相似文献   

9.
In this paper, a new formulation for the optimal tracking control problem (OTCP) of continuous-time nonlinear systems is presented. This formulation extends the integral reinforcement learning (IRL) technique, a method for solving optimal regulation problems, to learn the solution to the OTCP. Unlike existing solutions to the OTCP, the proposed method does not need to have or to identify knowledge of the system drift dynamics, and it also takes into account the input constraints a priori. An augmented system composed of the error system dynamics and the command generator dynamics is used to introduce a new nonquadratic discounted performance function for the OTCP. This encodes the input constrains into the optimization problem. A tracking Hamilton–Jacobi–Bellman (HJB) equation associated with this nonquadratic performance function is derived which gives the optimal control solution. An online IRL algorithm is presented to learn the solution to the tracking HJB equation without knowing the system drift dynamics. Convergence to a near-optimal control solution and stability of the whole system are shown under a persistence of excitation condition. Simulation examples are provided to show the effectiveness of the proposed method.  相似文献   

10.
RAM-based neural networks are designed to be efficiently implemented in hardware. The desire to retain this property influences the training algorithms used, and has led to the use of reinforcement (reward-penalty) learning. An analysis of the reinforcement algorithm applied to RAM-based nodes has shown the ease with which unlearning can occur. An amended algorithm is proposed which demonstrates improved learning performance compared to previously published reinforcement regimes.  相似文献   

11.
无线传感器网络易遭到各种内部攻击,入侵检测系统需要消耗大量能量进行攻击检测以保障网络安全。针对无线传感器网络入侵检测问题,建立恶意节点(malicious node,MN)与簇头节点(cluster head node,CHN)的攻防博弈模型,并提出一种基于强化学习的簇头入侵检测算法——带有近似策略预测的策略加权学习算法(weighted policy learner with approximate policy prediction,WPL-APP)。实验表明,簇头节点采用该算法对恶意节点进行动态检测防御,使得博弈双方快速达到演化均衡,避免了网络出现大量检测能量消耗和网络安全性能的波动。  相似文献   

12.
梯度算法下RBF网的参数变化动态   总被引:2,自引:0,他引:2  
分析神经网络学习过程中各参数的变化动态,对理解网络的动力学行为,改进网络的结构和性能等具有积极意义.本文讨论了用梯度算法优化误差平方和损失函数时RBF网隐节点参数的变化动态,即算法收敛后各隐节点参数的可能取值.主要结论包括:如果算法收敛后损失函数不为零,则各隐节点将位于样本输入的加权聚类中心;如果损失函数为零,则网络中的冗余隐节点将出现萎缩、衰减、外移或重合现象.进一步的试验发现,对结构过大的RBF网,冗余隐节点的萎缩、外移、衰减和重合是频繁出现的现象.  相似文献   

13.
The learning of complex control behaviour of autonomous mobile robots is one of the actual research topics. In this article an intelligent control architecture is presented which integrates learning methods and available domain knowledge. This control architecture is based on Reinforcement Learning and allows continuous input and output parameters, hierarchical learning, multiple goals, self-organized topology of the used networks and online learning. As a testbed this architecture is applied to the six-legged walking machine LAURON to learn leg control and leg coordination.  相似文献   

14.
Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which usually forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-andupdating framework to iteratively improve the solution, where exploration and exploitation are also needed to be well balanced. Therefore, derivative-free optimization deals with a similar core issue as reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods have been developed for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. However, recent survey on this topic is still lacking. In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, we discuss some current limitations and possible future directions, hoping that this article could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.  相似文献   

15.
Cognitive radio network (CRN) enables unlicensed users (or secondary users, SUs) to sense for and opportunistically operate in underutilized licensed channels, which are owned by the licensed users (or primary users, PUs). Cognitive radio network (CRN) has been regarded as the next-generation wireless network centered on the application of artificial intelligence, which helps the SUs to learn about, as well as to adaptively and dynamically reconfigure its operating parameters, including the sensing and transmission channels, for network performance enhancement. This motivates the use of artificial intelligence to enhance security schemes for CRNs. Provisioning security in CRNs is challenging since existing techniques, such as entity authentication, are not feasible in the dynamic environment that CRN presents since they require pre-registration. In addition these techniques cannot prevent an authenticated node from acting maliciously. In this article, we advocate the use of reinforcement learning (RL) to achieve optimal or near-optimal solutions for security enhancement through the detection of various malicious nodes and their attacks in CRNs. RL, which is an artificial intelligence technique, has the ability to learn new attacks and to detect previously learned ones. RL has been perceived as a promising approach to enhance the overall security aspect of CRNs. RL, which has been applied to address the dynamic aspect of security schemes in other wireless networks, such as wireless sensor networks and wireless mesh networks can be leveraged to design security schemes in CRNs. We believe that these RL solutions will complement and enhance existing security solutions applied to CRN To the best of our knowledge, this is the first survey article that focuses on the use of RL-based techniques for security enhancement in CRNs.  相似文献   

16.
舒凌洲  吴佳  王晨 《计算机应用》2019,39(5):1495-1499
针对城市交通信号控制中如何有效利用相关信息优化交通控制并保证控制算法的适应性和鲁棒性的问题,提出一种基于深度强化学习的交通信号控制算法,利用深度学习网络构造一个智能体来控制整个区域交通。首先通过连续感知交通环境的状态来选择当前状态下可能的最优控制策略,环境的状态由位置矩阵和速度矩阵抽象表示,矩阵表示法有效地抽象出环境中的主要信息并减少了冗余信息;然后智能体以在有限时间内最大化车辆通行全局速度为目标,根据所选策略对交通环境的影响,利用强化学习算法不断修正其内部参数;最后,通过多次迭代,智能体学会如何有效地控制交通。在微观交通仿真软件Vissim中进行的实验表明,对比其他基于深度强化学习的算法,所提算法在全局平均速度、平均等待队长以及算法稳定性方面展现出更好的结果。其中,与基线相比,平均速度提高9%,平均等待队长降低约13.4%。实验结果证明该方法能够适应动态变化的复杂的交通环境。  相似文献   

17.
18.
当前在交通信号控制系统中引入智能化检测和控制已是大势所趋,特别是强化学习和深度强化学习方法在可扩展性、稳定性和可推广性等方面展现出巨大的技术优势,已成为该领域的研究热点。针对基于强化学习的交通信号控制任务进行了研究,在广泛调研交通信号控制方法研究成果的基础上,系统地梳理了强化学习和深度强化学习在智慧交通信号控制领域的分类及应用;并归纳了使用多智能体合作的方法解决大规模交通信号控制问题的可行方案,对大规模交通信号控制的交通场景影响因素进行了分类概述;从提高交通信号控制器性能的角度提出了本领域当前所面临的挑战和未来可能极具潜力的研究方向。  相似文献   

19.
The reinforcement and imitation learning paradigms have the potential to revolutionise robotics. Many successful developments have been reported in literature; however, these approaches have not been explored widely in robotics for construction. The objective of this paper is to consolidate, structure, and summarise research knowledge at the intersection of robotics, reinforcement learning, and construction. A two-strand approach to literature review was employed. A bottom-up approach to analyse in detail a selected number of relevant publications, and a top-down approach in which a large number of papers were analysed to identify common relevant themes and research trends. This study found that research on robotics for construction has not increased significantly since the 1980s, in terms of number of publications. Also, robotics for construction lacks the development of dedicated systems, which limits their effectiveness. Moreover, unlike manufacturing, construction's unstructured and dynamic characteristics are a major challenge for reinforcement and imitation learning approaches. This paper provides a very useful starting point to understating research on robotics for construction by (i) identifying the strengths and limitations of the reinforcement and imitation learning approaches, and (ii) by contextualising the construction robotics problem; both of which will aid to kick-start research on the subject or boost existing research efforts.  相似文献   

20.
The characteristics of control system design using a universal learning network (ULN) are such that both the controlled systems and their controller are represented in a unified framework, and that the learning stage of the ULN can be executed by using not only first-order derivatives (gradient) but also the higher order derivatives of the criterion function with respect to parameters. ULNs have the same generalization ability as neural networks. So the ULN controller is able to control the system in a favorable way under an environment which is little different from the environment of the control system at the learning stage. However, stability cannot be sufficiently realized. In this paper, we propose a robust control method using a ULN and second-order derivatives of that ULN. Robust control, as considered here, is defined as follows. Even though the initial values of the node outputs are very different from those at the learning stage, the control system is able to reduce its influence to other node outputs and can control the system as in the case of no variation. In order to realize such robust control, a new term concerning the variation is added to the usual criterion function, and the parameters are adjusted so as to minimize the above-mentioned criterion function using second-order derivatives of the criterion function with respect to the parameters. Finally, it is shown that the ULN controller constructed by the proposed method works effectively in a simulation study of a non-linear crane system. This work was presented, in part, at the International Symposium on Artificial Life and Robotics, Oita, Japan, February 18–20, 1996  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号