首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.

为推进强化学习研究的进一步深入和扩大其实际应用范围,从强化学习研究的理论基础---知识表示和运用的角度对强化学习进行分类,并就经典随机强化学习,模糊强化学习,定性强化学习以及灰色强化学习作了较详细的探讨与比较.最后从知识表达和运用的角度对强化学习的发展进行了展望.

  相似文献   

2.
知识图谱是一种用图结构建模事物及事物间联系的数据表示形式,是实现认知智能的重要基础,得到了学术界和工业界的广泛关注.知识图谱的研究内容主要包括知识表示、知识抽取、知识融合、知识推理4部分.目前,知识图谱的研究还存在一些挑战.例如,知识抽取面临标注数据获取困难而远程监督训练样本存在噪声问题,知识推理的可解释性和可信赖性有待进一步提升,知识表示方法依赖人工定义的规则或先验知识,知识融合方法未能充分建模实体之间的相互依赖关系等问题.由环境驱动的强化学习算法适用于贯序决策问题.通过将知识图谱的研究问题建模成路径(序列)问题,应用强化学习方法,可解决知识图谱中的存在的上述相关问题,具有重要应用价值.首先梳理了知识图谱和强化学习的基础知识.其次,对基于强化学习的知识图谱相关研究进行全面综述.再次,介绍基于强化学习的知识图谱方法如何应用于智能推荐、对话系统、游戏攻略、生物医药、金融、安全等实际领域.最后,对知识图谱与强化学习相结合的未来发展方向进行展望.  相似文献   

3.
知识推理是解决知识图谱中知识缺失问题的重要方法,针对大规模知识图谱中知识推理方法仍存在可解释性差、推理准确率和效率偏低的问题,提出了一种将知识表示和深度强化学习相结合的方法RLPTransE。利用知识表示学习方法,将知识图谱映射到含有三元组语义信息的向量空间中,并在该空间中建立强化学习环境。通过单步择优策略网络和多步推理策略网络的训练,使强化学习智能体在与环境交互过程中,高效挖掘推理规则进而完成推理。在公开数据集上的实验结果表明,相比于其他先进方法,该方法在大规模数据集推理任务中取得更好的表现。  相似文献   

4.
近年来,深度强化学习在序列决策领域被广泛应用并且效果良好,尤其在具有高维输入、大规模状态空间的应用场景中优势明显.然而,深度强化学习相关方法也存在一些局限,如缺乏可解释性、初期训练低效与冷启动等问题.针对这些问题,提出了一种基于显式知识推理和深度强化学习的动态决策框架,将显式的知识推理与深度强化学习结合.该框架通过显式知识表示将人类先验知识嵌入智能体训练中,让智能体在强化学习中获得知识推理结果的干预,以提高智能体的训练效率,并增加模型的可解释性.将显式知识分为两种,即启发式加速知识与规避式安全知识.前者在训练初期干预智能体决策,加快训练速度;而后者将避免智能体作出灾难性决策,使其训练过程更为稳定.实验表明,该决策框架在不同强化学习算法上、不同应用场景中明显提高了模型训练效率,并增加了模型的可解释性.  相似文献   

5.
知识推理是知识图谱补全的重要方法,已在垂直搜索、智能问答等多个应用领域发挥重要作用。随着知识推理应用研究的不断深入,知识推理的可解释性受到了广泛关注。基于深度强化学习的知识推理方法具备更好的可解释性和更强的推理能力,能够更加充分地利用知识图谱中实体、关系等信息,使得推理效果更好。简要介绍知识图谱及其研究的基本情况,阐述知识推理的基本概念和近年来的研究进展,着重从封闭域推理和开放域推理两个角度,对当下基于深度强化学习知识推理方法进行了深入分析和对比,同时对所涉及到的数据集和评价指标进行了总结,并对未来研究方向进行了展望。  相似文献   

6.
深度强化学习是人工智能研究中的热点问题,随着研究的深入,其中的短板也逐渐暴露出来,如数据利用率低、泛化能力弱、探索困难、缺乏推理和表征能力等,这些问题极大地制约着深度强化学习方法在现实问题中的应用。知识迁移是解决此问题的非常有效的方法,文中从深度强化学习的视角探讨了如何使用知识迁移加速智能体训练和跨领域迁移过程,对深度强化学习中知识的存在形式及作用方式进行了分析,并按照强化学习的基本构成要素对深度强化学习中的知识迁移方法进行了分类总结,最后总结了目前深度强化学习中的知识迁移在算法、理论和应用方面存在的问题和发展方向。  相似文献   

7.
本文提出一种基于定性模糊网络的强化学习知识传递方法。该方法通过建立系统的定性模型,并用定性模糊网络抽取基于定性动作的次优策略的共同特征获得与系统参数无关知识。这些知识能有效描述参数值不同的系统所具有的共同控制规律,加快在新参数值的系统中强化学习的收敛速度。  相似文献   

8.
知识推理是补全知识图谱的重要方法,旨在根据图谱中已有的知识,推断出未知的事实或关系.针对多数推理方法仍存在没有充分考虑实体对之间的路径信息,且推理效率偏低、可解释性差的问题,提出了将TuckER嵌入和强化学习相结合的知识推理方法 TuckRL (TuckER embedding with reinforcement learning).首先,通过TuckER嵌入将实体和关系映射到低维向量空间,在知识图谱环境中采用策略引导的强化学习算法对路径推理过程进行建模,然后在路径游走进行动作选择时引入动作修剪机制减少无效动作的干扰,并将LSTM作为记忆组件保存智能体历史动作轨迹,促使智能体更准确地选择有效动作,通过与知识图谱的交互完成知识推理.在3个主流大规模数据集上进行了实验,结果表明TuckRL优于现有的大多数推理方法,说明将嵌入和强化学习相结合的方法用于知识推理的有效性.  相似文献   

9.
针对知识推理过程中,随着推理路径长度的增加,节点的动作空间急剧增长,使得推理难度不断提升的问题,提出一种分层强化学习的知识推理方法(knowledge reasoning method of hierarchical reinforcement learning, MutiAg-HRL),降低推理过程中的动作空间大小。MutiAg-HRL调用高级智能体对知识图谱中的关系进行粗略推理,通过计算下一步关系及给定查询关系之间的相似度,确定目标实体大致位置,依据高级智能体给出的关系,指导低级智能体进行细致推理,选择下一步动作;模型还构造交互奖励机制,对两个智能体的关系和动作选择及时给予奖励,防止模型出现奖励稀疏问题。为验证该方法的有效性,在FB15K-237和NELL-995数据集上进行实验,将实验结果与TransE、MINERVA、HRL等11种主流方法进行对比分析,MutiAg-HRL方法在链接预测任务上的hits@k平均提升了1.85%,MRR平均提升了2%。  相似文献   

10.
多跳推理模型在知识图谱中充分挖掘和利用实体间的多步关系,组成路径信息,完成知识推理,然而,目前的稀疏知识图谱多跳推理模型大多存在数据稀少及推理路径可靠性较低等问题.为了解决该问题,文中提出融合语义信息的知识图谱多跳推理模型.首先,将知识图谱中的实体和关系嵌入向量空间,作为强化学习训练的外部环境.然后,利用查询关系和推理路径的语义信息,选择相似度最高的(关系,实体)对扩充智能体进行路径搜索的动作空间,以此弥补推理过程中数据稀少的不足.最后,使用推理路径和查询关系的语义相似度评价推理路径的可靠性,并作为奖励函数反馈给智能体.在多个公开稀疏数据集上的实验表明,文中模型明显提升推理性能.  相似文献   

11.
Path-based relational reasoning over knowledge graphs has become increasingly popular due to a variety of downstream applications such as question answering in dialogue systems, fact prediction, and recommendation systems. In recent years, reinforcement learning (RL) based solutions for knowledge graphs have been demonstrated to be more interpretable and explainable than other deep learning models. However, the current solutions still struggle with performance issues due to incomplete state representations and large action spaces for the RL agent. We address these problems by developing HRRL (Heterogeneous Relational reasoning with Reinforcement Learning), a type-enhanced RL agent that utilizes the local heterogeneous neighborhood information for efficient path-based reasoning over knowledge graphs. HRRL improves the state representation using a graph neural network (GNN) for encoding the neighborhood information and utilizes entity type information for pruning the action space. Extensive experiments on real-world datasets show that HRRL outperforms state-of-the-art RL methods and discovers more novel paths during the training procedure, demonstrating the explorative power of our method.  相似文献   

12.
One of the difficulties encountered in the application of reinforcement learning methods to real-world problems is their limited ability to cope with large-scale or continuous spaces. In order to solve the curse of the dimensionality problem, resulting from making continuous state or action spaces discrete, a new fuzzy Actor-Critic reinforcement learning network (FACRLN) based on a fuzzy radial basis function (FRBF) neural network is proposed. The architecture of FACRLN is realized by a four-layer FRBF neural network that is used to approximate both the action value function of the Actor and the state value function of the Critic simultaneously. The Actor and the Critic networks share the input, rule and normalized layers of the FRBF network, which can reduce the demands for storage space from the learning system and avoid repeated computations for the outputs of the rule units. Moreover, the FRBF network is able to adjust its structure and parameters in an adaptive way with a novel self-organizing approach according to the complexity of the task and the progress in learning, which ensures an economic size of the network. Experimental studies concerning a cart-pole balancing control illustrate the performance and applicability of the proposed FACRLN.  相似文献   

13.
In this paper, we propose fuzzy logic-based cooperative reinforcement learning for sharing knowledge among autonomous robots. The ultimate goal of this paper is to entice bio-insects towards desired goal areas using artificial robots without any human aid. To achieve this goal, we found an interaction mechanism using a specific odor source and performed simulations and experiments [1]. For efficient learning without human aid, we employ cooperative reinforcement learning in multi-agent domain. Additionally, we design a fuzzy logic-based expertise measurement system to enhance the learning ability. This structure enables the artificial robots to share knowledge while evaluating and measuring the performance of each robot. Through numerous experiments, the performance of the proposed learning algorithms is evaluated.  相似文献   

14.
Scheduling semiconductor wafer manufacturing systems has been viewed as one of the most challenging optimization problems owing to the complicated constraints, and dynamic system environment. This paper proposes a fuzzy hierarchical reinforcement learning (FHRL) approach to schedule a SWFS, which controls the cycle time (CT) of each wafer lot to improve on-time delivery by adjusting the priority of each wafer lot. To cope with the layer correlation and wafer correlation of CT due to the re-entrant process constraint, a hierarchical model is presented with a recurrent reinforcement learning (RL) unit in each layer to control the corresponding sub-CT of each integrated circuit layer. In each RL unit, a fuzzy reward calculator is designed to reduce the impact of uncertainty of expected finishing time caused by the rematching of a lot to a delivery batch. The results demonstrate that the mean deviation (MD) between the actual and expected completion time of wafer lots under the scheduling of the FHRL approach is only about 30 % of the compared methods in the whole SWFS.  相似文献   

15.
This paper addresses a new method for combination of supervised learning and reinforcement learning (RL). Applying supervised learning in robot navigation encounters serious challenges such as inconsistent and noisy data, difficulty for gathering training data, and high error in training data. RL capabilities such as training only by one evaluation scalar signal, and high degree of exploration have encouraged researchers to use RL in robot navigation problem. However, RL algorithms are time consuming as well as suffer from high failure rate in the training phase. Here, we propose Supervised Fuzzy Sarsa Learning (SFSL) as a novel idea for utilizing advantages of both supervised and reinforcement learning algorithms. A zero order Takagi–Sugeno fuzzy controller with some candidate actions for each rule is considered as the main module of robot's controller. The aim of training is to find the best action for each fuzzy rule. In the first step, a human supervisor drives an E-puck robot within the environment and the training data are gathered. In the second step as a hard tuning, the training data are used for initializing the value (worth) of each candidate action in the fuzzy rules. Afterwards, the fuzzy Sarsa learning module, as a critic-only based fuzzy reinforcement learner, fine tunes the parameters of conclusion parts of the fuzzy controller online. The proposed algorithm is used for driving E-puck robot in the environment with obstacles. The experiment results show that the proposed approach decreases the learning time and the number of failures; also it improves the quality of the robot's motion in the testing environments.  相似文献   

16.
While driving a vehicle safely at its handling limit is essential in autonomous vehicles in Level 5 autonomy, it is a very challenging task for current conventional methods. Therefore, this study proposes a novel controller of trajectory planning and motion control for autonomous driving through manifold corners at the handling limit to improve the speed and shorten the lap time of the vehicle. The proposed controller innovatively combines the advantages of conventional model-based control algorithm, model-free reinforcement learning algorithm, and prior expert knowledge, to improve the training efficiency for autonomous driving in extreme conditions. The reward shaping of this algorithm refers to the procedure and experience of race training of professional drivers in real time. After training on track maps that exhibit different levels of difficulty, the proposed controller implemented a superior strategy compared to the original reference trajectory, and can to other tougher maps based on the basic driving knowledge learned from the simpler map, which verifies its superiority and extensibility. We believe this technology can be further applied to daily life to expand the application scenarios and maneuvering envelopes of autonomous vehicles.  相似文献   

17.
In this paper, a dynamic fuzzy energy state based AODV (DFES-AODV) routing protocol for Mobile Ad-hoc NETworks (MANETs) is presented. In DFES-AODV route discovery phase, each node uses a Mamdani fuzzy logic system (FLS) to decide its Route REQuests (RREQs) forwarding probability. The FLS inputs are residual battery level and energy drain rate of mobile node. Unlike previous related-works, membership function of residual energy input is made dynamic. Also, a zero-order Takagi Sugeno FLS with the same inputs is used as a means of generalization for state-space in SARSA-AODV a reinforcement learning based energy-aware routing protocol. The simulation study confirms that using a dynamic fuzzy system ensures more energy efficiency in comparison to its static counterpart. Moreover, DFES-AODV exhibits similar performance to SARSA-AODV and its fuzzy extension FSARSA-AODV. Therefore, the use of dynamic fuzzy logic for adaptive routing in MANETs is recommended.  相似文献   

18.
深度逆向强化学习是机器学习领域的一个新的研究热点,它针对深度强化学习的回报函数难以获取问题,提出了通过专家示例轨迹重构回报函数的方法。首先介绍了3类深度强化学习方法的经典算法;接着阐述了经典的逆向强化学习算法,包括基于学徒学习、最大边际规划、结构化分类和概率模型形式化的方法;然后对深度逆向强化学习的一些前沿方向进行了综述,包括基于最大边际法的深度逆向强化学习、基于深度Q网络的深度逆向强化学习和基于最大熵模型的深度逆向强化学习和示例轨迹非专家情况下的逆向强化学习方法等。最后总结了深度逆向强化学习在算法、理论和应用方面存在的问题和发展方向。  相似文献   

19.
Multi-agent reinforcement learning technologies are mainly investigated from two perspectives of the concurrence and the game theory. The former chiefly applies to cooperative multi-agent systems, while the latter usually applies to coordinated multi-agent systems. However, there exist such problems as the credit assignment and the multiple Nash equilibriums for agents with them. In this paper, we propose a new multi-agent reinforcement learning model and algorithm LMRL from a layer perspective. LMRL model is composed of an off-line training layer that employs a single agent reinforcement learning technology to acquire stationary strategy knowledge and an online interaction layer that employs a multi-agent reinforcement learning technology and the strategy knowledge that can be revised dynamically to interact with the environment. An agent with LMRL can improve its generalization capability, adaptability and coordination ability. Experiments show that the performance of LMRL can be better than those of a single agent reinforcement learning and Nash-Q.  相似文献   

20.
为使模糊Petri网能够描述可变模糊隶属判据下的模糊知识,利用基准变换能较好地表达模糊隶属判据可变情况的特点,基于定性映射和定性基准变换对模糊Petri网进行了扩展,给出了扩展后网模型的形式定义和基本运行机制。通过利用定性映射描述模糊产生式规则,给出了一种新的知识表示模式和推理方法,新方法有利于构建模糊Petri网基于认知的学习机制。结果显示,该网模型具有较强的知识表达能力,适用于处理认知模糊不确定性知识,其推理过程能体现某些认知特性,尤其适用于构建以定性判断为特点的智能系统。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号