期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

一种基于强化学习的学习Agent 总被引：22，自引：2，他引：22

李宁高阳陆鑫陈世福《计算机研究与发展》2001,38(9):1051-1056

强化学习通过感知环境状态和从环境中获得不确定奖赏值来学习动态系统的最优行为策略,是构造智能Agent的核心技术之一,在面向Agent的开发环境AODE中扩充BDI模型,引入策略和能力心智成分,采用强化学习技术实现策略构造函数,从而提出一种基于强化学习技术的学习Agent,研究AODE中自适应Agent物结构和运行方式,使智能Agent具有动态环境的在线学习能力,有效期能够有效地满足Agent各种心智要求。相似文献

2.

强化学习系统及其基于可靠度最优的学习算法 总被引：3，自引：0，他引：3

俞星星阎平凡《信息与控制》1997,26(5):332-339

归纳了强化学习的主要理论方法，提出了一个区分主客观因素的强化学习系统描述，引入了任务域的概念，针对以往强化学习采用的期望最优准则描述任务域能力的不足，考虑了目标水平准则下的首达时间可靠度优准则模型，分别结合随机逼近理论和时间差分理论，提出了基于概率估计的Ｊ－学习和无需建增量Ｒ－学习。相似文献

3.

动态环境中基于增强式学习的路径规划方法

庄晓东孟庆春熊建设殷波王汉萍《机器人》2001,(Z1)

本文结合机器人路径规划问题介绍了增强式学习方法 ,实现了动态环境中基于增强式学习的自适应路径规划 .增强式学习通过采用随机性的控制策略 ,实现策略的优化搜索和在线学习 .并采用具有模式增强输入的BP网络进行决策参数估计 ,加快学习的收敛 .仿真试验证明该方法能有效实现动态环境中机器人的避碰和导航相似文献

4.

USING REINFORCEMENT LEARNING TO COORDINATE BETTER 总被引：3，自引：0，他引：3

Cora B. Excelente-Toledo Nicholas R. Jennings 《Computational Intelligence》2005,21(3):217-245

This paper examines the potential and the impact of introducing learning capabilities into autonomous agents that make decisions at run-time about which mechanism to exploit to coordinate their activities. Specifically, our motivating hypothesis is that to deal with dynamic and unpredictable environments it is important to have agents that learn the right situations in which to attempt coordination , and the right coordination method to use in those situations . In particular, the efficacy of learning is evaluated when agents have varying types and amounts of information when those coordinating decisions are taken. This hypothesis is evaluated empirically, in a grid-world scenario in which (a) an agent's predictions about the other agents in the environment are approximately correct and (b) an agent cannot correctly predict the others' behavior. The results presented show when, where and why learning is effective when it comes to making a decision about selecting a coordination mechanism. 相似文献

5.

基于神经网络的强化学习算法研究 总被引：11，自引：0，他引：11

陆鑫高阳李宁陈世福《计算机研究与发展》2002,39(8):981-985

BP神经网络在非线性控制系统中被广泛运用,但作为有导师监督的学习算法,要求批量提供输入输出对神经网络训练,而在一些并不知道最优策略的系统中,这样的输入输出对事先并无法得到,另一方面,强化学习从实际系统学习经验来调整策略,并且是一个逼近最优策略的过程,学习过程并不需要导师的监督。提出了将强化学习与BP神经网络结合的学习算法-RBP模型。该模型的基本思想是通过强化学习控制策略,经过一定周期的学习后再用学到的知识训练神经网络,以使网络逐步收敛到最优状态。最后通过实验验证了该方法的有效性及收敛性。相似文献

6.

AUTOMATIC ALGORITHM DEVELOPMENT USING NEW REINFORCEMENT PROGRAMMING TECHNIQUES

Spencer White Tony Martinez George Rudolph 《Computational Intelligence》2012,28(2):176-208

Reinforcement Programming (RP) is a new approach to automatically generating algorithms that uses reinforcement learning techniques. This paper introduces the RP approach and demonstrates its use to generate a generalized, in‐place, iterative sort algorithm. The RP approach improves on earlier results that use genetic programming (GP). The resulting algorithm is a novel algorithm that is more efficient than comparable sorting routines. RP learns the sort in fewer iterations than GP and with fewer resources. Experiments establish interesting empirical bounds on learning the sort algorithm: A list of size 4 is sufficient to learn the generalized sort algorithm. The training set only requires one element and learning took less than 200,000 iterations. Additionally RP was used to generate three binary addition algorithms: a full adder, a binary incrementer, and a binary adder. 相似文献

7.

关系强化学习方法的初步研究

刘全周文云李志涛《计算机应用与软件》2010,27(2):40-43

强化学习方法是人工智能领域中比较重要的方法之一,自从其提出以来已经有了很大的发展,并且能用来解决很多的问题。但是在遇到大规模状态空间问题时,使用普通的强化学习方法就会产生“维数灾”现象,所以提出了关系强化学习,把强化学习应用到关系领域可以在一定的程度上解决“维数灾”难题。在此基础上,简单介绍关系强化学习的概念以及相关的算法,以及以后有待解决的问题。相似文献

8.

基于每阶段平均费用最优的激励学习算法 总被引：4，自引：0，他引：4

殷苌茗陈焕文谢丽娟《计算机应用》2002,22(4):25-27

文中利用求解最优费用函数的方法给出了一种新的激励学习算法,即基于每阶段平均费用最优的激励学习算法。这种学习算法是求解信息不完全Markov决策问题的一种有效激励学习方法,它从求解分阶段最优平均费用函数的方法出发,分析了最优解的存在性,分阶段最优平均费用函数与初始状态的关系以及与之相关的Bellman方程。这种方法的建立,可以使得动态规划（DP）算法中的许多结论直接应用到激励学习的研究中来。相似文献

9.

基于深度强化学习的恶意软件混淆对抗样本生成

严莹子王小平庄葛巍顾臻贺青史扬《计算机应用与软件》2022,(2):315-323+349

设计一种PE格式恶意软件混淆对抗样本生成模型。利用深度强化学习算法,实现对恶意软件的自动混淆。通过加入历史帧和LSTM神经网络结构的方法使深度强化学习模型具有记忆性。对比实验表明,该恶意软件变种在基于机器学习的检测模型上的逃逸率高于现有研究,在由918个PE格式恶意软件组成的测试集上达到39.54%的逃逸率。相似文献

10.

基于模糊推理的多智能体强化学习

韩伟鲁霜《计算机应用与软件》2011,28(11):96-98,107

以电子市场智能定价问题为研究背景,提出基于模糊推理的多智能体强化学习算法(FI-MARL).在马尔科夫博弈学习框架下,将领域知识初始化为一个模糊规则集合,智能体基于模糊规则选择动作,并采用强化学习来强化模糊规则.该方法有效融合应用背景的领域知识,充分利用样本信息并降低学习空间维数,从而增强在线学习性能.在电子市场定价的... 相似文献

11.

LEARNING OLIGOPOLISTIC COMPETITION IN ELECTRICITY AUCTIONS 总被引：1，自引：0，他引：1

Eric Guerci Stefano Ivaldi Marco Raberto Silvano Cincotti 《Computational Intelligence》2007,23(2):197-220

This paper addresses the problem of auction markets efficiency within the context of recently liberalized electricity markets. Two different auction mechanisms, i.e., the uniform and the discriminatory price setting rules, have been employed worldwide in designing electricity markets. In this paper, we study the relative efficiency of the two auction mechanisms in the framework of the learning-in-games approach. The behavior of electricity suppliers are modeled by means of an adaptive learning algorithm and the demand is assumed to be constant and inelastic, according to a common hypothesis in electricity market modeling. Computational experiments results are interpreted according game theoretical solutions, i.e., Nash equilibria and Pareto optima. Different economic scenarios corresponding to a duopoly and a tripoly competition with different level of demand are considered. Results show that in the proposed conditions, sellers learn to play competitive strategies, which correspond to Nash equilibria. Finally, this study establishes that, in the presented computational setting and economic scenarios, the discriminatory auction mechanism results more efficient than the uniform auction one. 相似文献

12.

基于Takagi—Sugeno的再励学习模糊神经网络控制

马力佳高岩《微计算机信息》2006,(6S):7-9

提出一种模糊神经网络的自适应控制方案。针对连续空间的复杂学习任务，提出了一种竞争式Takagi—Sugeno模糊再励学习网络，该网络结构集成了Takagi-Sugeno模糊推理系统和基于动作的评价值函数的再励学习方法。相应地，提出了一种优化学习算法，其把竞争式Takagi-Sugeno模糊再励学习网络训练成为一种所谓的Takagi-Sugeno模糊变结构控制器。以一级倒立摆控制系统为例．仿真研究表明所提出的学习算法在性能上优于其它的再励学习算法。相似文献

13.

基于改进深度强化学习的三维环境路径规划

封硕舒红谢步庆《计算机应用与软件》2021,38(1):250-255

提出一种改进深度强化学习算法(NDQN),解决传统Q-learning算法处理复杂地形中移动机器人路径规划时面临的维数灾难.提出一种将深度学习融于Q-learning框架中,以网络输出代替Q值表的深度强化学习方法.针对深度Q网络存在严重的过估计问题,利用更正函数对深度Q网络中的评价函数进行改进.将改进深度强化学习算法与... 相似文献

14.

基于解释学习的可操作性 总被引：1，自引：1，他引：1

石纯一邹晨东《计算机学报》1993,16(11):801-806

可操作性是基于解释机器学习的关键问题，本文给出了一种以模糊集理论来描述可操作性的方法，建立了ＥＢＬ的模型，这一结果优于Ｋｅｌｌｅｒ等人的工作。相似文献

15.

基于Takagi-Sugeno的再励学习模糊神经网络控制

马力佳高岩《微计算机信息》2006,22(16):7-9

提出一种模糊神经网络的自适应控制方案。针对连续空间的复杂学习任务,提出了一种竞争式Takagi-Sugeno模糊再励学习网络,该网络结构集成了Takagi-Sugeno模糊推理系统和基于动作的评价值函数的再励学习方法。相应地,提出了一种优化学习算法,其把竞争式Takagi-Sugeno模糊再励学习网络训练成为一种所谓的Takagi-Sugeno模糊变结构控制器。以一级倒立摆控制系统为例,仿真研究表明所提出的学习算法在性能上优于其它的再励学习算法。相似文献

16.

A NEW DESIGN APPROACH FOR FUZZY‐LEARNING FUZZY CONTROLLERS

C.W. Tao J.S. Taur 《Asian journal of control》2000,2(3):212-218

In this paper, a new approach to designing fuzzy‐learning fuzzy controllers for a system plant without an exact mathematical model is presented. The cost function is defined as the square of the sliding function to alleviate the difficulty of overshoot when on‐line learning is conducted. The learning mechanism of a fuzzy controller is constructed so as to minimize the cost function with a set of linguistic rules. Moreover, to reduce the complexity of the fuzzy‐learning fuzzy controller, the fuzzy mechanism used for learning and the fuzzy mechanism contained in the fuzzy controller are designed so as to have the identical structures. Finally, simulations are included to show the effectiveness of the fuzzy‐learning fuzzy controllers. 相似文献

17.

解释学习的可操作性 总被引：2，自引：1，他引：1

石纯一龚义涛《计算机学报》1992,15(2):153-157

本文对解释学习(EBL)中可操作性的形式化描述进行了初步探讨.强调了执行系统在学习中的作用,并将它形式化为一个公理系统;提出了等价置换概念,给出了可操作性的数学描述,最后验证了一个路线规划系统PLAN. 相似文献

18.

INCREMENTAL LEARNING OF PROCEDURAL PLANNING KNOWLEDGE IN CHALLENGING ENVIRONMENTS

Douglas J. Pearson John E. Laird 《Computational Intelligence》2005,21(4):414-439

Autonomous agents that learn about their environment can be divided into two broad classes. One class of existing learners, reinforcement learners, typically employ weak learning methods to directly modify an agent's execution knowledge. These systems are robust in dynamic and complex environments but generally do not support planning or the pursuit of multiple goals. In contrast, symbolic theory revision systems learn declarative planning knowledge that allows them to pursue multiple goals in large state spaces, but these approaches are generally only applicable to fully sensed, deterministic environments with no exogenous events. This research investigates the hypothesis that by limiting an agent to procedural access to symbolic planning knowledge, the agent can combine the powerful, knowledge-intensive learning performance of the theory revision systems with the robust performance in complex environments of the reinforcement learners. The system, IMPROV, uses an expressive knowledge representation so that it can learn complex actions that produce conditional or sequential effects over time. By developing learning methods that only require limited procedural access to the agent's knowledge, IMPROV's learning remains tractable as the agent's knowledge is scaled to large problems. IMPROV learns to correct operator precondition and effect knowledge in complex environments that include such properties as noise, multiple agents and time-critical tasks, and demonstrates a general learning method that can be easily strengthened through the addition of many different kinds of knowledge. 相似文献

19.

自学习软件自动化系统算法构架学习中的可操作性

陈道蓄徐家福《计算机学报》1992,15(12):942-946

本文讨论了在算法合成背景下,基于解释的学习中可操作性准则问题.针对一种相对复杂的学习目标概念——算法构架,提出了实现可操作性的方法,并探讨了算法自动化系统中通用性与可操作性的制衡关系及其对系统能力的影响. 相似文献

20.

Garment中的归约语义 总被引：1，自引：0，他引：1

郑红军张乃孝《计算机研究与发展》1998,35(6):486-490

文中用代数方法研究了Ｇａｒｍｅｎｔ中程序设计语言的归约语义，首先给出了归约语义在形式语言理论中的含义，然后提出了Ｇａｒｍｅｎｔ中语言的代数模型。在此代数模型下讨论了归约语义及其性质，并给出了语言可归的充分条件。相似文献