首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   613篇
  国内免费   43篇
  完全免费   265篇
  自动化技术   921篇
  2023年   4篇
  2022年   77篇
  2021年   63篇
  2020年   24篇
  2019年   29篇
  2018年   17篇
  2017年   8篇
  2016年   16篇
  2015年   14篇
  2014年   23篇
  2013年   55篇
  2012年   23篇
  2011年   47篇
  2010年   26篇
  2009年   38篇
  2008年   67篇
  2007年   48篇
  2006年   46篇
  2005年   47篇
  2004年   31篇
  2003年   23篇
  2002年   46篇
  2001年   30篇
  2000年   22篇
  1999年   18篇
  1998年   18篇
  1997年   20篇
  1996年   15篇
  1995年   8篇
  1994年   6篇
  1993年   3篇
  1992年   7篇
  1991年   1篇
  1990年   1篇
排序方式: 共有921条查询结果,搜索用时 222 毫秒
1.
自适应蚁群算法   总被引:115,自引:1,他引:114       下载免费PDF全文
蚁群算法是由鄣大利得M.Dorigo等人首先提出的一种新型的模拟进化算法,初步的研究已经表明该算法具有许多优良的性质,为求解算杂的组合优化问题提供了一种新思路,此方法已经引起了众多学者的研究兴趣,但同时也存在着一些缺点,如需要较长的计算时间,容易出现停滞现象等,目前国内对此研究尚少,为此,本文对景中算法的研究现状作一综述,希望能够对相关研究起到一定的启发作用。  相似文献
2.
强化学习研究综述   总被引:80,自引:2,他引:78       下载免费PDF全文
高阳  陈世福  陆鑫 《自动化学报》2004,30(1):86-100
摘要强化学习通过试错与环境交互获得策略的改进,其自学习和在线学习的特点使其成为 机器学习研究的一个重要分支.该文首先介绍强化学习的原理和结构;其次构造一个二维分类 图,分别在马尔可夫环境和非马尔可夫环境下讨论最优搜索型和经验强化型两类算法;然后结 合近年来的研究综述了强化学习技术的核心问题,包括部分感知、函数估计、多agent强化学 习,以及偏差技术;最后还简要介绍强化学习的应用情况和未来的发展方向.  相似文献
3.
Reinforcement Learning with Replacing Eligibility Traces   总被引:26,自引:0,他引:26  
The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace. Both kinds of trace assign credit to prior events according to how recently they occurred, but only the conventional trace gives greater credit to repeated events. Our analysis is for conventional and replace-trace versions of the offline TD(1) algorithm applied to undiscounted absorbing Markov chains. First, we show that these methods converge under repeated presentations of the training set to the same predictions as two well known Monte Carlo methods. We then analyze the relative efficiency of the two Monte Carlo methods. We show that the method corresponding to conventional TD is biased, whereas the method corresponding to replace-trace TD is unbiased. In addition, we show that the method corresponding to replacing traces is closely related to the maximum likelihood solution for these tasks, and that its mean squared error is always lower in the long run. Computational results confirm these analyses and show that they are applicable more generally. In particular, we show that replacing traces significantly improve performance and reduce parameter sensitivity on the "Mountain-Car" task, a full reinforcement-learning problem with a continuous state space, when using a feature-based function approximator.  相似文献
4.
一种基于强化学习的学习Agent   总被引:24,自引:2,他引:22  
强化学习通过感知环境状态和从环境中获得不确定奖赏值来学习动态系统的最优行为策略,是构造智能Agent的核心技术之一,在面向Agent的开发环境AODE中扩充BDI模型,引入策略和能力心智成分,采用强化学习技术实现策略构造函数,从而提出一种基于强化学习技术的学习Agent,研究AODE中自适应Agent物结构和运行方式,使智能Agent具有动态环境的在线学习能力,有效期能够有效地满足Agent各种心智要求。  相似文献
5.
Long-Ji Lin 《Machine Learning》1992,8(3-4):293-321
To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of reinforcement learning in solving much more complicated learning tasks than previously studied, and 2) to investigate methods that will speed up reinforcement learning.This paper compares eight reinforcement learning frameworks: adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning. The three extensions are experience replay, learning action models for planning, and teaching. The frameworks were investigated using connectionism as an approach to generalization. To evaluate the performance of different frameworks, a dynamic environment was used as a testbed. The environment is moderately complex and nondeterministic. This paper describes these frameworks and algorithms in detail and presents empirical evaluation of the frameworks.  相似文献
6.
Incremental Multi-Step Q-Learning   总被引:23,自引:0,他引:23  
Peng  Jing  Williams  Ronald J. 《Machine Learning》1996,22(1-3):283-290
This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD() return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quantization. The resulting algorithm, Q()-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.  相似文献
7.
Recent Advances in Hierarchical Reinforcement Learning   总被引:22,自引:0,他引:22  
Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather invoke the execution of temporally-extended activities which follow their own policies until termination. This leads naturally to hierarchical control architectures and associated learning algorithms. We review several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed. Common to these approaches is a reliance on the theory of semi-Markov decision processes, which we emphasize in our review. We then discuss extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability. Concluding remarks address open challenges facing the further development of reinforcement learning in a hierarchical setting.  相似文献
8.
一种基于Agent团队的强化学习模型与应用研究   总被引:22,自引:2,他引:20  
多Agent学习是近年来受到较多关注的研究方向,以单Agent强化Q-learning算法为基础,提出了一种基于Agent团队的强化学习模,这个模型的最大特点是引入主导Agent作为团队学习的主角,并通过主导Agent的角色变换实现整个团队的学习。结合仿真机器人足球领域,设计了具体的应用模型,在几个方面对Q-learning进行扩充,并进行了实验,在仿真机器人足球领域的成功应用表明了这个模型的有效  相似文献
9.
Elevator Group Control Using Multiple Reinforcement Learning Agents   总被引:22,自引:0,他引:22  
Crites  Robert H.  Barto  Andrew G. 《Machine Learning》1998,33(2-3):235-262
Recent algorithmic and theoretical advances in reinforcement learning (RL) have attracted widespread interest. RL algorithms have appeared that approximate dynamic programming on an incremental basis. They can be trained on the basis of real or simulated experiences, focusing their computation on areas of state space that are actually visited during control, making them computationally tractable on very large problems. If each member of a team of agents employs one of these algorithms, a new collective learning algorithm emerges for the team as a whole. In this paper we demonstrate that such collective RL algorithms can be powerful heuristic methods for addressing large-scale control problems.Elevator group control serves as our testbed. It is a difficult domain posing a combination of challenges not seen in most multi-agent learning research to date. We use a team of RL agents, each of which is responsible for controlling one elevator car. The team receives a global reward signal which appears noisy to each agent due to the effects of the actions of the other agents, the random nature of the arrivals and the incomplete observation of the state. In spite of these complications, we show results that in simulation surpass the best of the heuristic elevator control algorithms of which we are aware. These results demonstrate the power of multi-agent RL on a very large scale stochastic dynamic optimization problem of practical utility.  相似文献
10.
Asynchronous Stochastic Approximation and Q-Learning   总被引:21,自引:6,他引:15  
We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.  相似文献
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号