全文获取类型
收费全文 | 613篇 |
国内免费 | 43篇 |
完全免费 | 265篇 |
专业分类
自动化技术 | 921篇 |
出版年
2023年 | 4篇 |
2022年 | 77篇 |
2021年 | 63篇 |
2020年 | 24篇 |
2019年 | 29篇 |
2018年 | 17篇 |
2017年 | 8篇 |
2016年 | 16篇 |
2015年 | 14篇 |
2014年 | 23篇 |
2013年 | 55篇 |
2012年 | 23篇 |
2011年 | 47篇 |
2010年 | 26篇 |
2009年 | 38篇 |
2008年 | 67篇 |
2007年 | 48篇 |
2006年 | 46篇 |
2005年 | 47篇 |
2004年 | 31篇 |
2003年 | 23篇 |
2002年 | 46篇 |
2001年 | 30篇 |
2000年 | 22篇 |
1999年 | 18篇 |
1998年 | 18篇 |
1997年 | 20篇 |
1996年 | 15篇 |
1995年 | 8篇 |
1994年 | 6篇 |
1993年 | 3篇 |
1992年 | 7篇 |
1991年 | 1篇 |
1990年 | 1篇 |
排序方式: 共有921条查询结果,搜索用时 222 毫秒
1.
2.
3.
Reinforcement Learning with Replacing Eligibility Traces 总被引:26,自引:0,他引:26
The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace. Both kinds of trace assign credit to prior events according to how recently they occurred, but only the conventional trace gives greater credit to repeated events. Our analysis is for conventional and replace-trace versions of the offline TD(1) algorithm applied to undiscounted absorbing Markov chains. First, we show that these methods converge under repeated presentations of the training set to the same predictions as two well known Monte Carlo methods. We then analyze the relative efficiency of the two Monte Carlo methods. We show that the method corresponding to conventional TD is biased, whereas the method corresponding to replace-trace TD is unbiased. In addition, we show that the method corresponding to replacing traces is closely related to the maximum likelihood solution for these tasks, and that its mean squared error is always lower in the long run. Computational results confirm these analyses and show that they are applicable more generally. In particular, we show that replacing traces significantly improve performance and reduce parameter sensitivity on the "Mountain-Car" task, a full reinforcement-learning problem with a continuous state space, when using a feature-based function approximator. 相似文献
4.
一种基于强化学习的学习Agent 总被引:24,自引:2,他引:22
强化学习通过感知环境状态和从环境中获得不确定奖赏值来学习动态系统的最优行为策略,是构造智能Agent的核心技术之一,在面向Agent的开发环境AODE中扩充BDI模型,引入策略和能力心智成分,采用强化学习技术实现策略构造函数,从而提出一种基于强化学习技术的学习Agent,研究AODE中自适应Agent物结构和运行方式,使智能Agent具有动态环境的在线学习能力,有效期能够有效地满足Agent各种心智要求。 相似文献
5.
Self-Improving Reactive Agents Based on Reinforcement Learning,Planning and Teaching 总被引:24,自引:7,他引:17
Long-Ji Lin 《Machine Learning》1992,8(3-4):293-321
To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of reinforcement learning in solving much more complicated learning tasks than previously studied, and 2) to investigate methods that will speed up reinforcement learning.This paper compares eight reinforcement learning frameworks: adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning. The three extensions are experience replay, learning action models for planning, and teaching. The frameworks were investigated using connectionism as an approach to generalization. To evaluate the performance of different frameworks, a dynamic environment was used as a testbed. The environment is moderately complex and nondeterministic. This paper describes these frameworks and algorithms in detail and presents empirical evaluation of the frameworks. 相似文献
6.
Incremental Multi-Step Q-Learning 总被引:23,自引:0,他引:23
This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD() return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quantization. The resulting algorithm, Q()-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations. 相似文献
7.
Recent Advances in Hierarchical Reinforcement Learning 总被引:22,自引:0,他引:22
Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather invoke the execution of temporally-extended activities which follow their own policies until termination. This leads naturally to hierarchical control architectures and associated learning algorithms. We review several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed. Common to these approaches is a reliance on the theory of semi-Markov decision processes, which we emphasize in our review. We then discuss extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability. Concluding remarks address open challenges facing the further development of reinforcement learning in a hierarchical setting. 相似文献
8.
一种基于Agent团队的强化学习模型与应用研究 总被引:22,自引:2,他引:20
多Agent学习是近年来受到较多关注的研究方向,以单Agent强化Q-learning算法为基础,提出了一种基于Agent团队的强化学习模,这个模型的最大特点是引入主导Agent作为团队学习的主角,并通过主导Agent的角色变换实现整个团队的学习。结合仿真机器人足球领域,设计了具体的应用模型,在几个方面对Q-learning进行扩充,并进行了实验,在仿真机器人足球领域的成功应用表明了这个模型的有效 相似文献
9.
Elevator Group Control Using Multiple Reinforcement Learning Agents 总被引:22,自引:0,他引:22
Recent algorithmic and theoretical advances in reinforcement learning (RL) have attracted widespread interest. RL algorithms have appeared that approximate dynamic programming on an incremental basis. They can be trained on the basis of real or simulated experiences, focusing their computation on areas of state space that are actually visited during control, making them computationally tractable on very large problems. If each member of a team of agents employs one of these algorithms, a new collective learning algorithm emerges for the team as a whole. In this paper we demonstrate that such collective RL algorithms can be powerful heuristic methods for addressing large-scale control problems.Elevator group control serves as our testbed. It is a difficult domain posing a combination of challenges not seen in most multi-agent learning research to date. We use a team of RL agents, each of which is responsible for controlling one elevator car. The team receives a global reward signal which appears noisy to each agent due to the effects of the actions of the other agents, the random nature of the arrivals and the incomplete observation of the state. In spite of these complications, we show results that in simulation surpass the best of the heuristic elevator control algorithms of which we are aware. These results demonstrate the power of multi-agent RL on a very large scale stochastic dynamic optimization problem of practical utility. 相似文献
10.
Asynchronous Stochastic Approximation and Q-Learning 总被引:21,自引:6,他引:15
We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available. 相似文献