首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
付强  陈焕文 《微机发展》2007,17(12):76-79
机器博弈被认为是人工智能领域最具挑战性的研究方向之一。中国象棋计算机博弈的难度绝不亚于国际象棋,但是涉足学者太少,具有自学习能力的就更少了。介绍了中国象棋人机对弈原理,给出了近年来几类典型的评估函数自学习方法及其原理,通过比较得出了最适合中国象棋使用的学习方法。分析了这些方法尚存在的问题,并提出了未来的研究方向。  相似文献   

2.
机器博弈被认为是人工智能领域最具挑战性的研究方向之一。中国象棋计算机博弈的难度绝不亚于国际象棋,但是涉足学者太少,具有自学习能力的就更少了。介绍了中国象棋人机对弈原理,给出了近年来几类典型的评估函数学习方法及其原理,通过比较得出了最适合中国象棋使用的学习方法。分析了这些方法尚存在的问题,并提出了未来研究方向。  相似文献   

3.
Tesauro  Gerald 《Machine Learning》1998,32(3):241-243
The results obtained by Pollack and Blair substantially underperform my 1992 TD Learning results. This is shown by directly benchmarking the 1992 TD nets against Pubeval. A plausible hypothesis for this underperformance is that, unlike TD learning, the hillclimbing algorithm fails to capture nonlinear structure inherent in the problem, and despite the presence of hidden units, only obtains a linear approximation to the optimal policy for backgammon. Two lines of evidence supporting this hypothesis are discussed, the first coming from the structure of the Pubeval benchmark program, and the second coming from experiments replicating the Pollack and Blair results.  相似文献   

4.
Practical Issues in Temporal Difference Learning   总被引:8,自引:10,他引:8  
This paper examines whether temporal difference methods for training connectionist networks, such as Sutton's TD() algorithm, can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. These practical issues are then examined in the context of a case study in which TD() is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex non-trivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set. This indicates that TD learning may work better in practice than one would expect based on current theory, and it suggests that further analysis of TD methods, as well as applications in other complex domains, may be worth investigating.  相似文献   

5.
Co-Evolution in the Successful Learning of Backgammon Strategy   总被引:4,自引:0,他引:4  
Following Tesauro's work on TD-Gammon, we used a 4,000 parameter feedforward neural network to develop a competitive backgammon evaluation function. Play proceeds by a roll of the dice, application of the network to all legal moves, and selection of the position with the highest evaluation. However, no backpropagation, reinforcement or temporal difference learning methods were employed. Instead we apply simple hillclimbing in a relative fitness environment. We start with an initial champion of all zero weights and proceed simply by playing the current champion network against a slightly mutated challenger and changing weights if the challenger wins. Surprisingly, this worked rather well. We investigate how the peculiar dynamics of this domain enabled a previously discarded weak method to succeed, by preventing suboptimal equilibria in a meta-game of self-learning.  相似文献   

6.
Technical Update: Least-Squares Temporal Difference Learning   总被引:2,自引:0,他引:2  
Boyan  Justin A. 《Machine Learning》2002,49(2-3):233-246
TD./ is a popular family of algorithms for approximate policy evaluation in large MDPs. TD./ works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1–3, 33–57) eliminates all stepsize parameters and improves data efficiency.This paper updates Bradtke and Barto's work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from = 0 to arbitrary values of ; at the extreme of = 1, the resulting new algorithm is shown to be a practical, incremental formulation of supervised linear regression. Third, it presents a novel and intuitive interpretation of LSTD as a model-based reinforcement learning technique.  相似文献   

7.
Learning to Predict by the Methods of Temporal Differences   总被引:23,自引:2,他引:23  
This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.  相似文献   

8.
论文提出了一种基于粗糙集和时态概念的新神经网模型—时态粗糙神经网。在神经网的输入中加入时间的因素,即神经网络的输入是时间的函数,从而把传统的神经元改造成了时态神经元;时态粗糙神经网中的神经元是时态粗糙神经元,它包括一对时态神经元,即将数据中的上边界和下边界加入时间因素以后,作为神经网络的输入和输出。当网络的输入和输出不是单值数据而是一个随时间变化的数据的集合时,经典的神经网络建立的预测模型的输出就会产生较大的误差,而基于时态粗糙理论的神经网络则可以很好地解决这个问题,更能真实刻画实际问题。从而为解决这类问题提供了一个较好的理论模型。  相似文献   

9.
Conventional robot control schemes are basically model-based methods. However, exact modeling of robot dynamics poses considerable problems and faces various uncertainties in task execution. This paper proposes a reinforcement learning control approach for overcoming such drawbacks. An artificial neural network (ANN) serves as the learning structure, and an applied stochastic real-valued (SRV) unit as the learning method. Initially, force tracking control of a two-link robot arm is simulated to verify the control design. The simulation results confirm that even without information related to the robot dynamic model and environment states, operation rules for simultaneous controlling force and velocity are achievable by repetitive exploration. Hitherto, however, an acceptable performance has demanded many learning iterations and the learning speed proved too slow for practical applications. The approach herein, therefore, improves the tracking performance by combining a conventional controller with a reinforcement learning strategy. Experimental results demonstrate improved trajectory tracking performance of a two-link direct-drive robot manipulator using the proposed method.  相似文献   

10.
11.
套接字是网络通信的基础,C#语言与套接字编程结合,可有效用于开发网络通信程序.据此设计了网络五子棋对战系统,并具体阐述了利用的技术与实现的方法.  相似文献   

12.
一种基于时间差分算法的神经网络预测控制系统   总被引:5,自引:0,他引:5  
为提高多步预测控制的计算效率,提出一种基于时间差分算法的Elman网络多步预测控制器的设计方法.用Elman网络对非线性系统输出值进行直接多步预估,并针对BP算法无法对网络权值的实时调整进行渐进计算的缺点,提出了将时间差分法和BP算法相结合的新的网络学习算法;为简化计算,采用单值预测控制算法对非线性系统进行滚动优化以实现对下一步控制量的优化计算.理论分析与仿真结果表明,该方法具有结构简单、运算量小、速度快的特点,可应用于实时快速系统,并且对系统参数的变化具有一定的自适应性.  相似文献   

13.
采用遗传算法学习的神经网络控制器   总被引:13,自引:3,他引:13  
  相似文献   

14.
基于强化学习规则的两轮机器人自平衡控制   总被引:1,自引:0,他引:1  
两轮机器人是一个典型的不稳定,非线性,强耦合的自平衡系统,在两轮机器人系统模型未知和没有先验经验的条件下,将强化学习算法和模糊神经网络有效结合,保证了函数逼近的快速性和收敛性,成功地实现两轮机器人的自学习平衡控制,并解决了两轮机器人连续状态空间和动作空间的强化学习问题;仿真和实验表明:该方法不仅在很短的时间内成功地完成对两轮机器人的平衡控制,而且在两轮机器人参数变化较大时,仍能维持两轮机器人的平衡。  相似文献   

15.
This paper introduces a new framework for constructing learning algorithms. Our methods involve master algorithms which use learning algorithms for intersection-closed concept classes as subroutines. For example, we give a master algorithm capable of learning any concept class whose members can be expressed as nested differences (for example, c 1 – (c 2 – (c 3 – (c 4c 5)))) of concepts from an intersection-closed class. We show that our algorithms are optimal or nearly optimal with respect to several different criteria. These criteria include: the number of examples needed to produce a good hypothesis with high confidence, the worst case total number of mistakes made, and the expected number of mistakes made in the firstt trials.  相似文献   

16.
In this paper, a feedforward neural network with sigmoid hidden units is used to design a neural network based iterative learning controller for nonlinear systems with state dependent input gains. No prior offline training phase is necessary, and only a single neural network is employed. All the weights of the neurons are tuned during the iteration process in order to achieve the desired learning performance. The adaptive laws for the weights of neurons and the analysis of learning performance are determined via Lyapunov‐like analysis. A projection learning algorithm is used to prevent drifting of weights. It is shown that the tracking error vector will asymptotically converges to zero as the iteration goes to infinity, and the all adjustable parameters as well as internal signals remain bounded.  相似文献   

17.
18.
一种基于神经网络集成的规则学习算法   总被引:8,自引:0,他引:8  
将神经网络集成与规则学习相结合,提出了一种基于神经网络集成的规则学习算法.该算法以神经网络集成作为规则学习的前端,利用其产生出规则学习所用的数据集,在此基础上进行规则学习.在UCl机器学习数据库上的实验结果表明,该算法可以产生泛化能力非常强的规则.  相似文献   

19.
20.
基于自适应神经元学习模糊控制规则   总被引:13,自引:1,他引:13  
本文给出了利用自适应神经元学习、修改模糊控制规划的新方法,该方法可以学习与当前控制过程输出性能有关的在过去起作用的控制规划,可以随过程环境变化自动调整控制规划,以改善过程输出性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号