首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Controlling chaos by GA-based reinforcement learning neural network   总被引:12,自引:0,他引:12  
Proposes a TD (temporal difference) and GA (genetic algorithm) based reinforcement (TDGAR) neural learning scheme for controlling chaotic dynamical systems based on the technique of small perturbations. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to fulfil the reinforcement learning task. Structurally, the TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network for helping the learning of the other network, the action network, which determines the outputs (actions) of the TDGAR learning system. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. This can usually accelerate the GA learning since an external reinforcement signal may only be available at a time long after a sequence of actions have occurred in the reinforcement learning problems. By defining a simple external reinforcement signal. the TDGAR learning system can learn to produce a series of small perturbations to convert chaotic oscillations of a chaotic system into desired regular ones with a periodic behavior. The proposed method is an adaptive search for the optimum control technique. Computer simulations on controlling two chaotic systems, i.e., the Henon map and the logistic map, have been conducted to illustrate the performance of the proposed method.  相似文献   

2.
This paper proposes a reinforcement fuzzy adaptive learning control network (RFALCON), constructed by integrating two fuzzy adaptive learning control networks (FALCON), each of which has a feedforward multilayer network and is developed for the realization of a fuzzy controller. One FALCON performs as a critic network (fuzzy predictor), the other as an action network (fuzzy controller). Using temporal difference prediction, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network performs a stochastic exploratory algorithm to adapt itself according to the internal reinforcement signal. An ART-based reinforcement structure/parameter-learning algorithm is developed for constructing the RFALCON dynamically. During the learning process, structure and parameter learning are performed simultaneously. RFALCON can construct a fuzzy control system through a reward/penalty signal. It has two important features; it reduces the combinatorial demands of system adaptive linearization, and it is highly autonomous.  相似文献   

3.
This paper proposes an adaptive critic tracking control design for a class of nonlinear systems using fuzzy basis function networks (FBFNs). The key component of the adaptive critic controller is the FBFN, which implements an associative learning network (ALN) to approximate unknown nonlinear system functions, and an adaptive critic network (ACN) to generate the internal reinforcement learning signal to tune the ALN. Another important component, the reinforcement learning signal generator, requires the solution of a linear matrix inequality (LMI), which should also be satisfied to ensure stability. Furthermore, the robust control technique can easily reject the effects of the approximation errors of the FBFN and external disturbances. Unlike traditional adaptive critic controllers that learn from trial-and-error interactions, the proposed on-line tuning algorithm for ALN and ACN is derived from Lyapunov theory, thereby significantly shortening the learning time. Simulation results of a cart-pole system demonstrate the effectiveness of the proposed FBFN-based adaptive critic controller.  相似文献   

4.
In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.  相似文献   

5.
This paper proposes a three-layered parallel fuzzy inference model called reinforcement fuzzy neural network with distributed prediction scheme (RFNN-DPS), which performs reinforcement learning with a novel distributed prediction scheme. In RFNN-DPS, an additional predictor for predicting the external reinforcement signal is not necessary, and the internal reinforcement information is distributed into fuzzy rules (rule nodes). Therefore, using RFNN-DPS, only one network is needed to construct a fuzzy logic system with the abilities of parallel inference and reinforcement learning. Basically, the information for prediction in RFNN-DPS is composed of credit values stored in fuzzy rule nodes, where each node holds a credit vector to represent the reliability of the corresponding fuzzy rule. The credit values are not only accessed for predicting external reinforcement signals, but also provide a more profitable internal reinforcement signal to each fuzzy rule itself. RFNN-DPS performs a credit-based exploratory algorithm to adjust its internal status according to the internal reinforcement signal. During learning, the RFNN-DPS network is constructed by a single-step or multistep reinforcement learning algorithm based on the ART concept. According to our experimental results, RFNN-DPS shows the advantages of simple network structure, fast learning speed, and explicit representation of rule reliability.  相似文献   

6.
This paper presents a new method for learning a fuzzy logic controller automatically. A reinforcement learning technique is applied to a multilayer neural network model of a fuzzy logic controller. The proposed self-learning fuzzy logic control that uses the genetic algorithm through reinforcement learning architecture, called a genetic reinforcement fuzzy logic controller, can also learn fuzzy logic control rules even when only weak information such as a binary target of “success” or “failure” signal is available. In this paper, the adaptive heuristic critic algorithm of Barto et al. (1987) is extended to include a priori control knowledge of human operators. It is shown that the system can solve more concretely a fairly difficult control learning problem. Also demonstrated is the feasibility of the method when applied to a cart-pole balancing problem via digital simulations  相似文献   

7.
In this paper we consider the problem of reinforcement learning in a dynamically changing environment. In this context, we study the problem of adaptive control of finite-state Markov chains with a finite number of controls. The transition and payoff structures are unknown. The objective is to find an optimal policy which maximizes the expected total discounted payoff over the infinite horizon. A stochastic neural network model is suggested for the controller. The parameters of the neural net, which determine a random control strategy, are updated at each instant using a simple learning scheme. This learning scheme involves estimation of some relevant parameters using an adaptive critic. It is proved that the controller asymptotically chooses an optimal action in each state of the Markov chain with a high probability  相似文献   

8.
In this paper, an adaptive control algorithm is proposed for a class of robot manipulator systems with unknown functions and dead-zone input by using a reinforcement learning scheme. The parameters of the dead zone are supposed to be unknown but bounded. The unknown functions can be approximated based on the neural networks, which is one part of the reinforcement learning scheme, namely an action network. The other part is called critic network which is used to approximate the reinforcement signal. Then, the prominent advantage of the proposed approach is that an optimal control input can be obtained by using two networks compared with the results of robot manipulator with dead zone: an additional term is given to compensate for the effect of the dead zone, and a special design procedure to solve the difficulties in constructing the controllers and adaptation laws. Based on the Lyapunov analysis theory, all the signals of the closed-loop system are proved to be bounded and the system output can track the reference signal to a bounded compact set. Finally, a simulation example is given to illustrate the effectiveness of the approach.  相似文献   

9.
In this article, we focus on developing a neural‐network‐based critic learning strategy toward robust dynamic stabilization for a class of uncertain nonlinear systems. A type of general uncertainties involved both in the internal dynamics and in the input matrix is considered. An auxiliary system with actual action and auxiliary signal is constructed after dynamics decomposition and combination for the original plant. The reasonability of the control problem transformation from robust stabilization to optimal feedback design is also provided theoretically. After that, the adaptive critic learning method based on a neural network is established to derive the approximate optimal solution of the transformed control problem. The critic weight can be initialized to a zero vector, which apparently facilitates the learning process. Numerical simulation is finally presented to illustrate the effectiveness of the critic learning approach for neural robust stabilization.  相似文献   

10.
Adaptive critic (AC) based controllers are typically discrete and/or yield a uniformly ultimately bounded stability result because of the presence of disturbances and unknown approximation errors. A continuous-time AC controller is developed that yields asymptotic tracking of a class of uncertain nonlinear systems with bounded disturbances. The proposed AC-based controller consists of two neural networks (NNs) – an action NN, also called the actor, which approximates the plant dynamics and generates appropriate control actions; and a critic NN, which evaluates the performance of the actor based on some performance index. The reinforcement signal from the critic is used to develop a composite weight tuning law for the action NN based on Lyapunov stability analysis. A recently developed robust feedback technique, robust integral of the sign of the error (RISE), is used in conjunction with the feedforward action neural network to yield a semiglobal asymptotic result. Experimental results are provided that illustrate the performance of the developed controller.  相似文献   

11.
This paper proposes a reinforcement neural-network-based fuzzy logic control system (RNN-FLCS) for solving various reinforcement learning problems. The proposed RNN-FLCS is constructed by integrating two neural-network-based fuzzy logic controllers (NN-FLC's), each of which is a connectionist model with a feedforward multilayered network developed for the realization of a fuzzy logic controller. One NN-FLC performs as a fuzzy predictor, and the other as a fuzzy controller. Using the temporal difference prediction method, the fuzzy predictor can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the fuzzy controller. The fuzzy controller performs a stochastic exploratory algorithm to adapt itself according to the internal reinforcement signal. During the learning process, both structure learning and parameter learning are performed simultaneously in the two NN-FLC's using the fuzzy similarity measure. The proposed RNN-FLCS can construct a fuzzy logic control and decision-making system automatically and dynamically through a reward/penalty signal or through very simple fuzzy information feedback such as “high,” “too high,“ “low,” and “too low.” The proposed RNN-FLCS is best applied to the learning environment, where obtaining exact training data is expensive. It also preserves the advantages of the original NN-FLC, such as the ability to find proper network structure and parameters simultaneously and dynamically and to avoid the rule-matching time of the inference engine. Computer simulations were conducted to illustrate its performance and applicability  相似文献   

12.
This paper proposes a combination of online clustering and Q-value based genetic algorithm (GA) learning scheme for fuzzy system design (CQGAF) with reinforcements. The CQGAF fulfills GA-based fuzzy system design under reinforcement learning environment where only weak reinforcement signals such as "success" and "failure" are available. In CQGAF, there are no fuzzy rules initially. They are generated automatically. The precondition part of a fuzzy system is online constructed by an aligned clustering-based approach. By this clustering, a flexible partition is achieved. Then, the consequent part is designed by Q-value based genetic reinforcement learning. Each individual in the GA population encodes the consequent part parameters of a fuzzy system and is associated with a Q-value. The Q-value estimates the discounted cumulative reinforcement information performed by the individual and is used as a fitness value for GA evolution. At each time step, an individual is selected according to the Q-values, and then a corresponding fuzzy system is built and applied to the environment with a critic received. With this critic, Q-learning with eligibility trace is executed. After each trial, GA is performed to search for better consequent parameters based on the learned Q-values. Thus, in CQGAF, evolution is performed immediately after the end of one trial in contrast to general GA where many trials are performed before evolution. The feasibility of CQGAF is demonstrated through simulations in cart-pole balancing, magnetic levitation, and chaotic system control problems with only binary reinforcement signals.  相似文献   

13.
gripper     
Grasping of objects has been a challenging task for robots. The complex grasping task can be defined as object contact control and manipulation subtasks. In this paper, object contact control subtask is defined as the ability to follow a trajectory accurately by the fingers of a gripper. The object manipulation subtask is defined in terms of maintaining a predefined applied force by the fingers on the object. A sophisticated controller is necessary since the process of grasping an object without a priori knowledge of the object's size, texture, softness, gripper, and contact dynamics is rather difficult. Moreover, the object has to be secured accurately and considerably fast without damaging it. Since the gripper, contact dynamics, and the object properties are not typically known beforehand, an adaptive critic neural network (NN)-based hybrid position/force control scheme is introduced. The feedforward action generating NN in the adaptive critic NN controller compensates the nonlinear gripper and contact dynamics. The learning of the action generating NN is performed on-line based on a critic NN output signal. The controller ensures that a three-finger gripper tracks a desired trajectory while applying desired forces on the object for manipulation. Novel NN weight tuning updates are derived for the action generating and critic NNs so that Lyapunov-based stability analysis can be shown. Simulation results demonstrate that the proposed scheme successfully allows fingers of a gripper to secure objects without the knowledge of the underlying gripper and contact dynamics of the object compared to conventional schemes.  相似文献   

14.
A Chebyshev polynomial-based unified model (CPBUM) neural network is introduced and applied to control a magnetic bearing systems. First, we show that the CPBUM neural network not only has the same capability of universal approximator, but also has faster learning speed than conventional feedforward/recurrent neural network. It turns out that the CPBUM neural network is more suitable in the design of controller than the conventional feedforward/recurrent neural network. Second, we propose the inverse system method, based on the CPBUM neural networks, to control a magnetic bearing system. The proposed controller has two structures; namely, off-line and on-line learning structures. We derive a new learning algorithm for each proposed structure. The experimental results show that the proposed neural network architecture provides a greater flexibility and better performance in controlling magnetic bearing systems.  相似文献   

15.
Reinforcement learning has been widely-used for applications in planning, control, and decision making. Rather than using instructive feedback as in supervised learning, reinforcement learning makes use of evaluative feedback to guide the learning process. In this paper, we formulate a pattern classification problem as a reinforcement learning problem. The problem is realized with a temporal difference method in a FALCON-R network. FALCON-R is constructed by integrating two basic FALCON-ART networks as function approximators, where one acts as a critic network (fuzzy predictor) and the other as an action network (fuzzy controller). This paper serves as a guideline in formulating a classification problem as a reinforcement learning problem using FALCON-R. The strengths of applying the reinforcement learning method to the pattern classification application are demonstrated. We show that such a system can converge faster, is able to escape from local minima, and has excellent disturbance rejection capability.  相似文献   

16.
Based on the feedback linearization theory, this paper presents how a reinforcement learning scheme that is adopted to construct artificial neural networks (ANNs) can linearize a nonlinear system effectively. The proposed reinforcement linearization learning system (RLLS) consists of two sub-systems: The evaluation predictor (EP) is a long-term policy selector, and the other is a short-term action selector composed of linearizing control (LC) and reinforce predictor (RP) elements. In addition, a reference model plays the role of the environment, which provides the reinforcement signal to the linearizing process. The RLLS thus receives reinforcement signals to accomplish the linearizing behavior to control a nonlinear system such that it can behave similarly to the reference model. Eventually, the RLLS performs identification and linearization concurrently. Simulation results demonstrate that the proposed learning scheme, which is applied to linearizing a pendulum system, provides better control reliability and robustness than conventional ANN schemes. Furthermore, a PI controller is used to control the linearized plant where the affine system behaves like a linear system.  相似文献   

17.
A nonaffine discrete-time system represented by the nonlinear autoregressive moving average with eXogenous input (NARMAX) representation with unknown nonlinear system dynamics is considered. An equivalent affinelike representation in terms of the tracking error dynamics is first obtained from the original nonaffine nonlinear discrete-time system so that reinforcement-learning-based near-optimal neural network (NN) controller can be developed. The control scheme consists of two linearly parameterized NNs. One NN is designated as the critic NN, which approximates a predefined long-term cost function, and an action NN is employed to derive a near-optimal control signal for the system to track a desired trajectory while minimizing the cost function simultaneously. The NN weights are tuned online. By using the standard Lyapunov approach, the stability of the closed-loop system is shown. The net result is a supervised actor-critic NN controller scheme which can be applied to a general nonaffine nonlinear discrete-time system without needing the affinelike representation. Simulation results demonstrate satisfactory performance of the controller.  相似文献   

18.
A modified fuzzy cerebellar model articulation controller (FCMAC) with reinforcement learning capability is introduced in this article. This model utilizes the likelihood scheme to predict the evaluation of successive actions. Based on an approximating evaluation model, the proper output (action) is always selected. The structure of the proposed FCMAC consists of three parts: a fuzzy quantizer, which is used to represent the associative mapping function from the receptive field to the actual memory; an action evaluation module, which models and produces the expected evaluation signal and an action selection unit that generates an action with the expectation of better performance using a probability distribution function that estimates an optimal action selection policy. To demonstrate its excellent performance, the proposed self-improving model is implemented as a neural network controller for the swing control of a pendulum system. The results from both the simulation and experiment demonstrates better performance and applicability of the proposed learning model.  相似文献   

19.
基于强化学习的模型参考自适应控制   总被引:3,自引:0,他引:3  
提出了一种基于强化学习的模型参考自适应控制方法,控制器采用自适应启发评价算法,它由两部分组成:自适应评价单元及联想搜索单元.由参考模型给出系统的性能指标,利用系统反馈的强化信号在线更新控制器的参数.仿真结果表明:基于强化学习的模型参考自适应控制方法可以实现对一类复杂的非线性系统的稳定控制和鲁棒控制,该控制方法不仅响应速度快,而且具有较高的学习速率,实时性较强.  相似文献   

20.
Selection of learning parameters for CMAC-based adaptive criticlearning   总被引:3,自引:0,他引:3  
The CMAC-based adaptive critic learning structure consists of two CMAC modules: the action and the critic ones. Learning occurs in both modules. The critic module learns to evaluate the system status. It transforms the system response, usually some occasionally provided reinforcement signal, into organized useful information. Based on the knowledge developed in the critic module, the action module learns the control technique. One difficulty in using this scheme lies on selection of learning parameters. In our previous study on the CMAC-based scheme, the best set of learning parameters were selected from a large number of test simulations. The picked parameter values are not necessarily adequate for generic cases. A general guideline for parameter selection needs to be developed. In this study, the problem is investigated. Effects of parameters are studied analytically and verified by simulations. Results provide a good guideline for parameter selection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号