首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present a training approach using concepts from the theory of stochastic learning automata that eliminates the need for computation of gradients. This approach also offers the flexibility of tailoring a number of specific training algorithms based on the selection of linear and nonlinear reinforcement rules for updating automaton action probabilities. The training efficiency is demonstrated by application to two complex temporal learning scenarios, viz, learning of time-dependent continuous trajectories and feedback controller designs for continuous dynamical plants. For the first problem, it is shown that training algorithms can be tailored following the present approach for a recurrent neural net to learn to generate a benchmark circular trajectory more accurately than possible with existing gradient-based training procedures. For the second problem, it is shown that recurrent neural-network-based feedback controllers can be trained for different control objectives.  相似文献   

2.

The paper observes a similarity between the stochastic optimal control of discrete dynamical systems and the learning multilayer neural networks. It focuses on contemporary deep networks with nonconvex nonsmooth loss and activation functions. The machine learning problems are treated as nonconvex nonsmooth stochastic optimization problems. As a model of nonsmooth nonconvex dependences, the so-called generalized-differentiable functions are used. The backpropagation method for calculating stochastic generalized gradients of the learning quality functional for such systems is substantiated basing on Hamilton–Pontryagin formalism. Stochastic generalized gradient learning algorithms are extended for training nonconvex nonsmooth neural networks. The performance of a stochastic generalized gradient algorithm is illustrated by the linear multiclass classification problem.

  相似文献   

3.
Algorithms solving optimal control problems for linear discrete systems and linear continuous systems (without discretization) are discussed. The algorithms are based on a new approach to solving linear programming problems worked out in Minsk (USSR). A new method for solving nonlinear programming problems is justified. It uses the network interpretation of nonlinear functions and special network operations. Results of numerical experiment (on geometric programming problems) are given. In conclusion an algorithm of solving optimal control problem for the system with nonlinear input is described.  相似文献   

4.
赵恒军  李权忠  曾霞  刘志明 《软件学报》2022,33(7):2538-2561
信息物理系统(cyber-physicalsystem,CPS)的安全控制器设计是一个热门研究方向,现有基于形式化方法的安全控制器设计存在过度依赖模型、可扩展性差等问题.基于深度强化学习的智能控制可处理高维非线性复杂系统和不确定性系统,正成为非常有前景的CPS控制技术,但是缺乏对安全性的保障.针对强化学习控制在安全性方面的不足,围绕一个工业油泵控制系统典型案例,开展安全强化学习算法和智能控制应用研究.首先,形式化了工业油泵控制的安全强化学习问题,搭建了工业油泵仿真环境;随后,通过设计输出层结构和激活函数,构造了神经网络形式的油泵控制器,使得油泵开关时间的线性不等式约束得到满足;最后,为了更好地权衡安全性和最优性控制目标,基于增广拉格朗日乘子法设计实现了新型安全强化学习算法.在工业油泵案例上的对比实验表明,该算法生成的控制器在安全性和最优性上均超越了现有同类算法.在进一步评估中,所生成神经网络控制器以90%的概率通过了严格形式化验证;同时,与理论最优控制器相比实现了低至2%的最优目标值损失.所提方法有望推广至更多应用场景,实例研究的方案有望为安全智能控制和形式化验证领域其他学者提供借鉴.  相似文献   

5.
神经网络增强学习的梯度算法研究   总被引:11,自引:1,他引:11  
徐昕  贺汉根 《计算机学报》2003,26(2):227-233
针对具有连续状态和离散行为空间的Markov决策问题,提出了一种新的采用多层前馈神经网络进行值函数逼近的梯度下降增强学习算法,该算法采用了近似贪心且连续可微的Boltzmann分布行为选择策略,通过极小化具有非平稳行为策略的Bellman残差平方和性能指标,以实现对Markov决策过程最优值函数的逼近,对算法的收敛性和近似最优策略的性能进行了理论分析,通过Mountain-Car学习控制问题的仿真研究进一步验证了算法的学习效率和泛化性能。  相似文献   

6.
《Advanced Robotics》2013,27(10):1215-1229
Reinforcement learning is the scheme for unsupervised learning in which robots are expected to acquire behavior skills through self-explorations based on reward signals. There are some difficulties, however, in applying conventional reinforcement learning algorithms to motion control tasks of a robot because most algorithms are concerned with discrete state space and based on the assumption of complete observability of the state. Real-world environments often have partial observablility; therefore, robots have to estimate the unobservable hidden states. This paper proposes a method to solve these two problems by combining the reinforcement learning algorithm and a learning algorithm for a continuous time recurrent neural network (CTRNN). The CTRNN can learn spatio-temporal structures in a continuous time and space domain, and can preserve the contextual flow by a self-organizing appropriate internal memory structure. This enables the robot to deal with the hidden state problem. We carried out an experiment on the pendulum swing-up task without rotational speed information. As a result, this task is accomplished in several hundred trials using the proposed algorithm. In addition, it is shown that the information about the rotational speed of the pendulum, which is considered as a hidden state, is estimated and encoded on the activation of a context neuron.  相似文献   

7.
针对部分系统存在输入约束和不可测状态的最优控制问题,本文将强化学习中基于执行–评价结构的近似最优算法与反步法相结合,提出了一种最优跟踪控制策略.首先,利用神经网络构造非线性观测器估计系统的不可测状态.然后,设计一种非二次型效用函数解决系统的输入约束问题.相比现有的最优方法,本文提出的最优跟踪控制方法不仅具有反步法在处理...  相似文献   

8.
A family of two-layer discrete-time neural net (NN) controllers is presented for the control of a class of mnth-order MIMO dynamical system. No initial learning phase is needed so that the control action is immediate; in other words, the neural network (NN) controller exhibits a learning-while-functioning-feature instead of a learning-then-control feature. A two-layer NN is used which is linear in the tunable weights. The structure of the neural net controller is derived using a filtered error approach. It is indicated that delta-rule-based tuning, when employed for closed-loop control, can yield unbounded NN weights if: 1) the net cannot exactly reconstruct a certain required function, or 2) there are bounded unknown disturbances acting on the dynamical system. Certainty equivalence is not used, overcoming a major problem in discrete-time adaptive control. In this paper, new online tuning algorithms for discrete-time systems are derived which are similar to ϵ-modification for the case of continuous-time systems that include a modification to the learning rate parameter and a correction term to the standard delta rule  相似文献   

9.
近年来,强化学习与自适应动态规划算法的迅猛发展及其在一系列挑战性问题(如大规模多智能体系统优化决策和最优协调控制问题)中的成功应用,使其逐渐成为人工智能、系统与控制和应用数学等领域的研究热点.鉴于此,首先简要介绍强化学习和自适应动态规划算法的基础知识和核心思想,在此基础上综述两类密切相关的算法在不同研究领域的发展历程,着重介绍其从应用于单个智能体(控制对象)序贯决策(最优控制)问题到多智能体系统序贯决策(最优协调控制)问题的发展脉络和研究进展.进一步,在简要介绍自适应动态规划算法的结构变化历程和由基于模型的离线规划到无模型的在线学习发展演进的基础上,综述自适应动态规划算法在多智能体系统最优协调控制问题中的研究进展.最后,给出多智能体强化学习算法和利用自适应动态规划求解多智能体系统最优协调控制问题研究中值得关注的一些挑战性课题.  相似文献   

10.
This paper develops a low-order controller design method for linear continuous time-invariant single-input, single-output systems requiring only the solution of a convex optimization problem. The technique integrates several well-known results in control theory. An important step is the use of coprime factors so that, based on strictly positive real functions, feedback stabilization using low-order controllers becomes a zero-placement problem which is convex. From this result, we develop algorithms to solve two optimal control problems  相似文献   

11.
Although the potential of the powerful mapping and representational capabilities of recurrent network architectures is generally recognized by the neural network research community, recurrent neural networks have not been widely used for the control of nonlinear dynamical systems, possibly due to the relative ineffectiveness of simple gradient descent training algorithms. Developments in the use of parameter-based extended Kalman filter algorithms for training recurrent networks may provide a mechanism by which these architectures will prove to be of practical value. This paper presents a decoupled extended Kalman filter (DEKF) algorithm for training of recurrent networks with special emphasis on application to control problems. We demonstrate in simulation the application of the DEKF algorithm to a series of example control problems ranging from the well-known cart-pole and bioreactor benchmark problems to an automotive subsystem, engine idle speed control. These simulations suggest that recurrent controller networks trained by Kalman filter methods can combine the traditional features of state-space controllers and observers in a homogeneous architecture for nonlinear dynamical systems, while simultaneously exhibiting less sensitivity than do purely feedforward controller networks to changes in plant parameters and measurement noise.  相似文献   

12.
提出一种模糊神经网络的自适应控制方案。针对连续空间的复杂学习任务,提出了一种竞争式Takagi—Sugeno模糊再励学习网络,该网络结构集成了Takagi-Sugeno模糊推理系统和基于动作的评价值函数的再励学习方法。相应地,提出了一种优化学习算法,其把竞争式Takagi-Sugeno模糊再励学习网络训练成为一种所谓的Takagi-Sugeno模糊变结构控制器。以一级倒立摆控制系统为例.仿真研究表明所提出的学习算法在性能上优于其它的再励学习算法。  相似文献   

13.
The problem of learning multiple continuous trajectories by means of recurrent neural networks with (in general) time-varying weights is addressed. The learning process is transformed into an optimal control framework where both the weights and the initial network state to be found are treated as controls. For such a task, a learning algorithm is proposed which is based on a variational formulation of Pontryagin's maximum principle. The convergence of this algorithm, under reasonable assumptions, is also investigated. Numerical examples of learning nontrivial two-class problems are presented which demonstrate the efficiency of the approach proposed.  相似文献   

14.
连续状态自适应离散化基于K-均值聚类的强化学习方法   总被引:6,自引:1,他引:5  
文锋  陈宗海  卓睿  周光明 《控制与决策》2006,21(2):143-0148
使用聚类算法对连续状态空间进行自适应离散化.得到了基于K-均值聚类的强化学习方法.该方法的学习过程分为两部分:对连续状态空间进行自适应离散化的状态空间学习,使用K-均值聚类算法;寻找最优策略的策略学习.使用替代合适迹Sarsa学习算法.对连续状态的强化学习基准问题进行仿真实验,结果表明该方法能实现对连续状态空间的自适应离散化,并最终学习到最优策略.与基于CMAC网络的强化学习方法进行比较.结果表明该方法具有节省存储空间和缩短计算时间的优点.  相似文献   

15.
In this paper, real‐time results for a novel continuous‐time adaptive tracking controller algorithm for nonlinear multiple input multiple output systems are presented. The control algorithm includes the combination of a recurrent high order neural network with block control transformation using a high order sliding modes technique as control law. A neural network is used to identify the dynamic plant behavior where a filtered error algorithm is used to train the neural identifier. A decentralized high order sliding mode, named the twisting algorithm, is used to design chattering‐reduced independent controllers to solve the trajectory tracking problem for a robot arm with three degrees of freedom. Stability analyses are given via a Lyapunov approach.  相似文献   

16.
Most of the existing numerical optimization methods are based upon a discretization of some ordinary differential equations. In order to solve some convex and smooth optimization problems coming from machine learning, in this paper, we develop efficient batch and online algorithms based on a new principle, i.e., the optimized discretization of continuous dynamical systems (ODCDSs). First, a batch learning projected gradient dynamical system with Lyapunov's stability and monotonic property is introduced, and its dynamical behavior guarantees the accuracy of discretization-based optimizer and applicability of line search strategy. Furthermore, under fair assumptions, a new online learning algorithm achieving regret O(√T) or O(logT) is obtained. By using the line search strategy, the proposed batch learning ODCDS exhibits insensitivity to the step sizes and faster decrease. With only a small number of line search steps, the proposed stochastic algorithm shows sufficient stability and approximate optimality. Experimental results demonstrate the correctness of our theoretical analysis and efficiency of our algorithms.  相似文献   

17.
Synthesizing optimal controllers for large scale uncertain systems is a challenging computational problem. This has motivated the recent interest in developing polynomial-time algorithms for computing reduced dimension models for uncertain systems. Here we present algorithms that compute lower dimensional realizations of an uncertain system, and compare their theoretical and computational characteristics. Three polynomial-time dimensionality reduction algorithms are applied to the Shell Standard Control Problem, a continuous stirred-tank reactor (CSTR) control problem, and a large scale benchmark problem, where it is shown that the algorithms can reduce the computational effort of optimal controller synthesis by orders of magnitude. These algorithms allow robust controller synthesis and robust control structure selection to be applied to uncertain systems of increased dimensionality.  相似文献   

18.
This paper presents an approach that is suitable for Just-In-Time (JIT) production for multi-objective scheduling problem in dynamically changing shop floor environment. The proposed distributed learning and control (DLC) approach integrates part-driven distributed arrival time control (DATC) and machine-driven distributed reinforcement learning based control. With DATC, part controllers adjust their associated parts' arrival time to minimize due-date deviation. Within the restricted pattern of arrivals, machine controllers are concurrently searching for optimal dispatching policies. The machine control problem is modeled as Semi Markov Decision Process (SMDP) and solved using Q-learning. The DLC algorithms are evaluated using simulation for two types of manufacturing systems: family scheduling and dynamic batch sizing. Results show that DLC algorithms achieve significant performance improvement over usual dispatching rules in complex real-time shop floor control problems for JIT production.  相似文献   

19.
Reinforcement learning (RL) is an effective method for the design of robust controllers of unknown nonlinear systems. Normal RLs for robust control, such as actor‐critic (AC) algorithms, depend on the estimation accuracy. Uncertainty in the worst case requires a large state‐action space, this causes overestimation and computational problems. In this article, the RL method is modified with the k‐nearest neighbor and the double Q‐learning algorithm. The modified RL does not need the neural estimator as AC and can stabilize the unknown nonlinear system under the worst‐case uncertainty. The convergence property of the proposed RL method is analyzed. The simulations and the experimental results show that our modified RLs are much more robust compared with the classic controllers, such as the proportional‐integral‐derivative, the sliding mode, and the optimal linear quadratic regulator controllers.  相似文献   

20.
There is no method to determine the optimal topology for multi-layer neural networks for a given problem. Usually the designer selects a topology for the network and then trains it. Since determination of the optimal topology of neural networks belongs to class of NP-hard problems, most of the existing algorithms for determination of the topology are approximate. These algorithms could be classified into four main groups: pruning algorithms, constructive algorithms, hybrid algorithms and evolutionary algorithms. These algorithms can produce near optimal solutions. Most of these algorithms use hill-climbing method and may be stuck at local minima. In this article, we first introduce a learning automaton and study its behaviour and then present an algorithm based on the proposed learning automaton, called survival algorithm, for determination of the number of hidden units of three layers neural networks. The survival algorithm uses learning automata as a global search method to increase the probability of obtaining the optimal topology. The algorithm considers the problem of optimization of the topology of neural networks as object partitioning rather than searching or parameter optimization as in existing algorithms. In survival algorithm, the training begins with a large network, and then by adding and deleting hidden units, a near optimal topology will be obtained. The algorithm has been tested on a number of problems and shown through simulations that networks generated are near optimal.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号