首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   43篇
  国内免费   1篇
  完全免费   11篇
  自动化技术   55篇
  2019年   1篇
  2017年   1篇
  2016年   3篇
  2015年   1篇
  2014年   4篇
  2013年   10篇
  2012年   9篇
  2011年   8篇
  2010年   3篇
  2009年   2篇
  2008年   4篇
  2007年   2篇
  2006年   4篇
  2005年   2篇
  2002年   1篇
排序方式: 共有55条查询结果,搜索用时 250 毫秒
1.
In this paper, a new iterative adaptive dynamic programming (ADP) method is proposed to solve a class of continuous-time nonlinear two-person zero-sum differential games. The idea is to use the ADP technique to obtain the optimal control pair iteratively which makes the performance index function reach the saddle point of the zero-sum differential games. If the saddle point does not exist, the mixed optimal control pair is obtained to make the performance index function reach the mixed optimum. Stability analysis of the nonlinear systems is presented and the convergence property of the performance index function is also proved. Two simulation examples are given to illustrate the performance of the proposed method.  相似文献
2.
针对一类状态和控制变量均带有时滞的非线性系统的带有二次性能指标函数最优控制问题, 本文提出了一种基于新的迭代自适应动态规划算法的最优控制方案. 通过引进时滞矩阵函数, 应用动态规划理论, 本文获得了最优控制的显式表达式, 然后通过自适应评判技术获得最优控制量. 本文给出了收敛性证明以保证性能指标函数收敛到最优. 为了实现所提出的算法, 本文采用神经网络近似性能指标函数、计算最优控制策略、求解时滞矩阵函数、以及给非线性系统建模. 最后本文给出了两个仿真例子说明所提出的最优策略的有效性.  相似文献
3.
Approximate Dynamic Programming for Self-Learning Control   总被引:3,自引:0,他引:3       下载免费PDF全文
This paper introduces a self-learning control approach based on approximate dynamic programming. Dynamic programming was introduced by Bellman in the 1950‘s for solving optimal control problems of nonlinear dynamical systems. Due to its high computational complexity, the applications of dynamic programming have been limited to simple and small problems. The key step in finding approximate solutions to dynamic programming is to estimate the performance index in dynamic programming. The optimal control signal can then be determined by minimizing (or maximizing) the performance index. Artificial neural networks are very efficient tools in representing the performance index in dynamic programming. This paper assumes the use of neural networks for estimating the performance index in dynamic programming and for generating optimal control signals, thus to achieve optimal control through self-learning.  相似文献
4.
In this paper, a finite-horizon neuro-optimal tracking control strategy for a class of discrete-time nonlinear systems is proposed. Through system transformation, the optimal tracking problem is converted into designing a finite-horizon optimal regulator for the tracking error dynamics. Then, with convergence analysis in terms of cost function and control law, the iterative adaptive dynamic programming (ADP) algorithm via heuristic dynamic programming (HDP) technique is introduced to obtain the finite-horizon optimal tracking controller which makes the cost function close to its optimal value within an ?-error bound. Three neural networks are used as parametric structures to implement the algorithm, which aims at approximating the cost function, the control law, and the error dynamics, respectively. Two simulation examples are included to complement the theoretical discussions.  相似文献
5.
自适应单指数平滑法在短期交通流预测中的应用   总被引:2,自引:0,他引:2       下载免费PDF全文
短时交通流预测是实现交通规划和管理的关键技术之一. 指数平滑法因其计算过程简单, 需要观测数据较少等优点, 在短时交通流预测中获得了广泛的应用, 但其平滑系数缺乏有效的选取方法. 本文提出了一种自适应单指数平滑法, 通过近似动态规划方法的引入, 结合实际交通流数据对指数平滑系数进行优化, 使其随预测过程自动更新, 从而保证了预测的实时性、准确性. 严格的理论推导证明了这种预测方法的收敛性, 仿真结果验证了算法的有效性.  相似文献
6.
We review the literature on approximate dynamic programming, with the goal of better understanding the theory behind practical algorithms for solving dynamic programs with continuous and vector-valued states and actions and complex information processes. We build on the literature that has addressed the well-known problem of multidimensional (and possibly continuous) states, and the extensive literature on model-free dynamic programming, which also assumes that the expectation in Bellman’s equation cannot be computed. However, we point out complications that arise when the actions/controls are vector-valued and possibly continuous. We then describe some recent research by the authors on approximate policy iteration algorithms that offer convergence guarantees (with technical assumptions) for both parametric and nonparametric architectures for the value function.  相似文献
7.
This paper investigates the choice of function approximator for an approximate dynamic programming (ADP) based control strategy. The ADP strategy allows the user to derive an improved control policy given a simulation model and some starting control policy (or alternatively, closed-loop identification data), while circumventing the ‘curse-of-dimensionality’ of the traditional dynamic programming approach. In ADP, one fits a function approximator to state vs. ‘cost-to-go’ data and solves the Bellman equation with the approximator in an iterative manner. A proper choice and design of function approximator is critical for convergence of the iteration and the quality of final learned control policy, because an approximation error can grow quickly in the loop of optimization and function approximation. Typical classes of approximators used in related approaches are parameterized global approximators (e.g. artificial neural networks) and nonparametric local averagers (e.g. k-nearest neighbor). In this paper, we assert on the basis of some case studies and a theoretical result that a certain type of local averagers should be preferred over global approximators as the former ensures monotonic convergence of the iteration. However, a converged cost-to-go function does not necessarily lead to a stable control policy on-line due to the problem of over-extrapolation. To cope with this difficulty, we propose that a penalty term be included in the objective function in each minimization to discourage the optimizer from finding a solution in the regions of state space where the local data density is inadequately low. A nonparametric density estimator, which can be naturally combined with a local averager, is employed for this purpose.  相似文献
8.
Natural actor-critic algorithms   总被引:1,自引:0,他引:1  
We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and function-approximation ideas, and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function-approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of special interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.  相似文献
9.
基于排队长度均衡的交叉口信号配时优化策略   总被引:1,自引:0,他引:1  
齐驰  侯忠生  贾琰 《控制与决策》2012,27(8):1191-1194,1200
城市交通的快速发展和交通需求的不断增长,使得城市交通拥堵日益严重,由此造成的时间延误和经济损失越来越大.鉴于此,提出了基于排队长度均衡的交叉口信号配时优化策略,根据排队长度均衡的控制思想,实时动态调整各相位绿灯时间,以达到排队长度均衡的控制目标,保证了绿灯时间的充分利用.通过近似动态规划方法的引入,使得该算法具有自学习和自适应的特性,不依赖于交通流模型.仿真结果验证了算法的有效性.  相似文献
10.
We propose two approximate dynamic programming (ADP)-based strategies for control of nonlinear processes using input-output data. In the first strategy, which we term ‘J-learning,’ one builds an empirical nonlinear model using closed-loop test data and performs dynamic programming with it to derive an improved control policy. In the second strategy, called ‘Q-learning,’ one tries to learn an improved control policy in a model-less manner. Compared to the conventional model predictive control approach, the new approach offers some practical advantages in using nonlinear empirical models for process control. Besides the potential reduction in the on-line computational burden, it offers a convenient way to control the degree of model extrapolation in the calculation of optimal control moves. One major difficulty associated with using an empirical model within the multi-step predictive control setting is that the model can be excessively extrapolated into regions of the state space where identification data were scarce or nonexistent, leading to performances far worse than predicted by the model. Within the proposed ADP-based strategies, this problem is handled by imposing a penalty term designed on the basis of local data distribution. A CSTR example is provided to illustrate the proposed approaches.  相似文献
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号