首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
In this paper, we aim to solve the finite horizon optimal control problem for a class of discrete-time nonlinear systems with unfixed initial state using adaptive dynamic programming (ADP) approach. A new ε-optimal control algorithm based on the iterative ADP approach is proposed which makes the performance index function converge iteratively to the greatest lower bound of all performance indices within an error according to ε within finite time. The optimal number of control steps can also be obtained by the proposed ε-optimal control algorithm for the situation where the initial state of the system is unfixed. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the ε-optimal control algorithm. Finally, a simulation example is given to show the results of the proposed method.  相似文献   

2.
This paper studies an online iterative algorithm for solving discrete-time multi-agent dynamic graphical games with input constraints. In order to obtain the optimal strategy of each agent, it is necessary to solve a set of coupled Hamilton-Jacobi-Bellman (HJB) equations. It is very difficult to solve HJB equations by the traditional method. The relevant game problem will become more complex if the control input of each agent in the dynamic graphical game is constrained. In this paper, an online iterative algorithm is proposed to find the online solution to dynamic graphical game without the need for drift dynamics of agents. Actually, this algorithm is to find the optimal solution of Bellman equations online. This solution employs a distributed policy iteration process, using only the local information available to each agent. It can be proved that under certain conditions, when each agent updates its own strategy simultaneously, the whole multi-agent system will reach Nash equilibrium. In the process of algorithm implementation, for each agent, two layers of neural networks are used to fit the value function and control strategy, respectively. Finally, a simulation example is given to show the effectiveness of our method.  相似文献   

3.
This paper addresses two kinds of optimal control problems of probabilistic mix-valued logical control networks by using the semi-tensor product of matrices, and presents a number of new results on the optimal finite-horizon control and the first-passage model based control problems, respectively. Firstly, the probabilistic mix-valued logical control network is expressed in an algebraic form by the semi-tensor product method, based on which the optimal finite-horizon control problem is studied and a new algorithm for choosing a sequence of control actions is established to minimize a given cost functional over finite steps. Secondly, the first-passage model of probabilistic mix-valued logical networks is given and a new algorithm for designing the optimal control scheme is proposed to maximize the corresponding probability criterion. FinMly, an illustrative example is studied to support our new results/algorithms.  相似文献   

4.
A new visual servo control scheme for a robotic manipulator is presented in this paper, where a back propagation (BP) neural network is used to make a direct transition from image feature to joint angles without requiring robot kinematics and camera calibration. To speed up the convergence and avoid local minimum of the neural network, this paper uses a genetic algorithm to find the optimal initial weights and thresholds and then uses the BP algorithm to train the neural network according to the data given. The proposed method can effectively combine the good global searching ability of genetic algorithms with the accurate local searching feature of BP neural network. The Simulink model for PUMA560 robot visual servo system based on the improved BP neural network is built with the Robotics Toolbox of Matlab. The simulation results indicate that the proposed method can accelerate convergence of the image errors and provide a simple and effective way of robot control.  相似文献   

5.
Although computer architectures incorporate fast processing hardware resources, high performance real-time implementation of a complex control algorithm requires an efficient design and software coding of the algorithm so as to exploit special features of the hardware and avoid associated architecture shortcomings. This paper presents an investigation into the analysis and design mechanisms that will lead to reduction in the execution time in implementing real-time control algorithms. The proposed mechanisms are exemplified by means of one algorithm, which demonstrates their applicability to real-time applications. An active vibration control (AVC) algorithm for a flexible beam system simulated using the finite difference (FD) method is considered to demonstrate the effectiveness of the proposed methods. A comparative performance evaluation of the proposed design mechanisms is presented and discussed through a set of experiments.  相似文献   

6.
This paper is concerned with the optimal linear quadratic Gaussian (LQG) control problem for discrete time-varying system with multiplicative noise and multiple state delays. The main contributions are twofolds. First, in virtue of Pontryagin’s maximum principle, we solve the forward and backward stochastic difference equations (FBSDEs) and show the relationship between the state and the costate. Second, based on the solution to the FBSDEs and the coupled difference Riccati equations, the necessary and sufficient condition for the optimal problem is obtained. Meanwhile, an explicit analytical expression is given for the optimal LQG controller. Numerical examples are shown to illustrate the effectiveness of the proposed algorithm.  相似文献   

7.
动态交通分配与信号控制的组合模型及算法研究   总被引:7,自引:0,他引:7  
This paper presents a generalized bi-level programming model of combined dynamic traffic assignment and traffic signal control, and especially analyzes a procedure for determining the equilibrium queuing delays on saturated links for dynamic network signal control satisfying the FIFO (first-in-first-out) rule. The chaotic optimal algorithm proposed in this paper can not only present the optimal signal settings, but also calculate, at each interval, the link inflow rates and outflow rates for the dynamic user optimal problem, and provide real-time information for the travelers. Finally, a numerical example is given to illustrate the application of the proposed model and solution algorithm, and comparison shows that this model has better system performance.  相似文献   

8.
The rotation matrix estimation problem is a keypoint for mobile robot localization, navigation, and control. Based on the quaternion theory and the epipolar geometry, an extended Kalman filter (EKF) algorithm is proposed to estimate the rotation matrix by using a single-axis gyroscope and the image points correspondence from a monocular camera. The experimental results show that the precision of mobile robot s yaw angle estimated by the proposed EKF algorithm is much better than the results given by the image-only and gyroscope-only method, which demonstrates that our method is a preferable way to estimate the rotation for the autonomous mobile robot applications.  相似文献   

9.
This paper presents a fast adaptive iterative algorithm to solve linearly separable classification problems in R n.In each iteration,a subset of the sampling data (n-points,where n is the number of features) is adaptively chosen and a hyperplane is constructed such that it separates the chosen n-points at a margin and best classifies the remaining points.The classification problem is formulated and the details of the algorithm are presented.Further,the algorithm is extended to solving quadratically separable classification problems.The basic idea is based on mapping the physical space to another larger one where the problem becomes linearly separable.Numerical illustrations show that few iteration steps are sufficient for convergence when classes are linearly separable.For nonlinearly separable data,given a specified maximum number of iteration steps,the algorithm returns the best hyperplane that minimizes the number of misclassified points occurring through these steps.Comparisons with other machine learning algorithms on practical and benchmark datasets are also presented,showing the performance of the proposed algorithm.  相似文献   

10.
This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost.The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation(CT-GARE),which underlies the game problem.We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics.The feasibility of the ADP scheme is demonstrated in simulation for a power system control application.The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.  相似文献   

11.
罗艳红  张化光  曹宁  陈兵 《自动化学报》2009,35(11):1436-1445
提出一种贪婪迭代DHP (Dual heuristic programming)算法, 解决了一类控制受约束非线性系统的近似最优镇定问题. 针对系统的控制约束, 首先引入一个非二次泛函把约束问题转换为无约束问题, 然后基于协状态函数提出一种贪婪迭代DHP算法以求解系统的HJB (Hamilton-Jacobi-Bellman)方程. 在算法的每个迭代步, 利用一个神经网络来近似系统的协状态函数, 而后根据协状态函数直接计算系统的最优控制策略, 从而消除了常规近似动态规划方法中的控制网络. 最后通过两个仿真例子证明了本文提出的最优控制方案的有效性和可行性.  相似文献   

12.
In this paper, a new dual iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for a class of nonlinear systems with time-delays in state and control variables. The idea is to use the dynamic programming theory to solve the expressions of the optimal performance index function and control. Then, the dual iterative ADP algorithm is introduced to obtain the optimal solutions iteratively, where in each iteration, the performance index function and the system states are both updated. Convergence analysis is presented to prove the performance index function to reach the optimum by the proposed method. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the dual iterative ADP algorithm. Simulation examples are given to demonstrate the validity of the proposed optimal control scheme.  相似文献   

13.
This paper proposes a novel finite-time optimal control method based on input–output data for unknown nonlinear systems using adaptive dynamic programming (ADP) algorithm. In this method, the single-hidden layer feed-forward network (SLFN) with extreme learning machine (ELM) is used to construct the data-based identifier of the unknown system dynamics. Based on the data-based identifier, the finite-time optimal control method is established by ADP algorithm. Two other SLFNs with ELM are used in ADP method to facilitate the implementation of the iterative algorithm, which aim to approximate the performance index function and the optimal control law at each iteration, respectively. A simulation example is provided to demonstrate the effectiveness of the proposed control scheme.  相似文献   

14.
In this paper, a novel iterative adaptive dynamic programming (ADP) algorithm, called generalised policy iteration ADP algorithm, is developed to solve optimal tracking control problems for discrete-time nonlinear systems. The idea is to use two iteration procedures, including an i-iteration and a j-iteration, to obtain the iterative tracking control laws and the iterative value functions. By system transformation, we first convert the optimal tracking control problem into an optimal regulation problem. Then the generalised policy iteration ADP algorithm, which is a general idea of interacting policy and value iteration algorithms, is introduced to deal with the optimal regulation problem. The convergence and optimality properties of the generalised policy iteration algorithm are analysed. Three neural networks are used to implement the developed algorithm. Finally, simulation examples are given to illustrate the performance of the present algorithm.  相似文献   

15.
In this paper, a novel value iteration adaptive dynamic programming (ADP) algorithm, called “generalized value iteration ADP” algorithm, is developed to solve infinite horizon optimal tracking control problems for a class of discrete-time nonlinear systems. The developed generalized value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. Convergence property is developed to guarantee that the iterative performance index function will converge to the optimum. Neural networks are used to approximate the iterative performance index function and compute the iterative control policy, respectively, to implement the iterative ADP algorithm. Finally, a simulation example is given to illustrate the performance of the developed algorithm.  相似文献   

16.
针对含有复杂约束条件的非线性最优控制问题,提出了一种改进的Gauss伪谱法 (Improved Gauss pseudospectral method, IGPM). 这类问题难以得到解析解,特别是有些问题不存在解析的模型, 一些参数只能通过查表得到,使得传统方法难以求解. 在传统的Gauss伪谱法的基础上,将非线性的终端状态积分约束等价地转化为线性形式,提出了IGPM, 通过协态映射定理可以计算出协态变量,检验最优性,使得IGPM具有间接法一样的精度. 并且给出了初始时刻协态变量和端点时刻控制变量的计算方法. 为了提高解的精度,基于IGPM提出了迭代算法, 最后将该算法应用于求解高超声速飞行器上升段轨迹优化问题,结果表明最优轨迹基本满足路径约束条件和最优性条件.  相似文献   

17.
In this article, using singular perturbation theory and adaptive dynamic programming (ADP) approach, an adaptive composite suboptimal control method is proposed for linear singularly perturbed systems (SPSs) with unknown slow dynamics. First, the system is decomposed into fast‐ and slow‐subsystems and the original optimal control problem is reduced to two subproblems in different time‐scales. Afterward, the fast subproblem is solved based on the known model of the fast‐subsystem and a fast optimal control law is designed by solving the algebraic Riccati equation corresponding to the fast‐subsystem. Then, the slow subproblem is reformulated by introducing a system transformation for the slow‐subsystem. An online learning algorithm is proposed to design a slow optimal control law by using the information of the original system state in the framework of ADP. As a result, the obtained fast and slow optimal control laws constitute the adaptive composite suboptimal control law for the original SPSs. Furthermore, convergence of the learning algorithm, suboptimality of the adaptive composite suboptimal control law and stability of the whole closed‐loop system are analyzed by singular perturbation theory. Finally, a numerical example is given to show the feasibility and effectiveness of the proposed methods.  相似文献   

18.
This paper investigates the choice of function approximator for an approximate dynamic programming (ADP) based control strategy. The ADP strategy allows the user to derive an improved control policy given a simulation model and some starting control policy (or alternatively, closed-loop identification data), while circumventing the ‘curse-of-dimensionality’ of the traditional dynamic programming approach. In ADP, one fits a function approximator to state vs. ‘cost-to-go’ data and solves the Bellman equation with the approximator in an iterative manner. A proper choice and design of function approximator is critical for convergence of the iteration and the quality of final learned control policy, because an approximation error can grow quickly in the loop of optimization and function approximation. Typical classes of approximators used in related approaches are parameterized global approximators (e.g. artificial neural networks) and nonparametric local averagers (e.g. k-nearest neighbor). In this paper, we assert on the basis of some case studies and a theoretical result that a certain type of local averagers should be preferred over global approximators as the former ensures monotonic convergence of the iteration. However, a converged cost-to-go function does not necessarily lead to a stable control policy on-line due to the problem of over-extrapolation. To cope with this difficulty, we propose that a penalty term be included in the objective function in each minimization to discourage the optimizer from finding a solution in the regions of state space where the local data density is inadequately low. A nonparametric density estimator, which can be naturally combined with a local averager, is employed for this purpose.  相似文献   

19.
In this paper, a novel iterative adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. When the iterative control law and iterative performance index function in each iteration cannot be accurately obtained, it is shown that the iterative controls can make the performance index function converge to within a finite error bound of the optimal performance index function. Stability properties are presented to show that the system can be stabilized under the iterative control law which makes the present iterative ADP algorithm feasible for implementation both on-line and off-line. Neural networks are used to approximate the iterative performance index function and compute the iterative control policy, respectively, to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.  相似文献   

20.
Based on adaptive dynamic programming (ADP), the fixed-point tracking control problem is solved by a value iteration (Ⅵ) algorithm. First, a class of discrete-time (DT) nonlinear system with disturbance is considered. Second, the convergence of a Ⅵ algorithm is given. It is proven that the iterative cost function precisely converges to the optimal value, and the control input and disturbance input also converges to the optimal values. Third, a novel analysis pertaining to the range of the discount factor is presented, where the cost function serves as a Lyapunov function. Finally, neural networks (NNs) are employed to approximate the cost function, the control law, and the disturbance law. Simulation examples are given to illustrate the effective performance of the proposed method.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号