首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Based on the observation that iterative learning control (ILC) can be based on the inverse plant but that the approach can be degraded by modelling errors, particularly at high frequencies, this article investigates the construction and properties of a multi-parameter parameter-optimal ILC algorithm that uses an approximate polynomial representation of the inverse with natural high-frequency attenuation. In its simplest form, the algorithm replicates the original work of Owens and Feng but, more generally, it is capable of producing significant improvements to the observed convergence rate. As the number of parameters increases, convergence rates approach that of the ideal plant inverse algorithm. Introducing compensation into the algorithm provides a formal link to previously published gradient and norm-optimal ILC algorithms and indicates that the polynomial approach can be regarded as approximations to those control laws. Simulation examples verify the theoretical performance predictions.  相似文献   

2.
In this paper, a recursive hierarchical parametric estimation (RHPE) algorithm is proposed for stochastic nonlinear systems which can be described by Wiener‐Hammerstein (W‐H) mathematical models. The formulation of parameters estimation problem is based on the prediction error approach and the gradient techniques. The convergence analysis of the developed RHPE algorithm is derived using stochastic gradient‐based theory. Wiener‐Hammerstein hydraulic process is treated to prove the efficiency of the proposed approach.  相似文献   

3.
A new sample path analysis approach based on the smoothing property of conditional expectation for estimating the performance sensitivity of discrete event dynamical systems is proposed. Several examples are presented to show how this approach overcomes a difficulty of the ordinary infinitesimal perturbation analysis. The basic message is that one can get more knowledge about the system performance by observing and analyzing the sample path than by using the conventional simulation approach. It is also pointed out that the classical queueing theory approach for getting the performance sensitivity and the sample path based infinitesimal perturbation analysis approach can be unified in the framework of the new approach, the smoothed (conditional) perturbation analysis.  相似文献   

4.
Perturbation analysis via coupling   总被引:1,自引:0,他引:1  
Perturbation analysis is an efficient method for performance analysis of discrete event dynamic systems. It yields gradient information from only one sample path observation. Over the last two decades, various perturbation analysis techniques have been developed to handle a large class of problems. Coupling is a method of generating multiple random samples. It has wide range of applications in applied probability. The paper is concerned with perturbation analysis via coupling. This approach offers a great versatility of the form of gradient estimators, which is potentially useful for variance reduction and for efficient implementation. Several known perturbation analysis techniques can be reviewed as special ways of coupling. The coupling method is further applied to gradient estimation of Markov chains. The method is used not only in deriving a gradient estimator but also in its implementation. It is proved that the estimator is strongly consistent. Finally, different coupling schemes are compared using an illustrative example  相似文献   

5.
基于性能势理论和等价Markov过程方法,研究了一类半Markov决策过程(SMDP)在参数化随机平稳策略下的仿真优化算法,并简要分析了算法的收敛性.通过SMDP的等价Markov过程,定义了一个一致化Markov链,然后根据该一致化Markov链的单个样本轨道来估计SMDP的平均代价性能指标关于策略参数的梯度,以寻找最优(或次优)策略.文中给出的算法是利用神经元网络来逼近参数化随机平稳策略,以节省计算机内存,避免了“维数灾”问题,适合于解决大状态空间系统的性能优化问题.最后给出了一个仿真实例来说明算法的应用.  相似文献   

6.
Production lines with limited storage capacities can be modeled as cyclic queueing networks with finite buffers and general service times. A new technique, called perturbation analysis of discrete event dynamic systems, is applied to these queueing networks. An estimate of the gradient of the system throughput is obtained by perturbation analysis based on only one sample trajectory of the system. We show that the estimate is strongly consistent. Using this perturbation analysis estimate of gradient, we can apply the Robbins-Monro stochastic procedure in optimizing the system throughput. Compared to the conventional Kiefer-Wolfowitz optimization procedure, this approach saves a large amount of computation. For a real system, the gradient estimate can be obtained without changing any parameters in the system. The results also hold for systems with general routing but in which no server can block more than one other server simultaneously.  相似文献   

7.
An adaptive iterative learning control algorithm based on pulse neural network (PNN) is proposed for trajectory tracking of uncertain robot system. Sliding mode variable structure control is used to improve the robustness to disturbance and perturbation, and boundary layer is used to eliminate the chattering of sliding mode control. In the iterative domain, the unknown parameters are tuned and used for part of the controller. Running in parallel, the PNN can perform real-time state estimation for improving the system convergence. We analyze the stability and convergence of this algorithm by using the Lyapunove-like methodology. The simulation results show that the expected control purpose can be achieved using the proposed algorithm.  相似文献   

8.
This paper is concerned with the joint estimation of states and parameters of a special class of nonlinear systems, ie, bilinear systems. The key is to investigate new estimation methods for interactive state and parameter estimation of the considered system based on the interactive estimation theory. Because the system states are unknown, a bilinear state observer is established based on the Kalman filtering principle. Then, the unavailable states are updated by the state observer outputs recursively. Once the state estimates are obtained, the bilinear state observer–based hierarchical stochastic gradient algorithm is developed by using the gradient search. For the purpose of improving the convergence rate and the parameter estimation accuracy, a bilinear state observer–based hierarchical multi‐innovation stochastic gradient algorithm is proposed by expanding a scalar innovation to an innovation vector. The convergence analysis indicates that the parameter estimates can converge to their true values. The numerical example illustrates the effectiveness of the proposed algorithms.  相似文献   

9.
张春元  朱清新 《控制与决策》2015,30(12):2161-2167

针对传统Actor-critic (AC) 方法在求解连续空间序贯决策问题时收敛速度较慢、收敛质量不高的问题, 提出一种基于对称扰动采样的AC算法框架. 首先, 框架采用高斯分布作为策略分布, 在每一时间步对当前动作均值对称扰动, 从而生成两个动作与环境并行交互; 然后, 基于两者的最大时域差分(TD) 误差选取Agent 的行为动作, 并对值函数参数进行更新; 最后, 基于两者的平均常规梯度或增量自然梯度对策略参数进行更新. 理论分析和仿真结果表明, 所提框架具有较好的收敛性和计算效率.

  相似文献   

10.
针对行动者—评论家(AC)算法存在的经验学习样本维度高、策略梯度模型鲁棒性低等问题,依据多代理系统的信息协作优势,构建注意力机制网络并作为代理体,引入多层并行注意力机制网络模型对AC算法进行改进,提出一种基于多层并行注意力机制的柔性AC算法。将其用于解决动态未知环境下的机器人路径规划问题,可增强行动者的策略梯度鲁棒性并降低评论家的回归误差,实现机器人路径规划最优方案的快速收敛。实验结果表明,该算法有效克服机器人路径规划的局部最优,具有计算速度快、稳定收敛的优点。  相似文献   

11.
《Automatica》1987,23(4):491-496
Nonlinear systems can be identified with nonlinear filters by combined estimation of parameters and state. Often these methods are very complex or have convergence problems, as for example the Extended Kalman Filter. In this paper a new algorithm for recursive nonlinear system identification is presented. Convergence problems are eliminated by an improved calculation of the gradient as the total derivative of the prediction error. By separately estimating parameters and states, computation time is reduced. The efficacy of the new identification algorithm is illustrated by studying its performance in a gravimetric filling system.  相似文献   

12.
移动机器人在复杂环境中移动难以得到较优的路径,基于马尔可夫过程的Q学习(Q-learning)算法能通过试错学习取得较优的路径,但这种方法收敛速度慢,迭代次数多,且试错方式无法应用于真实的环境中。在Q-learning算法中加入引力势场作为初始环境先验信息,在其基础上对环境进行陷阱区域逐层搜索,剔除凹形陷阱区域[Q]值迭代,加快了路径规划的收敛速度。同时取消对障碍物的试错学习,使算法在初始状态就能有效避开障碍物,适用于真实环境中直接学习。利用python及pygame模块建立复杂地图,验证加入初始引力势场和陷阱搜索的改进Q-learning算法路径规划效果。仿真实验表明,改进算法能在较少的迭代次数后,快速有效地到达目标位置,且路径较优。  相似文献   

13.
In the theory of event‐based optimization (EBO), the decision making is triggered by events, which is different from the traditional state‐based control in Markov decision processes (MDP). In this paper, we propose a policy gradient approach of EBO. First, an equation of performance gradient in the event‐based policy space is derived based on a fundamental quantity called Q‐factors of EBO. With the performance gradient, we can find the local optimum of EBO using the gradient‐based algorithm. Compared to the policy iteration approach in EBO, this policy gradient approach does not require restrictive conditions and it has a wider application scenario. The policy gradient approach is further implemented based on the online estimation of Q‐factors. This approach does not require the prior information about the system parameters, such as the transition probability. Finally, we use an EBO model to formulate the admission control problem and demonstrate the main idea of this paper. Such online algorithm provides an effective implementation of the EBO theory in practice.  相似文献   

14.
This paper is concerned with numerical solutions to general linear matrix equations including the well-known Lyapunov matrix equation and Sylvester matrix equation as special cases. Gradient based iterative algorithm is proposed to approximate the exact solution. A necessary and sufficient condition guaranteeing the convergence of the algorithm is presented. A sufficient condition that is easy to compute is also given. The optimal convergence factor such that the convergence rate of the algorithm is maximized is established. The proposed approach not only gives a complete understanding on gradient based iterative algorithm for solving linear matrix equations, but can also be served as a bridge between linear system theory and numerical computing. Numerical example shows the effectiveness of the proposed approach.  相似文献   

15.
Consider the problem of developing a controller for general (nonlinear and stochastic) systems where the equations governing the system are unknown. Using discrete-time measurement, this paper presents an approach for estimating a controller without building or assuming a model for the system. Such an approach has potential advantages in accommodating complex systems with possibly time-varying dynamics. The controller is constructed through use of a function approximator, such as a neural network or polynomial. This paper considers the use of the simultaneous perturbation stochastic approximation algorithm which requires only system measurements. A convergence result for stochastic approximation algorithms with time-varying objective functions and feedback is established. It is shown that this algorithm can greatly enhance the efficiency over more standard stochastic approximation algorithms based on finite-difference gradient approximations  相似文献   

16.
This paper deals with a new approach based on Q-learning for solving the problem of mobile robot path planning in complex unknown static environments.As a computational approach to learning through interaction with the environment,reinforcement learning algorithms have been widely used for intelligent robot control,especially in the field of autonomous mobile robots.However,the learning process is slow and cumbersome.For practical applications,rapid rates of convergence are required.Aiming at the problem of slow convergence and long learning time for Q-learning based mobile robot path planning,a state-chain sequential feedback Q-learning algorithm is proposed for quickly searching for the optimal path of mobile robots in complex unknown static environments.The state chain is built during the searching process.After one action is chosen and the reward is received,the Q-values of the state-action pairs on the previously built state chain are sequentially updated with one-step Q-learning.With the increasing number of Q-values updated after one action,the number of actual steps for convergence decreases and thus,the learning time decreases,where a step is a state transition.Extensive simulations validate the efficiency of the newly proposed approach for mobile robot path planning in complex environments.The results show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time,compared with the one-step Q-learning algorithm and the Q(λ)-learning algorithm.  相似文献   

17.
基于图像的视觉伺服可用于对机械臂的运动进行有效的控制。然而,正如许多研究者指出的,当初始位置和期望位置相距较远时,此种控制策略将因其局部特性而存在收敛性、稳定性问题。通过在图像平面内定义充分的图像特征轨迹,并对这些轨迹进行跟踪,我们可以充分利用基于图像的视觉伺服所固有的局部收敛性及稳定性特性这一优势,从而避免初始位置与期望位置相距较远时所面临的问题。因此,近年来,图像空间路径规划已成为机器人领域的一个热点研究问题。但是,目前几乎所有的有关结果均是针对手眼视觉系统提出的。本文将针对场景摄像机视觉系统提出一种未标定视觉路径规划算法。此算法在射影空间中直接计算图像特征的轨迹,这样可保证它们与刚体运动一致。通过将旋转及平移运动的射影表示分解为规范化形式,我们可以很容易地对其射影空间内的路径进行插值。在此之后,图像平面中的图像特征轨迹可通过射影路径产生。通过这种方式,此算法并不需要特征点结构和摄像机内部参数的有关知识。为了验证所提算法的可行性及系统性能,本文最后给出了基于PUMA560机械臂的仿真研究结果。  相似文献   

18.
提出一种基于粒子滤波的全局路径规划方法, 以多段Ferguson样条曲线表示路径确保所得路径光滑且一阶连续. 将最优路径视为真实状态, 将其他路径视为受噪声污染的状态, 从而将最优路径的搜索过程视为状态空间中对真实状态的滤波过程. 以粒子滤波算法依据路径评价函数对状态空间实施滤波获得最优路径, 仿真结果表明该方法实现简单、收敛迅速、且所得到路径光滑, 易于为机器人直接采用.  相似文献   

19.
In realistic mobile ad-hoc network scenarios, the hosts usually travel to the pre-specified destinations, and often exhibit non-random motion behaviors. In such mobility patterns, the future motion behavior of the mobile is correlated with its past and current mobility characteristics. Therefore, the memoryless mobility models are not capable of realistically emulating such a mobility behavior. In this paper, an adaptive learning automata-based mobility prediction method is proposed in which the prediction is made based on the Gauss–Markov random process, and exploiting the correlation of the mobility parameters over time. In this prediction method, using a continuous-valued reinforcement scheme, the proposed algorithm learns how to predict the future mobility behaviors relying only on the mobility history. Therefore, it requires no a prior knowledge of the distribution parameters of the mobility characteristics. Furthermore, since in realistic mobile ad hoc networks the mobiles move with a wide variety of the mobility models, the proposed algorithm can be tuned for duplicating a wide spectrum of the mobility patterns with various randomness degrees. Since the proposed method predicts the basic mobility characteristics of the host (i.e., speed, direction and randomness degree), it can be also used to estimate the various ad-hoc network parameters like link availability time, path reliability, route duration and so on. In this paper, the convergence properties of the proposed algorithm are also studied and a strong convergence theorem is presented to show the convergence of the algorithm to the actual characteristics of the mobility model. The simulation results conform to the theoretically expected convergence results and show that the proposed algorithm precisely estimates the motion behaviors.  相似文献   

20.
We consider a closed Jackson—like queueing network with arbitrary service time distributions and derive an unbiased second derivative estimator of the throughput over N customers served at some node with respect to a parameter of the service distribution at that node. Our approach is based on observing a single sample path of this system, and evaluating all second-order effects on interdeparture times as a result of the parameter perturbation. We then define an estimator as a conditional expectation over appropriate observable quantities, as in Smoothed Perturbation Analysis (SPA). This process recovers the first derivative estimator along the way (which can also be derived using other techniques), and gives new insights into event order change phenomena which are of higher order, and on the type of sample path information we need to condition on for higher-order derivative estimation. Despite the complexity of the analysis, the final algorithm we obtain is relatively simple. Our estimators can be used in conjunction with other techniques to obtain rational approximations of the entire throughput response surface as a function of system parameters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号