首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 250 毫秒
1.
宋春跃  WANG Hui  李平 《自动化学报》2008,34(8):1028-1032
针对含扩散项的线性混杂切换系统优化控制问题, 为降低优化求解的计算复杂性, 提出了Monte Carlo统计预测方法. 首先通过数值求解技术把连续时间优化控制问题转化为离散时间的Markov决策过程问题; 然后在若干有限状态子空间内, 利用反射边界技术来求解相应子空间的最优控制策略; 最后根据最优控制策略的结构特性, 采用统计预测方法来预测出整个状态空间的最优控制策略. 该方法能有效降低求解涉及大状态空间及多维变量的线性混杂切换系统优化控制的计算复杂性, 文末的仿真结果验证了方法的有效性.  相似文献   

2.
周雷  孔凤  唐昊  张建军 《控制理论与应用》2011,28(11):1665-1670
研究单站点传送带给料生产加工站(conveyor-serviced production station,CSPS)系统的前视(look-ahead)距离最优控制问题,以提高系统的工作效率.论文运用半Markov决策过程对CSPS优化控制问题进行建模.考虑传统Q学习难以直接处理CSPS系统前视距离为连续变量的优化控制问题,将小脑模型关节控制器网络的Q值函数逼近与在线学习技术相结合,给出了在线Q学习及模型无关的在线策略迭代算法.仿真结果表明,文中算法提高了学习速度和优化精度.  相似文献   

3.
刘重阳  韩美佳 《控制与决策》2020,35(10):2407-2414
考虑到1,3-丙二醇(1,3-PD)批式流加发酵过程中的时滞现象,提出一个非线性时滞微分方程来描述该过程.以终端时刻1,3-PD的单位时间产量作为性能指标,同时,以甘油和碱的流加速度、发酵过程的终端时刻作为控制向量,建立一个含控制和状态约束的时滞最优控制模型.为了求解该最优控制问题,首先通过时间尺度变换,将该最优控制问题转化为具有固定终端时刻的等价最优控制问题;然后,应用控制参数化方法,将等价的最优控制问题用一系列有限维优化问题来近似;最后,构造一种改进的粒子群优化方法来求解相应的近似优化问题.数值结果表明,终端时刻的1,3-PD的单位时间产量比已有结果提高了约58%.  相似文献   

4.
针对一类连续时间线性Markov跳变系统,本文提出了一种新的策略迭代算法用于求解系统的非零和微分反馈Nash控制问题.通过求解耦合的数值迭代解,以获得具有线性动力学特性和无限时域二次成本的双层非零和微分策略的Nash均衡解.在每一个策略层,采用策略迭代算法来计算与每一组给定的反馈控制策略相关联的最小无限时域值函数.然后,通过子系统分解将Markov跳变系统分解为N个并行的子系统,并将该算法应用于跳变系统.本文提出的策略迭代算法可以很容易求解非零和微分策略所对应的耦合代数Riccati方程,且对高维系统有效.最后通过仿真示例证明了本文设计方法的有效性和可行性.  相似文献   

5.
基于遗传算法的污水处理过程优化控制方法   总被引:2,自引:0,他引:2  
分析和研究了在限制有机物排放总量条件下,污水处理运行费用最低的优化控制问题,给出污水处理过程中多变量最优控制的数学模型,提出一种采用遗传算法求解污水处理过程优化控制问题新的计算方法,论述了二进制编码、适应度函数计算的设计与实现方法.该方法避免了对迭代初值进行猜测的困难,提高了计算效率.数值仿真说明在污水处理过程中采用遗传算法寻优是可行和有效的.  相似文献   

6.
针对控制时滞及带饱和的一类离散时间非线性系统的最优控制问题,通过重构性能指标函数和对应的系统变换,处理了性能指标函数中的控制耦合项;继而引入一个合适的泛函,解决了控制带饱和问题.给出了一个新的性能指标函数,利用迭代自适应动态规划(ADP)算法获得最优控制.为实现该算法,采用神经网络逼近函数来求解最优控制问题.仿真结果验证了方法的有效性.  相似文献   

7.
本文主要研究可变服务率模式下基于需求驱动的传送带给料加工站(CSPS)系统的优化控制问题,主要目标是对系统的随机优化控制问题进行建模和提供解决方案.论文以缓冲库和成品库剩余容量为联合状态,以站点前视距离和工件服务率为控制变量,将其最优控制问题描述为半马尔科夫决策过程(SMDP)模型.该模型为利用策略迭代等方法求解系统在平均准则或折扣准则下的最优控制策略提供了理论基础,特别地,据此可引入基于模拟退火思想的Q学习算法等优化方法来寻求近似解,以克服理论求解过程中的维数灾和建模难等困难.仿真结果说明了本文建立的数学模型及给出的优化方法的有效性.  相似文献   

8.
CTMDP基于随机平稳策略的仿真优化算法   总被引:2,自引:2,他引:2  
基于Markov性能势理论和神经元动态规划(NDP)方法,研究一类连续时间Markov决 策过程(MDP)在随机平稳策略下的仿真优化问题,给出的算法是把一个连续时间过程转换成其 一致化Markov链,然后通过其单个样本轨道来估计平均代价性能指标关于策略参数的梯度,以 寻找次优策略,该方法适合于解决大状态空间系统的性能优化问题.并给出了一个受控Markov 过程的数值实例.  相似文献   

9.
Markov控制过程基于性能势的平均代价最优策略   总被引:2,自引:1,他引:2  
研究了一类离散时间Markov控制过程平均代价性能最优控制决策问题.应用 Markov性能势的基本性质,在很一般性的假设条件下,直接导出了无限时间平均代价模型在紧 致行动集上的最优性方程及其解的存在性定理.提出了求解最优平稳控制策略的迭代算法,并 讨论了这种算法的收敛性问题.最后通过分析一个实例来说明这种算法的应用.  相似文献   

10.
提出了一种基于马尔可夫切换状态空间控制模型的多媒体服务器集群系统能耗最优控制方法.通过建立多媒体服务器集群的随机控制模型,将能耗最优控制描述为一个带约束的优化问题.结合拉格朗日乘子法和性能势理论,提出了一种在线策略迭代算法.该优化算法通过样本轨道在线寻找最优控制策略,寻找过程不需要精确的系统参数信息.仿真实验证明了该算法的有效性.  相似文献   

11.
Principle of optimality or dynamic programming leads to derivation of a partial differential equation (PDE) for solving optimal control problems, namely the Hamilton‐Jacobi‐Bellman (HJB) equation. In general, this equation cannot be solved analytically; thus many computing strategies have been developed for optimal control problems. Many problems in financial mathematics involve the solution of stochastic optimal control (SOC) problems. In this work, the variational iteration method (VIM) is applied for solving SOC problems. In fact, solutions for the value function and the corresponding optimal strategies are obtained numerically. We solve a stochastic linear regulator problem to investigate the applicability and simplicity of the presented method and prove its convergence. In particular, for Merton's portfolio selection model as a problem of portfolio optimization, the proposed numerical method is applied for the first time and its usefulness is demonstrated. For the nonlinear case, we investigate its convergence using Banach's fixed point theorem. The numerical results confirm the simplicity and efficiency of our method.  相似文献   

12.
A method is presented for solving the infinite time Hamilton-Jacobi-Bellman (HJB) equation for certain state-constrained stochastic problems. The HJB equation is reformulated as an eigenvalue problem, such that the principal eigenvalue corresponds to the expected cost per unit time, and the corresponding eigenfunction gives the value function (up to an additive constant) for the optimal control policy. The eigenvalue problem is linear and hence there are fast numerical methods available for finding the solution.  相似文献   

13.
This paper considers mobile to base station power control for lognormal fading channels in wireless communication systems within a centralized information stochastic optimal control framework. Under a bounded power rate of change constraint, the stochastic control problem and its associated Hamilton-Jacobi-Bellman (HJB) equation are analyzed by the viscosity solution method; then the degenerate HJB equation is perturbed to admit a classical solution and a suboptimal control law is designed based on the perturbed HJB equation. When a quadratic type cost is used without a bound constraint on the control, the value function is a classical solution to the degenerate HJB equation and the feedback control is affine in the system power. In addition, in this case we develop approximate, but highly scalable, solutions to the HJB equation in terms of a local polynomial expansion of the exact solution. When the channel parameters are not known a priori, one can obtain on-line estimates of the parameters and get adaptive versions of the control laws. In numerical experiments with both of the above cost functions, the following phenomenon is observed: whenever the users have different initial conditions, there is an initial convergence of the power levels to a common level and then subsequent approximately equal behavior which converges toward a stochastically varying optimum.  相似文献   

14.
Optimal portfolios with regime switching and value-at-risk constraint   总被引:1,自引:0,他引:1  
We consider the optimal portfolio selection problem subject to a maximum value-at-Risk (MVaR) constraint when the price dynamics of the risky asset are governed by a Markov-modulated geometric Brownian motion (GBM). Here, the market parameters including the market interest rate of a bank account, the appreciation rate and the volatility of the risky asset switch over time according to a continuous-time Markov chain, whose states are interpreted as the states of an economy. The MVaR is defined as the maximum value of the VaRs of the portfolio in a short time duration over different states of the chain. We formulate the problem as a constrained utility maximization problem over a finite time horizon. By utilizing the dynamic programming principle, we shall first derive a regime-switching Hamilton-Jacobi-Bellman (HJB) equation and then a system of coupled HJB equations. We shall employ an efficient numerical method to solve the system of coupled HJB equations for the optimal constrained portfolio. We shall provide numerical results for the sensitivity analysis of the optimal portfolio, the optimal consumption and the VaR level with respect to model parameters. These results are also used to investigating the effect of the switching regimes.  相似文献   

15.
具有异常波动市场的消费与投资策略   总被引:2,自引:0,他引:2       下载免费PDF全文
讨论了异常波动市场中容许借贷的消费与投资策略问题,阐述了随机最优控制理论应用于现代金融理论研究中的一种方法.首先给出了金融市场中不确定性的随机模型,利用It^o公式,得到了与消费及投资策略有关的财富过程的随机微分方程,并建立了最优消费与投资问题的随机控制模型.根据随机最优控制理论,导出了目标函数满足的Hamilton-Jacobi-Bellman(HJB)方程.通过对HJB方程的讨论,得到了最优消费与投资策略的分段表示函数,并就Hara效用函数进行讨论,得到了具体的消费与投资策略.  相似文献   

16.
It is well known that stochastic control systems can be viewed as Markov decision processes (MDPs) with continuous state spaces. In this paper, we propose to apply the policy iteration approach in MDPs to the optimal control problem of stochastic systems. We first provide an optimality equation based on performance potentials and develop a policy iteration procedure. Then we apply policy iteration to the jump linear quadratic problem and obtain the coupled Riccati equations for their optimal solutions. The approach is applicable to linear as well as nonlinear systems and can be implemented on-line on real world systems without identifying all the system structure and parameters.  相似文献   

17.
In control systems theory, the Markov decision process (MDP) is a widely used optimization model involving selection of the optimal action in each state visited by a discrete-event system driven by Markov chains. The classical MDP model is suitable for an agent/decision-maker interested in maximizing expected revenues, but does not account for minimizing variability in the revenues. An MDP model in which the agent can maximize the revenues while simultaneously controlling the variance in the revenues is proposed. This work is rooted in machine learning/neural network concepts, where updating is based on system feedback and step sizes. First, a Bellman equation for the problem is proposed. Thereafter, convergent dynamic programming and reinforcement learning techniques for solving the MDP are provided along with encouraging numerical results on a small MDP and a preventive maintenance problem.  相似文献   

18.
In this paper, we present an empirical study of iterative least squares minimization of the Hamilton-Jacobi-Bellman (HJB) residual with a neural network (NN) approximation of the value function. Although the nonlinearities in the optimal control problem and NN approximator preclude theoretical guarantees and raise concerns of numerical instabilities, we present two simple methods for promoting convergence, the effectiveness of which is presented in a series of experiments. The first method involves the gradual increase of the horizon time scale, with a corresponding gradual increase in value function complexity. The second method involves the assumption of stochastic dynamics which introduces a regularizing second derivative term to the HJB equation. A gradual reduction of this term provides further stabilization of the convergence. We demonstrate the solution of several problems, including the 4-D inverted-pendulum system with bounded control. Our approach requires no initial stabilizing policy or any restrictive assumptions on the plant or cost function, only knowledge of the plant dynamics. In the Appendix, we provide the equations for first- and second-order differential backpropagation.  相似文献   

19.
This correspondence deals with a class of ergodic control problems for systems described by Markov chains with strong and weak interactions. These systems are composed of a set of subchains that are weakly coupled. Using results already available in the literature one formulates a limit control problem the solution of which can be obtained via an associated nondifferentiable convex programming (NDCP) problem. The technique used to solve the NDCP problem is the Analytic Center Cutting Plane Method (ACCPM) which implements a dialogue between, on one hand, a master program computing the analytical center of a localization set containing the solution and, on the other hand, an oracle proposing cutting planes that reduce the size of the localization set at each main iteration. The interesting aspect of this implementation comes from two characteristics: (i) the oracle proposes cutting planes by solving reduced sized Markov Decision Problems (MDP) via a linear program (LP) or a policy iteration method; (ii) several cutting planes can be proposed simultaneously through a parallel implementation on processors. The correspondence concentrates on these two aspects and shows, on a large scale MDP obtained from the numerical approximation ldquoa la Kushner-Dupuisrdquo of a singularly perturbed hybrid stochastic control problem, the important computational speed-up obtained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号