首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A set of neural networks is employed to develop control policies that are better than fixed, theoretically optimal policies, when applied to a combined physical inventory and distribution system in a nonstationary demand environment. Specifically, we show that model-based adaptive critic approximate dynamic programming techniques can be used with systems characterized by discrete valued states and controls. The control policies embodied by the trained neural networks outperformed the best, fixed policies (found by either linear programming or genetic algorithms) in a high-penalty cost environment with time-varying demand.  相似文献   

2.
This paper is concerned with the optimal control of linear discrete-time systems subject to unknown but bounded state disturbances and mixed polytopic constraints on the state and input. It is shown that the class of admissible affine state feedback control policies with knowledge of prior states is equivalent to the class of admissible feedback policies that are affine functions of the past disturbance sequence. This implies that a broad class of constrained finite horizon robust and optimal control problems, where the optimization is over affine state feedback policies, can be solved in a computationally efficient fashion using convex optimization methods. This equivalence result is used to design a robust receding horizon control (RHC) state feedback policy such that the closed-loop system is input-to-state stable (ISS) and the constraints are satisfied for all time and all allowable disturbance sequences. The cost to be minimized in the associated finite horizon optimal control problem is quadratic in the disturbance-free state and input sequences. The value of the receding horizon control law can be calculated at each sample instant using a single, tractable and convex quadratic program (QP) if the disturbance set is polytopic, or a tractable second-order cone program (SOCP) if the disturbance set is given by a 2-norm bound.  相似文献   

3.
This paper deals with the finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors. For a given control model \(\mathcal {M}\) with denumerable states and compact Borel action sets, and possibly unbounded reward functions, under reasonable conditions we prove that there exists a sequence of control models \(\mathcal {M}_{n}\) such that the first passage optimal rewards and policies of \(\mathcal {M}_{n}\) converge to those of \(\mathcal {M}\), respectively. Based on the convergence theorems, we propose a finite-state and finite-action truncation method for the given control model \(\mathcal {M}\), and show that the first passage optimal reward and policies of \(\mathcal {M}\) can be approximated by those of the solvable truncated finite control models. Finally, we give the corresponding value and policy iteration algorithms to solve the finite approximation models.  相似文献   

4.
This paper examines a codesign problem in industrial networked control systems (NCS) whereby physical systems are controlled over wireless fading channels. The considered wireless channels are assumed to be stochastically dependent on the physical states of moving machineries in the industrial working space. In this paper, the moving machineries are modeled as Markov decision processes whereas the characteristics of the correlated fading channels are modeled as a binary random process whose probability measure depends on both the physical states of moving machineries and the transmission power of communication channels. Under such a state‐dependent fading channel model, sufficient conditions to ensure the stochastic safety of the NCS are first derived. Using the derived safety conditions, a codesign problem is then formulated as a constrained joint optimization problem that seeks for optimal control and transmission power policies which simultaneously minimize an infinite time cost on both communication resource and control effort. This paper shows that such optimal policies can be obtained in a computationally efficient manner using convex programming methods. Simulation results of an autonomous forklift truck and a networked DC motor system are presented to illustrate the advantage and efficacy of the proposed codesign framework for industrial NCS.  相似文献   

5.
We consider a receding horizon approach as an approximate solution to two-person zero-sum Markov games with infinite horizon discounted cost and average cost criteria. We first present error bounds from the optimal equilibrium value of the game when both players take "correlated" receding horizon policies that are based on exact or approximate solutions of receding finite horizon subgames. Motivated by the worst-case optimal control of queueing systems by Altman, we then analyze error bounds when the minimizer plays the (approximate) receding horizon control and the maximizer plays the worst case policy. We finally discuss some state-space size independent methods to compute the value of the subgame approximately for the approximate receding horizon control, along with heuristic receding horizon policies for the minimizer.  相似文献   

6.
Industrial polymerization plants experience frequent changes of products, driven by end-use properties to meet various market requirements. Efficient grade transition policies are essential to save time and materials. In this study, the gas-phase catalytic polymerization is modeled in a fluidized bed reactor by a single-phase model, and dynamic optimization is implemented to determine optimal operating sequences for grade changes. Two optimization formulations, a single-stage and a multi-stage formulation, are introduced and compared. The superiority of the multi-stage formulation is demonstrated owing to a better control on each stage during the transition and a further reduction of off-grade time. Subsequently, an on-line optimal control framework is established by incorporating shrinking horizon nonlinear model predictive control with an expanding horizon weighted least-square estimator for process states and unknown parameters. The results of a case study indicate the designed framework is able to handle process uncertainty, while reducing the transition time.  相似文献   

7.
Solves a finite-horizon partially observed risk-sensitive stochastic optimal control problem for discrete-time nonlinear systems and obtains small noise and small risk limits. The small noise limit is interpreted as a deterministic partially observed dynamic game, and new insights into the optimal solution of such game problems are obtained. Both the risk-sensitive stochastic control problem and the deterministic dynamic game problem are solved using information states, dynamic programming, and associated separated policies. A certainty equivalence principle is also discussed. The authors' results have implications for the nonlinear robust stabilization problem. The small risk limit is a standard partially observed risk-neutral stochastic optimal control problem  相似文献   

8.
This paper derives methods for the calculation of optimal stabilization policies under the assumption that monetary and fiscal control are exercised by separate authorities who may have different objectives. Each authority minimizes its own quadratic cost functional subject to the constraint of a linear econometric model. Nash solution strategies are calculated for this discrete-time differential game, both in the context of open-loop and closed-loop behavior (in the closed-loop framework each authority can continually revise his policy in response to the evolving strategy of the other authority). The results are applied to a small econometric model, and show how the degree of fiscal or monetary, control depends on the particular conflict situation, and how conflicting policies are "suboptimal" in comparison with coordinated policies.  相似文献   

9.
本文用预测制导的思想,具体研究运载火箭偏航通道的制导问题.阐述了摄动预测制导的基本原理,在有导航计算和有干扰测量两种情况下,推导了摄动预测制导的基本计算公式, 研究了预测制导的简化形式以及简化条件. 另外,本文还提出了预测制导的零控无偏状态的概念,以及向零控无偏状态导引的最优控制问题,推导了最优控制的闭合解.这些概念和结果有待进一步研究,仅供有关同志参考.  相似文献   

10.
A method is presented for designing controllers for linear time-invariant systems whose states are not all available or accessible for measurement and where the structure of the controller is constrained to be a linear time-invariant combination of the measurable states of the system. Two types of structure constraints are considered: 1) each control channel is constrained to be a linear, time-invariant combination of one set of measurable states; 2) each control channel is constrained to he a linear, time-invariant combination of different sets of measurable states. The control system, subject to these constraints is selected such that the resulting closed-loop system performs as "near" to some known optimal system as is possible, i.e., suboptimal. The nearness of the optimal system to the suboptimal system is defined in two ways and thus, two types of suboptimal controllers are found.  相似文献   

11.
This article develops distributed optimal control policies via Q-learning for multi-agent systems (MASs) by solving dual games. According to game theory, first, the distributed consensus problem is formulated as a multi-player non-zero-sum game, where each agent is viewed as a player focusing only on its local performance and the whole MAS achieves Nash equilibrium. Second, for each agent, the anti-disturbance problem is formulated as a two-player zero-sum game, in which the control input and external disturbance are a pair of opponents. Specifically, (1) an offline data-driven off-policy for distributed tracking algorithm based on momentum policy gradient (MPG) is developed, which can effectively achieve consensus of MASs with guaranteed -bounded synchronization error. (2) An actor-critic-disturbance neural network is employed to implement the MPG algorithm and obtain optimal policies. Finally, numerical and practical simulation results are conducted to verify the effectiveness of the developed tracking policies via MPG algorithm.  相似文献   

12.
We present a simulation-based algorithm called "Simulated Annealing Multiplicative Weights" (SAMW) for solving large finite-horizon stochastic dynamic programming problems. At each iteration of the algorithm, a probability distribution over candidate policies is updated by a simple multiplicative weight rule, and with proper annealing of a control parameter, the generated sequence of distributions converges to a distribution concentrated only on the best policies. The algorithm is "asymptotically efficient," in the sense that for the goal of estimating the value of an optimal policy, a provably convergent finite-time upper bound for the sample mean is obtained  相似文献   

13.
For a linear control system with quadratic performance index the optimal control takes a feedback form of all state variables. However, if there are some states which are not fed in the control system, it is impossible to obtain the optimal feedback control by using the usual mathematical optimization technique such as dynamic programming or the maximum principle.

This paper presents the optimal control of output feedback systems for a quadratic performance index by using a new parameter optimization technique.

Since the optimal feedback gains depend on the initial states in the output feedback control system, two cases where (1) the initial states are known, and (2) the statistical properties of initial states such as mean and covariance matrices are known, are considered here. Furthermore, the proposed method for optimal output feedback control is also applied to sampled-data systems.  相似文献   

14.
This paper considers the feasibility of using the optimal parameters of stationary inventory control policies to design inventory control rules in supply systems operating on a real market. The efficiency of the long-sighted and myopic inventory control policies is compared. Different approaches to design the optimal stationary policies are investigated. A comparative evaluation of these approaches is given and the specifics of their application are discussed. The optimal parameters of the stationary inventory control policies as functions of the market state and, in particular, of the inflation rate are estimated via simulation experiments.  相似文献   

15.
In this paper, we study the optimisation problem of transmission power and delay in a multi-hop wireless network consisting of multiple nodes. The goal is to determine the optimal policy of transmission rates at various buffer and channel states in order to minimise the power consumption and the queueing delay of the whole network. With the assumptions of interference-free links and independently and identically distributed (i.i.d.) channel states, we formulate this problem using a semi-open Jackson network model for data transmission and a Markov model for channel states transition. We derive a difference equation of the system performance under any two different policies. The necessary and sufficient condition of optimal policy is obtained. We also prove that the system performance is monotonic with respect to (w.r.t.) the transmission rate and the optimal transmission rate can be either maximal or minimal. That is, the ‘bang-bang’ control is an optimal control. This optimality structure greatly reduces the problem complexity. Furthermore, we develop an iterative algorithm to find the optimal solution. Finally, we conduct the simulation experiments to demonstrate the effectiveness of our approach. We hope our work can shed some insights on solving this complicated optimisation problem.  相似文献   

16.
State estimation and safe controller synthesis for a general form of decentralized control architecture for discrete-event systems is investigated. For this architecture, controllable events are assigned to be either "conjunctive" or "disjunctive." A new state estimator that accounts for past local control actions when calculating the set of estimated system states is presented. The new state estimator is applied to a previous general decentralized control law. The new control method generates a controlled language at least as large as that generated by the original method if a safety condition is satisfied. An algorithm for generating locally maximal control policies for a given state estimate is also discussed. The algorithm allows an amount of "steering" of the controlled system through an event priority mechanism.  相似文献   

17.
Flow control is considered for M(⩾2) transmitting stations sending packets to a single receiver over a slotted time-multiplexed link. The optimal allocation problem is generalized to the case of nonidentical holding costs at the M transmitters. Qualitative properties of optimal discounted and time-average policies that reduce the computational complexity of the M-dimensional optimal flow control algorithm are derived. For M=2, a simple relationship between optimal allocations for states x and x +ei (i=1,2) that leads to significant computational savings in the optimal algorithm is established  相似文献   

18.
This paper reports about applications of optimal control theory to the analysis of macroeconomic policies for Slovenia during its way into the Euro Area. For this purpose, the model SLOPOL4, a macroeconometric model for Slovenia, is used. Optimal policies are calculated using the OPTCON algorithm, an algorithm for determining (approximately) optimal solutions to deterministic and stochastic control problems. We determine optimal exchange rate and fiscal policies for Slovenia as solutions to optimum control problems with a quadratic objective function and the model SLOPOL4 as constraint. Several optimization experiments under different assumptions about the exchange rate regime are carried out. The sensitivity of the results with respect to several assumptions is investigated; in particular, the reaction of the optimal paths on varying the stochastic character of the model parameters is examined. If the stochastic nature of more parameters is taken into consideration, the resulting policies are closer to the deterministic solution than with only a few stochastic parameters.  相似文献   

19.
Motivated by time-sensitive e-service applications, we consider the design of effective policies in a Markovian model for the dynamic control of both admission and routing of a single class of real-time transactions to multiple heterogeneous clusters of web servers, each having its own queue and server pool. Transactions come with response-time deadlines, staying until completion if the latter are missed. Per job rejection and deadline-miss penalties are incurred. Since computing an optimal policy is intractable, we aim to design near optimal heuristic policies that are tractable for large-scale systems. Four policies are developed: the static optimal Bernoulli-splitting (BS) policy, and three index policies, based respectively on individually optimal (IO) actions, one-step policy improvement (PI), and restless bandit (RB) indexation. A computational study demonstrates that PI is the best of such policies, being consistently near optimal. In the pure-routing case, both the PI and RB policies are nearly optimal.  相似文献   

20.
This note deals with the problems of fault diagnosis and fault-tolerant control for systems with delayed measurements and states. The main contribution consists in two aspects. First, by solving the Riccati equation and Sylvester equation, an optimal fault-tolerant control law is designed for systems with delayed measurements and states. The existence and uniqueness of the optimal fault-tolerant control law are proved. Second, the physically unrealizable problem of the optimal fault-tolerant control law is solved by proposing a novel fault diagnoser for systems with delayed measurements and states. Finally, a numerical example is given to demonstrate the feasibility and validity of the proposed schemes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号