期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Data mining for state space orthogonalization in adaptive dynamic programming

《Expert systems with applications》2017

Dynamic programming (DP) is a mathematical programming approach for optimizing a system that changes over time and is a common approach for developing intelligent systems. Expert systems that are intelligent must be able to adapt dynamically over time. An optimal DP policy identifies the optimal decision dependent on the current state of the system. Hence, the decisions controlling the system can intelligently adapt to changing system states. Although DP has existed since Bellman introduced it in 1957, exact DP policies are only possible for problems with low dimension or under very limiting restrictions. Fortunately, advances in computational power have given rise to approximate DP (ADP). However, most ADP algorithms are still computationally-intractable for high-dimensional problems. This paper specifically considers continuous-state DP problems in which the state variables are multicollinear. The issue of multicollinearity is currently ignored in the ADP literature, but in the statistics community it is well known that high multicollinearity leads to unstable (high variance) parameter estimates in statistical modeling. While not all real world DP applications involve high multicollinearity, it is not uncommon for real cases to involve observed state variables that are correlated, such as the air quality ozone pollution application studied in this research. Correlation is a common occurrence in observed data, including sources in meteorology, energy, finance, manufacturing, health care, etc.ADP algorithms for continuous-state DP achieve an approximate solution through discretization of the state space and model approximations. Typical state space discretizations involve full-dimensional grids or random sampling. The former option requires exponential growth in the number of state points as the state space dimension grows, while the latter option is typically inefficient and requires an intractable number of state points. The exception is computationally-tractable ADP methods based on a design and analysis of computer experiments (DACE) approach. However, the DACE approach utilizes ideal experimental designs that are (nearly) orthogonal, and a multicollinear state space will not be appropriately represented by such ideal experimental designs. While one could directly build approximations over the multicollinear state space, the issue of unstable model approximations remains unaddressed. Our approach for handling multicollinearity employs data mining methods for two purposes: (1) to reduce the dimensionality of a DP problem and (2) to orthogonalize a multicollinear DP state space and enable the use of a computationally-efficient DACE-based ADP approach. Our results demonstrate the risk of ignoring high multicollinearity, quantified by high variance inflation factors representing model instability. Our comparisons using an air quality ozone pollution case study provide guidance on combining feature selection and feature extraction to guarantee orthogonality while achieving over 95% dimension reduction and good model accuracy. 相似文献

2.

Least squares approximate policy iteration for learning bid prices in choice-based revenue management

《Computers & Operations Research》2017

We consider the revenue management problem of capacity control under customer choice behavior. An exact solution of the underlying stochastic dynamic program is difficult because of the multi-dimensional state space and, thus, approximate dynamic programming (ADP) techniques are widely used. The key idea of ADP is to encode the multi-dimensional state space by a small number of basis functions, often leading to a parametric approximation of the dynamic program’s value function. In general, two classes of ADP techniques for learning value function approximations exist: mathematical programming and simulation. So far, the literature on capacity control largely focuses on the first class.In this paper, we develop a least squares approximate policy iteration (API) approach which belongs to the second class. Thereby, we suggest value function approximations that are linear in the parameters, and we estimate the parameters via linear least squares regression. Exploiting both exact and heuristic knowledge from the value function, we enforce structural constraints on the parameters to facilitate learning a good policy. We perform an extensive simulation study to investigate the performance of our approach. The results show that it is able to obtain competitive revenues compared to and often outperforms state-of-the-art capacity control methods in reasonable computational time. Depending on the scarcity of capacity and the point in time, revenue improvements of around 1% or more can be observed. Furthermore, the proposed approach contributes to simulation-based ADP, bringing forth research on numerically estimating piecewise linear value function approximations and their application in revenue management environments. 相似文献

3.

Kernel-based reinforcement learning in average-cost problems

Ormoneit D. Glynn P. 《Automatic Control, IEEE Transactions on》2002,47(10):1624-1636

Reinforcement learning (RL) is concerned with the identification of optimal controls in Markov decision processes (MDPs) where no explicit model of the transition probabilities is available. We propose a class of RL algorithms which always produces stable estimates of the value function. In detail, we use "local averaging" methods to construct an approximate dynamic programming (ADP) algorithm. Nearest-neighbor regression, grid-based approximations, and trees can all be used as the basis of this approximation. We provide a thorough theoretical analysis of this approach and we demonstrate that ADP converges to a unique approximation in continuous-state average-cost MDPs. In addition, we prove that our method is consistent in the sense that an optimal approximate strategy is identified asymptotically. With regard to a practical implementation, we suggest a reduction of ADP to standard dynamic programming in an artificial finite-state MDP. 相似文献

4.

Optimal Constrained Self-learning Battery Sequential Management in Microgrid Via Adaptive Dynamic Programming

下载免费PDF全文

Qinglai Wei Derong Liu Yu Liu Ruizhuo Song 《IEEE/CAA Journal of Automatica Sinica》2017,4(2):168-176

This paper concerns a novel optimal self-learning battery sequential control scheme for smart home energy systems. The main idea is to use the adaptive dynamic programming (ADP) technique to obtain the optimal battery sequential control iteratively. First, the battery energy management system model is established, where the power efficiency of the battery is considered. Next, considering the power constraints of the battery, a new non-quadratic form performance index function is established, which guarantees that the value of the iterative control law cannot exceed the maximum charging/discharging power of the battery to extend the service life of the battery. Then, the convergence properties of the iterative ADP algorithm are analyzed, which guarantees that the iterative value function and the iterative control law both reach the optimums. Finally, simulation and comparison results are given to illustrate the performance of the presented method. 相似文献

5.

A dynamic programming approximation for downlink channel allocation in cognitive femtocell networks

Xudong Xiang Jianxiong Wan Chuang Lin Xin Chen 《Computer Networks》2013,57(15):2976-2991

Both femtocells and cognitive radio (CR) are envisioned as promising technologies for the NeXt Generation (xG) cellular networks. Cognitive femtocell networks (CogFem) incorporate CR technology into femtocell deployment to reduce its demand for more spectrum bands, thereby improving the spectrum utilization. In this paper, we focus on the channel allocation problem in CogFem, and formulate it as a stochastic dynamic programming (SDP) problem aiming at optimizing the long-term cumulative system throughput of individual femtocells. However, the multi-dimensional state variables resulted from complex exogenous stochastic information make the SDP problem computationally intractable using standard value iteration algorithms. To address this issue, we propose an approximate dynamic programming (ADP) algorithm in pursuit of an approximate solution to the SDP problem. The proposed ADP algorithm relies on an efficient value function approximation (VFA) architecture that we design and a stochastic gradient learning strategy to function, enabling each femtocell to learn and improve its own channel allocation policy. The algorithm is computationally attractive for large-scale downlink channel allocation problems in CogFem since its time complexity does not grow exponentially with the number of femtocells. Simulation results have shown that the proposed ADP algorithm exhibits great advantages: (1) it is feasible for online implementation with a fair rate of convergence and adaptability to both long-term and short-term network dynamics; and (2) it produces high-quality solutions fast, reaching approximately 80% of the upper bounds provided by optimal backward dynamic programming (DP) solutions to a set of deterministic counterparts of the formulated SDP problem. 相似文献

6.

基于数据的自学习优化控制:研究进展与展望

刘德荣李宏亮王鼎《自动化学报》2013,39(11):1858-1870

自适应动态规划(Adaptive dynamic programming, ADP)方法可以解决传统动态规划中的"维数灾"问题, 已经成为控制理论和计算智能领域最新的研究热点. ADP方法采用函数近似结构来估计系统性能指标函数, 然后依据最优性原理来获得近优的控制策略. ADP是一种具有学习和优化能力的智能控制方法, 在求解复杂非线性系统的最优控制问题中具有极大的潜力. 本文对ADP的理论研究、算法实现、相关应用等方面进行了全面的梳理, 涵盖了最新的研究进展, 并对ADP的未来发展趋势进行了分析和展望. 相似文献

7.

Choice of approximator and design of penalty function for an approximate dynamic programming based control approach 总被引：2，自引：1，他引：1

Jong Min Lee Niket S. Kaisare Jay H. Lee 《Journal of Process Control》2006,16(2):135-156

This paper investigates the choice of function approximator for an approximate dynamic programming (ADP) based control strategy. The ADP strategy allows the user to derive an improved control policy given a simulation model and some starting control policy (or alternatively, closed-loop identification data), while circumventing the ‘curse-of-dimensionality’ of the traditional dynamic programming approach. In ADP, one fits a function approximator to state vs. ‘cost-to-go’ data and solves the Bellman equation with the approximator in an iterative manner. A proper choice and design of function approximator is critical for convergence of the iteration and the quality of final learned control policy, because an approximation error can grow quickly in the loop of optimization and function approximation. Typical classes of approximators used in related approaches are parameterized global approximators (e.g. artificial neural networks) and nonparametric local averagers (e.g. k-nearest neighbor). In this paper, we assert on the basis of some case studies and a theoretical result that a certain type of local averagers should be preferred over global approximators as the former ensures monotonic convergence of the iteration. However, a converged cost-to-go function does not necessarily lead to a stable control policy on-line due to the problem of over-extrapolation. To cope with this difficulty, we propose that a penalty term be included in the objective function in each minimization to discourage the optimizer from finding a solution in the regions of state space where the local data density is inadequately low. A nonparametric density estimator, which can be naturally combined with a local averager, is employed for this purpose. 相似文献

8.

The computational order of a DACE dynamical model

Dorin Drignei 《Computational statistics & data analysis》2007,51(8):3654-3665

Multivariate design and analysis of computer experiments (DACE) methodology can be useful in situations where a dynamical computer model produces time series data sets. The main result of this paper determines the computational order of prediction from a dynamical statistical model underpinned by the dynamical computer model. Furthermore, it is shown that the computational orders of predictions from this dynamical statistical model and from a black box statistical model are comparable, but the likelihood optimization of the former model is more efficient. A virus dynamics example shows that the dynamical statistical model predictions can be more accurate than both the black box statistical model predictions and a coarse numerical solution of similar computational order. 相似文献

9.

Low-discrepancy sampling for approximate dynamic programming with local approximators

《Computers & Operations Research》2014

Approximate dynamic programming (ADP) relies, in the continuous-state case, on both a flexible class of models for the approximation of the value functions and a smart sampling of the state space for the numerical solution of the recursive Bellman equations. In this paper, low-discrepancy sequences, commonly employed for number-theoretic methods, are investigated as a sampling scheme in the ADP context when local models, such as the Nadaraya–Watson (NW) ones, are employed for the approximation of the value function. The analysis is carried out both from a theoretical and a practical point of view. In particular, it is shown that the combined use of low-discrepancy sequences and NW models enables the convergence of the ADP procedure. Then, the regular structure of the low-discrepancy sampling is exploited to derive a method for automatic selection of the bandwidth of NW models, which yields a significant saving in the computational effort with respect to the standard cross validation approach. Simulation results concerning an inventory management problem are presented to show the effectiveness of the proposed techniques. 相似文献

10.

DHP算法优化控制开发工具的设计与实现

宋绍剑王琦廖碧莲林小峰《计算机工程与设计》2011,32(12)

针对目前自适应动态规划的研究中大多采用Matlab语言编程,存在代码执行效率低、使用不灵活等缺点,采用面向对象的编程技术和C++语言设计了一种基于双启发式动态规划算法(DHP)和执行依赖双启发式动态规划(ADDHP)算法的优化控制开发工具,介绍了开发工具的设计开发思路、工作流程、功能特点、主要界面和使用方法.将该工具用于三容液位实验装置的实时仿真控制,实验结果表明,该开发工具具有使用灵活、训练速度快、执行效率高等优点. 相似文献

11.

Approximate dynamic programming approach for process control

Jay H. Lee Weechin Wong 《Journal of Process Control》2010,20(9):1038-1048

We assess the potentials of the approximate dynamic programming (ADP) approach for process control, especially as a method to complement the model predictive control (MPC) approach. In the artificial intelligence (AI) and operations research (OR) research communities, ADP has recently seen significant activities as an effective method for solving Markov decision processes (MDPs), which represent a type of multi-stage decision problems under uncertainty. Process control problems are similar to MDPs with the key difference being the continuous state and action spaces as opposed to discrete ones. In addition, unlike in other popular ADP application areas like robotics or games, in process control applications first and foremost concern should be on the safety and economics of the on-going operation rather than on efficient learning. We explore different options within ADP design, such as the pre-decision state vs. post-decision state value function, parametric vs. nonparametric value function approximator, batch-mode vs. continuous-mode learning, and exploration vs. robustness. We argue that ADP possesses great potentials, especially for obtaining effective control policies for stochastic constrained nonlinear or linear systems and continually improving them towards optimality. 相似文献

12.

Adaptive dynamic programming and optimal control of nonlinear nonaffine systems 总被引：1，自引：0，他引：1

Tao Bian Yu Jiang Zhong-Ping Jiang 《Automatica》2014

In this paper, a novel optimal control design scheme is proposed for continuous-time nonaffine nonlinear dynamic systems with unknown dynamics by adaptive dynamic programming (ADP). The proposed methodology iteratively updates the control policy online by using the state and input information without identifying the system dynamics. An ADP algorithm is developed, and can be applied to a general class of nonlinear control design problems. The convergence analysis for the designed control scheme is presented, along with rigorous stability analysis for the closed-loop system. The effectiveness of this new algorithm is illustrated by two simulation examples. 相似文献

13.

自适应动态规划综述 总被引：10，自引：14，他引：10

张化光张欣罗艳红杨珺《自动化学报》2013,39(4):303-311

自适应动态规划(Adaptive dynamic programming, ADP)是最优控制领域新兴起的一种近似最优方法, 是当前国际最优化领域的研究热点. ADP方法利用函数近似结构来近似哈密顿--雅可比--贝尔曼(Hamilton-Jacobi-Bellman, HJB)方程的解, 采用离线迭代或者在线更新的方法, 来获得系统的近似最优控制策略, 从而能够有效地解决非线性系统的优化控制问题. 本文按照ADP的结构变化、算法的发展和应用三个方面介绍ADP方法. 对目前ADP方法的研究成果加以总结, 并对这一研究领域仍需解决的问题和未来的发展方向作了进一步的展望. 相似文献

14.

Neural network and regression spline value function approximations for stochastic dynamic programming

Cristiano Cervellera Aihong Wen Victoria C.P. Chen 《Computers & Operations Research》2007

Dynamic programming is a multi-stage optimization method that is applicable to many problems in engineering. A statistical perspective of value function approximation in high-dimensional, continuous-state stochastic dynamic programming (SDP) was first presented using orthogonal array (OA) experimental designs and multivariate adaptive regression splines (MARS). Given the popularity of artificial neural networks (ANNs) for high-dimensional modeling in engineering, this paper presents an implementation of ANNs as an alternative to MARS. Comparisons consider the differences in methodological objectives, computational complexity, model accuracy, and numerical SDP solutions. Two applications are presented: a nine-dimensional inventory forecasting problem and an eight-dimensional water reservoir problem. Both OAs and OA-based Latin hypercube experimental designs are explored, and OA space-filling quality is considered. 相似文献

15.

Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach

Qinglai Wei Derong Liu Yancai Xu 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2016,20(2):697-706

In this paper, a novel value iteration adaptive dynamic programming (ADP) algorithm, called “generalized value iteration ADP” algorithm, is developed to solve infinite horizon optimal tracking control problems for a class of discrete-time nonlinear systems. The developed generalized value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. Convergence property is developed to guarantee that the iterative performance index function will converge to the optimum. Neural networks are used to approximate the iterative performance index function and compute the iterative control policy, respectively, to implement the iterative ADP algorithm. Finally, a simulation example is given to illustrate the performance of the developed algorithm. 相似文献

16.

强化学习与自适应动态规划:从基础理论到多智能体系统中的应用进展综述

温广辉杨涛周佳玲付俊杰徐磊《控制与决策》2023,38(5):1200-1230

近年来,强化学习与自适应动态规划算法的迅猛发展及其在一系列挑战性问题(如大规模多智能体系统优化决策和最优协调控制问题)中的成功应用,使其逐渐成为人工智能、系统与控制和应用数学等领域的研究热点.鉴于此,首先简要介绍强化学习和自适应动态规划算法的基础知识和核心思想,在此基础上综述两类密切相关的算法在不同研究领域的发展历程,着重介绍其从应用于单个智能体(控制对象)序贯决策(最优控制)问题到多智能体系统序贯决策(最优协调控制)问题的发展脉络和研究进展.进一步,在简要介绍自适应动态规划算法的结构变化历程和由基于模型的离线规划到无模型的在线学习发展演进的基础上,综述自适应动态规划算法在多智能体系统最优协调控制问题中的研究进展.最后,给出多智能体强化学习算法和利用自适应动态规划求解多智能体系统最优协调控制问题研究中值得关注的一些挑战性课题. 相似文献

17.

A stochastic dual dynamic programming method for two-stage distributionally robust optimization problems

Xiaojiao Tong Xiao Luo Bo Rao 《Optimization methods & software》2020,35(5):1002-1021

This paper studies a class of two-stage distributionally robust optimization (TDRO) problems which comes from many practical application fields. In order to set up some implementable solution method, we first transfer the TDRO problem to its equivalent robust counterpart (RC) by the duality theorem of optimization. The RC reformulation of TDRO is a semi-infinite stochastic programming. Then we construct a conditional value-at-risk-based sample average approximation model for the RC problem. Furthermore, we analyse the error bound of the approximation model and obtain the convergent results with respect to optimal value and optimal solution set. Finally, a so-called stochastic dual dynamic programming approach is proposed to solve the approximate model. Numerical results validate the solution approach of this paper. 相似文献

18.

基于回归正交试验设计的弹翼结构优化设计 总被引：2，自引：0，他引：2

金华戴金海陈琪锋《计算机仿真》2007,24(10):42-44,130

如何在设计空间域中寻找翼型的最优尺寸,是优化导弹战术技术性能指标的有效有段.卷弧翼弹弹翼结构优化设计涉及到多因素多水平问题,采用回归正交试验设计方法可以有效地减少试验次数,并进而完成结构优化设计.文中通过回归正交试验设计方法建立了卷弧翼弹弹翼结构设计的一阶响应面模型,并进行了方差分析,采用SQP算法对有约束的非线性规化问题进行寻优计算,得到了翼型的一阶响应面模型的最优解.所采用的方法和思想也适用于一般导弹外形的结构优化设计. 相似文献

19.

Finite horizon optimal control of non-linear discrete-time switched systems using adaptive dynamic programming with ε-error bound

Chunbin Qin Yanhong Luo Binrui Wang 《International journal of systems science》2014,45(8):1683-1693

In this paper, we aim to solve the finite-horizon optimal control problem for a class of non-linear discrete-time switched systems using adaptive dynamic programming(ADP) algorithm. A new ε-optimal control scheme based on the iterative ADP algorithm is presented which makes the value function converge iteratively to the greatest lower bound of all value function indices within an error according to ε within finite time. Two neural networks are used as parametric structures to implement the iterative ADP algorithm with ε-error bound, which aim at approximating the value function and the control policy, respectively. And then, the optimal control policy is obtained. Finally, a simulation example is included to illustrate the applicability of the proposed method. 相似文献

20.

The methodology of knowledge acquisition and modeling for troubleshooting in automotive braking system

Janus S. Liang 《Robotics and Computer》2012,28(1):24-34

相似文献