首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
This paper studies identification of systems in which only quantized output observations are available. An identification algorithm for system gains is introduced that employs empirical measures from multiple sensor thresholds and optimizes their convex combinations. Strong convergence of the algorithm is first derived. The algorithm is then extended to a scenario of system identification with communication constraints, in which the sensor output is transmitted through a noisy communication channel and observed after transmission. The main results of this paper demonstrate that these algorithms achieve the Cramér-Rao lower bounds asymptotically, and hence are asymptotically efficient algorithms. Furthermore, under some mild regularity conditions, these optimal algorithms achieve error bounds that approach optimal error bounds of linear sensors when the number of thresholds becomes large. These results are further extended to finite impulse response and rational transfer function models when the inputs are designed to be periodic and full rank.  相似文献   

3.

In an emerging paradigm, design is viewed as a sequential decision process (SDP) in which mathematical models of increasing fidelity are used in a sequence to systematically contract sets of design alternatives. The key idea behind SDP is to sequence models of increasing fidelity to provide sequentially tighter bounds on the decision criteria thereby removing inefficient designs from the tradespace with the guarantee that the antecedent model only removes design solutions that are dominated when analyzed using the more detailed, high-fidelity model. In general, efficiency in the SDP is achieved by using less expensive (low-fidelity) models early in the design process, before using high-fidelity models later on in the process. However, the set of multi-fidelity models and discrete decision states result in a combinatorial combination of model sequences, some of which require significantly fewer model evaluations than others. Unfortunately, the optimal modeling policy can not be determined at the onset of the SDP because the computational costs of executing all models on all designs and the discriminatory power of the resulting bounds are unknown. In this paper, the model selection problem is formulated as a finite Markov decision process (MDP) and an online reinforcement learning (RL) algorithm, namely, Q-learning, is used to obtain and follow an approximately optimal modeling policy, thereby overcoming the optimal modeling policy limitation of the current SDP. The outcome is a Reinforcement Learning based Design (RL-D) methodology able to learn efficient sequencing of models from sample estimates of the computational cost and discriminatory power of different models while analyzing design alternatives in the tradespace throughout the design process. Through application to two different design examples, the RL-D is shown to (1) effectively identify the approximate optimal modeling policy and (2) efficiently converge upon a choice set.

  相似文献   

4.
This paper propose a fuzzy concept of return cost of Markov Decision Process (MDP) model which is an application of dynamic programming to the solution of probabilistic decision process. The return structure of the process is measured by Triangular Fuzzy Number (TFN). The comparison method is based on the ranking method.

The goal of this research is to provide the optimal solution for a finite stage and infinite stage which can be manipulated to study the real-world situation for the purpose of aiding the decision maker [6,7].  相似文献   


5.
为合理设计废旧家电回收物流网络,结合展望理论和几何平均合成方法构造消费者交付满意度函数,建立该类网络的选址-定价联合决策模型。利用该模型可以确定收集点和回收中心的开设位置、收集点的废旧家电收购价格以及消费者的满意收集点。为便于模型求解,提出嵌入贪婪算法和线性规划精确算法的混合遗传算法,介绍算法的实现步骤。最后通过算例验证模型及算法的有效性。  相似文献   

6.
In this paper, the performance of punctured convolutional codes of short constraint lengths is discussed. The punctured codes are used to provide error protection to a particular user in an asynchronous code division multiple access (A-CDMA) system. Perfect channel estimation is assumed at the receiver. A slow fading Rician or Rayleigh channel is assumed. Maximum likelihood decoding through a Viterbi algorithm is used to decode the received symbols. Soft decision decoding for perfect phase tracking of the received signal is considered. Analytical bounds, which are useful in predicting the performance of the A-CDMA system are derived and plotted for the cases of infinite and finite channel memory. The upper bounds with Viterbi decoding are derived and plotted for the various punctured codes considered. The simulated results are found to agree very well with their upper bounds and predicted results.  相似文献   

7.
Near-Optimal Reinforcement Learning in Polynomial Time   总被引:1,自引:0,他引:1  
Kearns  Michael  Singh  Satinder 《Machine Learning》2002,49(2-3):209-232
We present new algorithms for reinforcement learning and prove that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes. After observing that the number of actions required to approach the optimal return is lower bounded by the mixing time T of the optimal policy (in the undiscounted case) or by the horizon time T (in the discounted case), we then give algorithms requiring a number of actions and total computation time that are only polynomial in T and the number of states and actions, for both the undiscounted and discounted cases. An interesting aspect of our algorithms is their explicit handling of the Exploration-Exploitation trade-off.  相似文献   

8.
基于每阶段平均费用最优的激励学习算法   总被引:4,自引:0,他引:4  
文中利用求解最优费用函数的方法给出了一种新的激励学习算法,即基于每阶段平均费用最优的激励学习算法。这种学习算法是求解信息不完全Markov决策问题的一种有效激励学习方法,它从求解分阶段最优平均费用函数的方法出发,分析了最优解的存在性,分阶段最优平均费用函数与初始状态的关系以及与之相关的Bellman方程。这种方法的建立,可以使得动态规划(DP)算法中的许多结论直接应用到激励学习的研究中来。  相似文献   

9.
In this note, we present a sampling algorithm, called recursive automata sampling algorithm (RASA), for control of finite-horizon Markov decision processes (MDPs). By extending in a recursive manner Sastry's learning automata pursuit algorithm designed for solving nonsequential stochastic optimization problems, RASA returns an estimate of both the optimal action from a given state and the corresponding optimal value. Based on the finite-time analysis of the pursuit algorithm by Rajaraman and Sastry, we provide an analysis for the finite-time behavior of RASA. Specifically, for a given initial state, we derive the following probability bounds as a function of the number of samples: 1) a lower bound on the probability that RASA will sample the optimal action and 2) an upper bound on the probability that the deviation between the true optimal value and the RASA estimate exceeds a given error.  相似文献   

10.
基于点的值迭代方法是求解部分可观测马尔科夫决策过程(POMDP)问题的一类有效算法.目前基于点的值迭代算法大都基于单一启发式标准探索信念点集,从而限制算法效果.基于此种情况,文中提出基于杂合标准探索信念点集的值迭代算法(HHVI),可以同时维持值函数的上界和下界.在扩展探索点集时,选取值函数上下界差值大于阈值的信念点进行扩展,并且在值函数上下界差值大于阈值的后继信念点中选择与已探索点集距离最远的信念点进行探索,保证探索点集尽量有效分布于可达信念空间内.在4个基准问题上的实验表明,HHVI能保证收敛效率,并能收敛到更好的全局最优解.  相似文献   

11.
The computation of ϵ-optimal policies for continuous time Markov decision processes (CTMDPs) over finite time intervals is a sophisticated problem because the optimal policy may change at arbitrary times. Numerical algorithms based on time discretization or uniformization have been proposed for the computation of optimal policies. The uniformization based algorithm has shown to be more reliable and often also more efficient but is currently only available for processes where the gain or reward does not depend on the decision taken in a state. In this paper, we present two new uniformization based algorithms for computing ϵ-optimal policies for CTMDPs with decision dependent rewards over a finite time horizon. Due to a new and tighter upper bound the newly proposed algorithms cannot only be applied for decision dependent rewards, they also outperform the available approach for rewards that do not depend on the decision. In particular for models where the policy only rarely changes, optimal policies can be computed much faster.  相似文献   

12.
In this paper, we explore the demand function follows the product-life-cycle shape for the decision maker to determine the optimal number of inventory replenishments and the corresponding optimal inventory replenishment time points in the finite planning horizon. The objective function of the total relevant costs considered in our model is mathematically formulated as a mixed-integer nonlinear programming problem. A complete search procedure is provided to find the optimal solution by employing the properties derived in this paper and the Nelder–Mead algorithm. Also, based on the search procedure developed in this paper, a decision support system is implemented on a personal computer to solve the proposed problem.  相似文献   

13.
14.
针对我国新零售模式的快速发展,消费者对生鲜产品需求与退货的模糊不确定性问题,考虑最低物流总成本、最佳设施选址以及最优配送车辆运输路径的决策,构建了新零售下生鲜产品闭环物流网络模糊规划模型。为求解该模型,将需求量与退货量看成三角模糊参数,利用模糊机会约束方法将模糊约束转化为等价的清晰条件。以上海市某生鲜电商企业为实例,通过置信水平的敏感性分析以及遗传算法与粒子群算法的双求解,验证了模型的有效性与可行性,进而为相关决策者提供了借鉴。  相似文献   

15.
This paper investigates the limit behavior of Markov decision processes made of independent objects evolving in a common environment, when the number of objects (N) goes to infinity. In the finite horizon case, we show that when the number of objects becomes large, the optimal cost of the system converges to the optimal cost of a discrete time system that is deterministic. Convergence also holds for optimal policies. We further provide bounds on the speed of convergence by proving second order results that resemble central limits theorems for the cost and the state of the Markov decision process, with explicit formulas for the limit. These bounds (of order \(1/\sqrt{N}\)) are proven to be tight in a numerical example. One can even go further and get convergence of order \(\sqrt{\log N}/N\) to a stochastic system made of the mean field limit and a Gaussian term. Our framework is applied to a brokering problem in grid computing. Several simulations with growing numbers of processors are reported. They compare the performance of the optimal policy of the limit system used in the finite case with classical policies by measuring its asymptotic gain. Several extensions are also discussed. In particular, for infinite horizon cases with discounted costs, we show that first order limits hold and that second order results also hold as long as the discount factor is small enough. As for infinite horizon cases with non-discounted costs, examples show that even the first order limits may not hold.  相似文献   

16.
Almost every engineering and manufacturing system consists of several subsystems, which are in general nonidentical and are subjected to stochastic failures and repairs. The system success logic can be represented using a combinatorial reliability model in terms of the states of subsystems, where as the success logic of each subsystem can be represented using a k-out-of- n structure. The long run cost associated with the downtime can be lowered by adding additional spares in each subsystem, which in turn can increase the operational and maintenance costs. Thus, it is desirable to find the optimal number of components in each subsystem that minimizes the overall cost associated with the system. The main contributions of this paper are the following: 1) formulation of an average cost function of complex repairable systems and 2) development of a new method to obtain tighter bounds for the optimal number of spares for each subsystem. The tighter bounds are extremely useful to reduce the search space and hence improve the efficiency of the optimization algorithm. With the proposed bounds, for a series system consisting of m parallel subsystems, the computational complexity to find the near optimal solution, which is the optimal solution in most cases, is O(m)  相似文献   

17.
《Automatica》2014,50(11):2943-2950
In this paper, an economic model predictive control algorithm is proposed which ensures satisfaction of transient average constraints, i.e., constraints on input and state variables averaged over some finite time period. We believe that this stricter form of average constraints (compared to previously proposed asymptotic average constraints) is of independent interest in various applications such as the operation of a chemical reactor, where e.g. the amount of inflow or the heat flux during some fixed period of time must not exceed a certain value. Besides guaranteeing fulfillment of transient average constraints for the closed-loop system, we show that closed-loop average performance bounds and convergence results established in the setting of asymptotic average constraints also hold in case of transient average constraints. Furthermore, we illustrate our results with a chemical reactor example.  相似文献   

18.
Optimal Choice of AR and MA Parts in Autoregressive Moving Average Models   总被引:2,自引:0,他引:2  
This paper deals with the Bayesian method of choosing the best model for a given one-dimensional series among a finite number of candidates belonging to autoregressive (AR), moving average (MA), ARMA, and other families. The series could be either a sequence of observations in time as in speech applications, or a sequence of pixel intensities of a two-dimensional image. The observation set is not restricted to be Gaussian. We first derive an optimum decision rule for assigning the given observation set to one of the candidate models so as to minimize the average probability of error in the decision. We also derive an optimal decision rule so as to minimize the average value of the loss function. Then we simplify the decision rule when the candidate models are different Gaussian ARMA models of different orders. We discuss the consistency of the optimal decision rule and compare it with the other decision rules in the literature for comparing dynamical models.  相似文献   

19.
The average cost optimal control problem is addressed for Markov decision processes with unbounded cost. It is found that the policy iteration algorithm generates a sequence of policies which are c-regular, where c is the cost function under consideration. This result only requires the existence of an initial c-regular policy and an irreducibility condition on the state space. Furthermore, under these conditions the sequence of relative value functions generated by the algorithm is bounded from below and “nearly” decreasing, from which it follows that the algorithm is always convergent. Under further conditions, it is shown that the algorithm does compute a solution to the optimality equations and hence an optimal average cost policy. These results provide elementary criteria for the existence of optimal policies for Markov decision processes with unbounded cost and recover known results for the standard linear-quadratic-Gaussian problem. In particular, in the control of multiclass queueing networks, it is found that there is a close connection between optimization of the network and optimal control of a far simpler fluid network model  相似文献   

20.
We present Byzantine Disk Paxos, an asynchronous shared-memory consensus algorithm that uses a collection of n < 3t disks, t of which may fail by becoming non-responsive or arbitrarily corrupted. We give two constructions of this algorithm; that is, we construct two different t-tolerant (i.e., tolerating up to t disk failures) building blocks, each of which can be used, along with a leader oracle, to solve consensus. One building block is a t-tolerant wait-free shared safe register. The second building block is a t-tolerant regular register that satisfies a weaker termination (liveness) condition than wait freedom: its write operations are wait-free, whereas its read operations are guaranteed to return only in executions with a finite number of writes. We call this termination condition finite writes (FW), and show that wait-free consensus is solvable with FW-terminating registers and a leader oracle. We construct each of these t-tolerant registers from n < 3t base registers, t of which can be non-responsive or Byzantine. All the previous t-tolerant wait-free constructions in this model used at least 4t + 1 fault-prone registers, and we are not familiar with any prior FW-terminating constructions in this model. We further show tight lower bounds on the number of invocation rounds required for optimal resilience reliable register constructions, or more generally, constructions that use less than 4t + 1 fault-prone registers. Our lower bounds show that such constructions are inherently more costly than constructions that use 4t + 1 registers, and that our constructions have optimal round complexity. Furthermore, our wait-free construction is early-stopping, and it achieves the optimal round complexity with any number of actual failures. A preliminary version of this paper, by the same authors and with the same title, appears in Proceedings of the 23rd ACM Symposium on Principles of Distributed Computing (PODC ’04), July 2004, pages 226–235.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号