首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
This paper develops and compares different local search algorithms for the no-wait flow-shop problem with makespan criterion (Cmax). We present several variants of descending search and tabu search algorithms. In the algorithms the multimoves are used that consist in performing several moves simultaneously in a single iteration of algorithm and allow us to accelerate the convergence to good solutions. Besides, in the tabu search algorithms a dynamic tabu list is proposed that assists additionally to avoid trapped at a local optimum. The proposed algorithms are empirically evaluated and found to be relatively more effective in finding better quality solutions than existing algorithms. The presented ideas can be applied in any local search procedures.  相似文献   

2.
This article proposes three novel time-varying policy iteration algorithms for finite-horizon optimal control problem of continuous-time affine nonlinear systems. We first propose a model-based time-varying policy iteration algorithm. The method considers time-varying solutions to the Hamiltonian–Jacobi–Bellman equation for finite-horizon optimal control. Based on this algorithm, value function approximation is applied to the Bellman equation by establishing neural networks with time-varying weights. A novel update law for time-varying weights is put forward based on the idea of iterative learning control, which obtains optimal solutions more efficiently compared to previous works. Considering that system models may be unknown in real applications, we propose a partially model-free time-varying policy iteration algorithm that applies integral reinforcement learning to acquiring the time-varying value function. Moreover, analysis of convergence, stability, and optimality is provided for every algorithm. Finally, simulations for different cases are given to verify the convenience and effectiveness of the proposed algorithms.  相似文献   

3.
In this paper, we propose two multirate generalised policy iteration (GPI) algorithms applied to discrete-time linear quadratic regulation problems. The proposed algorithms are extensions of the existing GPI algorithm that consists of the approximate policy evaluation and policy improvement steps. The two proposed schemes, named heuristic dynamic programming (HDP) and dual HDP (DHP), based on multirate GPI, use multi-step estimation (M-step Bellman equation) at the approximate policy evaluation step for estimating the value function and its gradient called costate, respectively. Then, we show that these two methods with the same update horizon can be considered equivalent in the iteration domain. Furthermore, monotonically increasing and decreasing convergences, so called value iteration (VI)-mode and policy iteration (PI)-mode convergences, are proved to hold for the proposed multirate GPIs. Further, general convergence properties in terms of eigenvalues are also studied. The data-driven online implementation methods for the proposed HDP and DHP are demonstrated and finally, we present the results of numerical simulations performed to verify the effectiveness of the proposed methods.  相似文献   

4.
强化学习(Reinforcement Learning)是学习环境状态到动作的一种映射,并且能够获得最大的奖赏信号。强化学习中有三种方法可以实现回报的最大化:值迭代、策略迭代、策略搜索。该文介绍了强化学习的原理、算法,并对有环境模型和无环境模型的离散空间值迭代算法进行研究,并且把该算法用于固定起点和随机起点的格子世界问题。实验结果表明,相比策略迭代算法,该算法收敛速度快,实验精度好。  相似文献   

5.
This paper deals with inventory systems with limited resource for a single item or multiple items under continuous review (r, Q) policies. For the single-item system with a stochastic demand and limited resource, it is shown that an existing algorithm can be applied to find an optimal (r, Q) policy that minimizes the expected system costs. For the multi-item system with stochastic demands and limited resource commonly shared among all items, an optimization problem is formulated for finding optimal (r, Q) policies for all items, which minimize the expected system costs. Bounds on the parameters (i.e., r and Q) of the optimal policies and bounds on the minimum expected system costs are obtained. Based on the bounds, an algorithm is developed for finding an optimal or near-optimal solution. A method is proposed for evaluating the quality of the solution. It is shown that the algorithm proposed in this paper finds a solution that is (i) optimal/near-optimal and/or (ii) significantly better than the optimal solution with unlimited resource.  相似文献   

6.
The ability to produce join results before having read an entire input (early) reduces query response time. This is especially important for interactive applications, and for joins in mediator systems that may have to wait on network delays when reading the inputs. Although several early join algorithms have been proposed, there has been no formal treatment of how different reading policies affect the number of results produced. In this work, we show that alternate reading is optimal among fixed reading policies, and we provide expressions for the expected number of results produced over time. Further, we analyze policies that adapt their execution to the tuples already read and to the distribution of the inputs. We present a greedy, adaptive algorithm that is optimal in that it outperforms all reading policies, on average. However, the greedy policy is shown to perform only marginally better than the alternating policy. Thus, the alternating policy emerges as a policy that is easy to implement, requires no knowledge of the input distributions, is optimal among fixed policies, and is nearly optimal among all policies.  相似文献   

7.
An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding an optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies of a discrete Markov decision process. The k best policies, k?>?1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policies problem by using this reduction requires unreasonable amounts of time even when k?=?3. We then provide two new algorithms. The first is a complete algorithm, based on our theoretical contribution that the k-th best policy differs from the i-th policy, for some i?k, on exactly one state. The second is an approximate algorithm that skips many less useful policies. We show that both algorithms have good scalability. We also show that the approximate algorithms runs much faster and finds interesting, high-quality policies.  相似文献   

8.
We investigate the problem of scheduling n jobs in s-stage hybrid flowshops with parallel identical machines at each stage. The objective is to find a schedule that minimizes the sum of weighted completion times of the jobs. This problem has been proven to be NP-hard. In this paper, an integer programming formulation is constructed for the problem. A new Lagrangian relaxation algorithm is presented in which precedence constraints are relaxed to the objective function by introducing Lagrangian multipliers, unlike the commonly used method of relaxing capacity constraints. In this way the relaxed problem can be decomposed into machine type subproblems, each of which corresponds to a specific stage. A dynamic programming algorithm is designed for solving parallel identical machine subproblems where jobs may have negative weights. The multipliers are then iteratively updated along a subgradient direction. The new algorithm is computationally compared with the commonly used Lagrangian relaxation algorithms which, after capacity constraints are relaxed, decompose the relaxed problem into job level subproblems and solve the subproblems by using the regular and speed-up dynamic programming algorithms, respectively. Numerical results show that the new Lagrangian relaxation method produces better schedules in much shorter computation time, especially for large-scale problems.  相似文献   

9.
An M/G/1 queue where the server may take repeated vacations is considered. Whenever a busy period terminates, the server takes a vacation of random duration. At the end of each vacation, the server may either take a new vacation or resume service; if the queue is found empty, the server always takes a new vacation. The cost structure includes a holding cost per unit of time and per customer in the system and a cost each time the server is turned on. One discounted cost criterion and two average cost criteria are investigated. It is shown that the vacation policy that minimizes the discounted cost criterion over all policies (randomized, history dependent, etc.) converges to a threshold policy as the discount factor goes to zero. This result relies on a nonstandard use of the value iteration algorithm of dynamic programming and is used to prove that both average cost problems are minimized by a threshold policy  相似文献   

10.
Tabucol   is a tabu search algorithm that tries to determine whether the vertices of a given graph can be colored with a fixed number kk of colors such that no edge has both endpoints with the same color. This algorithm was proposed in 1987, one year after Fred Glover's article that launched tabu search. While more performing local search algorithms have now been proposed, Tabucol remains very popular and is often chosen as a subroutine in hybrid algorithms that combine a local search with a population based method. In order to explain this unfailing success, we make a thorough survey of local search techniques for graph coloring problems, and we point out the main differences between all these techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号