期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

神经元动态规划综述 总被引：1，自引：1，他引：0

金辉宇于海斌《信息与控制》2001,30(4):343-347

神经元动态规划是近年发展起来的一种优化方法．它采用计算机仿真和函数近似,简化对状态空间的搜索,可以有效克服“维数危机” ,有广阔的应用前景．本文对神经元动态规划作一综述,希望能对相关研究有所帮助．相似文献

2.

赵坤嵇启春李玲燕《计算机工程》2013,(12):242-246,254

针对未知环境下的机器人迷宫求解问题,提出一种动态离散势场路径规划算法。为提高路径优化性能,采用引入边界节点的栅格法建立模型,在各栅格的边界节点处定义障碍物状态和势场的数值大小,通过计算可连通相邻节点的累计代价值完成势场的构造。为提高寻优速度,随着环境信息的更新动态改变势场分布,沿势场下降最快方向获得实时重规划路径,引导机器人向目标运动,通过预规划路径的访问状态判断路径是否收敛,避免无用栅格的扩展。仿真实验结果表明,应用该算法可使机器人在复杂未知的迷宫环境中快速、高效地规划出一条折线少、转折角度小的优化路径。相似文献

3.

具有不确定性路径概率的闭排队网络鲁棒控制策略 总被引：1，自引：0，他引：1

唐昊奚宏生韩江洪袁继彬《自动化学报》2005,31(3):446-450

The paper is concerned with the robust control problems for exponential controlled closed queuing networks (CCQNs) under uncertain routing probabilities. As the rows of some parameter matrices such as infinitesimal generators may be dependent, we first transform the objective vector under discounted-cost criteria into a weighed-average cost. Through the solution to Poisson equation, i.e., Markov performance potentials, we then unify both discounted-cost and average-cost problems to study, and derive the gradient formula of the new objective function with respect to the routing probabilities. Some solution techniques are related for searching the optimal robust control policy. Finally, a numerical example is presented and analyzed. 相似文献

4.

基于动态人工势场法移动机器人路径规划研究 总被引：3，自引：1，他引：2

黄立新耿以才《计算机测量与控制》2017,25(2):40-40

考虑机器人与目标点的相对位置以及相对速度因素构建引力势场和引力函数,考虑机器人与障碍物之间的相对位置、相对速度以及相对加速度因素构建斥力势场和斥力函数。基于位置的“分而治之”策略,将机器人所处的环境分解成不同的情景,通过传感器获得周围环境信息,制定并执行情景-运动规则。建立复杂动态机器人仿真环境,验证改进后的算法在动态环境中机器人自动避障的可行性。以IN-RE机器人为实验平台,做动态环境下机器人自动避障路径规划实验,验证本文提出的动态人工势场法在动态环境中的可行性。相似文献

5.

基于动态模糊人工势场法的移动机器人路径规划 总被引：1，自引：1，他引：1

孟蕊苏维均连晓峰《计算机工程与设计》2010,31(7)

传统人工势场法在路径规划中存在局部极小值问题,而且不能满足动态环境中移动机器人路径规划对实时性、安全性和可达性的要求.针对传统人工势场法存在的问题,通过引入速度矢量,改势场力函数,并与模糊控制方法相结合,实时调节斥力势场系数,克服人工势场法的缺陷.在MATLAB平台中验证了方法的有效性,实验结果表明,该方法优于人工势场法模型的路径规划. 相似文献

6.

基于改进势场法的机械臂动态避障规划 总被引：2，自引：0，他引：2

谢龙刘山《控制理论与应用》2018,35(9):1239-1249

本文针对刚性机械臂提出了一种基于改进势场法的动态避障规划算法.势场产生只作用于机械臂末端的引力和作用于机械臂与障碍物最近点的排斥力,并依据动力学定律分别产生吸引速度和排斥速度.本文直接在笛卡尔空间中构造基于目标速度的吸引速度,使机械臂能够追踪动态目标;构造方向依据障碍物速度变化的排斥速度,使机械臂连杆也具备避开动态障碍的能力.最终两种速度在机械臂关节空间合成,为机械臂规划一条无碰路径并控制机械臂追踪动态目标.仿真结果证明了该方法的有效性. 相似文献

7.

动态环境下基于人工势场的移动机器人运动规划 总被引：13，自引：0，他引：13

韩永刘国栋《机器人》2006,28(1):45-49

分析了传统势场法在动态环境下的不足,并在此基础上引入了速度势场的概念,改进了传统的势场函数,推导出新的引力函数和斥力函数.在新的势场函数作用下机器人能够快速调整自身的速度大小和方向,使其快速脱离障碍物的威胁并能快速地到达目标或追踪目标.仿真实验验证了新的势场方法的有效性. 相似文献

8.

基于新人工势场函数的机器人动态避障规划 总被引：13，自引：0，他引：13

樊晓平李双艳陈特放《控制理论与应用》2005,22(5):703-707

人工势场法是进行机器人路径规划时常用的方法,但若用圆锥曲线函数作为引力场数学模型时,在目标点会产生抖动问题.本文在分析抖动产生原因的基础上,增加一个指数项到引力场函数中,从而消除了奇异值点,避免了抖动现象.然后将一敏感度参数引入斥力场函数,以便灵活控制运动过程中机器人与障碍物距离的大小.通过对敏感度的调节,还可以克服传统势场法中目标点在斥力作用范围内时,机器人无法到达目标点的缺陷.最后给出新势场法的仿真. 相似文献

9.

基于模糊神经网络势场法的机器人动态路径规划

周文明张崇巍《微型机与应用》2011,30(11):89-91

针对机器人局部路径规划的特点和传统人工势场理论存在不足的问题,采用改进的斥力势场函数,将机器人与目标的相对距离和速度考虑在内以解决局部最小值问题。引入神经网络模糊系统,兼顾了系统的鲁棒性和快速性,并在应用实例中得到了有效的验证。相似文献

10.

基于神经元动态规划的可重入生产系统调度的仿真框架

王颖朱顺痣许威缪克华李茂青《信息与控制》2007,36(2):218-223

提出一个基于神经元动态规划解决可重入生产系统调度问题的仿真框架．根据可重入生产系统的特点建立状态集，并将调度问题表示成相应的马尔可夫决策过程．选择合理的性能指标，采用神经元动态规划产生每一步的调度，并在仿真中优化策略．仿真算例验证了该方法的有效性，三种调度策略的结果比较表明了神经元动态规划方法的优越性．本仿真框架还可拓展至其他类型的生产调度问题．相似文献

11.

平均和折扣准则MDP基于TD(0)学习的统一NDP方法 总被引：3，自引：0，他引：3

唐昊周雷袁继彬《控制理论与应用》2006,23(2):292-296

为适应实际大规模M arkov系统的需要,讨论M arkov决策过程(MDP)基于仿真的学习优化问题.根据定义式,建立性能势在平均和折扣性能准则下统一的即时差分公式,并利用一个神经元网络来表示性能势的估计值,导出参数TD(0)学习公式和算法,进行逼近策略评估;然后,根据性能势的逼近值,通过逼近策略迭代来实现两种准则下统一的神经元动态规划(neuro-dynam ic programm ing,NDP)优化方法.研究结果适用于半M arkov决策过程,并通过一个数值例子,说明了文中的神经元策略迭代算法对两种准则都适用,验证了平均问题是折扣问题当折扣因子趋近于零时的极限情况. 相似文献

12.

随机平稳策略下半Markov决策过程的仿真优化算法

代桂平唐昊奚宏生《控制理论与应用》2006,23(4):547-551

基于性能势理论和等价Markov过程方法,研究了一类半Markov决策过程(SMDP)在参数化随机平稳策略下的仿真优化算法,并简要分析了算法的收敛性．通过SMDP的等价Markov过程,定义了一个一致化Markov链,然后根据该一致化Markov链的单个样本轨道来估计SMDP的平均代价性能指标关于策略参数的梯度,以寻找最优(或次优)策略．文中给出的算法是利用神经元网络来逼近参数化随机平稳策略,以节省计算机内存,避免了“维数灾”问题,适合于解决大状态空间系统的性能优化问题．最后给出了一个仿真实例来说明算法的应用．相似文献

13.

NDPS：一种无线多媒体网络分组调度算法 总被引：1，自引：1，他引：0

下载免费PDF全文

余荣贾志鹏梅顺良《计算机工程》2008,34(12):70-72

分组调度是实现未来无线多媒体网络的关键技术之一。解决该技术的主要困难在于无线链路的高差错率、业务类型的多样性和分组到达模型的未知性。该文引入马尔可夫决策过程对分组调度过程进行建模,运用神经动态规划方法求解相应的马尔可夫决策过程问题。提出一种无线多媒体网络分组调度算法(NDPS)可以同时实现3个性能目标：对不同业务类型提供差分服务,最大化无线带宽的利用率和保证服务公平性。仿真实验结果证明,NDPS算法比两种流行的调度算法具有更好的性能。相似文献

14.

Error bounds of optimization algorithms for semi-Markov decision processes

Tang Hao Yin Baoqun Xi Hongsheng 《International journal of systems science》2013,44(9):725-736

Cao's work shows that, by defining an α-dependent equivalent infinitesimal generator A _α, a semi-Markov decision process (SMDP) with both average- and discounted-cost criteria can be treated as an α-equivalent Markov decision process (MDP), and the performance potential theory can also be developed for SMDPs. In this work, we focus on establishing error bounds for potential and A _α-based iterative optimization methods. First, we introduce an α-uniformized Markov chain (UMC) for a SMDP via A _α and a uniformized parameter, and show their relations. Especially, we obtain that their performance potentials, as solutions of corresponding Poisson equations, are proportional, so that the studies of a SMDP and the α-UMC based on potentials are unified. Using these relations, we derive the error bounds for a potential-based policy-iteration algorithm and a value-iteration algorithm, respectively, when there exist various calculation errors. The obtained results can be applied directly to the special models, i.e., continuous-time MDPs and Markov chains, and can be extended to some simulation-based optimization methods such as reinforcement learning and neuro-dynamic programming, where estimation errors or approximation errors are common cases. Finally, we give an application example on the look-ahead control of a conveyor-serviced production station (CSPS), and show the corresponding error bounds. 相似文献

15.

CTMDP基于随机平稳策略的仿真优化算法 总被引：2，自引：2，他引：2

唐昊奚宏生殷保群《自动化学报》2004,30(2):229-234

基于Markov性能势理论和神经元动态规划(NDP)方法,研究一类连续时间Markov决策过程(MDP)在随机平稳策略下的仿真优化问题,给出的算法是把一个连续时间过程转换成其一致化Markov链,然后通过其单个样本轨道来估计平均代价性能指标关于策略参数的梯度,以寻找次优策略,该方法适合于解决大状态空间系统的性能优化问题.并给出了一个受控Markov过程的数值实例. 相似文献

16.

连续时间Markov决策过程在呼叫接入控制中的应用

周亚平奚宏生殷保群唐昊《控制与决策》2001,16(Z1):795-799

应用Markov决策过程与性能势相结合的方法,给出了呼叫接入控制的策略优化算法.所得到的最优策略是状态相关的策略,与基于节点已占用带宽决定行动的策略相比,状态相关策略具有更好的性能值,而且该算法具有很快的收敛速度. 相似文献

17.

Passage-time computation and aggregation strategies for large semi-Markov processes

Marcel C. Guenther Author VitaeNicholas J. DingleAuthor Vitae Jeremy T. Bradley Author VitaeWilliam J. Knottenbelt Author Vitae 《Performance Evaluation》2011,68(3):221-236

High-level semi-Markov modelling paradigms such as semi-Markov stochastic Petri nets and process algebras are used to capture realistic performance models of computer and communication systems but often have the drawback of generating huge underlying semi-Markov processes. Extraction of performance measures such as steady-state probabilities and passage-time distributions therefore relies on sparse matrix-vector operations involving very large transition matrices. Previous studies have shown that exact state-by-state aggregation of semi-Markov processes can be applied to reduce the number of states. This can, however, lead to a dramatic increase in matrix density caused by the creation of additional transitions between remaining states. Our paper addresses this issue by presenting the concept of state space partitioning for aggregation.We present a new deterministic partitioning method which we term barrier partitioning. We show that barrier partitioning is capable of splitting very large semi-Markov models into a number of partitions such that first passage-time analysis can be performed more quickly and using up to 99% less memory than existing algorithms. 相似文献

18.

Basic Ideas for Event-Based Optimization of Markov Systems 总被引：5，自引：0，他引：5

Xi-Ren?Cao Email author 《Discrete Event Dynamic Systems》2005,15(2):169-197

The goal of this paper is two-fold: First, we present a sensitivity point of view on the optimization of Markov systems. We show that Markov decision processes (MDPs) and the policy-gradient approach, or perturbation analysis (PA), can be derived easily from two fundamental sensitivity formulas, and such formulas can be flexibly constructed, by first principles, with performance potentials as building blocks. Second, with this sensitivity view we propose an event-based optimization approach, including the event-based sensitivity analysis and event-based policy iteration. This approach utilizes the special feature of a system characterized by events and illustrates how the potentials can be aggregated using the special feature and how the aggregated potential can be used in policy iteration. Compared with the traditional MDP approach, the event-based approach has its advantages: the number of aggregated potentials may scale to the system size despite that the number of states grows exponentially in the system size, this reduces the policy space and saves computation; the approach does not require actions at different states to be independent; and it utilizes the special feature of a system and does not need to know the exact transition probability matrix. The main ideas of the approach are illustrated by an admission control problem.Supported in part by a grant from Hong Kong UGC. 相似文献

19.

Feature-Based Methods for Large Scale Dynamic Programming 总被引：5，自引：0，他引：5

Tsitsiklis John N. Van Roy Benjamin 《Machine Learning》1996,22(1-3):59-94

We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches. 相似文献

20.

Approximate Dynamic Programming for Stochastic Resource Allocation Problems

下载免费PDF全文

Ali Forootani Raffaele Iervolino Massimo Tipaldi Joshua Neilson 《IEEE/CAA Journal of Automatica Sinica》2020,7(4):975-990

A stochastic resource allocation model, based on the principles of Markov decision processes (MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations (i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming (DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations, occurs. In particular, an approximate dynamic programming (ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach. 相似文献