首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
未知环境下基于有先验知识的滚动Q学习机器人路径规划   总被引:1,自引:0,他引:1  
胡俊  朱庆保 《控制与决策》2010,25(9):1364-1368
提出一种未知环境下基于有先验知识的滚动Q学习机器人路径规划算法.该算法在对Q值初始化时加入对环境的先验知识作为搜索启发信息,以避免学习初期的盲目性,可以提高收敛速度.同时,以滚动学习的方法解决大规模环境下机器人视野域范围有限以及因Q学习的状态空间增大而产生的维数灾难等问题.仿真实验结果表明,应用该算法,机器人可在复杂的未知环境中快速地规划出一条从起点到终点的优化避障路径,效果令人满意.  相似文献   

2.
针对传统Q-learning算法在复杂环境下移动机器人路径规划问题中容易产生维数灾难的问题,提出一种改进方法。该方法将深度学习融于Q-learming框架中,以网络输出代替Q值表,解决维数灾难问题。通过构建记忆回放矩阵和双层网络结构打断数据相关性,提高算法收敛性。最后,通过栅格法建立仿真环境建模,在不同复杂程度上的地图上进行仿真实验,对比实验验证了传统Q-learming难以在大状态空间下进行路径规划,深度强化学习能够在复杂状态环境下进行良好的路径规划。  相似文献   

3.
基本Q学习算法应用于路径规划时,动作选择的随机性导致算法前期搜索效率较低,规划耗时长,甚至不能找到完整的可行路径,故提出一种改进蚁群与动态Q学习融合的机器人路径规划算法.利用精英蚂蚁模型和排序蚂蚁模型的信息素增量机制,设计了一种新的信息素增量更新方法,以提高机器人的探索效率;利用改进蚁群算法的信息素矩阵为Q表赋值,以减少机器人初期的无效探索;设计了一种动态选择策略,同时提高收敛速度和算法稳定性.在不同障碍物等级的二维静态栅格地图下进行的仿真结果表明,所提方法能够有效减少寻优过程中的迭代次数与寻优耗时.  相似文献   

4.
基于Q学习的受灾路网抢修队调度问题建模与求解   总被引:1,自引:0,他引:1  
受损路网的修复是灾害应急响应中的一个重要环节, 主要研究如何规划道路抢修队的修复活动, 为灾后救援快速打通生命通道.本文首先构建了抢修队修复和路线规划的数学模型, 然后引入马尔科夫决策过程来模拟抢修队的修复活动, 并基于Q学习算法求解抢修队的最优调度策略.对比实验结果表明, 本文方法能够让抢修队从全局和长远角度实施受损路段的修复活动, 在一定程度上提高了运输效率和修复效率, 可以为政府实施应急救援和快速安全疏散灾民提供有益的参考.  相似文献   

5.
Reinforcement based mobile robot navigation in dynamic environment   总被引:1,自引:0,他引:1  
In this paper, a new approach is developed for solving the problem of mobile robot path planning in an unknown dynamic environment based on Q-learning. Q-learning algorithms have been used widely for solving real world problems, especially in robotics since it has been proved to give reliable and efficient solutions due to its simple and well developed theory. However, most of the researchers who tried to use Q-learning for solving the mobile robot navigation problem dealt with static environments; they avoided using it for dynamic environments because it is a more complex problem that has infinite number of states. This great number of states makes the training for the intelligent agent very difficult. In this paper, the Q-learning algorithm was applied for solving the mobile robot navigation in dynamic environment problem by limiting the number of states based on a new definition for the states space. This has the effect of reducing the size of the Q-table and hence, increasing the speed of the navigation algorithm. The conducted experimental simulation scenarios indicate the strength of the new proposed approach for mobile robot navigation in dynamic environment. The results show that the new approach has a high Hit rate and that the robot succeeded to reach its target in a collision free path in most cases which is the most desirable feature in any navigation algorithm.  相似文献   

6.
Finding effective ways to collect the usage of network resources in all kinds of applications to ensure a distributed control plane has become a key requirement to improve the controller’s decision making performance. This paper explores an efficient way in combining dynamic NetView sharing of distributed controllers with the behavior of intra-service resource announcements and processing requirements that occur in distributed controllers, and proposes a rapid multipathing distribution mechanism. Firstly, we establish a resource collecting model and prove that the prisoner’s dilemma problem exists in the distributed resource collecting process in the Software-defined Network (SDN). Secondly, we present a bypass path selection algorithm and a diffluence algorithm based on Q-learning to settle the above dilemma. At last, simulation results are given to prove that the proposed approach is competent to improve the resource collecting efficiency by the mechanism of self-adaptive path transmission ratio of our approach, which can ensure high utilization of the total network we set up.  相似文献   

7.
动态灾害环境下多对多物资配送路径规划问题具有重大的现实意义,它需要在路径规划的同时应对路网环境随时间的变化,并找到不同应急物资储备点、配送点之间的最佳对应关系,同时保证求解的时效性和成功率.目前的静态预案规划方法(SPO)和动态路径规划方法(DPO)难以确保动态灾害环境下求解效果的理论最优性,甚至可能导致部分配送点不能...  相似文献   

8.
搜索和救援优化算法(SAR)是2020年提出的模拟搜救行为的一种元启发式优化算法,用来解决工程中的约束优化问题.但是, SAR存在收敛慢、个体不能自适应选择操作等问题,鉴于此,提出一种新的基于强化学习改进的SAR算法(即RLSAR).该算法重新设计SAR的局部搜索和全局搜索操作,并增加路径调整操作,采用异步优势演员评论家算法(A3C)训练强化学习模型使得SAR个体获得自适应选择算子的能力.所有智能体在威胁区数量、位置和大小均随机生成的动态环境中训练,进而从每个动作的贡献、不同威胁区下规划出的路径长度和每个个体的执行操作序列3个方面对训练好的模型进行探索性实验.实验结果表明, RLSAR比标准SAR、差分进化算法、松鼠搜索算法具有更高的收敛速度,能够在随机生成的三维动态环境中成功地为无人机规划出更加经济且安全有效的可行路径,表明所提出算法可作为一种有效的无人机路径规划方法.  相似文献   

9.
针对网格服务的动态性、时序性和随机性,给出了一种基于Q-learning的动态网格服务选择方法,用于求解具有不完全信息的网格环境中的服务组合。对满足马尔可夫决策过程的服务组合提出了一种支持不完备信息描述的网格服务描述模型,实现了对服务组合整个生命周期的描述。提出了一种改进的Q-learning 算法,动态、自适应地对服务选择中不同选择进行预估,并给出不同情况下的最优选择决策。仿真实验表明了该方法较传统的贪心选择算法具有优越性与实用性。  相似文献   

10.
基于Q-learning的机会频谱接入信道选择算法   总被引:1,自引:0,他引:1  
针对未知环境下机会频谱接入的信道选择问题进行研究。将智能控制中的Q-learning理论应用于信道选择问题, 建立次用户信道选择模型, 提出了一种基于Q-learning的信道选择算法。该算法通过不断与环境进行交互和学习, 引导次用户尽量选择累积回报最大的信道, 最大化次用户吞吐量。引入Boltzmann学习规则在信道探索与利用之间获得折中。仿真结果表明, 与随机选择算法相比, 该算法在不需要信道环境先验知识或预测模型下, 能够自适应地选择可用性较好的信道, 有效提高次用户吞吐量, 且收敛速度较快。  相似文献   

11.
基于ART2的Q学习算法研究   总被引:1,自引:0,他引:1  
为了解决Q学习应用于连续状态空间的智能系统所面临的"维数灾难"问题,提出一种基于ART2的Q学习算法.通过引入ART2神经网络,让Q学习Agent针对任务学习一个适当的增量式的状态空间模式聚类,使Agent无需任何先验知识,即可在未知环境中进行行为决策和状态空间模式聚类两层在线学习,通过与环境交互来不断改进控制策略,从而提高学习精度.仿真实验表明,使用ARTQL算法的移动机器人能通过与环境交互学习来不断提高导航性能.  相似文献   

12.
Path selection is one of the fundamental problems in emergency logistics management. Two mathematical models for path selection in emergency logistics management are presented considering more actual factors in time of disaster. First a single-objective path selection model is presented taking into account that the travel speed on each arc will be affected by disaster extension. The objective of the model is to minimize total travel time along a path. The travel speed on each arc is modeled as a continuous decrease function with respect to time. A modified Dijkstra algorithm is designed to solve the model. Based on the first model, we further consider the chaos, panic and congestions in time of disaster. A multi-objective path selection model is presented to minimize the total travel time along a path and to minimize the path complexity. The complexity of the path is modeled as the total number of arcs included in the path. An ant colony optimization algorithm is proposed to solve the model. Simulation results show the effectiveness and feasibility of the models and algorithms presented in this paper.  相似文献   

13.
提出一种改进深度强化学习算法(NDQN),解决传统Q-learning算法处理复杂地形中移动机器人路径规划时面临的维数灾难.提出一种将深度学习融于Q-learning框架中,以网络输出代替Q值表的深度强化学习方法.针对深度Q网络存在严重的过估计问题,利用更正函数对深度Q网络中的评价函数进行改进.将改进深度强化学习算法与...  相似文献   

14.
UAV online path-planning in a low altitude dangerous environment with dense obstacles, static threats (STs) and dynamic threats (DTs), is a complicated, dynamic, uncertain and real-time problem. We propose a novel method to solve the problem to get a feasible and safe path. Firstly STs are modeled based on intuitionistic fuzzy set (IFS) to express the uncertainties in STs. The methods for ST assessment and synthesizing are presented. A reachability set (RS) estimator of DT is developed based on rapidly-exploring random tree (RRT) to predict the threat of DT. Secondly a subgoal selector is proposed and integrated into the planning system to decrease the cost of planning, accelerate the path searching and reduce threats on a path. Receding horizon (RH) is introduced to solve the online path planning problem in a dynamic and partially unknown environment. A local path planner is constructed by improving dynamic domain rapidly-exploring random tree (DDRRT) to deal with complex obstacles. RRT* is embedded into the planner to optimize paths. The results of Monte Carlo simulation comparing the traditional methods prove that our algorithm behaves well on online path planning with high successful penetration probability.   相似文献   

15.
阳杰  张凯 《微处理机》2021,(1):47-51
未知连续环境状态下的Q学习路径规划算法在执行对环境的试错时收敛速度慢,容易陷入局部,不利于对真实未知环境的探索,为解决此问题,针对Q学习路径规划问题提出一种基于Metropolis准则的区域扩张策略的势场强化学习算法.算法为环境提供势场先验知识初始化状态信息,消除初始时刻的盲目性,提高学习效率,同时引入基于Metrop...  相似文献   

16.
移动机器人在复杂环境中移动难以得到较优的路径,基于马尔可夫过程的Q学习(Q-learning)算法能通过试错学习取得较优的路径,但这种方法收敛速度慢,迭代次数多,且试错方式无法应用于真实的环境中。在Q-learning算法中加入引力势场作为初始环境先验信息,在其基础上对环境进行陷阱区域逐层搜索,剔除凹形陷阱区域[Q]值迭代,加快了路径规划的收敛速度。同时取消对障碍物的试错学习,使算法在初始状态就能有效避开障碍物,适用于真实环境中直接学习。利用python及pygame模块建立复杂地图,验证加入初始引力势场和陷阱搜索的改进Q-learning算法路径规划效果。仿真实验表明,改进算法能在较少的迭代次数后,快速有效地到达目标位置,且路径较优。  相似文献   

17.
地震灾害发生后,为实现生命体的快速发现和科学救援,需要多源数据的辅助支撑.遥感数据、现场环境数据、危险设施分布数据、生命体征数据、生命探测设备运行状态数据以及历史数据,共同构成了应急救援场景下的多源异构数据集.本文面向应急救援领域的数据统一监管、多维分析等应用需求,深入研究多源异构数据的三维空间融合分析技术,提出了一种基于WebGL渲染技术的无插件三维空间融合方案,研制应急救援数据融合可视化系统,实现统一时空框架下应急救援场景所需多源异构数据的融合表达,辅助作业人员进行联动态势分析,帮助指挥人员进行指挥决策,大幅提高应急救援现场应用的工作时效.  相似文献   

18.
泥石流具有时间和空间群发性、运动持续性、灾害链式性、类型单一性、致灾方式和受灾对象的多样性特点;灾害救援中心位置选取过程是典型的非线性不收敛问题;单纯以线性技术和约束条件为基础的模型面对复杂泥石流灾害指标的情况下,会出现较大偏差;提出基于伪卫星测试结合并行计算的灾害救援中心选定模型设计方法;以泥石流灾害为例,利用离散元法和GPU并行算法构建泥石流运动堆积模型,将沟道泥石流汇流后运动堆积特点和灾害区域当作模拟参照,对泥石流运动进行模拟,根据模拟结果,预测泥石流覆盖范围;利用伪卫星技术计算灾害扩散程度和易发点距离,在救援中心搭建费用与救援成本最低前提下,将用户选择的位置与高度作为已知条件,以期望损失最低为约束,以精度因子为标准得到救援选址的核心区域点,构建基于伪卫星测试的灾害救援中心选定模型;仿真实验表明,设计模型的选址精度和效率较高,可为泥石流灾害救援提供数据支撑。  相似文献   

19.
Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart–pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm.  相似文献   

20.
煤矿事故紧急救援是一个艰难的任务,救援的关键是探明事故的发生地点、事故的影响范围和救援的最短路径.论文采用GIS独特的空间分析功能,构造了煤矿项目的关系模型和巷道数据库,建立了基于Dijsktra优化算法的实时最短路径搜索算法,为紧急救援指明了最短救援路径,给出了分析过程和可行性证明.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号