首页 | 本学科首页   官方微博 | 高级检索  
     

基于深度强化学习求解作业车间机器与 AGV联合调度问题
引用本文:孙爱红,雷琦,宋豫川,杨云帆. 基于深度强化学习求解作业车间机器与 AGV联合调度问题[J]. 控制与决策, 2024, 39(1): 253-262
作者姓名:孙爱红  雷琦  宋豫川  杨云帆
作者单位:重庆大学机械传动国家重点实验室
基金项目:国家自然科学基金项目(51205429);
摘    要:针对作业车间中自动引导运输车(automated guided vehicle, AGV)与机器联合调度问题,以完工时间最小化为目标,提出一种基于卷积神经网络和深度强化学习的集成算法框架.首先,对含AGV的作业车间调度析取图进行分析,将问题转化为一个序列决策问题,并将其表述为马尔可夫决策过程.接着,针对问题的求解特点,设计一种基于析取图的空间状态与5个直接状态特征;在动作空间的设置上,设计包含工序选择和AGV指派的二维动作空间;根据作业车间中加工时间与有效运输时间为定值这一特点,构造奖励函数来引导智能体进行学习.最后,设计针对二维动作空间的2D-PPO算法进行训练和学习,以快速响应AGV与机器的联合调度决策.通过实例验证,基于2D-PPO算法的调度算法具有较好的学习性能和可扩展性效果.

关 键 词:作业车间调度  自动引导运输车  深度强化学习  马尔可夫决策过程  近端策略优化  联合调度

Deep reinforcement learning for solving the joint scheduling problem of machines and AGVs in job shop
SUN Ai-hong,LEI Qi,SONG Yu-chuan,YANG Yun-fan. Deep reinforcement learning for solving the joint scheduling problem of machines and AGVs in job shop[J]. Control and Decision, 2024, 39(1): 253-262
Authors:SUN Ai-hong  LEI Qi  SONG Yu-chuan  YANG Yun-fan
Affiliation:State Key Laboratory of Mechanical Transmission,Chongqing University,Chongqing 400044,China
Abstract:Aiming at the joint scheduling problem of automated guided vehicle(AGV) and machines in the job shop, an integrated algorithm framework based on convolutional neural network and deep reinforcement learning is proposed with the goal of minimizing the completion time. Firstly, the job shop scheduling disjunction graph containing an AGV is analyzed, and the problem is transformed into a sequential decision problem, which is expressed as the Markov decision process. Then, according to the solving characteristics of the problem, a spatial state and five direct state features based on the disjunctive graph are designed. In the setting of the action space, a two-dimensional action space including process selection and AGV assignment is designed. According to the characteristics of fixed value of processing time and effective transportation time in the work workshop, a reward function is constructed to guide the agent to learn. Finally, a 2D-PPO algorithm for two-dimensional action space is designed for training and learning to quickly respond to the joint scheduling decision of the AGV and machine. Through case verification, the scheduling algorithm based on the 2D-PPO algorithm has good learning performance and scalability effect.
Keywords:job shop scheduling;automated guided vehicle;deep reinforcement learning;Markov decision process;proximal policy optimization;joint scheduling
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号