首页 | 本学科首页   官方微博 | 高级检索  
     

基于end-to-end深度强化学习的多车场车辆路径优化
引用本文:雷坤,郭鹏,王祺欣,赵文超,唐连生.基于end-to-end深度强化学习的多车场车辆路径优化[J].计算机应用研究,2022,39(10):3013-3019.
作者姓名:雷坤  郭鹏  王祺欣  赵文超  唐连生
作者单位:西南交通大学机械工程学院,成都610031;西南交通大学机械工程学院,成都610031;西南交通大学轨道交通运维技术与装备四川省重点实验室,成都610031;宁波工程学院 经济与管理学院,浙江 宁波315211
基金项目:浙江省高校重大人文社科攻关计划资助项目(2018QN060)
摘    要:为提高多车场车辆路径问题(multi-depot vehicle routing problem, MDVRP)的求解效率,提出了端到端的深度强化学习框架。首先,将MDVRP建模为马尔可夫决策过程(Markov decision process, MDP),包括对其状态、动作、收益的定义;同时,提出了改进图注意力网络(graph attention network, GAT)作为编码器对MDVRP的图表示进行特征嵌入编码,设计了基于Transformer的解码器;采用改进REINFORCE算法来训练该模型,该模型不受图的大小约束,即其一旦完成训练,就可用于求解任意车场和客户数量的算例问题。最后,通过随机生成的算例和公开的标准算例验证了所提出框架的可行性和有效性,即使在求解客户节点数为100的MDVRP上,经训练的模型平均仅需2 ms即可得到与现有方法相比更具优势的解。

关 键 词:多车场车辆路径问题  深度强化学习  图神经网络  REINFORCE算法  Transformer模型
收稿时间:2022/3/7 0:00:00
修稿时间:2022/9/9 0:00:00

End-to-end deep reinforcement learning framework for multi-depot vehicle routing problem
LEI Kun,GUO Peng,Wang Qixin,ZHAO Wenchao and TANG Liansheng.End-to-end deep reinforcement learning framework for multi-depot vehicle routing problem[J].Application Research of Computers,2022,39(10):3013-3019.
Authors:LEI Kun  GUO Peng  Wang Qixin  ZHAO Wenchao and TANG Liansheng
Affiliation:Southwest Jiaotong University,,,,
Abstract:This paper proposed an end-to-end deep reinforcement learning framework to improve the efficiency of solving the multi-depot vehicle routing problem(MDVRP). This paper modeled a novel formulation of the Markov decision process(MDP) for the MDVRP, including the definitions of its state, action, and reward. Then, this paper exploited an improved graph attention network(GAT) as the encoder to perform feature embedding on the graph representation of MDVRP, and designed a Transformer-based decoder. Meanwhile, it used the improved REINFORCE algorithm to train the proposed encoder-decoder model. Furthermore, the designed encoder-decoder model wasn''t bounded by the size of the graph. That was, once the framework was trained, it could be used to solve MDVRP instances with different scales. Finally, the results on randomly generated and published standard instances verified the feasibility and effectiveness of the proposed framework. Significantly, even on solving MDVRP with 100 customer nodes, the trained model takes only two milliseconds on average to obtain a very competitive solution compared with existing methods.
Keywords:multi-depot vehicle routing problem  deep reinforcement learning  graph neural network  REINFORCE algorithm  Transformer model
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号