基于end-to-end深度强化学习的多车场车辆路径优化 End-to-end deep reinforcement learning framework for multi-depot vehicle routing problem期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于end-to-end深度强化学习的多车场车辆路径优化

引用本文：	雷坤,郭鹏,王祺欣,赵文超,唐连生.基于end-to-end深度强化学习的多车场车辆路径优化[J].计算机应用研究,2022,39(10):3013-3019.

作者姓名：	雷坤郭鹏王祺欣赵文超唐连生

作者单位：	西南交通大学机械工程学院,成都610031;西南交通大学机械工程学院,成都610031;西南交通大学轨道交通运维技术与装备四川省重点实验室,成都610031;宁波工程学院经济与管理学院,浙江宁波315211

基金项目：	浙江省高校重大人文社科攻关计划资助项目(2018QN060)

摘要：	为提高多车场车辆路径问题(multi-depot vehicle routing problem, MDVRP)的求解效率，提出了端到端的深度强化学习框架。首先，将MDVRP建模为马尔可夫决策过程(Markov decision process, MDP),包括对其状态、动作、收益的定义；同时，提出了改进图注意力网络(graph attention network, GAT)作为编码器对MDVRP的图表示进行特征嵌入编码，设计了基于Transformer的解码器；采用改进REINFORCE算法来训练该模型，该模型不受图的大小约束，即其一旦完成训练，就可用于求解任意车场和客户数量的算例问题。最后，通过随机生成的算例和公开的标准算例验证了所提出框架的可行性和有效性，即使在求解客户节点数为100的MDVRP上，经训练的模型平均仅需2 ms即可得到与现有方法相比更具优势的解。
关键词：	多车场车辆路径问题深度强化学习图神经网络 REINFORCE算法 Transformer模型
收稿时间：	2022/3/7 0:00:00
修稿时间：	2022/9/9 0:00:00
End-to-end deep reinforcement learning framework for multi-depot vehicle routing problem

LEI Kun,GUO Peng,Wang Qixin,ZHAO Wenchao and TANG Liansheng.End-to-end deep reinforcement learning framework for multi-depot vehicle routing problem[J].Application Research of Computers,2022,39(10):3013-3019.

Authors:	LEI Kun GUO Peng Wang Qixin ZHAO Wenchao and TANG Liansheng

Affiliation:	Southwest Jiaotong University,,,,

Abstract:	This paper proposed an end-to-end deep reinforcement learning framework to improve the efficiency of solving the multi-depot vehicle routing problem(MDVRP). This paper modeled a novel formulation of the Markov decision process(MDP) for the MDVRP, including the definitions of its state, action, and reward. Then, this paper exploited an improved graph attention network(GAT) as the encoder to perform feature embedding on the graph representation of MDVRP, and designed a Transformer-based decoder. Meanwhile, it used the improved REINFORCE algorithm to train the proposed encoder-decoder model. Furthermore, the designed encoder-decoder model wasn''t bounded by the size of the graph. That was, once the framework was trained, it could be used to solve MDVRP instances with different scales. Finally, the results on randomly generated and published standard instances verified the feasibility and effectiveness of the proposed framework. Significantly, even on solving MDVRP with 100 customer nodes, the trained model takes only two milliseconds on average to obtain a very competitive solution compared with existing methods.

Keywords:	multi-depot vehicle routing problem deep reinforcement learning graph neural network REINFORCE algorithm Transformer model
本文献已被万方数据等数据库收录！
	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏