首页 | 本学科首页   官方微博 | 高级检索  
     

基于TD3算法的自动协商策略
引用本文:陈佐明,詹捷宇.基于TD3算法的自动协商策略[J].计算机系统应用,2023,32(3):15-24.
作者姓名:陈佐明  詹捷宇
作者单位:华南师范大学 计算机学院, 广州 510631
基金项目:国家自然科学基金青年基金(62006085)
摘    要:协商是人们就某些议题进行交流寻求一致协议的过程.而自动协商旨在通过协商智能体的使用降低协商成本、提高协商效率并且优化协商结果.近年来深度强化学习技术开始被运用于自动协商领域并取得了良好的效果,然而依然存在智能体训练时间较长、特定协商领域依赖、协商信息利用不充分等问题.为此,本文提出了一种基于TD3深度强化学习算法的协商策略,通过预训练降低训练过程的探索成本,通过优化状态和动作定义提高协商策略的鲁棒性从而适应不同的协商场景,通过多头语义神经网络和对手偏好预测模块充分利用协商的交互信息.实验结果表明,该策略在不同协商环境下都可以很好地完成协商任务.

关 键 词:自动协商  协商策略  深度强化学习  TD3算法  偏好预测
收稿时间:2022/7/27 0:00:00
修稿时间:2022/8/26 0:00:00

Automated Negotiation Strategy Based on TD3 Algorithm
CHEN Zuo-Ming,ZHAN Jie-Yu.Automated Negotiation Strategy Based on TD3 Algorithm[J].Computer Systems& Applications,2023,32(3):15-24.
Authors:CHEN Zuo-Ming  ZHAN Jie-Yu
Affiliation:School of Computer Science, South China Normal University, Guangzhou 510631, China
Abstract:Negotiation refers to the process in which people communicate with each other on certain topics to reach an agreement. Automated negotiation aims to reduce negotiation costs, improve negotiation efficiency, and optimize negotiation results by using negotiating agents. In recent years, deep reinforcement learning techniques have been applied to the field of automated negotiation with good results. However, there are still problems such as the long training time of agents, dependence on specific negotiation domains, and insufficient utilization of negotiation information. Therefore, this study proposes a negotiation strategy based on the TD3 deep reinforcement learning algorithm, which reduces the exploration cost of the training process through pre-training and improves the robustness of the negotiation strategy by optimizing the state and action definitions, so as to adapt to different negotiation scenarios. In addition, it makes full use of the interaction information of the negotiation by multi-head semantic neural network and opponent preference prediction module. The experimental results show that the strategy can perform the negotiation task well in different negotiation environments.
Keywords:automated negotiation  negotiation strategy  deep reinforcement learning  TD3 algorithm  preference prediction
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号