模仿排序学习模型 Imitation Learning to Rank期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

模仿排序学习模型

引用本文：	曾玮,俞蔚捷,徐君,兰艳艳,程学旗.模仿排序学习模型[J].中文信息学报,2020,34(1):97-105.

作者姓名：	曾玮俞蔚捷徐君兰艳艳程学旗

作者单位：	1.中国科学院计算技术研究所网络数据科学与技术重点实验室,北京 100190; 2.中国科学院大学,北京 100049; 3.中国人民大学高瓴人工智能学院大数据管理与分析方法研究北京市重点实验室,北京 100872

基金项目：	国家自然科学基金(61872338,61832017,61773362,61425016,61472401,61722211,61906180);北京高校卓越青年科学家计划项目(BJJWZYJH012019100020098);北京智源人工智能研究院(BAAI2019ZD0305);中国人民大学科学研究基金(2018030246);中国科学院青年创新促进会优秀会员项目(20144310,2016102);国家重点研发项目(2016QY02D0405)

摘要：	文档排序一直是信息检索(IR)领域的关键任务之一。受益于马尔科夫决策过程强大的建模能力,以及强化学习方法强大的求解能力,近年来基于强化学习的排序模型被提出并取得了良好效果。然而,由于候选文档中会包含大量的不相关文档,导致基于"试错"的强化学习方法存在效率低下的问题。为解决上述问题,该文提出了一种基于模仿学习的排序学习算法IR-DAGGER,其基于文档标注信息构建专家策略,在保证文档排序精度的同时提高了算法的学习效率。为了测试IR-DAGGER的性能,该文基于面向相关性排序任务的OHSUMED数据集和面向多样化排序的TREC数据集进行了实验,实验结果表明IR-DAGGER在上述两个数据集上均提升了文档排序的精度和效率。
关键词：	排序模仿学习强化学习
Imitation Learning to Rank

ZENG Wei,YU Weijie,XU Jun,LAN Yanyan,CHENG Xueqi.Imitation Learning to Rank[J].Journal of Chinese Information Processing,2020,34(1):97-105.

Authors:	ZENG Wei YU Weijie XU Jun LAN Yanyan CHENG Xueqi

Affiliation:	1.CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 2.University of Chinese Academy of Sciences, Beijing 100049, China; 3.Gaoling School of Artifical Intelligence, Beijing Key Laboratory of Big Data Management and Analysis Methods, Renmin University of China, Beijing 100872, China

Abstract:	Document ranking is one of the central tasks in a number of IR applications. In recent years, efforts have been made to apply reinforcement learning for learning document ranking models and a number of methods have been developed. Though preliminary success has been achieved, existing reinforcement methods still suffer from the sparseness of the relevant documents. In this paper, we propose to involve ground-truth ranking lists during the learning process, achieving a novel imitation learning-based learning to rank algorithm called IR-DAGGER. It utilizes the ranking lists sampled by the expert policy, which can enhance the learning efficiency while keeping the ranking accuracies. Experimental results based on OHSUMED and TREC showed that IR-DAGGER can outperform the state-of-the-art baselines for the tasks of relevant ranking and diverse ranking, indicating the effectiveness and efficiency of imitation learning in document ranking.

Keywords:	learning to rank imitation learning reinforcement learning
本文献已被维普等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏