首页 | 本学科首页   官方微博 | 高级检索  
     

融入事件实体知识的汉越跨语言新闻事件检索
引用本文:薛振宇,余正涛,高盛祥.融入事件实体知识的汉越跨语言新闻事件检索[J].计算机工程,2022,48(8):274.
作者姓名:薛振宇  余正涛  高盛祥
作者单位:1. 昆明理工大学 信息工程与自动化学院, 昆明 650500;2. 昆明理工大学 云南省人工智能重点实验室, 昆明 650500
基金项目:国家自然科学基金(61972186,61762056,61472168);国家重点研发计划(2018YFC0830105,2018YFC0830101,2018YFC0830100);云南省重大科技专项(202002AD080001);云南省高科技人才项目(201606,202105AC160018);云南省基础研究计划(202001AS070014,2018FB104)。
摘    要:现有汉越跨语言新闻事件检索方法较少使用新闻领域内的事件实体知识,在候选文档中存在多个事件的情况下,与查询句无关的事件会干扰查询句与候选文档间的匹配精度,影响检索性能。提出一种融入事件实体知识的汉越跨语言新闻事件检索模型。通过查询翻译方法将汉语事件查询句翻译为越南语事件查询句,把跨语言新闻事件检索问题转化为单语新闻事件检索问题。考虑到查询句中只有单个事件,候选文档中多个事件共存会影响查询句和文档的精准匹配,利用事件触发词划分候选文档事件范围,减小文档中与查询无关事件的干扰。在此基础上,利用知识图谱和事件触发词得到事件实体丰富的知识表示,通过查询句与文档事件范围间的交互,提取到事件实体知识表示与词以及事件实体知识表示之间的排序特征。在汉越双语新闻数据集上的实验结果表明,与BM25、Conv-KNRM、ATER等基线模型相比,该模型能够取得较好的跨语言新闻事件检索效果,NDCG和MAP指标最高可提升0.712 2和0.587 2。

关 键 词:跨语言检索  事件实体  事件触发词  事件范围  排序学习  事件检索  
收稿时间:2021-05-10
修稿时间:2021-09-05

Chinese-Vietnamese Cross-Language News Event Retrieval Incorporating Event Entity Knowledge
XUE Zhenyu,YU Zhengtao,GAO Shengxiang.Chinese-Vietnamese Cross-Language News Event Retrieval Incorporating Event Entity Knowledge[J].Computer Engineering,2022,48(8):274.
Authors:XUE Zhenyu  YU Zhengtao  GAO Shengxiang
Affiliation:1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;2. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China
Abstract:The existing Chinese-Vietnamese cross-language news event retrieval methods are not sufficiently integrated into the knowledge of event entities in the news field.Furthermore, when there are multiple events in the candidate document, events unrelated to the query sentence interfere with the matching accuracy between the query sentence and the candidate documents, which affects retrieval performance.This study proposes a Chinese-Vietnamese cross-language news event retrieval model incorporating event entity knowledge.The query translation method is used to translate Chinese event query sentences into Vietnamese event query sentences, and the cross-language news event retrieval problem is transformed into a monolingual news event retrieval problem.Considering that there is only a single event in the query sentence, the coexistence of multiple events in the candidate document affects the exact match between the query sentence and the document.The event trigger word is used to divide the event range of the candidate document and to reduce the interference of events unrelated to the query in the document.On this basis, the knowledge graph and event trigger words are used to obtain the rich knowledge representation of event entities.Through the interaction between the query sentence and the document event scope, the ranking features between the knowledge representation of event entities and the knowledge representation of words and event entities are extracted.The experimental results on the Chinese-Vietnamese bilingual news dataset show that compared with baseline models such as BM25, Conv-KNRM, and ATER, the proposed model achieves better cross-language news event retrieval performance;furthermore, using the proposed model, the NDCG and MAP indicators can be improved by up to 0.712 2 and 0.587 2.
Keywords:cross-language retrieval  event entity  event trigger  event range  ranking learning  event retrieval  
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号