图像标题生成中的人物类名实体填充方法研究 Research on Filling for Person Names in Image Captioning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

图像标题生成中的人物类名实体填充方法研究

引用本文：	张家硕,洪宇,唐建,程梦,姚建民.图像标题生成中的人物类名实体填充方法研究[J].中文信息学报,2019,33(9):96-106.

作者姓名：	张家硕洪宇唐建程梦姚建民

作者单位：	苏州大学计算机科学与技术学院,江苏苏州 215006

基金项目：	国家自然科学基金(61672367,61672368)

摘要：	得益于深度学习的发展和大规模图像标注数据集的出现,图像标题生成作为一种结合了计算机视觉和自然语言处理的综合任务得到了广泛关注。受到神经机器翻译任务的启发,前人将图像标题生成任务看作是一种特殊的翻译任务,即将一张图像视作源端的信息表述,通过编码解码过程,翻译为目标端的自然语言语句。因此,现有研究引入了端到端的神经网络模型,并取得了较好的生成效果。然而,图像标题生成研究依然面临许多挑战,其中最值得关注的难点之一是解决确切性文字表述的问题。一条确切的标题往往是有形且具体的表述,例如“梅西主罚点球”,而目前机器生成的标题则较为粗浅和单调,例如“一个人在踢球”。针对这一问题,该文尝试开展标题生成的有形化研究,并在前瞻性实验中聚焦于标题中人名实体的识别与填充。在技术层面,该文将机器自动生成的图像标题作为处理对象,去除其中抽象人名实体的名称(例如,一个人、男人和他等)或错误的称谓,并将由此形成的带有句法空缺的表述视作完型填空题目,从而引入了以Who问题为目标的阅读理解技术。具体地,该文利用R-NET阅读理解模型实现标题中人名实体的抽取与填充。此外,该文尝试基于图像所在文本的局部信息和外部链接的全局信息,对人名实体进行抽取。实验结果表明,该方法有效提高了图像标题的生成质量,BLEU值相应提升了2.93%;实验结果也显示,利用全局信息有利于发现和填充正确的人名实体。
关键词：	图像标题生成实体信息阅读理解
Research on Filling for Person Names in Image Captioning

ZHANG Jiashuo,HONG Yu,TANG Jian,CHENG Meng,YAO Jianmin.Research on Filling for Person Names in Image Captioning[J].Journal of Chinese Information Processing,2019,33(9):96-106.

Authors:	ZHANG Jiashuo HONG Yu TANG Jian CHENG Meng YAO Jianmin

Affiliation:	School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China

Abstract:	Current image captioning is challenged the veracity of captions, i.e. an exact caption with tangible and specific entities is generated with a crude and monotonous captions ( e.g. “Messi takes the penalty kick” vs “a person is playing a ball.”). Focused on the identification and filling of person entities, this paper transform this task into a cloze issue with syntactic vacancy by removing the common person representation(e.g.“man”“player”) in the generated image caption. To introduce reading comprehension famework to address Who problem, this paper uses the R-Net to realize the acquisition and filling of the person name entity. In addition, we attempt to use the local and the global information to extract the person name entity, with local information indicating the source document that the image is located and the global information indicating the related documents from external links. Experiments show that the proposed method can effectively improve the quality of image caption generation and increase the BLEU by 2.93%.

Keywords:	image captioning entity information reading comprehension

	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏