从视觉到文本: 图像描述生成的研究进展综述 From Vision to Text: A Brief Survey for Image Captioning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

从视觉到文本: 图像描述生成的研究进展综述

引用本文：	魏忠钰,范智昊,王瑞泽,承怡菁,赵王榕,黄萱菁. 从视觉到文本: 图像描述生成的研究进展综述[J]. 中文信息学报, 1986, 34(7): 19-29

作者姓名：	魏忠钰范智昊王瑞泽承怡菁赵王榕黄萱菁

作者单位：	1.复旦大学大数据学院,上海 200433; 2.复旦大学工程与应用技术研究院,上海 200433; 3.复旦大学计算机科学与技术学院,上海 200433

基金项目：	国家自然科学基金(71991471);国家社会科学基金(20ZDA060);上海市科学技术委员会(18DZ1201000,17JC1420200)

摘要：	近年来,跨模态研究吸引了越来越多学者的关注,尤其是连接视觉和语言的相关课题。该文针对跨视觉和语言模态研究中的核心任务——图像描述生成,进行文献综述。该文从基于视觉的文本生成框架、基于视觉的文本生成研究中的关键问题、图像描述生成模型的性能评价和图像描述生成模型的主要发展过程四个方面对相关文献进行介绍和总结。最后,该文给出了几个未来的重点研究方向,包括跨视觉和语言模态的特征对齐、自动化评价指标的设计以及多样化图像描述生成。
关键词：	图像描述生成跨模态特征对齐文献综述
From Vision to Text: A Brief Survey for Image Captioning

WEI Zhongyu,FAN Zhihao,WANG Ruize,CHENG Yijing,ZHAO Wangrong,HUANG Xuanjing. From Vision to Text: A Brief Survey for Image Captioning[J]. Journal of Chinese Information Processing, 1986, 34(7): 19-29

Authors:	WEI Zhongyu FAN Zhihao WANG Ruize CHENG Yijing ZHAO Wangrong HUANG Xuanjing

Affiliation:	1.School of Data Science, Fudan University, Shanghai 200433, China; 2.Academy for Engineering and Technology, Fudan University, Shanghai 200433, China; 3.School of Computer Science and Technology, Fudan University, Shanghai 200433, China

Abstract:	In recent years, increasing attention has been attracted to the research field related to cross-modality, especially vision and language. This survey focuses on the task of image captioning and summarizes literatures from four aspects, including the overall architecture, some key questions for cross-modality research, the evaluation of image captioning and the state-of-the-art approaches to image captioning. In conclusion, we suggest three directions for future research, i.e., cross-modality representation, automatic evaluation metrics and diverse text generation.

Keywords:	image captioning cross-modality alignment literature review

	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏