基于注意力机制和编码-解码架构的施工场景图像描述方法 A image caption method of construction scene based on attention mechanism and encoding-decoding architecture期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于注意力机制和编码-解码架构的施工场景图像描述方法

引用本文：	农元君,王俊杰,陈红,孙文涵,耿慧,李书悦.基于注意力机制和编码-解码架构的施工场景图像描述方法[J].浙江大学学报(自然科学版 ),2022,56(2):236-244.

作者姓名：	农元君王俊杰陈红孙文涵耿慧李书悦

作者单位：	中国海洋大学工程学院，山东青岛 266100

基金项目：	山东省重点研发计划资助项目(2019GHY112081)

摘要：	为了实现在光线不佳、夜间施工、远距离密集小目标等复杂施工场景下的图像描述，提出基于注意力机制和编码-解码架构的施工场景图像描述方法. 采用卷积神经网络构建编码器，提取施工图像中丰富的视觉特征；利用长短时记忆网络搭建解码器，捕捉句子内部单词之间的语义特征，学习图像特征与单词语义特征之间的映射关系；引入注意力机制，关注显著性强的特征，抑制非显著性特征，减少噪声信息的干扰. 为了验证所提方法的有效性，构建一个包含10种常见施工场景的图像描述数据集. 实验结果表明，所提方法取得了较高的精度，在光线不佳、夜间施工、远距离密集小目标等复杂施工场景下具有良好的图像描述性能，且具有较强的泛化性和适应性.
关键词：	图像描述施工场景注意力机制编码解码
A image caption method of construction scene based on attention mechanism and encoding-decoding architecture

Yuan-jun NONG,Jun-jie WANG,Hong CHEN,Wen-han SUN,Hui GENG,Shu-yue LI.A image caption method of construction scene based on attention mechanism and encoding-decoding architecture[J].Journal of Zhejiang University(Engineering Science),2022,56(2):236-244.

Authors:	Yuan-jun NONG Jun-jie WANG Hong CHEN Wen-han SUN Hui GENG Shu-yue LI

Abstract:	A construction scene image caption method based on attention mechanism and encoding-decoding architecture was proposed, in order to realize the image caption in the complex construction scenes such as poor light, night construction, long-distance dense small targets and so on. Convolutional neural network was used to construct encoder to extract rich visual features in construction images. Long short-term memory network was used to construct decoder to capture semantic features of words in sentences and learn mapping relationship between image features and semantic features of words. Attention mechanism was introduced to focus on significant features, suppress non-significant features and reduce interference of noise information. An image caption data set containing ten common construction scenes was constructed in order to verify the effectiveness of the proposed method. Experimental results show that the proposed method achieves high accuracy, has good image caption performance in complex construction scenes such as poor light, night construction, long-distance dense small targets and so on, and has strong generalization and adaptability.

Keywords:	image caption construction scene attention mechanism encoding decoding

	点击此处可从《浙江大学学报(自然科学版 )》浏览原始摘要信息
	点击此处可从《浙江大学学报(自然科学版 )》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏