Transformer在计算机视觉领域的研究综述 Survey of Transformer Research in Computer Vision期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Transformer在计算机视觉领域的研究综述

引用本文：	李翔,张涛,张哲,魏宏杨,钱育蓉.Transformer在计算机视觉领域的研究综述[J].计算机工程与应用,2023,59(1):1-14.

作者姓名：	李翔张涛张哲魏宏杨钱育蓉

作者单位：	新疆大学软件学院，乌鲁木齐 830002

基金项目：	国家自然科学基金（61966035）；

摘要：	Transformer是一种基于自注意力机制的深度神经网络。近几年，基于Transformer的模型已成为计算机视觉领域的热门研究方向，其结构也在不断改进和扩展，比如局部注意力机制、金字塔结构等。通过对基于Transformer结构改进的视觉模型，分别从性能优化和结构改进两个方面进行综述和总结；也对比分析了Transformer和CNN各自结构的优缺点，并介绍了一种新型的CNN+Transformer的混合结构；最后，对Transformer在计算机视觉上的发展进行总结和展望。
关键词：	Transformer 卷积神经网络(CNN) 混合结构计算机视觉深度学习
Survey of Transformer Research in Computer Vision

LI Xiang,ZHANG Tao,ZHANG Zhe,WEI Hongyang,QIAN Yurong.Survey of Transformer Research in Computer Vision[J].Computer Engineering and Applications,2023,59(1):1-14.

Authors:	LI Xiang ZHANG Tao ZHANG Zhe WEI Hongyang QIAN Yurong

Affiliation:	College of Software, Xinjiang University, Urumqi 830002, China

Abstract:	Transformer is a deep neural network based on self-attention mechanism. In recent years, Transformer-based models have become a hot research direction in the field of computer vision, and their structures are constantly being improved and expanded, such as local attention mechanisms, pyramid structures, and so on. Through the improved vision model based on Transformer structure, the performance optimization and structure improvement are reviewed and summarized respectively. In addition，the advantages and disadvantages of the respective structures of the Transformer and convolutional neural network（CNN） are compared and analyzed，and a new hybrid structure of CNN+Transformer is introduced. Finally，the development of Transformer in computer vision is summarized and prospected.

Keywords:	Transformer convolutional neural network（CNN） hybrid structure computer vision deep learning

	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏