首页 | 本学科首页   官方微博 | 高级检索  
     

Transformer在计算机视觉领域的研究综述
引用本文:李翔,张涛,张哲,魏宏杨,钱育蓉.Transformer在计算机视觉领域的研究综述[J].计算机工程与应用,2023,59(1):1-14.
作者姓名:李翔  张涛  张哲  魏宏杨  钱育蓉
作者单位:新疆大学 软件学院,乌鲁木齐 830002
基金项目:国家自然科学基金(61966035);
摘    要:Transformer是一种基于自注意力机制的深度神经网络。近几年,基于Transformer的模型已成为计算机视觉领域的热门研究方向,其结构也在不断改进和扩展,比如局部注意力机制、金字塔结构等。通过对基于Transformer结构改进的视觉模型,分别从性能优化和结构改进两个方面进行综述和总结;也对比分析了Transformer和CNN各自结构的优缺点,并介绍了一种新型的CNN+Transformer的混合结构;最后,对Transformer在计算机视觉上的发展进行总结和展望。

关 键 词:Transformer  卷积神经网络(CNN)  混合结构  计算机视觉  深度学习

Survey of Transformer Research in Computer Vision
LI Xiang,ZHANG Tao,ZHANG Zhe,WEI Hongyang,QIAN Yurong.Survey of Transformer Research in Computer Vision[J].Computer Engineering and Applications,2023,59(1):1-14.
Authors:LI Xiang  ZHANG Tao  ZHANG Zhe  WEI Hongyang  QIAN Yurong
Affiliation:College of Software, Xinjiang University, Urumqi 830002, China
Abstract:Transformer is a deep neural network based on self-attention mechanism. In recent years, Transformer-based models have become a hot research direction in the field of computer vision, and their structures are constantly being improved and expanded, such as local attention mechanisms, pyramid structures, and so on. Through the improved vision model based on Transformer structure, the performance optimization and structure improvement are reviewed and summarized respectively. In addition,the advantages and disadvantages of the respective structures of the Transformer and convolutional neural network(CNN) are compared and analyzed,and a new hybrid structure of CNN+Transformer is introduced. Finally,the development of Transformer in computer vision is summarized and prospected.
Keywords:Transformer  convolutional neural network(CNN)  hybrid structure  computer vision  deep learning  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号