首页 | 本学科首页   官方微博 | 高级检索  
     

基于多层特征增强的实时视觉跟踪
引用本文:费大胜,宋慧慧,张开华. 基于多层特征增强的实时视觉跟踪[J]. 计算机应用, 2005, 40(11): 3300-3305. DOI: 10.11772/j.issn.1001-9081.2020040514
作者姓名:费大胜  宋慧慧  张开华
作者单位:1. 江苏省大数据分析技术重点实验室(南京信息工程大学), 南京 210044;2. 江苏省大气环境与装备技术协同创新中心(南京信息工程大学), 南京 210044
基金项目:国家自然科学基金资助项目(61872189,61876088);江苏省自然科学基金资助项目(BK20191397,BK20170040)。
摘    要:为了解决全卷积孪生视觉跟踪网络(SiamFC)出现相似语义信息干扰物使得跟踪目标发生漂移,导致跟踪失败的问题,设计出一种基于多层特征增强的实时视觉跟踪网络(MFESiam),分别去增强高层和浅层的特征表示能力,从而提升算法的鲁棒性。首先,对于浅层特征,利用一个轻量并且有效的特征融合策略,通过一种数据增强技术模拟一些在复杂场景中的变化,例如遮挡、相似物干扰、快速运动等来增强浅层特征的纹理特性;其次,对于高层特征,提出一个像素感知的全局上下文注意力机制模块(PCAM)来提高目标的长时定位能力;最后,在三个具有挑战性的跟踪基准库OTB2015、GOT-10K和2018年视觉目标跟踪库(VOT2018)上进行大量实验。实验结果表明,所提算法在OTB2015和GOT-10K上的成功率指标比基准SiamFC分别高出6.3个百分点和4.1个百分点,并且以每秒45帧的速度运行达到实时跟踪。在VOT2018实时挑战上,所提算法的平均期望重叠率指标超过2018年的冠军,即高性能的候选区域孪生视觉跟踪器(SiamRPN),验证了所提算法的有效性。

关 键 词:视觉跟踪   数据增强   注意力机制   全局上下文   长时定位
收稿时间:2020-04-23
修稿时间:2020-06-30

Multi-level feature enhancement for real-time visual tracking
FEI Dasheng,SONG Huihui,ZHANG Kaihua. Multi-level feature enhancement for real-time visual tracking[J]. Journal of Computer Applications, 2005, 40(11): 3300-3305. DOI: 10.11772/j.issn.1001-9081.2020040514
Authors:FEI Dasheng  SONG Huihui  ZHANG Kaihua
Affiliation:1. Jiangsu Key Laboratory of Big Data Analysis Technology(Nanjing University of Information Science and Technology), Nanjing Jiangsu 210044, China;2. Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology(Nanjing University of Information Science and Technology), Nanjing Jiangsu 210044, China
Abstract:In order to solve the problem of Fully-Convolutional Siamese visual tracking network (SiamFC) that the tracking target drifts when the similar semantic information interferers occur, resulting in tracking failure, a Multi-level Feature Enhanced Siamese network (MFESiam) was designed to improve the robustness of the tracker by enhancing the representation capabilities of the high-level and shallow-level features respectively. Firstly, a lightweight and effective feature fusion strategy was adopted for shallow-level features. A data enhancement technology was utilized to simulate some changes in complex scenes, such as occlusion, similarity interference and fast motion, to enhance the texture characteristics of shallow features. Secondly, for high-level features, a Pixel-aware global Contextual Attention Module (PCAM) was proposed to improve the localization ability to capture long-range dependence. Finally, many experiments were conducted on three challenging tracking benchmarks:OTB2015, GOT-10K and 2018 Visual-Object-Tracking (VOT2018). Experimental results show that the proposed algorithm has the success rate index on OTB2015 and GOT-10K better than the benchmark SiamFC by 6.3 percentage points and 4.1 percentage points respectively and runs at 45 frames per second to achieve the real-time tracking. The expected average overlap index of the proposed algorithm surpasses the champion in the VOT2018 real-time challenge, that is the high-performance Siamese with Region Proposal Network (SiamRPN), which verifies the effectiveness of the proposed algorithm.
Keywords:visual tracking   data enhancement   attention mechanism   global context   long-range location
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号