Attention meets involution in visual tracking期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Attention meets involution in visual tracking

Abstract:	In visual tracking, both convolution and attention are widely employed for feature enhancement and fusion. However, convolution does not adequately model global dependencies of samples due to its operation on local neighbors, while attention gives too much attention to global dependencies and too little to local dependencies. It is intrinsically infeasible to combine both methods to integrate global and local information. However, a recently-proposed model called involution uses kernels differing in spatial extent but sharing across channels, making it possible to take advantage of both convolution and attention. We propose an attention-involution (Att-Inv) model that uses an attention mechanism to generate involution kernels to take both global and local dependencies of samples into account. To improve the performance of our tracker, we develop and implement strategies of backbone network modification, template updates, and regression of bounding box distributions. We evaluate our tracker using benchmarks such as GOT10k, LaSOT, TrackingNet and OxUvA. Experimental results show that it is competitive with state-of-the-art trackers.

Keywords:	Visual tracking Involution Attention
本文献已被 ScienceDirect 等数据库收录！