Abstract: | Discriminative correlation filter (DCF) played a dominant role in visual tracking tasks in early years. However, with the recent development of deep learning, the Siamese based networks begin to prevail. Unlike DCF, most Siamese network based tracking methods take the first frame as the reference, while ignoring the information from the subsequent frames. As a result, these methods may fail under unforeseeable situations (e.g. target scale/size changes, variant illuminations, occlusions etc.). Meanwhile, other deep learning based tracking methods learn discriminative filters online, where the training samples are extracted from a few fixed frames with predictable labels. However, these methods have the same limitations as Siamese-based trackers. The training samples are prone to have cumulative errors, which ultimately lead to tracking loss. In this situation, we propose SiamET, a Siamese-based network using Resnet-50 as its backbone with enhanced template module. Different from existing methods, our templates are acquired based on all historical frames. Extensive experiments have been carried out on popular datasets to verify the effectiveness of our method. It turns out that our tracker achieves superior performances than the state-of-the-art methods on 4 challenging benchmarks, including OTB100, VOT2018, VOT2019 and LaSOT. Specifically, we achieve an EAO score of 0.480 on VOT2018 with 31 FPS. Code is available at https://github.com/yu-1238/SiamET |