Presentation attack detection based on two-stream vision transformers with self-attention fusion |
| |
Affiliation: | 1. University of Surrey, Centre for Vision, Speech and Signal Processing (CVSSP), Guildford, UK;2. Data Mining Laboratory, Department of Engineering, College of Farabi, University of Tehran, Tehran, Iran |
| |
Abstract: | Aiming at the performance degradation of the existing presentation attack detection methods due to the illumination variation, a two-stream vision transformers framework (TSViT) based on transfer learning in two complementary spaces is proposed in this paper. The face images of RGB color space and multi-scale retinex with color restoration (MSRCR) space are fed to TSViT to learn the distinguishing features of presentation attack detection. To effectively fuse features from two sources (RGB color space images and MSRCR images), a feature fusion method based on self-attention is built, which can effectively capture the complementarity of two features. Experiments and analysis on Oulu-NPU, CASIA-MFSD, and Replay-Attack databases show that it outperforms most existing methods in intra-database testing and achieves good generalization performance in cross-database testing. |
| |
Keywords: | Presentation attack detection Multi-scale retinex with color restoration Vision transformer Deep learning Feature fusion |
本文献已被 ScienceDirect 等数据库收录! |
|