基于空洞单流ViT网络的灵活模态人脸呈现攻击检测方法 Flexible modal face presentation attack detection based on atrous single stream vision Transformer network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于空洞单流ViT网络的灵活模态人脸呈现攻击检测方法

引用本文：	肖立轩,封筠,高宇豪,贺晶晶. 基于空洞单流ViT网络的灵活模态人脸呈现攻击检测方法[J]. 计算机应用研究, 2024, 41(3): 916-922

作者姓名：	肖立轩封筠高宇豪贺晶晶

作者单位：	石家庄铁道大学信息科学与技术学院

基金项目：	国家自然科学基金资助项目(61772070,61972267)；;河北省高等学校科学技术研究重点项目(ZD2021333)；

摘要：	灵活模态人脸呈现攻击检测突破传统多模态方法对于模型训练与部署的模态一致性限制，可将统一模型按需灵活部署到多样模态的现实场景，但仍存在模型性能有待提升、计算资源需求高的问题。为此，提出一种以视觉Transformer(ViT)结构为基础的单流灵活模态人脸呈现攻击检测网络。提出空洞块嵌入模块以减少运算冗余，降低输入向量维度；为区分不同模态特征，设计模态编码标记；采用非补齐策略处理模态缺失问题。在公开多模态数据集上的实验结果表明，该方法在域内和跨域评估中分别获得2.69%和33.81%的最佳平均ACER值，相较于现有的三种方法，具有更优的域内和域外泛化性能，在不同子协议上的性能表现较为均衡，且其模型计算量与参数量均远低于多流方法，更加适合模态缺失场景下的灵活、高效部署。
关键词：	人脸呈现攻击检测灵活模态多模态视觉Transformer
收稿时间：	2023-07-16
修稿时间：	2024-02-04
Flexible modal face presentation attack detection based on atrous single stream vision Transformer network

xiaolixuan,fengjun,gaoyuhao and hejingjing. Flexible modal face presentation attack detection based on atrous single stream vision Transformer network[J]. Application Research of Computers, 2024, 41(3): 916-922

Authors:	xiaolixuan fengjun gaoyuhao hejingjing

Abstract:	Flexible modal face presentation attack detection can break through the limitations of traditional multi-modal methods on modal consistency in model training and deployment, and it can deploy the unified model flexibly to real scenarios of multiple modals on demand. However, there are still issues with improved model performance and high demand for computing resources. Therefore, this paper proposed a single stream flexible modal face presentation attack detection network based on vision Transformer. Furthermore, this paper proposed the atrous patch embedding module to address the operational redundancy problem and reduce the input vector dimension, designed the modal encoding token to distinguish different modal features, and adopted a non-padding strategy to solve the modal absence problem essentially. The experimental results on publicly available multi-modal datasets show that this method can obtain the best ACER averages of 2.69% and 33.81% in the intra-domain and cross-domain evaluations, respectively, and has excellent in-domain and out-of-domain generalization performance, and balanced performance across different sub-protocols compared to the existing three methods. It significantly reduces the quantities of calculations and parameters compared with multi-stream methods, and is more suitable for flexible and efficient deployment in modal absence scenarios.

Keywords:	face presentation attack detection flexible modal multi-modal vision Transformer

	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏