International Journal of Computer Vision

Towards Unified Defense for Face Forgery and Spoofing Attacks
via Dual Space Reconstruction Learning

Junyi Cao1     Ke-Yue Zhang2     Taiping Yao2     Shouhong Ding2     Xiaokang Yang1     Chao Ma1    


1 MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University    
2 Youtu Lab, Tencent    


pipeline

Abstract


Real-world face recognition systems are vulnerable to diverse face attacks, ranging from digitally manipulated artifacts to physically crafted spoofing attacks. Existing works primarily focus on utilizing an image classification network to address one type of attack but disregarding another one. However, in real-world scenarios, face recognition systems always encounter diverse simultaneous attacks, rendering the aforementioned single-attack detecting solution ineffective. Besides, excessive reliance on a classifier might easily fail when encountering face attacks with unknown patterns, as the category-level difference learned by classification backbones cannot generalize well to new attacks. Considering that real data are captured from actual individuals, while attack samples are generated by various distinct techniques, our focus is on extracting compact representations of real faces. This approach allows us to identify the fundamental differences between genuine and attack images, enabling us to address both manipulated artifacts and spoofing attacks simultaneously. Concretely, we propose a dual space reconstruction learning framework that models the commonalities of genuine faces in both spatial and frequency domains. With the learned characteristics of real faces, the model is more likely to segregate diverse attack samples as outliers from genuine images. Besides, we introduce a dynamic filtering module that filters out the redundant information brought in from the reconstruction and enhances the critical divergence between the real and the attack to achieve better classification features. Since the training samples only cover limited style variations which hampers the generalization to unseen domains, we further design a consistency regularized training strategy that mimics distribution shifts during training and imposes specific constraints to encourage style-irrelevant features. Moreover, in view of the lack of accessible benchmarks for unified evaluation of the detection competence against both face forgery and spoofing attacks, we set up a new challenging benchmark, named UniAttack, to foster the exploration of effective solutions to face attack detection. Both qualitative and quantitative results from existing and proposed benchmarks unequivocally demonstrate the superiority of our methods over state-of-the-art approaches.


New Benchmark: UniAttack


In real-world scenarios, the integrity of deployed face recognition systems is threatened by both digitally manipulated face forgeries and physically crafted spoofing samples. Regrettably, the current research focus predominantly revolves around singular approaches that target either face forgeries or spoofing samples, thereby neglecting the crucial aspect of comprehensive defense against these malicious attacks. This overlook in unified detection strategies can be attributed, in part, to the absence of clearly defined benchmarks for evaluating model performance concerning both face forgery and spoofing attacks. In an effort to advance the development of effective countermeasures against a diverse range of facial attacks, we present a novel benchmark, named UniAttack, specifically designed for the unified detection of face forgery and spoofing attacks. The establishment of such a benchmark aims to foster the exploration of robust and generalizable solutions to enhance the security and reliability of face recognition systems in the presence of sophisticated malicious threats.

UniAttack

The proposed UniAttack benchmark contains three evaluation protocols. Protocol I is designed for intra-dataset assessment. Protocol II is used for evaluating models' competence in a cross-dataset scenario. Protocol III is for cross-type evaluation in which models are tested on unseen attack methods.


Experiments



Intra-dataset testing on FaceForensics++

Intra-dataset testing comparisons on FaceForensics++. Our method performs favorably over current state-of-the-art approaches.

Cross-dataset testing on OCIM

Cross-dataset testing on OULU-NPU, CASIA-MFSD, Replay-Attack, and MSU-MFSD.


Visual Results



Reconstruction visualization

Reconstruction visualization. (a): Experiments in the intra-dataset testing on FaceForensics++; (b): Experiments in the cross-dataset testing on I&C&M to O protocol. It is observed that the real faces can be well reconstructed with little blur, while the attack faces cannot be restored soundly, as evidenced by the large reconstruction difference in both spatial and frequency spaces. This figure is best viewed in color with zoomed-in.

Grad-CAM visualization

The Grad-CAM visualization. (a): Experiments in the intra-dataset testing on FaceForensics++; (b): Experiments in the cross-dataset testing on O&M&I to C protocol. Samples in the first row are the input images, while those in the second and the third rows are Grad-CAM heatmaps of the baseline method and our approach, respectively. Best viewed in color.


Downloads



Reference