Present face video forgery detectors use huge or dual-stream backbones. We present {that a} single, light-weight fusion of two handcrafted cues can obtain greater accuracy with a a lot smaller mannequin. Based mostly on the Xception baseline mannequin (21.9 million parameters), we construct two detectors: LFWS, which provides a 1×1 convolution to mix a low-frequency Wavelet-Denoised Function (WDF) with the phase-only Spatial-Section Shallow Studying (SPSL) map, and LFWL, which merges WDF with Native Binary Patterns (LBP) in the identical manner. This further module provides solely 292 parameters, conserving the full at 21.9 million—smaller than F3Net (22.5 million) and fewer than half the scale of SRM (55.3 million). Even with this minimal overhead, the fused fashions enhance the typical space underneath the curve (AUC) from 74.8% to 78.6% on FaceForensics++ and from 70.5% to 74.9% on DFDC-Preview, positive aspects of three.8% and 4.4% over the Xception baseline. Additionally they constantly outperform F3Net, SRM, and SPSL in eight public benchmarks, with out further information or test-time augmentation. These outcomes present that rigorously paired, handcrafted options, mixed via the light-weight fusion block, can present state-of-the-art robustness at a considerably decrease price. Our findings recommend a have to reevaluate scale-driven design decisions in face video forgery detection.
- ‡ Carnegie Mellon College
- ** Work accomplished whereas at Apple

