Introduction
Your deepfake detector just returned clean on a presentation attack. Not because the attack was sophisticated. Because the detector was never designed to catch it.
Deepfake detection and presentation attack detection (PAD) are treated as interchangeable in most vendor decks and many internal threat models. They are not. They operate on different signal types, look for different artifacts, and fail in different directions. Conflating them doesn't just create a gap in coverage. It creates a gap that an attacker can map and exploit deliberately.
Key Takeaways
- Re-capturing a deepfake through a screen destroys the AI generation artifacts that deepfake detectors rely on.
- Presentation attack detection looks for liveness cues, depth signals, and temporal coherence. Not generation artifacts.
- The two problems require different architectures, not different confidence thresholds on the same model.
- Before asking how accurate your detector is, know which attack path you are actually exposed to.
Why Your Deepfake Detector Returns Clean on a Screen Attack
Deepfake detectors are trained to find the fingerprints that AI generation leaves behind: spatial frequency anomalies, blending boundary artifacts, inconsistencies in eye reflections and skin texture rendered at pixel level. These signals are subtle, and they are also fragile.
When an attacker re-captures a deepfake by playing it on a screen and filming it with a phone, using the simplest and most common presentation attack vector, something important happens to those signals. The compression applied during screen playback flattens them. The moiré pattern introduced by filming a display at a slight angle scrambles them. Color shift from the display's white balance adjustment buries them further. By the time the video reaches your detector, the generation artifacts that the model was trained to find are gone. The signal has been laundered through physics.
This is not a theoretical edge case. It is the default attack path for opportunistic fraud. The attacker does not need to understand your detection architecture. They just need a screen and a phone.
What PAD Is Actually Looking For
Presentation attack detection operates on entirely different assumptions. It does not ask whether a face was generated by a model. It asks whether a face is physically present in three-dimensional space.
PAD systems look for liveness cues: involuntary micro-movements, pupillary response to light changes, the subtle mechanical texture of skin under directional illumination. They look for depth signals: the parallax behavior of a face that occupies three-dimensional space versus a flat surface that does not. They look for temporal coherence: whether the motion patterns across a video sequence are consistent with a person responding to a real environment in real time, or with a recording playing back at fixed frame intervals.
None of these signals care whether the face on the screen was generated by a diffusion model or filmed in a studio. They care whether the object presenting that face has physical depth and live behavior. A printed photo and a deepfake played on a phone fail for the same PAD reason: neither is a living person. The generative provenance is irrelevant to the attack surface PAD is covering.
The Practical Implication: Attack Path First, Detector Second
The question most teams ask is: how accurate is our detector? That is the wrong first question. The right first question is: what attack paths are we actually exposed to?
If your threat model is synthetic identity fraud at onboarding, where an attacker submits a fully AI-generated face to a KYC flow, then deepfake detection, properly implemented, is the right primary control. The generation artifacts will be present in the submitted media because there is no re-capture step.
If your threat model is presentation attacks, where an attacker holds a phone up to a camera, then PAD is the right control, and a deepfake detector is covering a different risk entirely. Running a deepfake detector on that flow and reporting its accuracy is technically correct and operationally meaningless for the actual threat.
In practice, most real-world identity fraud attempts involve both vectors, or at least an attacker who will pivot between them based on what gets through. A pipeline that runs only PAD is exposed to injected synthetic media. A pipeline that runs only deepfake detection is exposed to re-captured material. The architectures are complements, not substitutes, and they need to be deployed as such.
The Gap That Conflation Creates
When teams conflate these problems, a predictable failure mode follows. They evaluate vendors on accuracy benchmarks, select a system that performs well on deepfake detection datasets, deploy it as their primary identity fraud control, and then discover it returns clean results on physical presentation attacks. The postmortem usually frames this as a detector accuracy problem. It is not. It is a threat model scoping problem.
The solution is not a better deepfake detector. It is understanding the distinction between the two problems before writing a vendor brief or setting an accuracy threshold. That distinction is architectural, not parametric. You cannot tune your way from one coverage area to the other.
The question worth sitting with: when your organization defines its detection requirements, is it starting from the attack paths it faces — or from the detection categories its current vendor supports?













.png)





