Deepfake Detection and PAD Are Not the Same Problem

Most identity pipelines run deepfake detectors on presentation attacks and get a clean pass. Here's why the architecture matters, and what each detector is actually solving.
Mohamed Ochalhi
May 20, 2026
l
3
 min read
What are deepfakes — business risk overview article
Table of Content
No items found.

Introduction

Your deepfake detector just returned clean on a presentation attack. Not because the attack was sophisticated. Because the detector was never designed to catch it.

Deepfake detection and presentation attack detection (PAD) are treated as interchangeable in most vendor decks and many internal threat models. They are not. They operate on different signal types, look for different artifacts, and fail in different directions. Conflating them doesn't just create a gap in coverage. It creates a gap that an attacker can map and exploit deliberately.

Attack vector × detector coverage
Deepfake detector
PAD system
Injected synthetic face
AI-generated, no re-capture
Detected
Generation artifacts intact
Partial
No depth signal to test
Deepfake via screen re-capture
Phone filmed, artifacts laundered
Missed
Moiré + compression destroy signal
Detected
No liveness or depth present
Printed photo
Physical paper, held to camera
Varies
Depends on photo origin
Detected
Flat — fails liveness and depth
Virtual camera injection
Deepfake piped into video stream
Detected
Artifacts present in raw stream
Blind spot
No physical presence to measure
Live real person
Legitimate user in frame
Pass
No generation artifacts
Pass
Liveness + depth confirmed
Catches / passes
Misses / blind spot
Partial / varies

Key Takeaways

  • Re-capturing a deepfake through a screen destroys the AI generation artifacts that deepfake detectors rely on.
  • Presentation attack detection looks for liveness cues, depth signals, and temporal coherence. Not generation artifacts.
  • The two problems require different architectures, not different confidence thresholds on the same model.
  • Before asking how accurate your detector is, know which attack path you are actually exposed to.

Why Your Deepfake Detector Returns Clean on a Screen Attack

Deepfake detectors are trained to find the fingerprints that AI generation leaves behind: spatial frequency anomalies, blending boundary artifacts, inconsistencies in eye reflections and skin texture rendered at pixel level. These signals are subtle, and they are also fragile.

When an attacker re-captures a deepfake by playing it on a screen and filming it with a phone, using the simplest and most common presentation attack vector, something important happens to those signals. The compression applied during screen playback flattens them. The moiré pattern introduced by filming a display at a slight angle scrambles them. Color shift from the display's white balance adjustment buries them further. By the time the video reaches your detector, the generation artifacts that the model was trained to find are gone. The signal has been laundered through physics.

Deepfake detection
What the detector hunts
Spatial frequency anomalies Blending artifacts at face boundaries Specular highlight inconsistencies
Path 1
Direct injection
AI face generated
Generative model produces synthetic face
Injected into video stream
No re-capture — artifacts travel intact
Deepfake detector
Spatial frequencies intact — signal clear
Detected
Generation artifacts clearly present — detector catches it
Path 2
Screen re-capture
AI face generated
Same generative model, same synthetic face
Played on screen, filmed with phone
Re-capture introduces physical interference
Physics launders the signal
Compression flattens spatial frequencies
Moiré pattern scrambles boundary artifacts
Colour shift buries highlight fingerprints
Returns clean
Signal destroyed by re-capture physics — detector has nothing to find

This is not a theoretical edge case. It is the default attack path for opportunistic fraud. The attacker does not need to understand your detection architecture. They just need a screen and a phone.

What PAD Is Actually Looking For

Presentation attack detection operates on entirely different assumptions. It does not ask whether a face was generated by a model. It asks whether a face is physically present in three-dimensional space.

PAD systems look for liveness cues: involuntary micro-movements, pupillary response to light changes, the subtle mechanical texture of skin under directional illumination. They look for depth signals: the parallax behavior of a face that occupies three-dimensional space versus a flat surface that does not. They look for temporal coherence: whether the motion patterns across a video sequence are consistent with a person responding to a real environment in real time, or with a recording playing back at fixed frame intervals.

None of these signals care whether the face on the screen was generated by a diffusion model or filmed in a studio. They care whether the object presenting that face has physical depth and live behavior. A printed photo and a deepfake played on a phone fail for the same PAD reason: neither is a living person. The generative provenance is irrelevant to the attack surface PAD is covering.

Presentation attack detection
What the system measures
Liveness cues Depth / parallax Temporal coherence
Printed photo
Physical paper held to camera
Liveness cues No micro-movements
Depth / parallax Flat surface
Temporal coherence Static image
Caught by PAD
Fails all three signals
Screen replay
Video played on screen, filmed with phone
Liveness cues No pupillary response
Depth / parallax Flat screen
Temporal coherence Fixed frame intervals
Caught by PAD
Fails depth and liveness
Silicone mask
Physical 3D object worn over real face
Liveness cues No micro-movements or pupil response
Depth / parallax Has physical depth
Temporal coherence Unnatural motion patterns
Caught by PAD
Passes depth, fails liveness
Live real person
Legitimate user in front of camera
Liveness cues Micro-movements + pupil response
Depth / parallax Three-dimensional face
Temporal coherence Natural, non-repeating motion
Passes PAD
All three signals confirmed

The Practical Implication: Attack Path First, Detector Second

The question most teams ask is: how accurate is our detector? That is the wrong first question. The right first question is: what attack paths are we actually exposed to?

If your threat model is synthetic identity fraud at onboarding, where an attacker submits a fully AI-generated face to a KYC flow, then deepfake detection, properly implemented, is the right primary control. The generation artifacts will be present in the submitted media because there is no re-capture step.

If your threat model is presentation attacks, where an attacker holds a phone up to a camera, then PAD is the right control, and a deepfake detector is covering a different risk entirely. Running a deepfake detector on that flow and reporting its accuracy is technically correct and operationally meaningless for the actual threat.

In practice, most real-world identity fraud attempts involve both vectors, or at least an attacker who will pivot between them based on what gets through. A pipeline that runs only PAD is exposed to injected synthetic media. A pipeline that runs only deepfake detection is exposed to re-captured material. The architectures are complements, not substitutes, and they need to be deployed as such.

The Gap That Conflation Creates

The gap
Which system catches which attack
The two cells that matter: screen re-capture passes deepfake detection — physics destroyed the signal. Virtual camera injection passes PAD — there was never a physical surface to test.
Deepfake detection
Hunts generative artifacts
PAD
Measures physical presence
Injected synthetic face
AI-generated, no re-capture
Caught
Generation artifacts intact
Partial
No depth signal to test
Screen re-capture
Deepfake filmed via phone — artifacts laundered
Blind spot
Physics destroyed the signal
Caught
No liveness or depth
Printed photo
Physical paper held to camera
Varies
Depends on photo origin
Caught
Flat — fails depth and liveness
Virtual camera injection
Deepfake piped directly into video stream
Caught
Artifacts present in raw stream
Blind spot
No physical surface to test
Live real person
Legitimate user in frame
Pass
No generation artifacts
Pass
Liveness and depth confirmed
Caught / passes
Blind spot
Partial / varies

When teams conflate these problems, a predictable failure mode follows. They evaluate vendors on accuracy benchmarks, select a system that performs well on deepfake detection datasets, deploy it as their primary identity fraud control, and then discover it returns clean results on physical presentation attacks. The postmortem usually frames this as a detector accuracy problem. It is not. It is a threat model scoping problem.

The solution is not a better deepfake detector. It is understanding the distinction between the two problems before writing a vendor brief or setting an accuracy threshold. That distinction is architectural, not parametric. You cannot tune your way from one coverage area to the other.

The question worth sitting with: when your organization defines its detection requirements, is it starting from the attack paths it faces — or from the detection categories its current vendor supports?

Mohamed Ochalhi
DuckDuckGoose AI

About the author

Mohamed Ochalhi
DuckDuckGoose AI

Discover the Power of Explainable AI (XAI) Deepfake Detection

Schedule a free demo today to experience how our solutions can safeguard your organization from fraud, identity theft, misinformation & more