Deepfake Detection vs PAD: Know the Gap

Table of Content

No items found.

Introduction

Your deepfake detector just returned clean on a presentation attack. Not because the attack was sophisticated. Because the detector was never designed to catch it.

Deepfake detection and presentation attack detection (PAD) are treated as interchangeable in most vendor decks and many internal threat models. They are not. They operate on different signal types, look for different artifacts, and fail in different directions. Conflating them doesn't just create a gap in coverage. It creates a gap that an attacker can map and exploit deliberately.

Attack vector × detector coverage

Deepfake detector

PAD system

Injected synthetic face

AI-generated, no re-capture

Detected

Generation artifacts intact

Partial

No depth signal to test

Deepfake via screen re-capture

Phone filmed, artifacts laundered

Missed

Moiré + compression destroy signal

Detected

No liveness or depth present

Printed photo

Physical paper, held to camera

Varies

Depends on photo origin

Detected

Flat — fails liveness and depth

Virtual camera injection

Deepfake piped into video stream

Detected

Artifacts present in raw stream

Live real person

Legitimate user in frame

Pass

No generation artifacts

Pass

Liveness + depth confirmed

Catches / passes

Misses / blind spot

Partial / varies

Key Takeaways

Re-capturing a deepfake through a screen destroys the AI generation artifacts that deepfake detectors rely on.
Presentation attack detection looks for liveness cues, depth signals, and temporal coherence. Not generation artifacts.
The two problems require different architectures, not different confidence thresholds on the same model.
Before asking how accurate your detector is, know which attack path you are actually exposed to.

Why Your Deepfake Detector Returns Clean on a Screen Attack

Deepfake detectors are trained to find the fingerprints that AI generation leaves behind: spatial frequency anomalies, blending boundary artifacts, inconsistencies in eye reflections and skin texture rendered at pixel level. These signals are subtle, and they are also fragile.

When an attacker re-captures a deepfake by playing it on a screen and filming it with a phone, using the simplest and most common presentation attack vector, something important happens to those signals. The compression applied during screen playback flattens them. The moiré pattern introduced by filming a display at a slight angle scrambles them. Color shift from the display's white balance adjustment buries them further. By the time the video reaches your detector, the generation artifacts that the model was trained to find are gone. The signal has been laundered through physics.

Deepfake detection

What the detector hunts

Spatial frequency anomalies Blending artifacts at face boundaries Specular highlight inconsistencies

Path 1

Direct injection

AI face generated

Generative model produces synthetic face

Injected into video stream

No re-capture — artifacts travel intact

Deepfake detector

Spatial frequencies intact — signal clear

Detected

Generation artifacts clearly present — detector catches it

Path 2

Screen re-capture

AI face generated

Same generative model, same synthetic face

Played on screen, filmed with phone

Re-capture introduces physical interference

Physics launders the signal

Compression flattens spatial frequencies

Moiré pattern scrambles boundary artifacts

Colour shift buries highlight fingerprints

Returns clean

Signal destroyed by re-capture physics — detector has nothing to find

This is not a theoretical edge case. It is the default attack path for opportunistic fraud. The attacker does not need to understand your detection architecture. They just need a screen and a phone.

‍

What PAD Is Actually Looking For

Presentation attack detection operates on entirely different assumptions. It does not ask whether a face was generated by a model. It asks whether a face is physically present in three-dimensional space.

PAD systems look for liveness cues: involuntary micro-movements, pupillary response to light changes, the subtle mechanical texture of skin under directional illumination. They look for depth signals: the parallax behavior of a face that occupies three-dimensional space versus a flat surface that does not. They look for temporal coherence: whether the motion patterns across a video sequence are consistent with a person responding to a real environment in real time, or with a recording playing back at fixed frame intervals.

None of these signals care whether the face on the screen was generated by a diffusion model or filmed in a studio. They care whether the object presenting that face has physical depth and live behavior. A printed photo and a deepfake played on a phone fail for the same PAD reason: neither is a living person. The generative provenance is irrelevant to the attack surface PAD is covering.

Presentation attack detection

What the system measures

Liveness cues Depth / parallax Temporal coherence

Printed photo

Physical paper held to camera

Liveness cues No micro-movements

Depth / parallax Flat surface

Temporal coherence Static image

Caught by PAD

Fails all three signals

Screen replay

Video played on screen, filmed with phone

Liveness cues No pupillary response

Depth / parallax Flat screen

Temporal coherence Fixed frame intervals

Caught by PAD

Fails depth and liveness

Silicone mask

Physical 3D object worn over real face

Liveness cues No micro-movements or pupil response

Depth / parallax Has physical depth

Temporal coherence Unnatural motion patterns

Caught by PAD

Passes depth, fails liveness

Live real person

Legitimate user in front of camera

Liveness cues Micro-movements + pupil response

Depth / parallax Three-dimensional face

Temporal coherence Natural, non-repeating motion

Passes PAD

All three signals confirmed

The Practical Implication: Attack Path First, Detector Second

The question most teams ask is: how accurate is our detector? That is the wrong first question. The right first question is: what attack paths are we actually exposed to?

If your threat model is synthetic identity fraud at onboarding, where an attacker submits a fully AI-generated face to a KYC flow, then deepfake detection, properly implemented, is the right primary control. The generation artifacts will be present in the submitted media because there is no re-capture step.

If your threat model is presentation attacks, where an attacker holds a phone up to a camera, then PAD is the right control, and a deepfake detector is covering a different risk entirely. Running a deepfake detector on that flow and reporting its accuracy is technically correct and operationally meaningless for the actual threat.

In practice, most real-world identity fraud attempts involve both vectors, or at least an attacker who will pivot between them based on what gets through. A pipeline that runs only PAD is exposed to injected synthetic media. A pipeline that runs only deepfake detection is exposed to re-captured material. The architectures are complements, not substitutes, and they need to be deployed as such.

‍

The Gap That Conflation Creates

The gap

Which system catches which attack

The two cells that matter: screen re-capture passes deepfake detection — physics destroyed the signal. Virtual camera injection passes PAD — there was never a physical surface to test.

Deepfake detection

Hunts generative artifacts

PAD

Measures physical presence

Injected synthetic face

AI-generated, no re-capture

Caught

Generation artifacts intact

Partial

No depth signal to test

Screen re-capture

Deepfake filmed via phone — artifacts laundered

Caught

No liveness or depth

Printed photo

Physical paper held to camera

Varies

Depends on photo origin

Caught

Flat — fails depth and liveness

Virtual camera injection

Deepfake piped directly into video stream

Caught

Artifacts present in raw stream

Live real person

Legitimate user in frame

Pass

No generation artifacts

Pass

Liveness and depth confirmed

Caught / passes

Blind spot

Partial / varies

When teams conflate these problems, a predictable failure mode follows. They evaluate vendors on accuracy benchmarks, select a system that performs well on deepfake detection datasets, deploy it as their primary identity fraud control, and then discover it returns clean results on physical presentation attacks. The postmortem usually frames this as a detector accuracy problem. It is not. It is a threat model scoping problem.

The solution is not a better deepfake detector. It is understanding the distinction between the two problems before writing a vendor brief or setting an accuracy threshold. That distinction is architectural, not parametric. You cannot tune your way from one coverage area to the other.

The question worth sitting with: when your organization defines its detection requirements, is it starting from the attack paths it faces — or from the detection categories its current vendor supports?

Mohamed Ochalhi

DuckDuckGoose AI

About the author

Mohamed Ochalhi

DuckDuckGoose AI

Deepfake Detection and PAD Are Not the Same Problem

Introduction

Key Takeaways

Why Your Deepfake Detector Returns Clean on a Screen Attack

What PAD Is Actually Looking For

The Practical Implication: Attack Path First, Detector Second

The Gap That Conflation Creates

About the author

Discover the Power of Explainable AI (XAI) Deepfake Detection