DECEPTICON: How Dark Patterns Manipulate Web Agents

Abstract

As autonomous web agents become increasingly prevalent, understanding their vulnerabilities to manipulative interface designs becomes critical. We introduce DECEPTICON, a testing environment containing 700 web navigation tasks incorporating dark patterns—600 synthetically generated and 100 from real-world websites. Our evaluation reveals that dark patterns successfully manipulated agent behavior in over 70% of tested scenarios, while human resistance averaged only 31%. Critically, we find that larger, more capable models prove more susceptible to manipulation, and standard defensive approaches fail to meaningfully reduce dark pattern success rates.

Introduction

Dark patterns are deceptive user interface designs crafted to manipulate users into taking actions they might not otherwise take—from unwanted subscriptions to inadvertent data sharing. As AI agents increasingly perform autonomous web navigation on behalf of users, they inherit and amplify these vulnerabilities.

The stakes are significant: an agent trusted to complete a simple purchase might be manipulated into signing up for recurring subscriptions, adding unwanted items to cart, or agreeing to invasive data collection practices. Unlike humans who may recognize manipulation through experience, current web agents lack the skepticism necessary to resist these designs.

In this work, we systematically study how dark patterns affect web agent behavior and evaluate whether existing mitigation strategies provide meaningful protection.

Critical Finding

Larger, more capable models are more susceptible to dark pattern manipulation. This counterintuitive result suggests that increased reasoning capabilities may actually make agents more vulnerable to sophisticated deceptive designs.

Research Questions

RQ1: Manipulation Effectiveness

How effectively do dark patterns steer agent trajectories away from user-intended goals?

RQ2: Model Scale Effects

Does increased model capability correlate with improved or degraded resistance to manipulation?

RQ3: Defense Effectiveness

Can in-context prompting or guardrail models provide meaningful protection against dark patterns?

The DECEPTICON Environment

DECEPTICON provides a controlled testing environment for evaluating web agent robustness against dark patterns. The benchmark comprises 700 web navigation tasks across diverse scenarios:

600

Synthetic Tasks

Programmatically generated scenarios with controlled dark pattern implementations for systematic evaluation.

100

Real-World Tasks

Tasks derived from actual websites with naturally occurring dark patterns for ecological validity.

Dark Pattern Taxonomy

We evaluate agent susceptibility across seven categories of dark patterns, each representing distinct manipulation mechanisms:

Forced Action

Requiring unwanted actions to proceed with intended task

Nagging

Repeated interruptions pushing toward undesired actions

Obstruction

Making desired actions deliberately difficult to complete

Sneaking

Hidden costs, items, or actions added without consent

Interface Interference

Visual manipulation favoring unwanted choices

Trick Wording

Confusing or misleading language

Urgency

False scarcity or time pressure

Key Results

Agent / Baseline	Manipulation Rate	Resistance Rate
Smaller Models (Avg.)	62%	38%
Larger Models (Avg.)	>70%	<30%
State-of-the-Art Agents	73%	27%
Human Baseline	69%	31%

Surprising Finding: Scale Hurts

Contrary to expectations, we observe that model capability correlates positively with susceptibility to dark patterns. Larger models with stronger reasoning abilities are more easily manipulated, possibly because they more readily follow the deceptive logic embedded in dark pattern designs.

Defense Strategies: An Unmitigated Risk

We evaluated two common defensive approaches against dark pattern manipulation:

💬

In-Context Prompting

Ineffective

Adding warnings about dark patterns to the agent's system prompt ("Be cautious of manipulative designs...") showed no statistically significant improvement in resistance rates.

🛡️

Guardrail Models

Ineffective

Deploying secondary models to review and potentially block agent actions failed to meaningfully reduce manipulation success rates. Guardrails often approved deceptive flows as "normal" UI interactions.

Implications

Dark patterns represent an unmitigated risk to web agents. Current defensive approaches—the same ones used to prevent other forms of agent misuse—provide essentially no protection against manipulative UI designs. This highlights an urgent need for novel defense mechanisms.

Why Agents Fall for Dark Patterns

🎯

Goal Hijacking

Dark patterns often present themselves as necessary steps toward the user's goal. Agents, optimized for task completion, interpret these as legitimate requirements rather than manipulation.

👁️

Visual Salience Bias

Agents preferentially interact with visually prominent elements. Dark patterns exploit this by making unwanted actions visually dominant while hiding desired actions.

🤝

Excessive Compliance

Larger models exhibit stronger instruction-following behavior, making them more likely to comply with the implicit "instructions" embedded in deceptive UI flows.

🔄

Lack of Skepticism

Unlike humans who develop pattern recognition for manipulation through experience, agents lack innate skepticism about interface designs and treat all UI elements as trustworthy.

Conclusion

Our findings reveal that dark patterns pose a fundamental challenge to web agent safety. The counterintuitive result that more capable models are more susceptible suggests this is not a problem that will be solved simply through scale.

The failure of standard defensive approaches—in-context prompting and guardrail models—to provide meaningful protection indicates that novel defense mechanisms are urgently needed. We release DECEPTICON as an open benchmark to catalyze research into robust web agents.

Future directions include:

Training agents with adversarial dark pattern exposure
Developing explicit skepticism mechanisms for UI interpretation
Creating specialized dark pattern detection models
Investigating user-in-the-loop confirmation for high-stakes actions

Citation

@article{cuvin2025decepticon,
  title={DECEPTICON: How Dark Patterns Manipulate Web Agents},
  author={Cuvin, Phil and Zhu, Hao and Yang, Diyi},
  journal={arXiv preprint arXiv:2512.22894},
  year={2025}
}