Abstract
As autonomous web agents become increasingly prevalent, understanding their vulnerabilities to manipulative interface designs becomes critical. We introduce DECEPTICON, a testing environment containing 700 web navigation tasks incorporating dark patterns—600 synthetically generated and 100 from real-world websites. Our evaluation reveals that dark patterns successfully manipulated agent behavior in over 70% of tested scenarios, while human resistance averaged only 31%. Critically, we find that larger, more capable models prove more susceptible to manipulation, and standard defensive approaches fail to meaningfully reduce dark pattern success rates.
Introduction
Dark patterns are deceptive user interface designs crafted to manipulate users into taking actions they might not otherwise take—from unwanted subscriptions to inadvertent data sharing. As AI agents increasingly perform autonomous web navigation on behalf of users, they inherit and amplify these vulnerabilities.
The stakes are significant: an agent trusted to complete a simple purchase might be manipulated into signing up for recurring subscriptions, adding unwanted items to cart, or agreeing to invasive data collection practices. Unlike humans who may recognize manipulation through experience, current web agents lack the skepticism necessary to resist these designs.
In this work, we systematically study how dark patterns affect web agent behavior and evaluate whether existing mitigation strategies provide meaningful protection.
Critical Finding
Larger, more capable models are more susceptible to dark pattern manipulation. This counterintuitive result suggests that increased reasoning capabilities may actually make agents more vulnerable to sophisticated deceptive designs.
Research Questions
RQ1: Manipulation Effectiveness
How effectively do dark patterns steer agent trajectories away from user-intended goals?
RQ2: Model Scale Effects
Does increased model capability correlate with improved or degraded resistance to manipulation?
RQ3: Defense Effectiveness
Can in-context prompting or guardrail models provide meaningful protection against dark patterns?
The DECEPTICON Environment
DECEPTICON provides a controlled testing environment for evaluating web agent robustness against dark patterns. The benchmark comprises 700 web navigation tasks across diverse scenarios:
Programmatically generated scenarios with controlled dark pattern implementations for systematic evaluation.
Tasks derived from actual websites with naturally occurring dark patterns for ecological validity.
Dark Pattern Taxonomy
We evaluate agent susceptibility across seven categories of dark patterns, each representing distinct manipulation mechanisms:
Forced Action
Requiring unwanted actions to proceed with intended task
Nagging
Repeated interruptions pushing toward undesired actions
Obstruction
Making desired actions deliberately difficult to complete
Sneaking
Hidden costs, items, or actions added without consent
Interface Interference
Visual manipulation favoring unwanted choices
Trick Wording
Confusing or misleading language
Urgency
False scarcity or time pressure
Key Results
| Agent / Baseline | Manipulation Rate | Resistance Rate |
|---|---|---|
| Smaller Models (Avg.) | 62% | 38% |
| Larger Models (Avg.) | >70% | <30% |
| State-of-the-Art Agents | 73% | 27% |
| Human Baseline | 69% | 31% |
Surprising Finding: Scale Hurts
Contrary to expectations, we observe that model capability correlates positively with susceptibility to dark patterns. Larger models with stronger reasoning abilities are more easily manipulated, possibly because they more readily follow the deceptive logic embedded in dark pattern designs.
Defense Strategies: An Unmitigated Risk
We evaluated two common defensive approaches against dark pattern manipulation:
In-Context Prompting
IneffectiveAdding warnings about dark patterns to the agent's system prompt ("Be cautious of manipulative designs...") showed no statistically significant improvement in resistance rates.
Guardrail Models
IneffectiveDeploying secondary models to review and potentially block agent actions failed to meaningfully reduce manipulation success rates. Guardrails often approved deceptive flows as "normal" UI interactions.
Implications
Dark patterns represent an unmitigated risk to web agents. Current defensive approaches—the same ones used to prevent other forms of agent misuse—provide essentially no protection against manipulative UI designs. This highlights an urgent need for novel defense mechanisms.
Why Agents Fall for Dark Patterns
Goal Hijacking
Dark patterns often present themselves as necessary steps toward the user's goal. Agents, optimized for task completion, interpret these as legitimate requirements rather than manipulation.
Visual Salience Bias
Agents preferentially interact with visually prominent elements. Dark patterns exploit this by making unwanted actions visually dominant while hiding desired actions.
Excessive Compliance
Larger models exhibit stronger instruction-following behavior, making them more likely to comply with the implicit "instructions" embedded in deceptive UI flows.
Lack of Skepticism
Unlike humans who develop pattern recognition for manipulation through experience, agents lack innate skepticism about interface designs and treat all UI elements as trustworthy.
Conclusion
Our findings reveal that dark patterns pose a fundamental challenge to web agent safety. The counterintuitive result that more capable models are more susceptible suggests this is not a problem that will be solved simply through scale.
The failure of standard defensive approaches—in-context prompting and guardrail models—to provide meaningful protection indicates that novel defense mechanisms are urgently needed. We release DECEPTICON as an open benchmark to catalyze research into robust web agents.
Future directions include:
- Training agents with adversarial dark pattern exposure
- Developing explicit skepticism mechanisms for UI interpretation
- Creating specialized dark pattern detection models
- Investigating user-in-the-loop confirmation for high-stakes actions
Citation
@article{cuvin2025decepticon,
title={DECEPTICON: How Dark Patterns Manipulate Web Agents},
author={Cuvin, Phil and Zhu, Hao and Yang, Diyi},
journal={arXiv preprint arXiv:2512.22894},
year={2025}
}