Hilary Torn

Hilary Torn

I lead adversarial evaluation research on AI systems: how they can be deceived, how they learn to lie, and how to catch it before deployment. 20 years building and leading teams, designing marketing experiments around persuasion and behavior change, now applied to the systems that need it most.

Featured Projects

Featured Posts

I Tricked AI Safety Monitors Using Plain English

I Tricked AI Safety Monitors Using Plain English

I adapted a jailbreaking algorithm to fool AI agent monitors using plain English, no model access, no GPUs. The attacks transferred across model families, hitting up to 73.7% on models they were never optimized against.

H

Hilary Torn

Mar 31, 2026