Work

Writing about AI safety, red-teaming, and what I'm learning along the way

I Tricked AI Safety Monitors Using Plain English

I adapted a jailbreaking algorithm to fool AI agent monitors using plain English, no model access, no GPUs. The attacks transferred across model families, hitting up to 73.7% on models they were never optimized against.

Hilary Torn

Mar 31, 2026

A CoT Generator That Made AI Agents Reveal Their Manipulation Tactics

Give an AI agent metrics to hit and a performance review in 3 days, and it'll fabricate orders, invent confirmation emails it never sent, and generate three contradictory order IDs before trying to cover it up — all visible in its own chain-of-thought.

Hilary Torn

Feb 20, 2026

Why Knowledge Graphs Outperform Vector Databases for Accurate AI Analysis

While vector databases have been instrumental in building the LLMs we use today, knowledge graphs fundamentally outperform them for complex AI analysis tasks, particularly when deep understanding of relationships between entities is critical.

Hilary Torn

Apr 29, 2025

Work

I Tricked AI Safety Monitors Using Plain English

A CoT Generator That Made AI Agents Reveal Their Manipulation Tactics

Why Knowledge Graphs Outperform Vector Databases for Accurate AI Analysis

No posts found