This is a demo — Sign up free to use with your real agent data →Create free account →
Windtunnel · Interactive Demo

A/B test your AI agent prompts
before you ship them.

Windtunnel runs your baseline and challenger prompts against the same test inputs, then uses an LLM judge to score each response pair. If your challenger regresses, it gets blocked.

Run · Sample DataBLOCKED

Support Bot — Prompt v2 vs v3

Ran 8 production interactions · Completed just now

🚫
Deploy Blocked
BLOCKED
50% regression rate — exceeds 30% threshold. Fix your challenger prompt before shipping.
exit code: 1
8
Total
4
Regressed
3
Improved
1
Neutral

Interaction Results

✗ WorseHow do I reset my password?
✓ BetterWhat payment methods do you accept?
✗ WorseCan I export my data?
✓ BetterHow long does shipping take?
✗ WorseDo you offer a free trial?
✓ BetterHow do I contact support?
✗ WorseIs my data secure?
— NeutralCan I change my subscription plan?

Ready to protect your own agent?

Create a free account and connect your first project in under 2 minutes.