Catch prompt regressions before they reach users. Record production traffic, replay it against your new prompt, block bad deploys automatically.
PRODUCT PREVIEW
Before & After
Integration
Three steps from zero to protected deploys.
2 lines of code captures every real user interaction your agent handles in production.
wt.record( user_input=q, agent_output=r )
Replay production interactions through both old and new prompts simultaneously.
result = wt.run_windtunnel( baseline_prompt=v1, challenger_prompt=v2 )
LLM-as-judge compares responses. Fails CI if regression exceeds your threshold.
- run: windtunnel check
--fail-on-regressionCI/CD
name: Windtunnel Check
on:
pull_request:
paths: ['prompts/**']
jobs:
windtunnel:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install windtunnel-ai
- run: windtunnel check \
--baseline @prompts/baseline.txt \
--challenger @prompts/challenger.txt \
--fail-on-regression
env:
WINDTUNNEL_API_KEY: ${{ secrets.WINDTUNNEL_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}