Documentation

Everything you need to start catching prompt regressions.

Quick Start

Install the SDK and set your API keys. You'll be running checks in under 2 minutes.

pip install windtunnel-ai

export WINDTUNNEL_API_KEY=wt_your_key_here
export ANTHROPIC_API_KEY=sk-ant-...
Get your API key

Find your API key in the API Keys section of the dashboard.

Record Interactions

Wrap your agent to automatically record every interaction to Windtunnel.

from windtunnel import WindTunnel

wt = WindTunnel(api_key="wt_your_key")

# In your agent handler:
response = your_agent.run(user_message)

wt.record(
    user_input=user_message,
    agent_output=response,
    prompt_version="v1",       # track your prompt version
    model="claude-haiku-4-5",  # optional
)

Recorded interactions become the test suite for future checks. The more you record, the better your coverage.

Run a Check

Compare your baseline prompt against a challenger. Windtunnel replays your recorded interactions through both and scores the results with an LLM judge.

windtunnel check \
  --baseline @prompts/v1.txt \
  --challenger @prompts/v2.txt \
  --n 20 \
  --fail-on-regression
🌪️ Windtunnel check starting...
Fetching 20 production interactions...
Testing interaction 1/20...
✅ DEPLOY APPROVED — 5% regression rate (1/20 worse)
Run ID: run_abc123
🚫 DEPLOY BLOCKED — 60% regression rate (12/20 worse)
Run ID: run_xyz789 · Exit code: 1

CI/CD Integration

Add Windtunnel to your GitHub Actions workflow to automatically block merges when prompt quality degrades. Copy windtunnel.yml from the dashboard into .github/workflows/ in your repo, then add your API key as a repository secret.

# .github/workflows/windtunnel.yml
name: Windtunnel Check

on:
  pull_request:
    branches: [main]

jobs:
  windtunnel-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - run: pip install windtunnel-ai

      - name: Run Windtunnel check
        env:
          WINDTUNNEL_API_KEY: ${{ secrets.WINDTUNNEL_API_KEY }}
        run: |
          python - <<'EOF'
          import os, sys, json
          from pathlib import Path
          from windtunnel import WindTunnel

          wt = WindTunnel(api_key=os.environ["WINDTUNNEL_API_KEY"])

          # Reads prompts from env vars or falls back to files
          baseline  = os.environ.get("BASELINE_PROMPT")  or Path("prompts/baseline.txt").read_text()
          challenger = os.environ.get("CHALLENGER_PROMPT") or Path("prompts/challenger.txt").read_text()

          # Loads tests from windtunnel_tests.json or uses 3 example interactions
          if Path("windtunnel_tests.json").exists():
              interactions = json.loads(Path("windtunnel_tests.json").read_text())
          else:
              interactions = [
                  {"user_input": "What is 2+2?",
                   "baseline_output": "4", "challenger_output": "4"},
                  {"user_input": "What is the capital of France?",
                   "baseline_output": "Paris.", "challenger_output": "Paris is the capital of France."},
                  {"user_input": "Reverse a string in Python.",
                   "baseline_output": "Use s[::-1].", "challenger_output": "Use s[::-1] or reversed(s)."},
              ]

          result = wt.check(baseline_prompt=baseline, challenger_prompt=challenger,
                            interactions=interactions)

          print(f"Verdict: {result['verdict']}  |  Regression rate: {result['regression_rate']:.0%}")
          sys.exit(1 if result["verdict"] == "BLOCKED" else 0)
          EOF
Setup

1. Download windtunnel.yml from the dashboard and place it in .github/workflows/.

2. Add WINDTUNNEL_API_KEY to your repo's Settings → Secrets and variables → Actions.

3. Optionally add windtunnel_tests.json to your repo root with your test interactions.

Exits with code 1 if verdict is BLOCKED, failing the PR check and preventing the merge automatically.

Python SDK Reference

WindTunnel(api_key)
wt = WindTunnel(
    api_key: str,                   # required — your wt_* key
    anthropic_api_key: str = None,  # falls back to ANTHROPIC_API_KEY env
    supabase_url: str = None,       # optional override
    supabase_key: str = None        # optional override
)
wt.record(...)
wt.record(
    user_input: str,           # required
    agent_output: str,         # required
    prompt_version: str = 'v1',
    model: str = 'claude-haiku-4-5',
    metadata: dict = {},
    session_id: str = None     # auto-generated if not provided
) -> dict
wt.run_windtunnel(...)
wt.run_windtunnel(
    baseline_prompt: str,          # required
    challenger_prompt: str,        # required
    n_interactions: int = 10,
    baseline_version: str = 'v1',
    challenger_version: str = 'v2',
    run_name: str = None
) -> {
    run_id: str,
    verdict: 'APPROVED' | 'BLOCKED' | 'NEUTRAL',
    total: int,
    better: int,
    worse: int,
    neutral: int,
    regression_rate: float    # 0.0 – 1.0
}

CLI Reference

windtunnel check
windtunnel check [OPTIONS]

Options:
  --api-key TEXT        Windtunnel API key  [env: WINDTUNNEL_API_KEY]
  --anthropic-key TEXT  Anthropic API key   [env: ANTHROPIC_API_KEY]
  --baseline TEXT       Baseline prompt or @file.txt  [required]
  --challenger TEXT     Challenger prompt or @file.txt  [required]
  --n INTEGER           Interactions to test  [default: 10]
  --fail-on-regression  Exit 1 if verdict is BLOCKED
windtunnel status
windtunnel status [OPTIONS]

Options:
  --api-key TEXT  Windtunnel API key  [env: WINDTUNNEL_API_KEY]

Verifies your connection and prints your project ID.