Don't let broken
agent changes merge.

Maida gates your PRs with behavioral checks for AI agents. It compares current runs against checked-in baselines and fails CI when structural behavior regresses: more steps, unexpected tool calls, loops, latency spikes, or cost blowups.

Example CI output
$ maida assert <RUN_ID> --baseline baselines/main.json --policy .maida/policy.yaml --format markdown

[FAIL] step_count       6 -> 22 (+267%)
[FAIL] tool_calls       4 -> 19 (+375%)
[FAIL] new_tool_path    salesforce_api
[FAIL] cost_estimate    $0.012 -> $0.041 (+242%)
[PASS] loop_risk        none detected

4 checks failed. PR blocked.
Output evals
Was the answer good?
Production tools
What happened after deploy?
Maida
Did this PR change agent behavior before merge?
The problem

Code changes break agents.
Nobody notices.

A prompt tweak can preserve the final answer while changing the execution path completely. The agent may take more steps, call a new tool, retry more often, or enter a loop. Manual smoke tests and output-only checks often miss that kind of regression.

Output evals ask whether the answer was good. Maida asks whether behavior changed.

Output evals are useful. Production tools are useful. Maida sits earlier in the workflow: before merge, inside CI, where a PR can still be blocked.

How it works

Install. Record. Baseline.
Block regressions.

Maida is the pre-merge behavioral regression gate for AI agents.

01

Install Maida

Install the `maida-ai` package from PyPI. It provides the `maida` CLI and the `maida` Python module.

pip install maida-ai
02

Instrument your agent

Wrap one agent entrypoint so Maida can record runs.

from maida import trace

@trace
def run_agent(input: str):
    # Your existing agent code
    ...
03

Capture a known-good baseline

Run your agent once on main, find the run ID, and check in the baseline JSON.

python my_agent.py
maida list --json
maida baseline <RUN_ID> --out baselines/my_agent.json
04

Assert future runs

Compare each new run against the baseline and fail when policy says behavior regressed.

python my_agent.py
maida assert <RUN_ID> --baseline baselines/my_agent.json --policy .maida/policy.yaml --format markdown
maida diff <RUN_ID> --baseline baselines/my_agent.json
maida view <RUN_ID>
GitHub Action

Block bad PRs
before merge.

Add maida-ai/maida-assert@v2 to pull requests. Your agent-script must use @trace or traced_run() so Maida can record a run.

See the action repo for the current inputs: agent-script, baseline, policy, maida-version, python-version, extra-args, and post-comment.

.github/workflows/agent-regression-check.yml
name: Agent Regression Check

on: [pull_request]

jobs:
  agent-check:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      - uses: maida-ai/maida-assert@v2
        with:
          agent-script: my_agent.py
          baseline: baselines/my_agent.json
          policy: .maida/policy.yaml
          python-version: '3.12'
Behavioral checks

What Maida flags
before merge.

These are behavioral regression signals. They do not prove output quality. They tell you when the agent's execution behavior changed relative to a baseline.

01
step count
02
tool call count
03
unexpected tool path
04
loop risk
05
guardrail events
06
latency envelope
07
cost envelope
08
stop condition
.maida/policy.yaml
assert:
  no_loops: true
  no_guardrails: true
  step_tolerance: 0.5
  expect_status: ok

Policy-as-code

Commit the policy file beside your baseline. The policy defines which behavior changes the CI check should tolerate and which changes should block the pull request.

Example regressions Maida catches

These are product examples, not customer testimonials.

Prompt change increases steps

The final answer still looks fine, but the agent now takes 22 steps instead of 6. Maida flags the step-count regression before the PR merges.

New tool path appears

A refactor introduces a tool call that never appeared in the baseline. Maida fails the check until the behavior is fixed or explicitly accepted in policy.

Agent stops reaching its final state

The run produces partial output, but no longer reaches the expected terminal condition. Maida catches the missing stop condition in CI.

Loop risk appears

A model or prompt change causes repeated calls with similar inputs. Maida records the pattern and fails the PR when policy says loops are not allowed.

Local-first

Runs stay in
your environment.

Runs on your machine or GitHub Actions runner. No cloud account required. No telemetry by default. Runs stay in your environment unless you explicitly export them.

Maida writes local JSONL files under ~/.maida/runs/. You can inspect them, archive them, or delete them.

Start with one agent workflow

Try Maida on
your next agent change.

Install Maida, capture a known-good baseline, and run a behavioral check against your next prompt, model, tool, or framework change. No cloud account required. No telemetry by default. Runs locally or in your GitHub Actions runner.

Install `maida-ai`
Add `@trace` to one agent entrypoint
Capture a known-good baseline
Add `.maida/policy.yaml`
Run `maida-ai/maida-assert@v2` on pull requests
FAQ

Common questions

Do I need a Maida cloud account?
No. Maida is local-first. It runs on your machine or CI runner, and runs stay in your environment unless you explicitly export them.
Can I use Maida in GitHub Actions?
Yes. Use maida-ai/maida-assert@v2 to run behavioral checks in CI.
Is Maida an eval tool?
No. Output evals ask whether the answer is good. Maida asks whether the agent's execution behavior changed before merge.
Why is the package called `maida-ai` but the CLI is `maida`?
The PyPI package is maida-ai. It installs the maida command-line tool and the maida Python module.