Don't let broken
agent changes merge.

Maida gates your PRs with behavioral checks for AI agents. It compares current runs against checked-in baselines and fails CI when structural behavior regresses: more steps, unexpected tool calls, loops, latency spikes, or cost blowups.

Get started View GitHub

Example CI output

$ maida assert <RUN_ID> --baseline baselines/main.json --policy .maida/policy.yaml --format markdown

[FAIL] step_count       6 -> 22 (+267%)
[FAIL] tool_calls       4 -> 19 (+375%)
[FAIL] new_tool_path    salesforce_api
[FAIL] cost_estimate    $0.012 -> $0.041 (+242%)
[PASS] loop_risk        none detected

4 checks failed. PR blocked.

Output evals

Was the answer good?

Production tools

What happened after deploy?

Maida

Did this PR change agent behavior before merge?

The problem

Code changes break agents.
Nobody notices.

A prompt tweak can preserve the final answer while changing the execution path completely. The agent may take more steps, call a new tool, retry more often, or enter a loop. Manual smoke tests and output-only checks often miss that kind of regression.

Output evals ask whether the answer was good. Maida asks whether behavior changed.

Output evals are useful. Production tools are useful. Maida sits earlier in the workflow: before merge, inside CI, where a PR can still be blocked.

How it works

Install. Record. Baseline.
Block regressions.

Maida is the pre-merge behavioral regression gate for AI agents.

Install Maida

Install the `maida-ai` package from PyPI. It provides the `maida` CLI and the `maida` Python module.

pip install maida-ai

Instrument your agent

Wrap one agent entrypoint so Maida can record runs.

from maida import trace

@trace
def run_agent(input: str):
    # Your existing agent code
    ...

Capture a known-good baseline

Run your agent once on main, find the run ID, and check in the baseline JSON.

python my_agent.py
maida list --json
maida baseline <RUN_ID> --out baselines/my_agent.json

Assert future runs

Compare each new run against the baseline and fail when policy says behavior regressed.

python my_agent.py
maida assert <RUN_ID> --baseline baselines/my_agent.json --policy .maida/policy.yaml --format markdown
maida diff <RUN_ID> --baseline baselines/my_agent.json
maida view <RUN_ID>

GitHub Action

Block bad PRs
before merge.

Add maida-ai/maida-assert@v2 to pull requests. Your agent-script must use @trace or traced_run() so Maida can record a run.

See the action repo for the current inputs: agent-script, baseline, policy, maida-version, python-version, extra-args, and post-comment.

.github/workflows/agent-regression-check.yml

name: Agent Regression Check

on: [pull_request]

jobs:
  agent-check:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      - uses: maida-ai/maida-assert@v2
        with:
          agent-script: my_agent.py
          baseline: baselines/my_agent.json
          policy: .maida/policy.yaml
          python-version: '3.12'

Behavioral checks

What Maida flags
before merge.

These are behavioral regression signals. They do not prove output quality. They tell you when the agent's execution behavior changed relative to a baseline.

step count

tool call count

unexpected tool path

loop risk

guardrail events

latency envelope

cost envelope

stop condition

.maida/policy.yaml

assert:
  no_loops: true
  no_guardrails: true
  step_tolerance: 0.5
  expect_status: ok

Policy-as-code

Commit the policy file beside your baseline. The policy defines which behavior changes the CI check should tolerate and which changes should block the pull request.

Example regressions Maida catches

These are product examples, not customer testimonials.

Prompt change increases steps

The final answer still looks fine, but the agent now takes 22 steps instead of 6. Maida flags the step-count regression before the PR merges.

New tool path appears

A refactor introduces a tool call that never appeared in the baseline. Maida fails the check until the behavior is fixed or explicitly accepted in policy.

Agent stops reaching its final state

The run produces partial output, but no longer reaches the expected terminal condition. Maida catches the missing stop condition in CI.

Loop risk appears

A model or prompt change causes repeated calls with similar inputs. Maida records the pattern and fails the PR when policy says loops are not allowed.

Local-first

Runs stay in
your environment.

Runs on your machine or GitHub Actions runner. No cloud account required. No telemetry by default. Runs stay in your environment unless you explicitly export them.

Maida writes local JSONL files under ~/.maida/runs/. You can inspect them, archive them, or delete them.

Start with one agent workflow

Try Maida on
your next agent change.

Install Maida, capture a known-good baseline, and run a behavioral check against your next prompt, model, tool, or framework change. No cloud account required. No telemetry by default. Runs locally or in your GitHub Actions runner.

Install `maida-ai`

Add `@trace` to one agent entrypoint

Capture a known-good baseline

Add `.maida/policy.yaml`

Run `maida-ai/maida-assert@v2` on pull requests

Get started View GitHub Email contact@maida.ai

FAQ

Common questions

Do I need a Maida cloud account?

No. Maida is local-first. It runs on your machine or CI runner, and runs stay in your environment unless you explicitly export them.

Can I use Maida in GitHub Actions?

Yes. Use maida-ai/maida-assert@v2 to run behavioral checks in CI.

Is Maida an eval tool?

No. Output evals ask whether the answer is good. Maida asks whether the agent's execution behavior changed before merge.

Why is the package called `maida-ai` but the CLI is `maida`?

The PyPI package is maida-ai. It installs the maida command-line tool and the maida Python module.

Don't let broken agent changes merge.

Code changes break agents.Nobody notices.

Install. Record. Baseline.Block regressions.

Install Maida

Instrument your agent

Capture a known-good baseline

Assert future runs

Block bad PRsbefore merge.

What Maida flagsbefore merge.

Policy-as-code

Example regressions Maida catches

Prompt change increases steps

New tool path appears

Agent stops reaching its final state

Loop risk appears

Runs stay inyour environment.

Try Maida onyour next agent change.

Common questions

Don't let broken
agent changes merge.

Code changes break agents.
Nobody notices.

Install. Record. Baseline.
Block regressions.

Block bad PRs
before merge.

What Maida flags
before merge.

Runs stay in
your environment.

Try Maida on
your next agent change.