Don't let broken
agent changes merge.

Maida gates your PRs with behavioral checks for AI agents. It compares current runs against checked-in baselines and fails CI when structural behavior regresses: more steps, unexpected tool calls, loops, latency spikes, or cost blowups.

Example CI output
$ maida assert <RUN_ID> --baseline baselines/main.json --policy .maida/policy.yaml --format markdown

[FAIL] step_count       6 -> 22 (+267%)
[FAIL] tool_calls       4 -> 19 (+375%)
[FAIL] new_tool_path    salesforce_api
[FAIL] cost_estimate    $0.012 -> $0.041 (+242%)
[PASS] loop_risk        none detected

4 checks failed. PR blocked.
Output evals
Was the answer good?
Production tools
What happened after deploy?
Maida
Did this PR change agent behavior before merge?
The problem

Code changes break agents.
Nobody notices.

A prompt tweak can preserve the final answer while changing the execution path completely. The agent may take more steps, call a new tool, retry more often, or enter a loop. Manual smoke tests and output-only checks often miss that kind of regression.

Output evals ask whether the answer was good. Maida asks whether behavior changed.

Output evals are useful. Production tools are useful. Maida sits earlier in the workflow: before merge, inside CI, where a PR can still be blocked.

How it works

Install. Record. Baseline.
Block regressions.

Maida is the pre-merge behavioral regression gate for AI agents.

01

Install Maida

Install the `maida-ai` package from PyPI. It provides the `maida` CLI and the `maida` Python module.

pip install maida-ai
02

Instrument your agent

Wrap one agent entrypoint so Maida can record runs.

from maida import trace

@trace
def run_agent(input: str):
    # Your existing agent code
    ...
03

Capture a known-good baseline

Run your agent once on main, find the trace-backed run ID, and check in the baseline JSON.

python my_agent.py
maida list --json
maida baseline <RUN_ID> --out baselines/my_agent.json
04

Assert future runs

Compare each new run against the baseline and fail when policy says behavior regressed.

python my_agent.py
maida assert <RUN_ID> --baseline baselines/my_agent.json --policy .maida/policy.yaml --format markdown
maida diff <RUN_ID> --baseline baselines/my_agent.json
maida view <RUN_ID>
OpenTelemetry traces

Local trace evidence,
OTel-compatible.

Maida stores each run as OpenTelemetry-compatible spans, keeps the files local by default, and still feeds the same baselines, assertions, diffs, and timeline viewer.

spans.jsonl

Append-only span records with trace_id, span_id, parent_span_id, timing, attributes, and status.

meta.json

Run metadata keyed by the OTel trace ID, including status, counts, duration, and run name.

events

A compatibility projection via spans_to_events() for baselines, policy checks, diffs, and the UI.

GitHub Action

Block bad PRs
before merge.

Add maida-ai/maida-assert@v2 to pull requests. Your agent-script must use @trace or traced_run() so Maida can record a run.

See the action repo for the current inputs: agent-script, baseline, policy, maida-version, python-version, extra-args, and post-comment.

.github/workflows/agent-regression-check.yml
name: Agent Regression Check

on: [pull_request]

jobs:
  agent-check:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      - uses: maida-ai/maida-assert@v2
        with:
          agent-script: my_agent.py
          baseline: baselines/my_agent.json
          policy: .maida/policy.yaml
          python-version: '3.12'
Behavioral checks

What Maida flags
before merge.

These are behavioral regression signals. They do not prove output quality. They tell you when the agent's execution behavior changed relative to a baseline.

01
step count
02
tool call count
03
unexpected tool path
04
loop risk
05
guardrail events
06
latency envelope
07
cost envelope
08
stop condition
.maida/policy.yaml
assert:
  no_loops: true
  no_guardrails: true
  step_tolerance: 0.5
  expect_status: ok

Policy-as-code

Commit the policy file beside your baseline. The policy defines which behavior changes the CI check should tolerate and which changes should block the pull request.

Example regressions Maida catches

These are product examples, not customer testimonials.

Prompt change increases steps

The final answer still looks fine, but the agent now takes 22 steps instead of 6. Maida flags the step-count regression before the PR merges.

New tool path appears

A refactor introduces a tool call that never appeared in the baseline. Maida fails the check until the behavior is fixed or explicitly accepted in policy.

Agent stops reaching its final state

The run produces partial output, but no longer reaches the expected terminal condition. Maida catches the missing stop condition in CI.

Loop risk appears

A model or prompt change causes repeated calls with similar inputs. Maida records the pattern and fails the PR when policy says loops are not allowed.

Local-first

Runs stay in
your environment.

Runs on your machine or GitHub Actions runner. No cloud account required. No telemetry by default. Maida.AI does not receive your traces unless you explicitly export them or configure external telemetry.

Maida writes local OTel span files under ~/.maida/runs/. You can inspect them, archive them, or delete them. Local traces can still contain sensitive content depending on what your code records, so redaction stays on by default.

Start with one agent workflow

Try Maida on
your next agent change.

Install Maida, capture a known-good baseline, and run a behavioral check against your next prompt, model, tool, or framework change. No cloud account required. No telemetry by default. Runs locally or in your GitHub Actions runner, and Maida.AI does not receive them by default.

Install `maida-ai`
Add `@trace` to one agent entrypoint
Capture a known-good baseline
Add `.maida/policy.yaml`
Run `maida-ai/maida-assert@v2` on pull requests
FAQ

Common questions

Do I need a Maida cloud account?
No. Maida is local-first. It runs on your machine or CI runner, and Maida.AI does not receive runs by default unless you explicitly export them or configure external telemetry.
Can I use Maida in GitHub Actions?
Yes. Use maida-ai/maida-assert@v2 to run behavioral checks in CI.
Is Maida an eval tool?
No. Output evals ask whether the answer is good. Maida asks whether the agent's execution behavior changed before merge.
Why is the package called `maida-ai` but the CLI is `maida`?
The PyPI package is maida-ai. It installs the maida command-line tool and the maida Python module.