Maida gates your PRs with behavioral checks for AI agents. It compares current runs against checked-in baselines and fails CI when structural behavior regresses: more steps, unexpected tool calls, loops, latency spikes, or cost blowups.
$ maida assert <RUN_ID> --baseline baselines/main.json --policy .maida/policy.yaml --format markdown [FAIL] step_count 6 -> 22 (+267%) [FAIL] tool_calls 4 -> 19 (+375%) [FAIL] new_tool_path salesforce_api [FAIL] cost_estimate $0.012 -> $0.041 (+242%) [PASS] loop_risk none detected 4 checks failed. PR blocked.
A prompt tweak can preserve the final answer while changing the execution path completely. The agent may take more steps, call a new tool, retry more often, or enter a loop. Manual smoke tests and output-only checks often miss that kind of regression.
Output evals ask whether the answer was good. Maida asks whether behavior changed.
Output evals are useful. Production tools are useful. Maida sits earlier in the workflow: before merge, inside CI, where a PR can still be blocked.
Maida is the pre-merge behavioral regression gate for AI agents.
Install the `maida-ai` package from PyPI. It provides the `maida` CLI and the `maida` Python module.
pip install maida-ai
Wrap one agent entrypoint so Maida can record runs.
from maida import trace
@trace
def run_agent(input: str):
# Your existing agent code
...
Run your agent once on main, find the trace-backed run ID, and check in the baseline JSON.
python my_agent.py maida list --json maida baseline <RUN_ID> --out baselines/my_agent.json
Compare each new run against the baseline and fail when policy says behavior regressed.
python my_agent.py maida assert <RUN_ID> --baseline baselines/my_agent.json --policy .maida/policy.yaml --format markdown maida diff <RUN_ID> --baseline baselines/my_agent.json maida view <RUN_ID>
Maida stores each run as OpenTelemetry-compatible spans, keeps the files local by default, and still feeds the same baselines, assertions, diffs, and timeline viewer.
Append-only span records with trace_id, span_id, parent_span_id, timing, attributes, and status.
Run metadata keyed by the OTel trace ID, including status, counts, duration, and run name.
A compatibility projection via spans_to_events() for baselines, policy checks, diffs, and the UI.
Add maida-ai/maida-assert@v2
to pull requests. Your agent-script
must use @trace
or traced_run()
so Maida can record a run.
See the action repo for the current inputs: agent-script, baseline, policy, maida-version, python-version, extra-args, and post-comment.
name: Agent Regression Check
on: [pull_request]
jobs:
agent-check:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
- uses: maida-ai/maida-assert@v2
with:
agent-script: my_agent.py
baseline: baselines/my_agent.json
policy: .maida/policy.yaml
python-version: '3.12'
These are behavioral regression signals. They do not prove output quality. They tell you when the agent's execution behavior changed relative to a baseline.
assert:
no_loops: true
no_guardrails: true
step_tolerance: 0.5
expect_status: ok
Commit the policy file beside your baseline. The policy defines which behavior changes the CI check should tolerate and which changes should block the pull request.
These are product examples, not customer testimonials.
The final answer still looks fine, but the agent now takes 22 steps instead of 6. Maida flags the step-count regression before the PR merges.
A refactor introduces a tool call that never appeared in the baseline. Maida fails the check until the behavior is fixed or explicitly accepted in policy.
The run produces partial output, but no longer reaches the expected terminal condition. Maida catches the missing stop condition in CI.
A model or prompt change causes repeated calls with similar inputs. Maida records the pattern and fails the PR when policy says loops are not allowed.
Runs on your machine or GitHub Actions runner. No cloud account required. No telemetry by default. Maida.AI does not receive your traces unless you explicitly export them or configure external telemetry.
Maida writes local OTel span files under
~/.maida/runs/.
You can inspect them, archive them, or delete them. Local traces can
still contain sensitive content depending on what your code records,
so redaction stays on by default.
Install Maida, capture a known-good baseline, and run a behavioral check against your next prompt, model, tool, or framework change. No cloud account required. No telemetry by default. Runs locally or in your GitHub Actions runner, and Maida.AI does not receive them by default.