Is this another agent observability tool, like LangSmith or Braintrust?
No. LangSmith and Braintrust observe and evaluate the agents you build. Armature tests how your users' agents behave on the product you ship: your MCP, your CLI. Think of it as the new layer that replaces UI testing: you used to test the UI your human users clicked through, now you test the MCP their agent calls. We grade their behavior and the reasoning behind it, the part you'd typically uncover during user interviews with humans.
How is this different from Datadog Synthetics or Playwright-based testing?
Synthetics tools test deterministic browser flows. They were built for the click-based web. Armature spawns real LLM agents that reason their way through your MCP and CLI like real users. We catch the regression a script can't see: the wrong tool picked, the off-script retry, the path that only fails on Opus 4.7.
Can't I just ask my Claude Code to test my MCP?
It's tempting, but biased. Your Claude Code already knows how to use your MCP and has full context from your codebase, exactly the context your real users won't have. You'd also need to re-run it on every harness, every time you ship a change, every time a new model lands, and regularly even when nothing's new. Labs ship harness updates and reasoning improvements that quietly change behavior on production MCPs.
Do I need to manually write all my own test scenarios?
No. An AI agent discovers your tools, configures monitors automatically, and suggests realistic end-to-end workflows: the kind your users would actually prompt their agent for. It also defines the right evaluation criteria, grounded in both the real outcome and the tester agent's reasoning trace. You can edit anything, but the default setup is on its feet in under a minute.
Which MCPs and CLIs do you support?
Any MCP that speaks the protocol (HTTP, SSE, stdio) and any CLI we can spawn in a sandbox. Bring your own bearer token, API key, or basic auth. Secrets are stored server-side, never exposed. OAuth coming soon.
How does billing work?
Everything is fully included in your plan, within the quotas listed above. No hidden charges, no metered surprises. If you need more than what's available, talk to us and we'll work out something custom.
Is there a free tier or a free trial?
We don't offer a free tier or a 7-day trial because we want you to enjoy full access from day one, and for long enough to see real value. Instead, we make it completely risk-free: if you're not 100% satisfied within 30 days, tell us and get a full refund. No questions asked.