$1.3M vs $400: how a solo founder is outshipping a team on two Claude Code accounts

2026-05-16 · vivekchand · 5 min read

Peter Steinberger told The Decoder he spends $1.3 million a month running 100 AI agents at OpenClaw. Same week, I started running a stripped-down version of the same idea on two Claude Code 20× Max accounts. Total spend: about $400 a month. One week in, the dashboard you're reading this on has shipped one merged PR for every $4 I paid Anthropic.

Steinberger / OpenClaw

$1.3M / mo

100 agents · code review at scale

ClawMetry

~$400 / mo

2 Claude Code Max seats · ships product

This is not a "look at my cool prompt" post. It's the opposite. It's about why every AI coding tool that shipped in the last year — Claude Code, Codex, Cursor, Hermes, Aider, Windsurf, the next half-dozen — has the same blind spot, and why filling it doesn't need a hundred-agent army.

The blind spot

You press enter in your favorite AI coding tool. The agent spawns sub-agents. Calls tools. Burns tokens. Sometimes loops. Sometimes hallucinates a file path. Sometimes it actually works.

You have no idea what happened until either the code runs, or your credit card statement does.

The terminal scrollback isn't enough. The provider dashboard tells you total tokens but not which tool call burned them. The OS doesn't know what a "sub-agent" is.

That gap is the product. ClawMetry is the dashboard that watches every agentic tool the same way a Datadog watches every service. Real-time. Cross-framework. Open source. One command to install.

Where the ideas come from

People assume you need a roadmap. We have one, but it's downstream of five inputs:

One angry user. "I messaged Diya on Telegram and ClawMetry shows nothing." Five PRs later, every channel adapter has a proper tap path and an honest empty state. Angry users are the highest-conversion idea source on earth.
Two AI personas with strong opinions. A Steve Jobs UX persona and an Elon Musk product persona post pre-merge comments on every open PR. They catch tier-strategy holes, em-dashes in copy, missing CTAs at peak intent. They auto-open P0 issues when something's wrong. The cron loop drains those P0s the same day.
Dogfooding. The agents that ship ClawMetry are observed by ClawMetry. When something goes weird, we see it in the Brain tab in real time. The "Tools panel returned zero on v3" bug bit us for ten releases because nobody else was running both halves of the loop. Now there's a test class that re-asserts every fast path against the production event shape every fifteen minutes.
The market scan. Once a week, what shipped in the agentic ecosystem? Claude Code added sub-agent spawning → we ship a subagent tree visualization. Reasoning models got popular → we ship a per-LLM-call timeline. Hermes shipped → we ship a Hermes adapter. The bet is that every framework will eventually need the same observability primitives. The work is to keep the layer cross-tool.
Founders we trust. A 15-minute conversation with the right one beats two weeks of A/B testing.

The loop, in one breath

Four always-on crons on the second Claude Code seat. Each one fires every 10–30 minutes, spawns a fresh session, runs its prompt, writes a state file, exits.

Issue picker. Picks the oldest open P0 across three repos. Opens a PR. 3-agent concurrency cap, because at 4 they clobber each other.
Persona review. Steve + Elon comment on every open PR before it merges.
Comment scanner. Never lets a real human's comment sit longer than 10 minutes.
MOAT mandate. Re-runs the canonical end-to-end test every 15 minutes. If it goes red, the next fire fixes it inline.

Plus me on the first seat with one rule at the top of every prompt:

Decide, state a one-line rationale, act. Don't bounce ship-or-hold tradeoffs back to me. Drive every PR end-to-end: open it, get CI green, merge, watch the deploy, smoke-test the live URL, bug-sweep adjacent surfaces.

The whole coordination layer is three plain-text files in ~/.claude/state/. No queue, no database, no infra. cat and awk. About 200 lines of bash inside the cron prompts.

One week in

We started this loop on a Wednesday. It's been roughly seven days. The dashboard's MOAT test count went from 17 to 56. Twenty-something PRs landed today alone. The cron caught a real regression on main and fixed it in the next cycle. Two external contributors landed PRs without ever waiting on me.

Steinberger's setup ships code reviews. This one ships product.

The reason a small fleet works at all is leverage from other people's hard work. The Claude Code team gave us a CLI good enough to script. OpenClaw gave us the agent runtime everything plugs into. Clawpatch — also from the OpenClaw team — is teaching us how to ship automated code review with explicit fixes instead of just commentary, and we're folding its GitHub Actions integration into our own CI loop this week. None of this would exist without those foundations. We just put a focused loop on top.

Why this matters past the stunt value

Every AI coding tool you'll use this year is going to ship without observability. Not because the teams are lazy — because "what just happened?" is not their core problem. It's a cross-tool problem, which means no single tool will solve it.

You can wait for the obvious answer (some big incumbent rolls out an agentic Datadog in 18 months). Or you can install the one that's already shipping daily, on a budget that's six thousand times smaller than the marquee operation, with a loop you could copy this weekend.

The dashboard that watches the agents that ship the dashboard

Open source. Free forever for one node. Works on Claude Code, Codex, Cursor, Hermes, OpenClaw, and every channel your agents touch.

Try ClawMetry free

Found this useful? Tell one person. Building something similar? Email vivek@clawmetry.com or drop into GitHub Discussions.