OpenInfer × ClawMetry

Run AI anywhere.
Watch every turn.

Edge, on-prem, or cloud. Any hardware your agents can reach. See every CPU and GPU each turn ran on, what it cost, and which sub-agent caused the spend.

Install ClawMetry Start OpenInfer beta

📅 Book a 30-min demo

511k+ installs 387 stars 127+ countries E2E encrypted · local-first

The Promise

What if AI was:

The three pillars OpenInfer ships against. ClawMetry adds the fourth so you can prove it.

Low
Cost

Maximize ROI

Sovereign

Your control

Reliable

Always on

Observable

Every turn, every silicon

HARDWARE

Agnostic

EASY

To deploy

RESOURCE

Unbound

"We make it possible for the 90% of agentic workloads that are routine and always on, to run on leaner and often under-utilized compute topologies."

Behnam Bastani · CEO, OpenInfer

The Substrate

Meet OpenInfer

An inference OS that treats compute as a scheduling problem. Latency-critical turns run on GPU. The other 90 percent run on CPU and Graviton. Sessions migrate across processors without re-paying the prefill cost.

Learn more about OpenInfer

Distributed inference for heterogeneous compute +

A unified inference layer that schedules across every chip you have. GPUs handle the latency-critical work. CPUs absorb the long-tail. Sessions migrate at prefill and decode boundaries.

Built with deployment in mind +

Drop-in OpenAI-compatible endpoint. No agent code changes. A single config.json points your OpenClaw workspace at OpenInfer and the OS layer takes over.

Always-on collaborative AI +

Background agents that never sleep cost a fortune on premium silicon. OpenInfer keeps them alive on the cheap pool so always-on becomes economical.

Data center-grade inference where data lives +

Run at the edge, on-prem, or in cloud. The same OS layer schedules across whatever hardware is sitting in front of your data, so the data never has to leave.

Performance

Proven at the Edge

Real benchmarks on commodity hardware. +50% capacity on a single AWS g6e.16xlarge by recruiting otherwise-idle CPUs into the inference fabric.

Learn more

The Eyes

Meet ClawMetry

OpenInfer makes the substrate schedule smartly. ClawMetry shows you what actually ran. Every turn, every chip, every dollar, every sub-agent. Local-first and end-to-end encrypted, so the substrate stops being a black box.

Read the joint launch post

Per-turn silicon attribution +

Every LLM turn annotated with the chip it ran on. Scroll a Telegram chat replay and see "this turn cost $0.0008, ran on EPYC, 1.4s end-to-end" right next to the user's message and the agent's reply.

Cost split by route +

Two new lines in the Tokens tab: GPU pool, CPU pool. "86 percent of our token spend went through the other-90 percent pool this week" becomes a number you can show finance.

Sub-agent spawn tree +

A runaway that fans out 17 sub-agents shows up as a tree, each leaf with its own cost. Loop and stuck detection flags it, and a budget alert pages you before the quota wall.

Local-first, E2E encrypted +

An HTTP interceptor on the OpenClaw process. No new instrumentation. Your agents' conversations never leave your machine in plaintext; only simple totals reach the optional cloud in readable form. pip install clawmetry.

Together

The cost curve of heterogeneous compute. The explainability of a single-vendor stack.

Two ten-minute integrations. No code changes. No new dashboard to learn.

Get started