Helm - The control plane for autonomous AI agents

02What Helm does

Four primitives. They cover almost everything you'd want from agent oversight.

01

Live event stream

See every action as it happens.

Tool calls, model output, file changes, sub-agent spawns - streamed over WebSocket the moment they fire. Every event is timestamped, typed, and replayable. No more guessing what your agent did while you were away.

02

Permission gates

The agent stops and asks before it acts.

Mark which tools require approval - destructive shell, money movement, outbound email, production writes. The agent blocks and waits. You allow or deny from any device - phone, web, terminal.

03

Mid-run steering

Redirect without restarting.

Inject a message into a running agent: "exclude companies under 50 employees", "use the v2 endpoint", "stop and ask me before touching the database". The agent reads it on the next turn and adapts.

04

Run-level control

Browse, diff, and roll back.

Browse every file the agent reads or writes. View diffs against the starting state. Roll a bad run back in one click. Per-run cost and tokens tracked - hard budget caps pause the agent before it overspends.

03Built differently

There's a lot of agent tooling. Most of it stops short.

Observability is necessary. It isn't sufficient. When an agent is about to send the wrong email, commit the wrong row, or burn the wrong budget, you need more than a dashboard that tells you it happened.

The status quo

Watch what the agent did.

Logs, traces, dashboards. Read-only.

→

Helm

Watch and control what the agent does.

Permission gates, mid-run steering, kill switches. Live.

The status quo

Send your traces to a SaaS.

Your prompts, code, customer data on someone else's box.

→

Helm

Run on the hardware you already own.

Laptop, VPS, cluster. No telemetry. No accounts. Open source.

The status quo

Pick a model. Get locked in.

Frameworks built around one provider's quirks. Switching means rewriting.

→

Helm

Provider-agnostic at the core. Adapters at the edge.

Anthropic adapter ships in v0. OpenAI, Google, and local adapters next - same primitives, no rewrite.

The status quo

Built for chatbots.

Tokens in, tokens out. Conversation transcripts.

→

Helm

Built for agents that take action.

Tool calls, file writes, money movement, system changes - first-class.

The status quo

One surface. Watch from your laptop.

A web UI that ends when you close the tab.

→

Helm

CLI, web, and mobile. One backend.

Approve a permission from your phone. Steer from your terminal. Same WebSocket stream.

04Provider-agnostic by design

The control plane doesn't care which provider runs the agent.

Event log, permission gates, file API, security primitives, surfaces - none of it is provider-specific. Adapters live at the edge. One file per provider, all speaking the same internal interface.

Honest current state: v0 ships with the Anthropic adapter. The others are on the roadmap, not in the box.

Anthropic

Claude 4.x

v0

OpenAI

GPT · o-series

Gemini

Ollama · llama.cpp

Grok

planned

Mistral

Mistral · Mixtral

planned

What this unlocks once adapters land. Route by task: a research blueprint might use a fast model for batch summaries and a frontier model for the synthesis turn. A code-migration blueprint might use a local model for boilerplate and a frontier one only on the hard parts. Helm already tracks tokens and cost per provider, per blueprint, per run - so the budget caps you set stay meaningful as you mix providers.

05What you can run

Any agent. Any tools. Any workflow.

Helm doesn't care what your agent is doing. It observes, gates, and routes - whatever the domain. A few common shapes:

⌘

Code & migrations

"Migrate auth from sessions to JWT across the API and update tests."

EditBashGitTests

⌖

Research & briefs

"Summarize Q1 moves from 8 competitors and email the brief to leadership."

WebFetchSearchDatabaseEmail

≣

Data pipelines

"Ingest yesterday's events into the warehouse and alert on anomalies."

SQLS3SlackSchedule

⊡

Customer ops

"Triage today's support tickets, tag them, and draft replies for review."

Tickets APICRMTagsEmail

◈

DevOps automation

"Rotate the prod DB password and update every service that uses it."

SSHVaultKubernetesOncall

¶

Content workflows

"Draft posts from this week's release notes and queue them for review."

SourceDraftPublishReview

⊕

Bring your own tools

Plug in custom tools, hooks, sub-agents, and full framework integrations. Define a Blueprint once - model, tools, permissions, budget - and reuse it across runs. Today via the Anthropic Agent SDK and MCP protocol; the same Blueprint shape carries to every adapter that follows.

Anthropic Agent SDK MCP protocol Custom hooks Sub-agents Bash & Python tools

06Across roles

Wherever there's repeatable cognitive work, an agent earns its keep.

And wherever it can cause real-world consequences - money moving, messages sending, records changing - Helm keeps you in the loop. Some examples:

▲

Bookkeeping & accounting

"End-of-month reconciliation across 4 bank accounts and the ledger - match invoices, flag discrepancies, draft the close report."

Why Helm: permission gate on every journal post · append-only audit for the next external review

◇

Recruiting & sourcing

"Pull 50 candidates against the JD, score them, draft personalized outreach for the top 10."

Why Helm: steer mid-run when criteria shift · budget cap so the search doesn't drift open-ended

⌬

Customer support

"Triage today's 200 tickets, tag them, route to the right queue, draft first replies for senior review."

Why Helm: permission gate on outbound replies · live event stream of every draft as it forms

◬

Sales & SDRs

"Research 30 inbound leads, enrich the CRM, draft tailored follow-ups in the team's voice."

Why Helm: per-account budget · diff view of CRM changes before they land in production

¶

Marketing & content ops

"Draft 12 posts from this week's release notes and product updates. Queue them for editor review."

Why Helm: workspace control - browse drafts, roll back bad runs, push approved ones

§

Paralegals & legal ops

"Review a stack of NDAs against our standard terms. Redline the deltas. Summarize for the partner."

Why Helm: append-only audit for regulatory · permission gate before any document leaves the workspace

≡

Supply chain & procurement

"Compare 5 vendor quotes for the Q2 reorder. Score on price, lead time, quality history. Draft the recommendation."

Why Helm: permission gate before any PO commits · resumable run when vendor APIs rate-limit

⌘

Internal ops & admin

"Compile the weekly status from Linear, GitHub, and Notion. Slack the leadership channel by Monday 9am."

Why Helm: schedule as a recurring run · mobile push if the agent stalls or asks for input

⌕

Researchers & analysts

"Pull every Q1 earnings transcript from peers, extract guidance changes, build a comparison table."

Why Helm: resumable runs - if the agent times out hitting a paywall, restart from the last good state

07How you use it

One backend. Three surfaces.

Same data, same auth, same WebSocket stream. Use whichever fits the moment.

CLI

Provision, deploy, debug.

$ helm status

● backend healthy · 2 active runs

$ helm runs --status running

4f3a · q1-competitor-brief · 00:02:14

9b1c · events-etl-nightly · 00:11:42

Web

Full dashboard at your desk.

q1-competitor-brief running

market-intel · $0.42 / $5.00 · 47 events

db-password-rotate paused

infra-ops · permission · 11 min ago

Mobile

Push for permissions when away.

Notification

q1-competitor-brief wants to SendEmail to leadership@

Allow Deny

08Security & reliability

Built for workloads you'd never put on someone else's box.

Helm runs on infrastructure you own. Every safeguard below ships on by default - none of them are toggles.

API key isolation

Your model API key never enters the agent container. A host-level proxy injects auth on outgoing requests only.

Egress firewall

The agent can only reach an explicit allowlist of domains. iptables HELM_EGRESS drops everything else.

Container hardening

Non-root user, cap_drop ALL, no-new-privileges, read-only rootfs, memory + PID limits.

Workspace boundaries

Path-traversal checks on every file operation. Agents physically cannot read or write outside their workspace.

Command sandbox

Pre-tool hooks block rm -rf /, mkfs, reverse shells, credential reads, and curl-pipe-to-shell.

JWT + rate limiting

Every endpoint requires a 24h JWT. Auth route rate-limited 5 req/min/IP. WebSocket auth via signed query token.

Append-only auditReliable

Every event recorded with a monotonic sequence number. Runs are fully replayable, immutable, and exportable.

Resumable runsReliable

Run state lives in SQLite WAL with foreign-key checks. Restart the backend mid-run; state survives, the agent picks up.

Budget enforcementReliable

Per-run USD caps pause the agent before it overspends. Permission timeouts auto-deny so runs never hang forever.

09Under the hood

Boring stack on purpose.

One Hono server in a Docker container. SQLite with WAL. Append-only event log. Caddy out front for TLS. Three clients over the same WebSocket. Host-isolated key. That's it.

Your infrastructure

⌘

CLI

Ink · TypeScript

▢

Web

React · Vite · Tailwind

◳

Mobile

React Native · Expo

WebSocket · REST · JWT

Hono backend

Node 20 · single process · Docker container

control plane

RunManager Event log SQLite · WAL File API Metrics Push

localhost · auth-injected

API key proxy

runs on host · agent container has no key

key file · chmod 600

HTTPS · TLS 1.3

Model provider API

Anthropic in v0 · adapter slot accepts OpenAI, Google, local next - the only address that ever leaves your network

data flow

trust boundary

10Who it's for

People running real agents on real work.

Individuals & automators

Long-running automations on your laptop or VPS - code refactors, research briefs, monthly reports. Watch progress from anywhere, kill bad runs, keep costs bounded.

Small teams & ops

A shared agent runner on internal infra. Per-blueprint budgets, audit logs, permission gates per workspace. Bring your own tooling via MCP.

Builders & framework authors

Anyone composing agentic workflows. Helm gives you observability, replay, and steering primitives so you can focus on the agent's logic.

In development

Source release and self-host instructions land soon.

Building this in the open. No accounts to create, nothing to sign up for - just docs and a repo when it's ready.

Watch your agents
work in real time_

An agent is running for hours on your behalf.
You can't sit there watching the terminal.

Four primitives. They cover almost everything you'd want from agent oversight.

See every action as it happens.

The agent stops and asks before it acts.

Redirect without restarting.

Browse, diff, and roll back.

There's a lot of agent tooling. Most of it stops short.

The control plane doesn't care which provider runs the agent.

Any agent. Any tools. Any workflow.

Wherever there's repeatable cognitive work, an agent earns its keep.

One backend. Three surfaces.

Provision, deploy, debug.

Full dashboard at your desk.

Push for permissions when away.

Built for workloads you'd never put on someone else's box.

Boring stack on purpose.

People running real agents on real work.

Source release and self-host instructions land soon.

Watch your agentswork in real time_

An agent is running for hours on your behalf. You can't sit there watching the terminal.

Four primitives. They cover almost everything you'd want from agent oversight.

See every action as it happens.

The agent stops and asks before it acts.

Redirect without restarting.

Browse, diff, and roll back.

There's a lot of agent tooling. Most of it stops short.

The control plane doesn't care which provider runs the agent.

Any agent. Any tools. Any workflow.

Wherever there's repeatable cognitive work, an agent earns its keep.

One backend. Three surfaces.

Provision, deploy, debug.

Full dashboard at your desk.

Push for permissions when away.

Built for workloads you'd never put on someone else's box.

Boring stack on purpose.

People running real agents on real work.

Source release and self-host instructions land soon.

Watch your agents
work in real time_

An agent is running for hours on your behalf.
You can't sit there watching the terminal.