Helm

An open control plane for autonomous agents

Watch your agents
work in real time_

Helm is a self-hosted control plane for autonomous AI agents - any workflow, any tools, any infrastructure. See every action as it happens. Gate the risky parts. Steer mid-run in plain English.

Your data, your machine, your model key. Open source. No telemetry. No accounts.

Live · run #4f3a · q1-competitor-brief
running · 00:02:14
14:02:11 Pulling Q1 announcement pages from 8 competitors. Looking for pricing changes and product launches.
14:02:13 WebFetch · stripe.com/blog
14:02:14 ok · 47 KB · 12 posts parsed
14:02:18 WriteRow · competitive_intel.csv · 142 entries
14:02:19 written
Permission requested
SendEmail "Q1 Competitor Brief" → leadership@
Allow Deny
You Exclude any company with under 50 employees first.
14:02:28 Got it. Filtering employee_count ≥ 50_

Mocked - same event types Helm streams from real runs to browser, terminal, or phone.

An agent is running for hours on your behalf.
You can't sit there watching the terminal.

Agents move fast. They call APIs, hit databases, send messages, write files, spend money. When something goes sideways you find out after the fact - in a wrecked record, a duplicate email blast, a $40 bill. By then it's already done.

Helm sits between you and the agent. You see what it's about to do, gate the dangerous parts, and redirect it without losing context. Same primitives whether the agent is refactoring code, drafting a brief, or rotating a secret.

Four primitives. They cover almost everything you'd want from agent oversight.

01
Live event stream

See every action as it happens.

Tool calls, model output, file changes, sub-agent spawns - streamed over WebSocket the moment they fire. Every event is timestamped, typed, and replayable. No more guessing what your agent did while you were away.

02
Permission gates

The agent stops and asks before it acts.

Mark which tools require approval - destructive shell, money movement, outbound email, production writes. The agent blocks and waits. You allow or deny from any device - phone, web, terminal.

03
Mid-run steering

Redirect without restarting.

Inject a message into a running agent: "exclude companies under 50 employees", "use the v2 endpoint", "stop and ask me before touching the database". The agent reads it on the next turn and adapts.

04
Run-level control

Browse, diff, and roll back.

Browse every file the agent reads or writes. View diffs against the starting state. Roll a bad run back in one click. Per-run cost and tokens tracked - hard budget caps pause the agent before it overspends.

Any agent. Any tools. Any workflow.

Helm doesn't care what your agent is doing. It observes, gates, and routes - whatever the domain. A few common shapes:

Code & migrations

"Migrate auth from sessions to JWT across the API and update tests."

EditBashGitTests
Research & briefs

"Summarize Q1 moves from 8 competitors and email the brief to leadership."

WebFetchSearchDatabaseEmail
Data pipelines

"Ingest yesterday's events into the warehouse and alert on anomalies."

SQLS3SlackSchedule
Customer ops

"Triage today's support tickets, tag them, and draft replies for review."

Tickets APICRMTagsEmail
DevOps automation

"Rotate the prod DB password and update every service that uses it."

SSHVaultKubernetesOncall
Content workflows

"Draft posts from this week's release notes and queue them for review."

SourceDraftPublishReview
Bring your own tools

Helm runs anything that speaks the Claude Agent SDK or MCP - plug in custom tools, hooks, sub-agents, or full framework integrations. Define a Blueprint once (model, tools, permissions, budget) and reuse it across runs. Your stack, your way.

Claude Agent SDK MCP servers Custom hooks Sub-agents Bash & shell Python

Wherever there's repeatable cognitive work, an agent earns its keep.

And wherever it can cause real-world consequences - money moving, messages sending, records changing - Helm keeps you in the loop. Some examples:

Bookkeeping & accounting

"End-of-month reconciliation across 4 bank accounts and the ledger - match invoices, flag discrepancies, draft the close report."

Why Helm: permission gate on every journal post · append-only audit for the next external review
Recruiting & sourcing

"Pull 50 candidates against the JD, score them, draft personalized outreach for the top 10."

Why Helm: steer mid-run when criteria shift · budget cap so the search doesn't drift open-ended
Customer support

"Triage today's 200 tickets, tag them, route to the right queue, draft first replies for senior review."

Why Helm: permission gate on outbound replies · live event stream of every draft as it forms
Sales & SDRs

"Research 30 inbound leads, enrich the CRM, draft tailored follow-ups in the team's voice."

Why Helm: per-account budget · diff view of CRM changes before they land in production
Marketing & content ops

"Draft 12 posts from this week's release notes and product updates. Queue them for editor review."

Why Helm: workspace control - browse drafts, roll back bad runs, push approved ones
§
Paralegals & legal ops

"Review a stack of NDAs against our standard terms. Redline the deltas. Summarize for the partner."

Why Helm: append-only audit for regulatory · permission gate before any document leaves the workspace
Supply chain & procurement

"Compare 5 vendor quotes for the Q2 reorder. Score on price, lead time, quality history. Draft the recommendation."

Why Helm: permission gate before any PO commits · resumable run when vendor APIs rate-limit
Internal ops & admin

"Compile the weekly status from Linear, GitHub, and Notion. Slack the leadership channel by Monday 9am."

Why Helm: schedule as a recurring run · mobile push if the agent stalls or asks for input
Researchers & analysts

"Pull every Q1 earnings transcript from peers, extract guidance changes, build a comparison table."

Why Helm: resumable runs - if the agent times out hitting a paywall, restart from the last good state

One backend. Three surfaces.

Same data, same auth, same WebSocket stream. Use whichever fits the moment.

CLI

Provision, deploy, debug.

$ helm status
● backend healthy · 2 active runs
$ helm runs --status running
4f3a · q1-competitor-brief · 00:02:14
9b1c · events-etl-nightly · 00:11:42
Web

Full dashboard at your desk.

q1-competitor-brief running
market-intel · $0.42 / $5.00 · 47 events
db-password-rotate paused
infra-ops · permission · 11 min ago
Mobile

Push for permissions when away.

Notification
q1-competitor-brief wants to SendEmail to leadership@
Allow Deny

Built for workloads you'd never put on someone else's box.

Helm runs on infrastructure you own. Every safeguard below ships on by default - none of them are toggles.

API key isolation

Your model API key never enters the agent container. A host-level proxy injects auth on outgoing requests only.

Egress firewall

The agent can only reach an explicit allowlist of domains. iptables HELM_EGRESS drops everything else.

Container hardening

Non-root user, cap_drop ALL, no-new-privileges, read-only rootfs, memory + PID limits.

Workspace boundaries

Path-traversal checks on every file operation. Agents physically cannot read or write outside their workspace.

Command sandbox

Pre-tool hooks block rm -rf /, mkfs, reverse shells, credential reads, and curl-pipe-to-shell.

JWT + rate limiting

Every endpoint requires a 24h JWT. Auth route rate-limited 5 req/min/IP. WebSocket auth via signed query token.

Append-only auditReliable

Every event recorded with a monotonic sequence number. Runs are fully replayable, immutable, and exportable.

Resumable runsReliable

Run state lives in SQLite WAL with foreign-key checks. Restart the backend mid-run; state survives, the agent picks up.

Budget enforcementReliable

Per-run USD caps pause the agent before it overspends. Permission timeouts auto-deny so runs never hang forever.

Boring stack on purpose.

One Hono server in a Docker container. SQLite with WAL. Append-only event log. Caddy out front for TLS. Three clients over the same WebSocket. Host-isolated key. That's it.

Your infrastructure
CLI
Ink · TypeScript
Web
React · Vite · Tailwind
Mobile
React Native · Expo
WebSocket · REST · JWT
Hono backend
Node 20 · single process · Docker container
RunManager Event log SQLite · WAL File API Metrics Push
localhost · auth-injected
API key proxy
runs on host · agent container has no key
key file · chmod 600
HTTPS · TLS 1.3
api.anthropic.com
the only endpoint that ever leaves your network
data flow
trust boundary

People running real agents on real work.

Individuals & automators

Long-running automations on your laptop or VPS - code refactors, research briefs, monthly reports. Watch progress from anywhere, kill bad runs, keep costs bounded.

Small teams & ops

A shared agent runner on internal infra. Per-blueprint budgets, audit logs, permission gates per workspace. Bring your own tooling via MCP.

Builders & framework authors

Anyone composing agentic workflows. Helm gives you observability, replay, and steering primitives so you can focus on the agent's logic.

In development

Source release and self-host instructions land soon.

Building this in the open. No accounts to create, nothing to sign up for - just docs and a repo when it's ready.