6 min read

Sandboxing the Agent

Sandboxing the Agent

A proactive agent is only useful if you let it act. It reads your inbox, drafts the deck, files the ticket, runs the script. The moment it can do real work on your behalf, it also becomes the most over-permissioned process on your machine, and the only one taking instructions from text it found on the internet.

The thing we kept getting wrong

The first instinct, and the one most of the market still runs on, is to isolate execution. Put the agent in a container, or a microVM, or a locked-down OS profile, so that if it runs something hostile, the blast stays inside the box.

That is necessary but more often than not, it is also not the part that bites you.

The part that bites you is credentials. An agent that can call Stripe needs a Stripe key. An agent that can push to GitHub needs a token. An agent that can read your Drive needs an OAuth grant. In almost every sandbox we looked at, those secrets live inside the box with the agent. The isolation keeps a compromised agent from escaping to the host. It does nothing to stop that same agent, talked into it by a prompt injection, from reading the key out of its own environment and mailing it somewhere.

Six questions we graded everyone on

We needed a way to compare things that looked superficially similar. A Firecracker microVM from one vendor and a "kernel isolated secure workspace" from another are the same primitives wrapped in very different opinions. So we wrote down the six questions that actually separate them.

  1. Credential model. Does the agent hold real secrets, or synthetic placeholders that something outside the box swaps in?
  2. Egress control. Can you inspect, rewrite, block, or allowlist every outbound request, at the application layer, not just by IP?
  3. Isolation boundary. V8 isolate, WASM, Firecracker microVM, gVisor, Type-1 hypervisor, or a plain container?
  4. Runtime mutability. Can policy change while the sandbox is running, or do you tear down and rebuild?
  5. TTL and lifecycle. Ephemeral, persistent, or perpetual, and who decides how long a session lives?
  6. Host and sandbox RPC. A typed, capability-safe bridge, or sockets and environment variables?

What the landscape actually looks like

We ran the top 17 providers through the grid. Here is what it looks like:

Plot every tool on two axes, isolation strength across the bottom and credential isolation up the side, and the market sorts itself into a picture that is honestly a little uncomfortable.

A few things stood out once it was drawn.

The bottom row is full. Plenty of tools build a genuinely strong box, Firecracker, Kata, even a per-agent kernel, and then leave the real credentials sitting inside it. E2B, Vercel, Fly Sprites, Edera, the container crowd: strong-to-excellent isolation, bring-your-own secrets with no proxy in the path.

The top row is where it got interesting, because the tools that did isolate credentials mostly weren’t built like infrastructure. Cloudflare does it beautifully with its globalOutbound hook, but you’re limited to what V8 will run and it wont autoscale. Gondolin does it locally on your laptop. nono does it as an OS primitive. IronClaw does it at the WASM tool boundary. Good ideas, each living in a corner of the market. Vercel comes the closest with autoscaling and great egress proxies but the auto-scaling comes at an extremely high cost, not to mention vendor lock in.

The pattern worth stealing

The good credential models all do the same trick, just in different runtimes. The agent never gets a real secret. It gets a placeholder. The real swap happens somewhere the agent can’t reach.

Without it, your security depends on the agent never being convinced to misbehave, which is not a property you can guarantee about a thing whose job is to follow instructions.

Here is what that looks like from inside the box. The agent sees this and behaves completely normally:

# inside the sandbox: the agent only ever sees a placeholder
$ echo $STRIPE_KEY
sk_live_PLACEHOLDER_9f2c4a...      # looks real, is not

$ curl https://api.stripe.com/v1/charges \
    -u "$STRIPE_KEY:" -d amount=2000 -d currency=usd

# what actually happens on the way out:
#   1. request hits the L7 egress proxy
#   2. proxy checks policy: is api.stripe.com allowed for this session?
#   3. proxy swaps the placeholder for the real key from the vault
#   4. proxy re-signs the request and forwards it
#   ← 200 OK. The real sk_live_… never entered the VM.

That is the whole idea, and it is the thing we decided to build our layer around.

What we built

Our sandbox is four pieces that only mean something together. Take any one away and the model leaks.

A Firecracker microVM per session. Every session gets its own kernel, no shared state with anyone else’s. We considered going all the way to a Type-1 hypervisor like Edera for a per-agent kernel, and for the most paranoid deployments that is still on the table, but Firecracker gave us the isolation we needed without the operational weight that keeps solo developers out.

A deny-by-default L7 egress proxy. Nothing leaves unless policy lets it. Because the proxy sits at the application layer, it can match on SNI, path, method, and body, not just an IP allowlist. It can refuse a POST whose body looks like an API key, flag a transfer that is suspiciously large, and shut down DNS-over-HTTPS tunneling before it starts.

Per-tool credential scoping. Gmail, Slack, Stripe, GitHub, AWS, and your internal APIs each get their own placeholder. A leaked GitHub placeholder cannot touch AWS, because they were never the same secret and neither one was ever real inside the box.

Signed-API re-signing. This is the piece almost nothing in open source handles. HMAC and SigV4 requests can’t just have a header swapped, because the signature covers the payload. So the proxy terminates the request, strips the placeholder signature, and re-signs with the real key it holds. The agent builds a request it could never validly sign, and the proxy makes it valid on the way out.

The shape borrows from the best ideas we found in the survey, Cloudflare’s interception model and Gondolin’s local placeholder injection, and puts them under a boundary you can run production workloads on.

What it buys you at runtime

Static policy is easy. The interesting behavior shows up while a session is live.

Because policy is runtime-mutable, rules can follow the session instead of being frozen at launch. Pause before a risky action. Cap spend. Narrow tool access after a sensitive operation. And when egress starts looking wrong, the credential behind the placeholder can be revoked in milliseconds, mid-session, without the agent ever knowing the difference.

The guest stays stateless on purpose. Secrets, code, and cache live outside the VM and mount read-only, so there is nothing durable inside for a compromised agent to poison for the next session. Every syscall is logged through seccomp-notify, which means the escape surface is observable rather than assumed, and we can hold ourselves to a public SandboxEscapeBench score instead of a marketing claim.

What’s next

Re-signing more protocols. One MCP server per microVM instead of per process. Cheaper persistence so a long-running agent doesn’t have to choose between a tight blast radius and an affordable bill. The same direction the whole Foundations for Proactive AI series is heading: give the agent enough room to do real work, and not one inch more.

Try Fluso.