VMs, Sandboxes, and the Infrastructure of AI Agents

Every major shift in computing has demanded a new kind of infrastructure. Virtual machines made the cloud possible. Containers made microservices practical. Now AI agents are forcing us to rethink compute isolation all over again — and the answers are not as obvious as they first appear.

I have spent the past year and a half building infrastructure for autonomous AI agents at a platform that runs millions of agent sessions per week. The question that keeps me up at night is deceptively simple: what does an agent actually need from the machine it runs on? Not what a human developer needs. Not what a web service needs. What does an LLM-driven process need when it is trying to accomplish a task on behalf of a user?

The industry is converging on a few answers, but I think there is still widespread confusion about the tradeoffs. This post is my attempt to lay out the landscape as I see it.

The Agent Cloud Thesis

AWS platformized raw compute and storage. It gave developers building blocks — EC2, S3, Lambda — and let them assemble whatever they needed. The next platform shift will do the same thing for AI agents, but the building blocks are different. An agent does not need a load balancer or a CDN. It needs:

Model access. Bulk-purchased API quotas for foundation models so agents can call the right model for the right subtask without the user managing a dozen API keys.
Application interfaces. Not just human-facing UIs, but machine-legible control surfaces. Think of a document editor that an LLM can manipulate through tool calls and structured intermediate representations, not by simulating mouse clicks on a rendered page.
External integrations. OAuth-mediated read/write access to the user's existing accounts — email, calendars, code repositories, cloud storage — so the agent can act in the real world.
Stateful identity. Payment, persistent storage, project sharing, and audit trails tied to a login account. The agent operates as a delegate of a real person.

This is the toolkit we are building for LLMs to use. The "product" is not an app — it is an environment. And the central engineering question becomes: what shape should that environment take?

Three Models of Compute

In practice, the industry has landed on three broad approaches to running agent workloads.

Orchestrated Services: Public Transit

The first approach is running agents as orchestrated services — typically Kubernetes pods or cloud functions. The agent's code runs in a managed container, calls model APIs, and returns results. The platform handles scaling, routing, and lifecycle. This is how most AI platforms start, because it leverages existing cloud-native patterns that engineering teams already understand.

The strength is efficiency. Containers are lightweight, fast to start, and easy to scale horizontally. You can pack many agent sessions onto the same node. The weakness is rigidity. A container is not a general-purpose computer. If the agent needs to install a Python package, run a shell pipeline, or modify its own environment in unexpected ways, you are fighting the abstraction instead of using it.

For well-defined workflows — an agent that processes documents, another that writes and runs SQL queries — orchestrated services work beautifully. They are public transit: efficient, predictable, and terrible for going off-route.

Sandboxes: The Robo-Taxi

Sandboxes are lightweight, ephemeral compute environments that give agents something closer to a real machine. A sandbox typically provides a Linux userspace with a filesystem, network access, and the ability to install packages and run arbitrary code — but with hard boundaries on resource consumption and session lifetime. Think of them as containers with more freedom and less permanence.

This is the approach that has gained the most traction for general-purpose agents. The appeal is clear: sandboxes strip away the heavy baggage of virtual machines and focus on what agents actually need. CPU and RAM to run code. A filesystem for state. Network access for APIs. Nothing more. The isolation is enforced by the runtime, not by the agent's good behavior.

The tricky part is state management. Sandboxes are designed to be disposable, but agents need continuity. A coding agent that spends twenty minutes setting up a development environment should not lose that work when the session ends. This pushes you toward snapshot-and-restore patterns — checkpointing the filesystem and restoring it on the next session. It works, but it adds complexity, and the restore latency becomes a product-quality issue that users notice.

Sandboxes are robo-taxis: you get a private vehicle for your trip, it takes you wherever you need to go, and you do not own it when the ride is over.

Virtual Machines: The Private Car

Then there are full virtual machines — persistent, stateful, individually provisioned. The user (or their agent) gets a real Linux box with root access, running 24/7 or on-demand, with its own disk that survives reboots. This is the most powerful option and the most expensive.

The case for VMs rests on a word: customization. A sufficiently powerful agent operating inside a VM can install arbitrary software, configure services, stand up databases, build cron jobs, create network tunnels — anything a skilled system administrator would do. This is not hypothetical. We are already seeing agents that, given a fresh Ubuntu box and a natural-language description of what the user wants, will methodically install, configure, and test a complete working environment.

The result is a machine that is deeply adapted to one person's habits and needs. It can beat any uniform platform service on the long tail of user-specific details, because it is not uniform. It is bespoke. And the bespoke-ness compounds over time as the agent learns what the user needs and modifies the environment accordingly.

The Case for Sandboxes

My honest assessment, after building with both approaches: sandboxes are the more promising technology for the near and medium term. Here is why.

VMs are quick to set up — spin one up, hand it to the agent, done. But achieving truly foolproof managed hosting requires enormous long-term investment. VMs accumulate entropy. Packages conflict, disk fills up, configurations drift, services crash and do not restart. A human sysadmin handles this with experience and intuition. Delegating it to an agent means the agent needs sysadmin-level reliability, which today's models do not consistently deliver.

Sandboxes sidestep this problem by being ephemeral. When something goes wrong, you throw away the sandbox and start fresh. The state you care about — code, data, configuration — lives in a snapshot or an external store. The compute environment is a disposable shell around it. This aligns well with how agents actually work: they operate in bursts of activity, not as long-running daemons.

There is also a cost argument. Persistent VMs burn money around the clock, whether the agent is active or not. Sandboxes scale to zero. For a platform running millions of sessions, the difference in infrastructure cost is not incremental — it is structural.

The Case for VMs

And yet I cannot dismiss VMs entirely, because they offer something sandboxes fundamentally do not: a sense of place.

A VM is a machine you can point at and say, "that is mine." It has an IP address, a hostname, a filesystem that accumulates your history. When an agent builds something inside a VM — installs a service, writes a script, configures a workflow — the user can SSH in and see it. Touch it. Modify it. There is a participatory quality to building up a personal computing environment that reading documentation does not replicate.

This might sound like sentimentality, but I think it matters for product design. Geek users — power users, developers, tinkerers — derive genuine satisfaction from having a machine they can customize. And here is the thing: as agents get more capable, the definition of "geek user" expands. Actions that today require Linux expertise — setting up a self-hosted application, configuring a database, writing automation scripts — become accessible to anyone who can describe what they want in natural language. The creative satisfaction of participatory building, currently limited to technical users, could be available to everyone.

The distinction is not just technical. VMs offer the highest degree of personal customization freedom. Agents can install software that does not exist in any pre-provisioned image, build communication tools, configure services in idiosyncratic ways. Users bear the price of that freedom — occasional breakage, the need for a "reinstall" button — but they also reap the rewards.

The Hybrid Path

In practice, I think the answer is not either-or. The right architecture is a hybrid, with the choice of compute model depending on the task and the user.

Consider a layered approach. Most agent tasks — document processing, code generation, data analysis, API orchestration — run perfectly well in sandboxes. These are the bulk of the workload. The sandbox provides a clean environment, the agent does its work, the results are persisted to external storage, and the sandbox is recycled.

For power users who want persistent environments, you offer VMs as a tier. Priced competitively with cloud providers. Pre-installed with the platform's CLI tools and API credentials. Some automatic maintenance — security patches, disk cleanup, health checks — but with the ability for users (or their agents) to take manual control. This is not a mass-market product, at least not yet. It is an offering for the people who want a private car and are willing to pay for parking.

The interesting middle ground is what I think of as the "soft customization" approach. Instead of giving every user a VM, you provide a sufficiently universal base environment — a well-stocked sandbox image with common languages, frameworks, packages, and pre-configured integrations. All deep customization happens through "soft" artifacts: configuration files, prompt templates, memory stores, shell scripts, environment variables. The hypothesis is that a rich enough base image, combined with the right soft customization hooks, can approximate the flexibility of a VM without the operational burden.

The challenge is that "sufficiently universal" is a moving target. Every new use case reveals another package that is not installed, another system service that is not available, another kernel module that is missing. You end up in an arms race between the base image and the long tail of user needs. VMs win the arms race by definition, because they do not constrain the solution space. Sandboxes can only win it by being very, very good at predicting what users will need.

The Desktop Question

There is one more dimension worth addressing: GUI operation. When agents become capable of directly operating graphical desktop environments — clicking buttons, reading screens, navigating applications — does that change the calculus?

The short answer is: not yet, and probably not for a while. Current network bandwidth and latency are not sufficient for a great remote desktop experience, especially when the agent needs to process visual information at interactive speeds. Streaming a desktop to an agent, having it decide what to click, sending the click, waiting for the screen to update, streaming the new frame — the round-trip latency adds up fast.

The longer answer is more nuanced. If vision-language models truly solve the GUI operation problem — reliably, quickly, at scale — then some form of desktop environment probably becomes necessary. Not for every task, but for the substantial category of tasks that involve applications without good APIs. Enterprise software, legacy tools, niche desktop applications — the long tail of GUI-only workflows is enormous.

At that point, you might be running a fleet of headless desktop VMs that agents connect to via optimized streaming protocols. It would look a lot like the virtual desktop infrastructure (VDI) that enterprises have been building for decades, except the "user" is an AI model and the interaction patterns are fundamentally different. Whether this is better served by full VMs or by specialized desktop sandboxes is an open question that depends on how the technology evolves.

What I Am Betting On

If I had to place a single bet on the architecture that wins over the next three to five years, here is where I would put my money:

Sandboxes as the default. Ephemeral, fast-starting, snapshot-restorable compute environments become the standard substrate for agent workloads. The base images get richer over time. The snapshot/restore latency drops to sub-second. State management becomes a solved problem at the platform level, invisible to both the agent and the user. VMs as the power tier. Available for users who need them, managed with a light touch. The platform provides guardrails and escape hatches — automatic backups, a reinstall button, health monitoring — but fundamentally trusts the user (and their agent) to manage the machine. Priced to cover costs, not to maximize margin. A loss leader for the most engaged users on the platform. Machine-legible interfaces over GUI automation. Rather than teaching agents to click buttons on a screen, we build proper programmatic interfaces for the applications agents need to use. Tool calls, structured intermediate representations, well-documented APIs. This is more work upfront, but it is faster, more reliable, and cheaper to run than vision-based GUI automation. The desktop VM play is a fallback for applications we do not control, not the primary path.

The underlying thesis is that the infrastructure layer should make agents more capable, not just give them a place to run. A sandbox with pre-configured API access, pre-authenticated integrations, and a rich toolkit is more valuable than a bare VM with root access. The platform's job is to reduce the distance between "the agent decides what to do" and "the thing is done." Every millisecond of latency, every authentication step, every missing package in the base image — these are friction that makes agents less useful.

We are still early. The shape of agent infrastructure will change as models get more capable, as new use cases emerge, as the economics shift. But the core question — what does an agent need from its environment? — will remain central. The platforms that answer it well will define the next era of cloud computing.

VMs, Sandboxes, and the Infrastructure of AI Agents

#The Agent Cloud Thesis

#Three Models of Compute

#Orchestrated Services: Public Transit

#Sandboxes: The Robo-Taxi

#Virtual Machines: The Private Car

#The Case for Sandboxes

#The Case for VMs

#The Hybrid Path

#The Desktop Question

#What I Am Betting On