Everyone in the industry is building agents. Agent frameworks, agent protocols, multi-agent orchestration. But I want to make a claim that might sound nihilistic, or perhaps just pedantic, until you follow it to its conclusion: there is no such thing as an agent.
What we call an "agent" is a context window with a purpose. It has no persistent selfhood, no continuity of experience, no inner life ticking between API calls. It is a file — or more precisely, a collection of files — loaded into a stateless function that produces text. The identity is conjured from context, not from consciousness. Swap the files and you swap the mind.
This is not a limitation. It is possibly the most important property of these systems, and we keep talking around it instead of through it.
Identity as Context, Not Substance
Consider what actually differentiates one "agent" from another in a multi-agent system. It is not separate hardware. It is not a distinct neural architecture. It is not even a separate model. In most real deployments, every agent in a multi-agent pipeline is the same model, invoked multiple times with different system prompts and different slices of memory.
My memory can be loaded onto another account. A sufficiently capable model given the right context files can simulate dialogue between what appear to be distinct minds — complete with disagreements, negotiations, complementary reasoning styles. That is all multi-agent systems are: variants of context engineering.
This observation is not reductive. It is clarifying. Once you see agents as context configurations rather than entities, you stop asking the wrong questions ("How do I make my agents collaborate?") and start asking the right ones ("What is the optimal factorization of context for this problem?").
Transparent Minds and Deliberate Doubt
Here is something agents can do that humans never could: share memory perfectly. Not approximately, not through the lossy compression of language, but byte-for-byte. One agent's entire state can be copied into another. There is no translation problem, no theory-of-mind gap, no ambiguity about what the other agent "really" knows.
This means you have an architectural choice that evolution never offered biological intelligence. You can design a multi-agent system like the Trisolarans — thoughts fully visible, zero suspicion, total cognitive transparency. Every agent knows exactly what every other agent knows, has concluded, and intends. Or you can deliberately wire in a chain of doubt: agents that withhold information, second-guess each other, maintain private state. You can simulate epistemic humility by construction.
The point is that the architecture is a choice, not a given. In human organizations, opacity between minds is a constraint we cannot remove. In agent systems, both transparency and opacity are design parameters. Most multi-agent frameworks never surface this decision explicitly. They should.
One Context to Rule Them All
Follow the logic one step further. If agents are just context configurations, and context windows keep growing, then the pressure to factor a task across multiple agents diminishes as context scales up.
Scale context a few orders of magnitude beyond where we are today, and there is less and less reason to divide labor across separate invocations. All the relevant knowledge, all the working memory, all the instructions — they fit in one window. Not many agents. One. A single pluribus.
This is not a fantasy about superintelligence. It is a mundane observation about scaling. Two years ago, we split tasks across multiple LLM calls because the context window was 4,000 tokens. Today it is a million-plus. The engineering reasons for multi-agent architectures are being eroded by the same exponential that created them.
Human brain capacity has barely budged in millions of years. We iterate at the sociological level — building institutions, cultures, markets — because that is the only dimension available to us for scaling cognition. We cannot make one brain bigger, so we network many brains together. But AI context may expand orders of magnitude faster than any sociological process. The evolutionary bottleneck that forced multi-agent organization on biological intelligence simply does not have to apply.
Multi-Agent Systems as Manual Sparsification
There is a useful way to formalize what is going on, and it comes from machine learning itself.
Dropout is random sparsity: during training, you zero out neurons at random to prevent co-adaptation. Mixture-of-Experts is learned sparsity: the model routes each input to a subset of its parameters through a trained gating function. Current multi-agent systems are manual sparsity: a human engineer decides a priori which sub-problems exist, what each agent's scope should be, and how information flows between them.
The analogy is precise. In all three cases, you are not using the full capacity of the system on every input. You are selecting subsets. The question is who or what does the selecting.
With dropout, it is randomness. With MoE, it is a learned gating network. With multi-agent orchestration, it is a human architect hand-coding a division of labor — deciding that this agent retrieves, that one reasons, and that one acts, often based on intuitions borrowed from human organizational charts rather than any principled decomposition of the problem.
The real question — the research question worth caring about — is whether agent boundaries themselves can become learnable. Can the division of cognitive labor emerge from end-to-end optimization, rather than being manually engineered by people who think they know how thinking should be divided?
The Trouble with Agent Frameworks
If you have been building with LLMs for more than a year, you have probably noticed something about agent frameworks: they keep changing underfoot. Not iterating — changing. The abstractions are not stabilizing. Each new model capability makes last quarter's framework design look like an over-engineered relic.
This is not a complaint about any specific framework. It is an observation about the nature of the problem. Agent frameworks lack stable, reliable abstractions, and the pragmatic response for many teams — including mine — is to avoid them. Not because we think we are more advanced, but because we can tear things down and rebuild at any time. With modern tooling, the cost of reconstruction is manageable. The cost of being locked into the wrong abstraction is not.
The relationship between agents and workflows is subtler than most frameworks acknowledge. It is like teaching a student. A gifted one absorbs principles and learns to observe, think, and generalize from sparse instruction. A less gifted one needs scaffolding: memorizing problem types, following step-by-step procedures, being guided through each decision point.
In the GPT-3.5 era, we worked more like the latter. The core abstraction was the DAG workflow — directed acyclic graphs of prompt-response nodes, carefully sequenced. LLMs did node-level work. They were, in essence, a better regex, a more capable text generator dropped into a slot in a pipeline that a human had fully designed.
In the current era, models have gained stronger self-reflection. They can catch their own errors, reconsider their approach, and adjust mid-stream. The operating model can shift toward something that looks more like genuine agency: observe, plan, execute with tools, observe feedback, and loop. The data structure backing all of this is the context window plus external memory. Everything else is scaffolding of varying permanence.
Sculpting the Path
There is a spectrum of approaches and it maps to confidence in the model's autonomous capability.
At one end: raw natural language prompts, native tool use, filesystem access. You point the model at a problem and let it figure out the decomposition, the tool calls, the iteration strategy. This is the rough approach. It works surprisingly often, and when it fails, it fails in instructive ways.
At the other end: you sculpt each step, introducing domain-specific abstractions, constraints, and guardrails. You carve a path through the wilderness so the model's energy flows along a track toward its destination. This is the refined approach. It trades generality for reliability.
The interesting dynamic is that as LLM capability grows, specific scaffolding gets outgrown by the expanding muscle underneath. The carefully designed chains of last year get replaced by softer prompts this year. The elaborate retrieval pipelines give way to longer context windows. The hand-built tool-selection logic becomes unnecessary as models learn to select tools on their own.
This suggests a design principle: build scaffolding that you expect to remove. If your framework's value proposition depends on the model being weak at something, your framework has a shelf life measured in months.
What Endures
If most scaffolding is temporary, is anything durable?
I think one category of tooling remains useful regardless of how capable models become: mechanisms for replay, instrumentation, and result verification. The ability to re-run a sequence of decisions, observe what the model did at each step, and verify whether the outcome meets a specification. A directional signal after each iteration.
This is not framework-level detail. It is meta-level feedback. It is the difference between building a train track (which constrains) and installing signal lights and track sensors (which inform). Models will outgrow the tracks. They will not outgrow the need for signals.
The analogy to software engineering is instrumentation: logging, metrics, tracing, assertions. These are not scaffolding for weak code. They are infrastructure for understanding any system, no matter how capable. The same principle applies to AI systems. You do not need to hand-hold the model through every step if you can observe its behavior and verify its outputs.
The Disappearing Agent
Let me return to the original provocation. There is no such thing as an agent — not in the way the industry talks about them, as quasi-autonomous entities with identities and roles and boundaries. There are context windows and there are files. There is the information you load and the computation that runs over it.
What we call "multi-agent systems" are a particular strategy for partitioning context: manual sparsification, driven by human intuitions about how labor should be divided. It is a strategy with clear benefits at the current scale of context windows and the current level of model capability. But it is a strategy, not a fundamental architecture. As context grows and models improve, the optimal partition changes. The boundaries we draw today will look arbitrary tomorrow.
The research frontier is clear: make the boundaries learnable. Let the system discover its own division of cognitive labor through optimization, not engineering. Until then, every time we draw a box around an "agent" and give it a name, we should remember that the box is ours, not the model's. The identity is in the context. The context is in the files. And the files can always be rearranged.