Code Architecture in the Age of AI

Something fundamental has shifted in who — or what — consumes the code we write. For decades, our APIs were called by other engineers' code. Humans were the ultimate readers, writers, and orchestrators. That assumption is quietly breaking down, and our architecture has not caught up.

Today, the callers of your APIs increasingly fall into three categories. First, code generated by AI-assisted development tools that engineers use at their desks. Second, one-shot programs that large language models write and execute in sandboxed environments to complete a task, then throw away. Third — and this is the one most people underestimate — models calling your APIs directly as tool calls, with no human-written code in the loop at all.

All three consumers are, at bottom, large models. They are either generating the code that calls you, or they are the code that calls you. This is not a future prediction. It is the current state of things, accelerating. And it carries a design consequence that most engineering organizations have not internalized: your APIs should be built for LLMs as first-class consumers.

What LLMs Actually Need

When a model is your caller, the design priorities shift. Elaborate client SDKs with fluent builder patterns and method chaining become liabilities rather than assets. Models do not benefit from syntactic sugar the way human developers do. What they need is predictability, clear error messages, self-describing schemas, and simple operational semantics. They need to know that if they provide valid input, they will get a well-structured response — every time, with no ambient state or hidden prerequisites.

The practical implication is that we should be crystallizing our battle-tested abstractions — the reliable pieces we have built over years — and packaging them into LLM-friendly forms. Not dumbed-down APIs, but honest APIs. Interfaces where the contract is fully expressed in the schema, where failure modes are enumerated rather than emergent, where a caller does not need tribal knowledge to get correct behavior.

This is less about adding features and more about removing ambiguity. Every implicit assumption in your API is a potential hallucination trigger for a model consumer. Every undocumented side effect is a failure mode that no amount of prompt engineering will reliably avoid. Think about it from the model's perspective: it has no institutional memory, no watercooler conversations, no years of accumulated context about why the timeout parameter silently falls back to 30 seconds if you pass zero. It reads the schema, it reads the error, and it acts. If the schema lies or the error is vague, the model will do the wrong thing confidently.

A Sandboxed Surprise

I recently ran an experiment that made this concrete. I asked an agent to collect video URLs on a particular topic and save them as JSON files. Then I asked it to download those videos to cloud storage. I expected the obvious path: the agent would use the cloud platform's built-in media ingestion endpoint, which has a dedicated method for pulling from video hosting services. A single API call per URL. Clean, efficient, the way I would have designed the workflow.

Instead, the agent did something I did not anticipate. It parsed the JSON files in its sandbox environment, installed a video downloading tool itself, and pulled all dozen-plus videos directly — writing a batch program on the fly to parallelize the downloads, then uploading the results to cloud storage through the standard file API.

My first reaction was that it had done it wrong. It ignored the purpose-built integration. But sitting with it, I realized the agent had actually revealed something important about what matters in our architecture.

Three Lessons from One Task

That single interaction demonstrated several things worth internalizing:

Models can write programs in sandboxes to complete batch tasks without multi-turn interaction. What I would have accomplished with careful orchestration code — iterating through URLs, handling errors, managing parallelism — the agent just wrote as a throwaway script. It was effectively making parallel tool calls by writing its own tool. The implication is that simple orchestration logic is no longer a durable artifact. It is ephemeral, generated on demand.
The main value we provide is reliable foundational capabilities. The agent did not need a specialized video import endpoint. It needed a storage API that reliably accepted file uploads. The simpler, more general primitive was more useful than the purpose-built integration, because the agent could compose its own solution on top of a dependable foundation.
Simple and dependable beats clever and specialized. The orchestration code that I might have spent an afternoon writing with an AI coding tool — parsing URLs, handling retries, managing temp files — the agent wrote the equivalent in seconds, in a sandbox, tailored to the exact task. Our effort is better spent making the underlying APIs rock-solid than building clever wrappers that anticipate specific workflows.

The Unix Lesson

There is a historical parallel that I keep returning to. Before Unix, professional operating systems were built by systems programmers for end users. The interfaces were opaque, the design paternalistic. Unix inverted this: built by developers, for developers, using itself. The result was not just a good OS but a generative ecosystem that scaled to purposes its creators never imagined, because the primitives were honest and combinable.

We are at a similar inflection point. The consumers of our platforms are increasingly models. If we build our APIs with models as first-class users — simple contracts, predictable behavior, composable primitives — we get the same emergent capability. Do not try to anticipate the workflows. Make the primitives so reliable and so clearly specified that the workflows compose themselves.

Closing the Development Loop

Here is where the architectural argument meets a product design question. Right now, most engineers who work with AI coding tools operate in a split world. They collaborate with models in their editor, iterating on code. Then they copy the results into production systems, test them through conventional pipelines, and deploy through conventional channels. The model helps you write the code, but the environment where the code runs is opaque to the model.

What I want is to collapse that gap. Let environment feedback and human feedback reach the model more directly. Instead of crafting solutions in an editor and transplanting them into production, what if the entire development experience lived in the agent environment? The model writes code, runs it, sees the results, adjusts — all within the same feedback loop, against real infrastructure, with real data.

The counter-argument is obvious and worth taking seriously: most users do not want to be in the loop. Minimizing user involvement and black-boxing the implementation better suits how people actually want to interact with software. And that is correct — for production. For the end user who wants a button that says "do the thing," transparency is overhead. But during development and exploration, the tighter loop is enormously valuable. You learn more in ten minutes of watching an agent struggle with your API than in a week of reading usage analytics. The question is whether you can have both: a transparent, collaborative mode for building and iterating, and a hardened, autonomous mode for production execution.

A Thousand Parallel Explorations

This leads to what I think is the most interesting architectural pattern emerging right now. Imagine launching a thousand parallel agent sessions, each exploring a different configuration for a given problem. Not hypothetically — literally forking the exploration space and letting agents search it in parallel.

Most will fail. That is fine — search is supposed to be wasteful at the leaves. The valuable output is the patterns that emerge across successful runs. You crystallize those patterns into hardened primitives, which become the foundation for the next round of exploration. The architecture is an engine for converting exploration into reliability.

This is memoization operating at a higher level of abstraction — caching the discovery that a particular approach works, not just a function's return value.

Build for the Actual Caller

The practical takeaway is straightforward, even if the implications are not. Look at your APIs, your data formats, your platform primitives. Ask who is actually calling them today, and who will be calling them in twelve months. If the answer is increasingly "models" — directly or through generated code — then your design priorities should reflect that.

Make your contracts explicit. Make your errors informative. Make your primitives composable. Strip away the conveniences that assume a human is reading the docs and building intuition over time. A model does not build intuition. It either has enough information in the schema and the error response to do the right thing, or it does not.

And be honest about what is worth building versus what is worth letting agents generate on the fly. The era of writing elaborate orchestration code by hand, of building bespoke integrations for every workflow, of anticipating every use case in a monolithic platform — that era is closing. What endures is the foundation: simple capabilities, reliably delivered, clearly described. Everything above that line is increasingly the agent's job.

Build for the actual caller. Right now, the actual caller is changing. Our architecture should change with it.

Code Architecture in the Age of AI

#What LLMs Actually Need

#A Sandboxed Surprise

#Three Lessons from One Task

#The Unix Lesson

#Closing the Development Loop

#A Thousand Parallel Explorations

#Build for the Actual Caller