← back

Taming Randomness

There is a paradox at the heart of every agent system: the thing that makes LLMs useful — their ability to handle situations you did not anticipate — is also the thing that makes them dangerous. Deterministic code and stochastic intelligence are fundamentally incompatible substances. Fuse them wrong, and you get either a very expensive regex or a very creative disaster.

The tension is sharpest when the output resists symbolization — when the agent produces not tokens but continuous artifacts: a rendered frame, a synthesized voice, a composed scene. Text can be approximately right and still useful. But perceptual media is unforgiving. A wrong frame is visible. A wrong note is audible. The output lives in sensory space, where humans are ruthless detectors of incoherence, and where "close enough" is never close enough.

The solution we arrived at is a pattern I call Blueprint: a structured, declarative representation that sits between the LLM's intent and the code's execution. Not a workflow. Not a prompt template. A spine.

Iron Wraps Meat, Meat Wraps Iron

Before Blueprint, we oscillated between two failure modes. In Chinese there is a vivid pair of phrases for this: 铁包肉 (iron wraps meat) and 肉包铁 (meat wraps iron). They describe exactly the dichotomy we faced.

Iron wraps meat: you treat the LLM as a function call. It slots into a pipeline you fully control. The system is deterministic, testable, deployable — and dead. Every edge case requires a new code path. Every creative decision is pre-made by an engineer. The model's generality — the entire reason you are paying for it — goes unused. You bought a sports car and put it on rails.

Meat wraps iron: you hand the LLM the keys. It decides what tools to call, in what order, with what parameters. Occasionally it produces something transcendent. More often it hallucinates a tool that does not exist, calls the right tool with wrong arguments, or enters a reasoning loop that burns tokens without converging. When the output is expensive to produce and impossible to unsee, "occasionally transcendent" is not a production strategy.

The failure is symmetric. One approach over-constrains; the other under-constrains. Both treat control as a binary: either the code is in charge or the model is. Blueprint breaks this binary by introducing a third artifact — neither code nor prompt, but structured state. The LLM proposes, the structure constrains, the code executes. Each operates in its own domain. None needs to understand the others.

The Spine

The best analogy I have found comes from an unlikely source: Theo Jansen's Strandbeest — skeletal sculptures made of PVC pipe that walk on beaches, powered by nothing but wind. They look alive. They are not. They are geometry.

Jansen's genius is not in the wind and not in the movement. It is in the linkage ratios — the precise geometric relationships between rigid segments that convert chaotic force into smooth gait. No segment is flexible. Every joint is hard. The flexibility is emergent, arising from the arrangement of inflexible parts.

Blueprint works the same way. It is a structured document — think XML or a typed schema — that describes the desired state of a complex artifact. Not the artifact itself, but its topology: what parts exist, how they relate, what constraints bind them. Every field has a type. Every reference must resolve. Every relationship must be consistent.

The LLM's job is to populate this structure. The code's job is to validate and execute it. The structure's job is to make invalid states unrepresentable — or at least, cheaply detectable.

This is the spine. It does not tell the LLM how to think. It tells the LLM what shape its thoughts must take before they are allowed to become real.

Not DNA — Sheet Music

People instinctively reach for the DNA metaphor when they encounter structured representations of complex artifacts. But DNA is a prescription: it specifies the complete construction of an organism, and the organism's job is faithful replication. A mutation is an error.

Blueprint is closer to sheet music. A score specifies rhythm, structure, key signature, dynamics — but leaves enormous space for interpretation. Two pianists playing the same Chopin nocturne produce recognizably different performances, both valid. The score constrains without dictating.

Blueprint specifies topology, not texture. It says: these parts exist, they relate in these ways, these boundaries must hold. What it does not specify is the substance that fills the space between the boundaries — the thing you actually see, hear, or experience. That is the performer's domain: the generative model, operating within constraints but free to interpret.

This distinction matters because it determines how you handle errors. If Blueprint were DNA, any deviation would be a defect requiring termination. But because it is a score, deviation is expected — the question is whether the deviation stays within the key signature.

The Semantic Phase-Locked Loop

Which brings us to the hard part: what happens when the LLM writes a bad score?

Traditional software handles errors through exceptions — binary, discrete, terminal. The operation succeeds or it throws. This model fails for LLMs because LLM errors are not binary. They are analog. A response is not "right" or "wrong"; it is more or less aligned with what you need. An LLM that omits a required component has not crashed — it has drifted.

We handle this with what I call a Semantic Phase-Locked Loop, borrowing from control theory. In electronics, a PLL locks an oscillator to a reference frequency by continuously measuring phase error and applying correction. The oscillator is never perfectly on frequency — it is always drifting and always being corrected. Stability is not a state but a dynamic equilibrium.

Our Validator works the same way. When the LLM produces a Blueprint, the Validator does not just check "valid or invalid." It generates a structured semantic error signal: "node B references node F, which does not exist in the current graph; the nearest valid nodes satisfying the same type constraint are F′ and G." This signal returns to the LLM, which revises the Blueprint, which gets validated again.


The feedback is not "you are wrong." It is "you are this far off, in this direction, and here is what the feasible region looks like."

This is fundamentally different from retry-on-failure, which discards the failed attempt and starts over. Our loop preserves the attempt and nudges it toward validity. The LLM learns — within the conversation context — where the boundaries are, and subsequent generations are more accurate. The system converges rather than rolling dice repeatedly.

Crystallizing Memory

There is a second problem that Blueprint solves, one that has nothing to do with validation.

LLMs forget. In long production sessions involving dozens of revision rounds, the model's working memory degrades. Earlier decisions get pushed out of the context window or compressed beyond usefulness. Ask the model to modify a parameter it specified forty messages ago, and it may not remember what it specified.

Blueprint solves this by being the memory. Not the LLM's memory — the system's memory. Every decision the LLM makes gets crystallized into the structure: choices, relationships, constraints, parameters. They are no longer floating in a conversation transcript. They are nodes in a graph, addressable by path.

This is what makes precision editing possible. You can use an XPath-like query to reach into a deeply nested part of the Blueprint and modify a single node — three layers deep, fortieth revision, one attribute — without regenerating the artifact above or below it. The LLM does not need to remember the decision. The structure remembers it.

The shift is from episodic memory (conversation history) to structural memory (typed, indexed state). Episodic memory is fuzzy and decays. Structural memory is exact and persistent. This is memoization at its most literal: an expensive creative decision gets frozen into a durable, addressable location.

State, Not Process

Here is the conceptual distinction that trips up most people encountering Blueprint for the first time: it is not a workflow.

Workflows are imperative. They describe a sequence: first do A, then do B, then if C then D else E. They are process descriptions. Their failure mode is brittle — if step B fails, the chain breaks, and recovery means re-engineering the sequence. Worse, a generated workflow inherits all the fragility of the model that generated it: one probabilistic misstep in the middle, and the entire downstream path derails.

Blueprint is declarative. It describes the desired end state: the artifact should have these properties, these constraints should hold, these references should resolve. It says nothing about the order in which the LLM should arrive at this state. The model can fill in components in any order, revise any part at any time, approach the problem from whatever angle makes sense.

The analogy I keep coming back to is hooves. A horse's body is extraordinarily flexible — muscles, tendons, joints that absorb shock and adapt to terrain in real time. But its hooves are rigid. The point of contact with reality is hard. You can be as flexible as you like in how you move, but where you touch the ground, physics is non-negotiable.

Blueprint defines the hooves: the invariants that must hold regardless of how the model gets there. Root exists. References resolve. Types are consistent. Required fields are present. Temporal ordering holds. These are not suggestions. They are physics. Everything else — the creative substance, the perceptual texture, the thing the audience actually experiences — is the flexible body above the hooves.

Muscle Memory

But declaring what must be true is not enough. An LLM that knows the constraints but not the strategies will stall. It knows the red lines but not the good moves.

Consider: you want the agent to extend an existing composition — add a new section that flows naturally from what came before. The Blueprint can express the constraint: the new section must maintain continuity with the boundary of the old. But it cannot teach the craft of smooth transitions. The LLM knows the goal but not the technique.

This is where Playbooks come in — lightweight, injectable strategy guides. Not workflows with fixed steps, but heuristic nudges: "When extending, anchor to the existing boundary. Verify continuity across the seam. Prefer gradual divergence over abrupt change."

Playbooks are muscle memory. They encode best practices discovered through iteration — the kind of tacit knowledge a seasoned practitioner carries but rarely writes down. Injected at runtime, they give the LLM enough tactical guidance to act confidently within the Blueprint's constraints, without being straitjacketed into a rigid procedure.

The layering is deliberate:

  • Blueprint guards the invariants: what must be true. Declarative, rigid, auditable.
  • Playbook suggests the strategies: what tends to work. Heuristic, soft, replaceable.
  • Code handles execution: what actually happens. Deterministic, testable, fast.

Each layer operates in its natural domain. The Blueprint does not try to be clever. The Playbook does not try to be rigid. The code does not try to be creative. The interactions between layers produce behavior that none could achieve alone — just like Jansen's linkages.

The Paved Road

The obvious counterargument: why not just let the LLM write code in a sandbox? Give it full access, let it compose the artifact programmatically. Maximum flexibility, zero structural overhead.

Because sandbox mode is jungle exploration. It works — sometimes spectacularly — but the cost per attempt is high, the failure modes are opaque, and debugging requires reading generated code that the model itself may not be able to explain on the next turn. When your output is perceptual — when a human will see it, hear it, feel it — "try it and see" is not a search strategy. It is a prayer.

Blueprint is a paved road. You sacrifice some off-road capability in exchange for predictable throughput, lower cost per iteration, and a map that tells you where you are at every point. The road does not prevent detours — we are building escape hatches for cases where the structure genuinely cannot express what the model needs. But the default path should be structured, because structure is what makes iteration cheap and debugging possible.

Blueprint is the skeleton, and the skeleton's job is to make the wind's energy productive rather than dissipative. Sandbox mode skips the skeleton. Sometimes the wind sculpts something beautiful on its own. Usually it just blows sand around.

The road has to come first. You can always build an off-ramp later. But if you start in the jungle, paving your way out is a lot harder than never having entered.

Engineering Serendipity

We do not use iron to cage flesh. We use iron to support it. A spine does not restrict movement — it makes movement possible. Without one, you are an invertebrate: flexible, sure, but confined to the ocean floor.

The phrase that captures our design philosophy is engineering serendipity. It sounds like a contradiction — engineering is deterministic, serendipity is not. That is exactly the point. We are not trying to eliminate randomness. We are building structures that make randomness land somewhere useful rather than somewhere catastrophic.

Blueprint gives the agent a skeleton. And with a skeleton, it can finally walk.