Sense & Motion | The Volatile Layer and the Durable Layer

OpenAI shipped GPT-5.5 on April 23, six weeks after GPT-5.4. The model is another step forward, but the story for organizations is more about the speed of change.

OpenAI's GPT-5.5 release on April 23 followed GPT-5.4 by six weeks. And by standard measures, this new model is a real advancement. It takes the lead over Claude Opus 4.7 and Gemini 3.1 Pro on some benchmarks, and shows materially better performance in agentic coding and computer-operating tasks, leading to positive early enterprise reception.

But it's the cadence that is the noteworthy part. Through the GPT-4 era, OpenAI's flagship model update timelines ran four to eight months. The 5-series has been faster from the start: GPT-5 in August 2025, GPT-5.1 in November, then 5.2, 5.3, 5.4 through the winter, and 5.5 on April 23. That's six numbered flagships in roughly nine months. Anthropic is running a similar compression, with Opus 4.5 in November, 4.6 on February 5, and 4.7 on April 16. Stepping back across the five major labs (OpenAI, Anthropic, Google, Meta, DeepSeek), the count of frontier-class releases has roughly tripled in three years.

The lifespan of "this is the model to use" is now roughly the duration of some of the fastest enterprise RFPs, which is to say, not long enough to anchor a real business decision. By the time most large organizations finish vendor evaluation, security review, and budget approval to switch from one model to another, the relative ranking will have already changed at least once.

The labs and a small number of frontier startups can keep up with that race because their entire product depends on it. For organizations that are using AI rather than building it, the more useful question is which capabilities, in which workflows, are now reliable enough that you can build something durable on top of them. That question has a different shape, and a different answer, than "which model is best this week."

The volatile layer

The model is the volatile layer. The price changes with each release, the behavior changes, and prompts that worked last week sometimes stop working as expected. Anthropic's Opus 4.7, shipped on April 16, is a recent illustration. It included a number of behavior changes from prior versions, and teams running long-established production prompts reported needing to rework them after the upgrade. None of the scaffolding or code changed; the model underneath did.

This is not a complaint about model providers. It's a feature of the cadence. Frontier labs are training larger models, post-training them differently, retuning safety behaviors, and shipping the result faster than most enterprise change-management processes can keep up. None of that is going to slow down. Microsoft, Alphabet, Meta, and Amazon collectively committed roughly $725 billion to AI infrastructure in 2026, a 77% increase over 2025. That money is buying capacity for more frequent releases, not fewer.

Pricing moves the same way the capability does. DeepSeek's V4 release on April 24 sits a few months behind the absolute frontier on benchmarks, but is priced at $1.74/M input tokens against $5 for GPT-5.5 or Opus 4.7. The economically optimal model for a given workload, meaning acceptable quality at the lowest defensible cost, shifts on its own cycle. It moves separately from the absolute capability frontier. An organization committed to a single model on a multi-year contract is betting against two volatile axes at once.

So the practical posture for organizations using these models is to assume the volatile layer will keep moving on capability, behaviour, and price simultaneously, and to make decisions accordingly.

The durable layer

The durable layer is everything around the model. Evals (evaluation tests used to measure how well an AI model or agent performs on specific tasks) encode the firm's judgment about what "good" looks like for its work. They are how an organization stops treating each new release as a lottery ticket and starts treating it as a measurable comparison against work that already matters. The proprietary data and business context the model is allowed to see is the input, and over time becomes the most defensible piece of the system. The redesigned processes that wrap the model are where most of the actual productivity gain shows up: clear handoffs, override rules, accountability for what gets shipped. They survive the model swap. The team's accumulated judgment about when to trust output, plus the relationship with whichever vendor hosts and governs the model, compound across releases. The model itself does not.

The argument that AI's value accrues to the surrounding system rather than to the model itself is not new. Menlo Ventures, in its analysis of vertical AI businesses, frames defensible advantage as coming from "compounding data, cross-customer signal, and expanding workflow coverage" rather than from which model the business runs on. The volatile-and-durable framing here is one way to make that observation concrete for organizations deciding where to put their AI investment over the next twelve months.

Stanford's Enterprise AI Playbook, published in April, analyzed 51 successful real-world AI deployments across 41 organizations. Its central finding is that 77% of implementation challenges are non-technical. They come from organizational design, data infrastructure, role redesign, and executive sponsorship. The technical model itself is rarely the bottleneck. That data point is the empirical version of the volatile-versus-durable observation: the things that determine AI's value to an organization sit around the model, not inside it.

A useful test for any AI investment in the current environment: if the underlying model were swapped out tomorrow, what proportion of the work survives? An eval suite survives. A well-defined workflow with clear handoffs survives. A clean dataset survives. A vendor relationship survives. Time spent picking the model itself does not survive, because the model could be different next month.

This frames the build-versus-buy conversation differently than the way it tends to be posed. The question is not "do we build our own AI or buy one." The question is which parts of the durable layer are worth owning, and which parts are worth renting from a vendor with better economics. For most BC organizations the answer is going to be: own your evals, your prompts, your data and your workflow definitions; rent the model and the runtime.

What it means inside an organization

For the purposes of your vendor selection, GPT-5.5 is not the thing being purchased in any meaningful sense. The thing being purchased is a relationship with whoever hosts and governs the model on a customer's behalf. That can be OpenAI directly, Anthropic, Microsoft via Azure, AWS via Bedrock, Google via Vertex, or Cohere if Canadian sovereignty is part of the procurement requirement.

Then as new models emerge, the move is to run an existing eval suite against them, document what changes in behavior on workflows that matter, and decide how to move.

The corollary applies to internal expectations. The pace of model releases is going to keep generating headlines about new state-of-the-art capabilities. Most of those headlines will not change what should be happening inside an organization on a Tuesday morning. The teams that compound advantage in this environment are not the ones who switch fastest; they are the ones who build a durable layer that absorbs each new release without rewrites, and who develop the judgment to route different kinds of work to different models rather than waiting for one model to win everything.

Where the next six weeks land

The race between OpenAI, Anthropic, Google, and the Chinese open-weights labs will continue. Benchmarks will continue to be set and beaten. The capability frontier will keep moving, and the cadence is unlikely to slow until a hardware constraint forces it to.

The thing that decides whether an organization's AI investment compounds is not which model is selected this quarter, it is whether the durable layer underneath survives the next four model releases without rework. GPT-5.5 is a useful prompt to ask that question, not because the model is special, but because the cadence makes the question unavoidable.

The volatile layer and the durable layer.

The volatile layer

The durable layer

What it means inside an organization

Where the next six weeks land

Ready to start your AI journey?