You're not talking to an AI - you're talking to a platform

When most people refer to "AI" these days, they don't mean a large language model. They mean a clean chat interface, the ability to upload a spreadsheet and get a reliable analysis back, and an assistant that remembers what you said last Tuesday.

What people are really experiencing is a sophisticated platform wrapped around a set of large language models. Understanding the gap between the two is important, both for anyone trying to build AI capabilities outside these walled gardens, and for understanding why that's become so difficult.

The fact that people conflate model capabilities with the wider platform is understandable - and it is exactly what vendors want. They are seeking to establish "moats" that protect their revenue and relegate everybody else to content providers in an "app store" style ecosystem. How else are they going to turn a profit if reasoning is increasingly commoditised by cheap, open-source models?

The "lazy intern" problem.

Large language models are improving all the time, but they can be surprisingly brittle. They don't remember anything between invocations and you must build mechanisms that remember past interactions and decide what to pass into the model. If you push in too much information the model can become overwhelmed and give unreliable answers.

Models have improved to the point where they are surprisingly good at reasoning and problem solving, but they remain fundamentally unreliable. They generate text to fit the pattern of an answer regardless of whether this is correct and don't have any intrinsic mechanisms to verify their output. This means that it reports fabricated nonsense with the same absolute conviction as verifiable facts.

This requires sophisticated solutions that manage a context window, persist relevant memory, and ground responses in verifiable facts. The wider the range of questions, the more sophisticated these solutions need to be.

If you ask a large language model to analyse a long document it may do a reasonable job on the first few pages and the last few, yet quietly gloss over everything in between. Models are getting better at handling large context, but the "lost in the middle" problem remains: models tend to weigh tokens at the edges of the context window more heavily than those buried in the center. This effect is like handing a task to a "lazy intern". The model skims material rather than reading it, and it's not even honest about which bits it skimmed.

This is why LLM cannot reliably count things, aggregate columns of numbers, or identify every instance of a particular clause across a hundred-page contract. It will try. It will sound confident. It will be wrong in a slightly different way each time.

ChatGPT and Claude get around this not by solving the underlying problem, but by routing those requests through a code execution layer. When you ask GPT to analyse your spreadsheet, it writes Python, runs it in a sandboxed environment, and returns the result. It's a clever workaround, but it requires a lot of plumbing to work. You need orchestration, a secure execution environment, a way to pass data back and forth, and logic to decide when to write code versus when to just answer. That's non-trivial and it doesn't come for free with the model.

The scaffolding that nobody sees

There's an influential Google paper on technical debt in machine learning systems that points out that the model itself is often a surprisingly small part of the overall system. AI platforms are no different. They need to handle a lot of concerns quietly on your behalf.

A production AI sysyem must manage context across long conversations, deciding what to keep, summarise, or discard. It needs memory systems that can capture useful facts, past interactions, and user preferences, then recall them at the right moment. It must handle document uploads through retrieval and search mechanisms rather than simply dumping everything into a context window. It needs orchestration layers to choose tools, execute workflows, recover from failures, and assemble results.

Beyond that sit guardrails, observability, audit trails, caching, access control, streaming, session management, and the user interface itself. None of this comes with a language model. It all has to be built and it's easy to underestimate the effort required.

The race for a "moat"

AI vendors are not just in the business of trying to sell superior models. They also want to own the wider platform including the integrations and developer ecosystem. They may be reaching for something analogous to the app ecosystem created by Apple. If you can become the leading "operating system for AI" then everybody else is locked in as a content provider, giving the platform owner a license to print money.

It's not just Anthropic and OpenAI who are doing this, but cloud based agent platforms are everywhere these days. No self-respecting cloud vendor is without their own agent platform, established vendors such as Salesforce and ServiceNow are pivoting heavily towards agent development, while frameworks such as Langchain are evolving into fully featured platforms.

The promise is much the same across all these vendors. You save yourself from the overhead of building your own scaffolding in return for significant lock-in. Organisations with sensitive data, regulatory requirements, or genuine differentiation may still want to build their own capabilities, but for everybody else it will make sense to lean into pre-baked capabilities.

Despite the scramble to establish a platform, the more agentic applications provided by Anthropic and OpenAI are likely to create the more enduring "moat". They are evolving towards creating a "universal interface" that can be used for pretty much any task. No more specialised tools or dedicated applications - you can accomplish anything through a hyper-personalised experience that adapts to the task at hand. If it doesn't have anything in its immediate bag of tools and gadgets then it can just build something on the fly.

In this sense, they are trying to define how people experience AI. This is a distinction that is easy to miss - when people talk about "AI", they are usually talking about the platform built around a model. The model generates the words, but the platform provides the memory, tools, context, retrieval, orchestration, and user experience that make this output useful.

The hard problem is no longer about building an agent, but about building the scaffolding that turns agents into a product.