In the early days of internet applications, the unit of work was either: a function call, a database query, an HTTP request. The latter being formally structured as APIs and we saw the rise of things like API design and microservices architecture. Services talked to each other through structured interfaces and well-defined contracts. Infrastructure patterns evolved around this model. Layer 4 handled traffic at the socket level, Layer 7 handled application-level routing based on paths, headers, and cookies.
But is something fundamental chanaging with AI? In AI-native systems, the primary unit of work is no longer a structured API call. It’s a prompt. A prompt isn't just data—it’s an open-ended instruction expressed in natural language. It doesn’t follow a spec. It’s not typed or versioned in the way an API might be. It might ask for a summary, a chart, a line of code, a meal plan, or all of the above. And it’s often sent to a generic endpoint like /v1/chat/completions
with a POST
payload that looks identical regardless of intent.
So how do we route it? How do we inspect it? How do we handle prompts at scale?
Routing at Layer 7 made sense when applications were deterministic. /cart/checkout
clearly meant one thing. But in an LLM-based system, the meaning is embedded in free-form language: “Can you turn this bullet list into a short product pitch?”
That’s a meaningful task, but there’s no structured metadata telling us what it is. You have to understand the intent to process and route it properly. Your application may want to quickly reject jailbreaking prompts. Or maybe the prompt goes to GPT-4 if the task is creative writing. Maybe it goes to Claude if it’s summarization. Maybe it goes to a cheaper local model if the quality bar is lower. Or maybe its routed to an agent that is well suited for that task.
This kind of routing isn’t application-layer in the traditional sense. It’s more of an intent-layer. Would it then be fair to define a new layer of the OSI model? What about Layer 8?
Layer 8 is not technically part of the OSI model. It’s a new framing and more of a metaphor based on the way we now must build and operate AI-native systems. If Layer 7 routes based on protocol semantics (like Host
headers and URL paths), then Layer 8 routes based on goal semantics—the underlying intent behind the prompt.
It's the layer where:
This isn’t theoretical. Many systems are already implementing Layer 8 logic today, just without calling it that.
If you’re building with LLMs or agents, you’re already facing problems that traditional infra can’t handle well:
Technically these are Layer 7 concerns. But there is something subtle happening as AI changes workload patterns. Prompts arefundamentally about understanding what the user wants to do—not just how they asked.
This shift means the edge needs to be smarter. Not just “is this request valid,” but “what is this request trying to accomplish?” That’s the promise of intent-aware or a models-integrated infra:
Just as application load balancers helped scale APIs, its this prompt-aware or intent-aware infra layer that will help scale AI applications in a platform and language agnostic way.
Prompts are becoming the atomic unit of work in the AI-native world. And just like we needed Layer 7 logic to handle the rise of APIs, we now need Layer 8 to handle the rise of prompts. Once you recognize this shift, everything from observability to routing to planning starts to look different. And infrastructure that thrives in this new environment will be one that natively understands prompts. It will platform agnostic, framework friendly and transparent to developers so that they can build and ship AI applications to production faster.