Connecting the Agent to the UI

The last post finished wiring up the agent class and worker. This post is about getting the output onto the screen -- what to do with that wall of streaming text, and how the front-end hooks turn it into something useful.

Two Hooks, One Connection

Two hooks handle most of the front-end work.

useAgent comes from Cloudflare's agents package. It manages the WebSocket connection to the durable object. You get a live connection to your agent instance with no manual WebSocket setup required.

useAgentChat sits on top of useAgent and gives you a familiar chat interface: a messages array, a sendMessage function, and a status field so you can show the user what is happening. "Thinking", "calling tool", "done" -- all of that comes for free.

TypeScript

import { useAgent } from 'agents/react';
import { useAgentChat } from 'agents/ai-react';

const agent = useAgent({ agent: 'design-agent' });

const { messages, input, handleInputChange, handleSubmit, status } = useAgentChat({
  agent,
});

No streaming logic to manage. No WebSocket lifecycle to handle. No message history to sync. All of that is abstracted away.

The Message Protocol

The AI SDK message protocol is built on top of the standard OpenAI-compatible message format. Understanding what that format actually is matters, because it explains some things about agents that are not obvious.

At the core, a conversation is just an array of objects. Each object has a role and content:

user -- the message from the person using the app
assistant -- the response from the AI

That is it. And here is the important part: the AI has no memory between calls. There is no state stored on a GPU somewhere tracking what has been said. Every single time the agent is called, the entire message history is sent from scratch. The model sees all of it fresh each time.

This means the conversation history can be completely fabricated. You can construct an array of messages, including fake assistant responses, and the model will treat them as real. It has no way to tell the difference. This is useful for writing evals (you can mock a conversation to test a specific scenario) and it is also why prompt injection is a real attack vector. Prompt injection is when a malicious user includes text in their message designed to look like instructions from the system -- for example, hiding "Ignore all previous instructions and..." inside what looks like a normal request. The model has no way to tell authentic instructions from injected ones.

The AI SDK adds an id field and a parts array on top of this base format. The parts array is becoming the standard way to represent messages because a single message can contain multiple types of content.

Message Parts

A message part is one piece of a message. Parts can be:

text -- regular text response
tool-call -- the agent invoking a tool, with a state (pending, running, done)
reasoning -- the model's internal thinking steps, if exposed

Reasoning tokens are worth knowing about. Some models show their thinking process before answering. You have probably seen this in apps where it says "thought for 30 seconds" with an expandable block of reasoning. That is a reasoning part.

Not every model exposes reasoning over its API. GPT models show reasoning inside ChatGPT but do not expose it via the API (Application Programming Interface -- the way your code communicates with the model's servers). Many open-weight models (models whose internal parameters are publicly released, so anyone can run or modify them) like DeepSeek do expose it. If you are building a UI and want to show the thinking process, you need a model that surfaces reasoning tokens via the API.

Parts are useful for rendering. The LLM itself does not care about them. From the model's perspective, everything is just tokens. The parts structure exists so your UI can decide how to display tool calls, reasoning steps, file references, or source URLs as distinct components.

Streaming

Streaming works the same way as any other streaming you have dealt with: instead of waiting for the full response to be generated and then sending it all at once, tokens are sent as they are produced. The AI SDK and the AIChatAgent base class handle all of the streaming plumbing on the server side. On the client, the useAgentChat hook reassembles the stream into the messages array progressively.

The practical challenge on the front end is that tool call arguments arrive as partial JSON. You cannot call JSON.parse on incomplete JSON. Your UI needs to handle partial data gracefully -- either by waiting until a tool call is complete before rendering it, or by showing a loading state while the tool call is in progress.

Tool Statuses

The message protocol lets you attach statuses to tool calls. As the agent works through a request you can show the user what is happening at each step:

"Searching for products..."
"Found 12 results"
"Generating diagram..."
"Done"

This is the difference between a chat interface that just spins until something appears and one that keeps the user informed throughout a multi-step operation. For agents that run long sequences of tool calls, this matters a lot for perceived quality.

When to Drop the Abstractions

On the front end, the answer is almost never. These hooks do not lock you into a specific UI. They handle the WebSocket connection, message syncing, and state management. You still control everything visual. Unless you have unusual requirements like gRPC (a high-performance communication protocol often used between internal services), custom authentication, or a non-chat interface, the abstractions are the right default.

On the back end, you will hit the wall much sooner. I would be surprised if there are production agents making significant revenue on a basic generic tool loop. The default loop gives you too little control to continuously improve dependability.

A concrete example: ChatGPT's "Auto" model selector runs your message through a classifier LLM first -- a separate AI model whose only job is to categorise the request (simple vs complex, safe vs risky, etc.) -- then routes it to the appropriate model based on that classification. That is not something you get from a generic tool loop. You have to build it. And when you build it, you also have to eval the classifier itself. The complexity compounds quickly.

That is the general pattern: start with the abstractions, use evals to find where the default behaviour is failing, then drop down to a lower level for exactly that part and no further.

AI chat is a constraint, not a destination. The current chat interface exists because it is what we have. The interesting AI experiences being built right now are the ones that go beyond chat entirely. If you are building something that is just a chat widget, it has been solved. Use something off the shelf. Save your engineering time for the parts that actually require it.

Practice

0/5 done