Tools, Schemas, and the System Prompt

The last post covered the agent loop and what tools fundamentally are. This post is about actually writing them well.

Define the Schema First

Before writing any tool, you need TypeScript types for whatever the agent will be producing. A schema is just a description of the shape of some data -- what fields it has, what type each field is, what values are allowed. If you do not understand the shape of the output, you cannot teach the agent to produce it reliably.

For the diagramming app, agents generate Excalidraw elements. Every element shares a base shape:

TypeScript

type BaseElement = {
  id: string;
  x: number;
  y: number;
  width: number;
  height: number;
  backgroundColor: string;
  fillStyle: 'hachure' | 'cross-hatch' | 'solid';
  strokeWidth: number;
  roughness: number;
  opacity: number;
  angle: number;
  groupIds: string[];
  isDeleted: boolean;
  boundElements: Array<{ type: 'arrow' | 'text'; id: string }> | null;
};

Specific element types (RectangleElement, EllipseElement, ArrowElement, etc.) extend this base. A union of all of them (a TypeScript union type means "it can be any one of these") gives you ExcalidrawElement. This lives in src/schema.ts. Everything else imports from it.

A practical tip: define your schemas in Zod first and use z.infer<typeof schema> to derive TypeScript types. Zod is a JavaScript library for describing and validating data shapes -- you define what a piece of data should look like, and Zod checks that real data matches that definition. Write the shape once instead of twice.

Writing the Tools

The tools file imports two things: tool from the Vercel AI SDK, and z from Zod.

TypeScript

import { tool } from 'ai';
import { z } from 'zod';

The first tool, generateDiagram, asks the LLM to produce a complete diagram in one shot:

TypeScript

export const tools = {
  generateDiagram: tool({
    description: `Generate a complete diagram as an array of Excalidraw elements.
Use this when the user asks you to create, draw, or design a new diagram.`,
    parameters: z.object({
      elements: z.array(excalidrawElementSchema).describe('Array of Excalidraw elements'),
    }),
    execute: async ({ elements }) => {
      return elements;
    },
  }),
};

This tool is intentionally naive. Asking the LLM to one-shot an entire diagram will not produce great results. That is the point. You build it, measure it, and improve from there.

The Input Schema Trick

The execute function is just a pass-through here. So why bother?

The answer is structured output -- instead of the AI returning free-form text, it returns data in a specific format you defined. OpenAI guarantees that if you give a tool an input schema, the model will follow that schema when it decides to call the tool. It must generate JSON that matches the shape you defined.

So by making generateDiagram a tool with an Excalidraw-shaped schema, you get a guarantee: whatever comes back will be valid Excalidraw elements. The execute does not need to do anything clever. The schema is doing the work.

TypeScript

execute: async ({ elements }) => {
  return elements; // pass-through -- the schema enforced the shape
},

This is the distinction between input schema and output schema on a tool:

Input schema -- the shape the LLM must produce as its argument. This forces structured output at the tool level.
Return value of execute -- what gets fed back to the LLM as context. No contract. Return whatever is most useful.
Structured output -- a separate concept for the LLM's final response, not individual tool calls.

The return value is a pre-processor: what context does the LLM need after this tool runs? Sometimes it is the raw result. Sometimes it is a summarised version. If a tool returns 100,000 rows, you would not send all of them back. You would send 10 with a hint: "90,000 more available -- call loadMore to paginate." That is good context design.

Describing the Schema Well

Every Zod field can have a .describe() call. This is a hint for the LLM about what the field means.

TypeScript

z.array(excalidrawElementSchema).describe('Array of Excalidraw elements to render on the canvas')

For non-obvious fields like points on an arrow, add .describe('Array of [x, y] coordinate pairs'). Without the hint, the model infers. With it, it follows.

For optional fields in OpenAI-compatible schemas, .nullable() works more reliably than .optional(). OpenAI tends to ignore .optional():

TypeScript

z.string().nullable().optional() // nullable is what actually matters

Tool Descriptions Are Not the First Thing to Optimise

A common early mistake is writing long, detailed tool descriptions to coax the model into correct behaviour. Long descriptions are prompt-level optimisation (trying to fix bad behaviour by rewriting the instructions), and it is one of the weakest levers available.

A reliable warning sign: if you catch yourself writing "please" in a tool description, you have run out of prompt-engineering room. That is the signal to stop coaxing with words and rethink the tool design itself.

The description should tell the LLM when to use the tool and what it does. That is it. The schema communicates the shape. The execute handles the logic.

The Second Tool: modifyDiagram

modifyDiagram lets the agent update an existing element. The naive implementation has one obvious flaw: the agent cannot see the canvas.

To modify something, you need to know what is there. This tool does not pass the current canvas state in. The agent guesses from chat history. It will get it wrong. That is expected, and it is why you build it this way first: so evals confirm what you suspect and give you a measurement to improve from.

TypeScript

modifyDiagram: tool({
  description: `Modify existing elements on the canvas by ID. Set only the fields you want to change.`,
  parameters: z.object({
    elementId: z.string().describe('The ID of the element to modify'),
    updates: excalidrawUpdateSchema,
  }),
  execute: async ({ elementId, updates }) => {
    return { elementId, updates };
  },
}),

The System Prompt Is Internal Instructions

"System prompt" is a slightly misleading name. I think of it as internal instructions: the baseline context the creator provides for every conversation, separate from what the user types.

It sits at position zero in the conversation history and persists across every turn. A better mental model: it is a fixed entry point for injecting context. You put in information the agent needs to function correctly, whether static instructions, dynamic context based on the current state, or per-user configuration.

TypeScript

const systemPrompt = `You are a diagram design assistant.
You help users create and modify diagrams on an Excalidraw canvas.

Guidelines:
- Use unique IDs for every element
- Keep at least 20px between elements
- Add labels to shapes and arrows
- Use a clean left-to-right or top-to-bottom layout
- Default stroke color: #1e1e1e`;

A few things worth knowing:

Match your words to your tool names. If tools are called generateDiagram and modifyDiagram, use "generate" and "modify" in the system prompt. The model will associate them.
The system prompt is stable. Conversation history can be compressed or trimmed. The system prompt is always at position zero.
A bad opening sentence biases everything. Every message gets re-evaluated on every turn. That one sentence at position zero shapes the entire conversation. Getting your system prompt leaked is revealing: you can see exactly what the creators were struggling to fix.
The more you pile in, the worse it gets. Start short. Add only what evals tell you is missing.

Practice

0/5 done

Tools, Schemas, and the System Prompt

Define the Schema First

Writing the Tools

The Input Schema Trick

Describing the Schema Well

Tool Descriptions Are Not the First Thing to Optimise

The Second Tool: modifyDiagram

The System Prompt Is Internal Instructions

Further Reading and Watching

Practice