MMatt Goren
← AI hub
GuideBuilding with LLMs

Structured Output and Tool Use, End to End

How I get reliable JSON out of a model, design tools it can call, validate and repair the output, and wire the whole thing into a real app.

By Matt Goren · Updated June 25, 2026 · 8 min read

For a long time I treated getting JSON out of a model as a battle. I would beg in the prompt, "Respond ONLY with valid JSON, no preamble," and most of the time it worked, and then one response in fifty would arrive wrapped in "Here's the JSON you asked for:" and my parser would explode in production. The fix was to stop asking and start constraining. Modern models support real structured output and real tool use, and once you wire those in properly, the model becomes a dependable component instead of a creative writer you have to wrestle.

This guide walks the whole path: getting reliable JSON, designing tools the model can call, validating and repairing what comes back, and stitching it into an app. I will use the official @anthropic-ai/sdk because that is what I build on, but the shapes carry over.

Getting reliable JSON out of a model

There are two ways to ask a model for structured data, and only one of them holds up.

The fragile way: ask in the prompt

You write "return a JSON object with fields name and email" and parse whatever comes back. This works until it does not. The model adds a friendly sentence, wraps the JSON in a markdown fence, or trails a comment. Your JSON.parse throws, and now you are writing brittle regex to claw the JSON out of prose. I have done this. It is a tax you pay forever.

The reliable way: constrain with a schema

The better approach is structured output, where you hand the API a JSON schema and the model is constrained to produce output that matches it. The response is the data, not a story containing the data. With the Anthropic SDK this is output_config.format with a json_schema:

import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();

const res = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 1024,
  output_config: {
    format: {
      type: "json_schema",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          email: { type: "string", format: "email" },
          wants_demo: { type: "boolean" },
        },
        required: ["name", "email", "wants_demo"],
        additionalProperties: false,
      },
    },
  },
  messages: [{ role: "user", content: "Extract the contact info: " + raw }],
});

A few rules I have learned to respect. Every object needs additionalProperties: false and a required list, or the schema is looser than you think. Keep the schema to supported features; basic types, enums, and anyOf are safe, but exotic constraints like string-length minimums are not enforced and get stripped. And give fields plain, descriptive names, because the names are part of how the model decides what goes where. This single switch eliminated almost all of my JSON parsing failures.

Tool-forcing as an alternative

Before structured output existed, the common trick was to define a single tool with your schema and force the model to call it. That still works and is sometimes handy, but for "I just want the answer in this shape," reach for structured output first; it is the purpose-built tool for the job.

Designing tools the model can call

Structured output shapes the final answer. Tool use is different and more powerful: it lets the model decide to call functions you define, then act on what they return. You give the model a search function, a database lookup, a "send email" action, and it weaves them into its reasoning. This is the backbone of agents.

A tool definition is a name, a description, and an input schema:

const tools = [
  {
    name: "get_order_status",
    description:
      "Look up the current status of a customer order. " +
      "Call this whenever the user asks where their order is or " +
      "mentions an order number.",
    input_schema: {
      type: "object",
      properties: {
        order_id: { type: "string", description: "The order ID, e.g. ORD-1024" },
      },
      required: ["order_id"],
    },
  },
];

The design lessons that matter most:

Write the description for the model's decision, not your documentation. The model reads the description to decide whether to call the tool. So say what it does and, crucially, when to call it. "Call this whenever the user asks where their order is" earns far more reliable triggering than a bare "Gets order status." The trigger condition is the load-bearing part.

Keep the active tool set small and focused. A handful of well-named tools beats twenty overlapping ones. Too many choices blur the model's decision and it starts calling the wrong one. If you genuinely have a large library, load only the relevant few per request rather than dumping all of them in.

Promote actions to dedicated tools when you need control. A single generic "run this command" tool gives you almost no leverage. A specific send_email tool, by contrast, is something you can validate, gate behind a confirmation, log, and render in your UI. Anything irreversible or sensitive, like sending a message or deleting data, deserves its own tool so your code sits in the path.

Mark tools strict when you need a guaranteed shape. Setting strict: true on a tool definition (alongside name and schema, not on the tool choice) makes the model's call validate exactly against the schema, which removes a class of "the arguments were almost right" bugs.

Validating and repairing output

Here is the discipline that separates a demo from a product: even with schemas, you validate at the boundary. Schema-constrained output is dramatically more reliable, not infallible. The model can hit its token limit mid-object, or, on the safety-classifier models, decline a request and return a refusal with empty content. Code that reaches straight for response.content[0].text will crash on exactly those cases.

So my receiving path always does three things. It checks the stop reason first, because a max_tokens truncation or a refusal is not a parse problem and must be handled separately. It parses defensively with a real validator (I use a schema library like Zod so the parsed object is typed and any drift throws a clean error I can catch). And it has a repair path: if validation fails, I can send the bad output back to the model with the error message and ask it to fix the specific field, or I retry once with a tightened prompt. One repair attempt catches almost everything; if it fails twice, that is a real signal worth surfacing, not swallowing.

if (res.stop_reason === "refusal") return handleRefusal(res);
if (res.stop_reason === "max_tokens") return retryWithMoreTokens();

const text = res.content.find((b) => b.type === "text")?.text ?? "";
const parsed = ContactSchema.safeParse(JSON.parse(text));
if (!parsed.success) return repair(text, parsed.error);

The point is not paranoia. It is that the boundary between an unpredictable model and your deterministic code is exactly where things break, so that is where the guardrail goes.

Wiring it into an app

Tool use is not one request; it is a loop. The model asks to call a tool, you run it, you feed the result back, and the model continues until it is done. The shape:

  1. Send the user message plus the tool definitions.
  2. If the response's stop reason is tool_use, pull out each tool call, execute it in your code, and collect the results.
  3. Append the model's turn and a single user turn carrying all the tool results, then call again.
  4. Repeat until the stop reason is end_turn, and cap the loop so a misbehaving model cannot spin forever.

Two details I always get wrong if I am not careful. When the model makes several tool calls at once, all of their results go back in one user message, not several; splitting them quietly teaches the model to stop calling tools in parallel. And every tool result must carry the matching tool-use id, including failures: a failed tool returns a result flagged as an error so the model can recover, rather than being dropped, which leaves the conversation in a broken, unanswerable state.

The official SDKs ship a tool runner that drives this loop for you, executing your functions and looping until the model is done. I reach for it when I want the default behavior, and I write the loop by hand when I need to gate a tool behind human approval or log every step. Either way, the mental model is the same: structured output gives me a dependable shape for answers, tool use gives the model dependable hands, and validation sits between them so the whole thing survives contact with real users.

If you are building toward full agents on top of these primitives, my guide on agents that actually work covers the loop, the failure modes, and when an agent is overkill. For where structured output and tool use sit in the broader stack, see the building with LLMs field guide.

FAQ

How do I force a model to return clean JSON? Use the API's structured output mode, where you pass a JSON schema and the model is constrained to match it, rather than asking for JSON in the prompt and hoping. With the Anthropic SDK that's output_config.format with a json_schema.

What's the difference between structured output and tool use? Structured output shapes the model's final answer into a fixed schema. Tool use lets the model call functions you define and act on the results. They share the same schema machinery, and real apps use both together.

Do I still need to validate output if I used a schema? Yes. Schema-constrained output is far more reliable, but you still validate at the boundary, parse defensively, and have a repair path for the rare malformed or refused response.

How many tools should I give a model at once? Keep the active set small and focused. Too many tools blur the model's decision about which to call. If you have a large library, load only the relevant few per request.

What should a tool description contain? Say what the tool does and, just as important, when to call it. A clear trigger condition in the description meaningfully improves how reliably the model reaches for the right tool.

#tool-use#structured-output
Want to apply this right now?

Use the free, no-API prompt generators to put it into practice.

Open Prompt Studio →
Keep reading