Building With Claude: Strengths, Quirks, and How to Get the Most Out of It
How I build with Claude in production: where it shines, which tier to use, prompt caching, structured output, extended thinking, and the honest limits.
Claude is the model I reach for first, and I want to be specific about why instead of waving at benchmarks. I run RunOctopus, a content engine that is really thousands of orchestrated model calls reading a merchant's entire store, reasoning about what to publish, judging its own output, and shipping it live. That workload is unforgiving: long inputs, careful rules, real money downstream. Claude earns its place there by being good at exactly the things that break most AI features in production.
This is the practical guide I wish I had when I started: where Claude actually shines, how the model tiers map to real decisions, the levers that change your cost and latency the most, and the honest quirks you need to design around. Model specifics move fast, so treat the names and tiers as a snapshot and the reasoning as the durable part.
Where Claude shines
The headline strength is following nuanced instructions. A lot of models do fine when you ask for one thing. Claude holds up when you ask for one thing with seven constraints, three exceptions, and a tone requirement, and it does not quietly drop the constraints it finds inconvenient. For anything that is really a spec dressed up as a prompt, that reliability is the whole game.
Sustained reasoning over long context is the second pillar. Claude supports a large context window, and it does not just technically accept the tokens — it actually uses information buried in the middle of a long document rather than fixating on the beginning and end. When I feed it a merchant's full catalog plus brand guidelines plus prior content, it reasons across all of it coherently. That is the difference between a model that summarizes and a model that synthesizes.
Tool use and agentic work is the third. Claude is strong at deciding when to call a tool, calling it with well-formed arguments, reading the result, and continuing toward a goal across many steps without losing the thread. Long-horizon loops are where weaker models fall apart — they forget the objective, repeat themselves, or hallucinate a tool result. Claude stays on task. If you are building agents, this is the capability that matters most.
And then there is writing. Claude produces prose that reads like a careful human wrote it, not like a content mill set to maximum buzzword. It handles voice and register well, so when I tell it to write in a specific first-person builder voice, it actually does, instead of defaulting to the same flat marketing cadence every model seems to fall into. For a content business, that is not a nice-to-have; it is the product.
The model tiers and when to use each
Anthropic's Claude family splits into tiers, and choosing well is the single biggest lever on your cost and latency. As of this writing the lineup is Opus 4.8, Sonnet 4.6, Haiku 4.5, and Fable 5.
Opus 4.8 is the most capable. It is what you want for genuinely hard reasoning, complex multi-step agentic loops, tricky code, and any step where a wrong answer is expensive enough to justify paying more and waiting longer. Do not reach for it reflexively. Most tasks do not need it, and using it everywhere is how you light money on fire.
Sonnet 4.6 is the balanced default, and I mean default literally. Start here for almost everything. It is fast enough for interactive use, capable enough for the large majority of production work, and priced so you can run it at volume. The right instinct is to assume Sonnet can do the job until an eval proves it cannot, then move up.
Haiku 4.5 is the fast, cheap tier. Use it for high-volume and latency-sensitive work: classification, extraction, routing, simple transformations, the inner loop of something that runs millions of times. A well-prompted Haiku call on a narrow task often matches a bigger model while costing a fraction and returning in a blink.
Fable 5 rounds out the family as a newer entry. As always, check current docs for exact positioning, because the lineup shifts.
The pattern I follow: cascade. Cheap model for the bulk work, escalate to a bigger one only on the steps that need it. A judge step that scores output can run on Sonnet while the generation runs on Haiku, or a hard sub-problem can escalate to Opus while the orchestration stays cheaper. Match the tier to the difficulty of each step, not to the importance of the overall feature.
The levers that change everything
Three Claude capabilities will reshape your cost and latency more than any prompt tweak, so learn them early.
Prompt caching. If a big chunk of your prompt stays constant across requests — a long system prompt, a reference document, a fat set of tool definitions — caching lets you reuse that processed prefix instead of paying to reprocess it every call. In agent loops, where the same system prompt and tools ride along on every turn, this is a large saving on both cost and time. The rule of thumb: put the stable, expensive stuff at the front of your prompt and cache it, then vary only the tail.
Structured output. When you need machine-readable results, do not parse freeform prose and hope. Use tool use or a defined schema to specify the exact shape, give one clean example, and let Claude fill it. It follows these contracts well. You still validate on your side, because the schema is your contract and parsing defensively is just good engineering, but the drift rate is low enough to build on.
Extended thinking. Claude can reason through a problem before it answers, which materially helps hard, multi-step tasks — thorny logic, planning, careful analysis. It is not free. It adds latency and tokens, and for simple work it buys you nothing. Turn it on for the genuinely hard calls and leave it off for routine extraction and formatting. Like the tiers, it is a per-step decision, not a global switch.
A short SDK snippet
Here is the shape of a basic call with the official SDK, @anthropic-ai/sdk:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic(); // reads ANTHROPIC_API_KEY
const msg = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: "You are a precise assistant. Follow the schema exactly.",
messages: [
{ role: "user", content: "Summarize this in three bullet points: ..." },
],
});
console.log(msg.content);
From here you layer on tools (pass a tools array and handle tool_use blocks), structured output (define a tool whose input schema is your target shape), caching (mark stable prompt segments), and extended thinking — all on the same messages.create surface. Check current docs for the exact parameter names, because the SDK evolves.
The honest quirks and limits
Claude is good, not magic, and building well means designing around the rough edges.
It has a training cutoff. It does not know what happened after it was trained, so for anything time-sensitive you supply the facts in context or give it a tool to fetch them. Do not assume it knows today's news, today's prices, or your private data.
It can be confidently wrong. Like every model, Claude will sometimes produce a fluent, plausible, incorrect answer. The fix is not to trust harder; it is to ground it with real context, give it tools for facts and math instead of asking it to recall or compute from memory, and verify anything headed for a money or destructive path. In my own systems, the rule is that the model never writes to an irreversible action without a check in front of it.
It will follow your instructions, including the bad ones. The flip side of strong instruction-following is that a sloppy or contradictory prompt gets faithfully executed. If your output is wrong in a systematic way, look at your prompt before you blame the model — most of the time the spec was the bug.
And it can be verbose if you let it. Claude defaults to thoroughness. When you want terse, say so explicitly, give the format, and it complies. The model is not fighting you; it just guesses long when you leave the length open.
None of this is disqualifying. It is the normal texture of building with a probabilistic system. You get the upside — reasoning, long context, tool use, real writing — by respecting the limits and engineering for them. For the wider playbook on shipping any model into production, see my building with LLMs field guide, and if you are weighing Claude against the alternatives, my Claude vs GPT vs Gemini comparison for builders lays out how I actually choose.
FAQ
Which Claude model should I default to? Default to Sonnet 4.6. It is the balanced tier and handles the overwhelming majority of production work well. Drop to Haiku 4.5 for high-volume, latency-sensitive, or simple tasks where cost matters, and reach up to Opus 4.8 for the genuinely hard reasoning, long-horizon agentic loops, and steps where a wrong answer is expensive.
What is Claude genuinely best at? Nuanced instruction-following, sustained reasoning over long context, multi-step tool use and agentic work, and writing that does not sound like a press release. If your task involves reading a lot, following careful rules, and producing something a human will actually read, Claude is a strong pick.
What is prompt caching and when should I use it? Prompt caching lets you reuse a large, stable chunk of your prompt — a system prompt, a long document, a set of tool definitions — across many requests so you do not pay full price to reprocess it every time. Use it whenever a big prefix stays constant between calls, which is most agent loops and document-heavy workflows.
What is extended thinking and does it always help? Extended thinking lets the model reason through a problem before answering, which improves hard multi-step tasks. It does not always help. For simple extraction, classification, or formatting, it adds latency and cost without improving the answer, so turn it on for genuinely hard reasoning and leave it off for routine calls.
How do I get reliable structured output from Claude? Define the exact shape you want with tool use or a schema, give one clear example, and validate the result on your side. Treat the schema as the contract and parse defensively. Claude follows structured-output instructions well, but you still own validation because any model can occasionally drift.
What are Claude's honest limits? It has a training cutoff so it does not know recent events without you supplying them, it can state wrong things confidently, and it is not a calculator or a database. Give it tools for facts and math, ground it with real context, and never ship its output to a money or destructive path without verification.
Use the free, no-API prompt generators to put it into practice.
Building With GPT and the OpenAI Stack: A Practical Guide
Where GPT and the OpenAI ecosystem fit for builders: multimodal, function calling, ecosystem breadth, when to reach for it, and the honest tradeoffs.
GuideBuilding With Gemini: Where Google's Model Fits
Where Gemini fits in a builder's toolkit: huge context, strong multimodal, Google ecosystem and data integration, and the honest tradeoffs to plan for.
GuideBuilding With Grok (xAI): Where It Fits
An honest operator's take on xAI's Grok — its real-time and X-data edge, where you'd reach for it, and the tradeoffs to weigh.