GuideBuilding with LLMsClaude ChatGPT & GPT Prompting

Prompting Claude vs GPT: What Actually Differs

The prompting habits that carry between Claude and GPT, the ones that don't, and how each family wants to be steered in production.

By Matt Goren · Updated June 26, 2026 · 7 min read

I ship against both Claude and GPT in production, and the question I get most is some version of: "I have a prompt that works on one — will it work on the other?" The honest answer is mostly, and then the last 20% will bite you. The fundamentals of good prompting are portable. The steering — how each family wants to be talked to, structured, and corrected — is not. This piece is about telling those two layers apart so you don't waste a week debugging a "model problem" that's really a phrasing problem.

I'm writing about the model families as I use them today (2026): Anthropic's Claude — Opus 4.8, Sonnet 4.6, Haiku 4.5 — and OpenAI's GPT line behind ChatGPT and the API. I'm not going to invent benchmark numbers; the families leapfrog constantly and any score I quote would be stale. This is about behavior and craft.

What's portable: the 80% that carries everywhere

Before the differences, be clear about how much is shared. If you internalize one thing, make it this: most of what makes a prompt good is model-agnostic. These habits pay off on Claude, GPT, Gemini, and the open models alike.

Specificity beats cleverness. Tell the model exactly what you want, for whom, in what format, with what constraints. Vague prompts fail everywhere.
Examples steer better than adjectives. One or two concrete input/output examples (few-shot) move both families harder than a paragraph of description.
Define the output shape. Ask for JSON with named fields, or a fixed structure, and say what to do when data is missing. This is the single biggest reliability lever on any model.
Give the model room to think. Letting a model reason through a hard problem before it answers improves quality across the board.
Put stable rules in the system prompt, the task in the user turn. Both families honor this split.

If your prompt is failing on both models, the problem is the prompt, not the family. I cover this portable core in depth in prompt engineering for production. Get that right first, then worry about the deltas below.

Where Claude and GPT actually differ

Now the interesting part — the 20% that doesn't transfer cleanly.

Structure and delimiters

Claude has a strong, well-documented affinity for structure. When I write Claude prompts I delimit sections with XML-style tags — <instructions>, <context>, <examples>, <document> — and it reliably keeps them straight. It makes long, multi-part prompts dramatically easier to steer and to read.

<instructions>
Summarize the document for a busy executive. Three bullets, no preamble.
</instructions>

<document>
{{ pasted text }}
</document>

GPT also benefits from clear structure, but it's less ceremonial about it — Markdown headers, numbered lists, and plain delimiters work well. It tends to be a touch more forgiving of loose, conversational prompts, and it's happy to fill in gaps you left open. The flip side: that helpfulness means GPT will sometimes do more than you asked, where Claude tends to do exactly what you said.

Literalness and steering

The practical difference I feel most: Claude tends to be careful and literal with explicit instructions. If I tell it "output only JSON, no prose," it really means it. That precision is a gift when you write tightly and an annoyance when you're sloppy — vague instructions get a careful, narrow answer rather than a helpful guess. GPT, as a strong generalist, more often infers your intent and rounds toward "what you probably meant."

The takeaway for steering: with Claude, say precisely what you want and what you don't want — it will honor both. With GPT, you sometimes need to actively constrain its helpfulness ("do not add explanations, do not suggest next steps") to stop it from over-delivering.

Neither is "better." They reward different prompting discipline. And both shift version to version, so treat these as starting hypotheses you confirm on your own task, not laws.

Tone and refusals

Each family has its own default voice and its own way of declining. Claude leans toward thoughtful, slightly more cautious phrasing and will explain its reasoning when it pushes back; GPT has its own characteristic style. When a model refuses something benign, the fix is usually clarifying intent and context rather than fighting it — and the wording that unblocks one family may not be what unblocks the other. Keep your refusal-handling phrasing in the model-specific layer, not the shared core.

Tool use and structured output

Both families support tool calling (function calling) and structured output, and both are good at it — but the request and response shapes differ, and the phrasing that makes tool calls reliable differs too. With Claude I'm explicit in the system prompt about when to call a tool versus answer directly, and I lean on its structured tool schema. The mechanics — how you declare tools, how the model signals a call, how you return results — are provider-specific plumbing. Write your tool-orchestration logic with a thin adapter per provider so the business logic stays shared. I go deeper on Claude's specifics, including its tool and system-prompt conventions, in building with Claude.

A portability strategy that survives contact with production

Here's how I actually structure prompts when I want to run against more than one family without a maintenance nightmare.

Write a portable core. The instructions, the examples, the schema, the task description — these go in one shared block written in clean, structured, model-neutral language. This is 80% of the prompt and it changes rarely.

Keep a thin per-model layer. A small override file per family handles the parts that genuinely differ: how output formatting is requested, how tools are declared, how refusals are nudged, whether you wrap sections in XML tags or Markdown. When you add a model, you write one small adapter, not a whole new prompt.

core/extract.txt        # shared instructions + schema + examples
adapters/claude.txt     # XML wrapping, tool schema, system-prompt rules
adapters/gpt.txt        # markdown structure, function-call format

Re-tune, never blind-port. When I move a prompt to a new family, I run it against a fixed set of real test cases — twenty or so inputs with known-good outputs — and read every result. The portable core usually survives. The deltas show up in formatting and edge cases, and I fix them in the adapter. This takes an hour and saves you from shipping a prompt that's quietly 15% worse because you assumed it transferred.

Don't trust folklore — trust your eval. Every "Claude likes X, GPT likes Y" rule, including the ones in this article, is a hypothesis until you've checked it on your task and your model versions. The families change with every release. A small, boring eval harness is worth more than any blog post, mine included.

The honest bottom line

If you write good prompts — specific, structured, example-driven, schema-constrained — they will largely carry between Claude and GPT. The differences are real but they live at the edges: Claude rewards precise, literal, tag-structured instructions and does exactly what you say; GPT is a forgiving generalist that infers intent and sometimes over-helps. Build a portable core, keep a thin per-model adapter, and re-tune against real test cases when you switch. That's the whole game. The model family matters less than the discipline you bring to it.

FAQ

Is a prompt that works on GPT going to work on Claude? Mostly, yes — the fundamentals are portable. Clear instructions, examples, and a defined output schema help both. But the optimal phrasing, structure, and steering differ enough that you should re-tune, not copy-paste, when you switch families.

Does Claude really care about XML tags? Claude responds very well to structure, and XML-style tags are a clean way to delimit sections like instructions, context, and examples. It is not magic — any clear delimiter helps — but tags are a reliable habit that makes Claude prompts easier to read and steer.

How important is the system prompt versus the user message? Very. Both families treat the system prompt as the place for durable role, rules, and tone. Put the stable instructions there and the specific task in the user turn. Claude in particular leans heavily on a well-written system prompt.

Should I tune prompts per model or write one prompt for all of them? Write portable prompts where you can, then keep a thin model-specific layer for the parts that differ — output formatting, refusal handling, tool-call phrasing. A shared core plus small per-model overrides is the maintainable pattern.

Which family follows instructions more literally? It varies by version and task, so test rather than trust folklore. In my experience Claude tends to be careful and literal with explicit instructions, which rewards precise wording, while GPT is a strong generalist that often fills gaps helpfully. Both reward specificity.

#claude#gpt#chatgpt#prompting#system-prompt#tool-use#llm

Want to apply this right now?

Use the free, no-API prompt generators to put it into practice.

Open Prompt Studio →

Keep reading

Guide