Open-Weight vs Closed Models: What Builders Should Actually Use
Open-weight or closed API model for your product? Capability, cost, privacy, control, support, and total cost of ownership, decided by use case.
"Should I use an open-source model?" is one of the most common questions I get from people building with AI, and it's usually asked as if it were a moral or tribal choice. It isn't. For a builder it's an engineering and economics decision with a handful of concrete trade-offs, and once you lay them out the answer for any given job becomes pretty clear. The split that matters isn't "free vs paid" — it's open-weight models you can download and run yourself versus closed models you reach through a provider's API. This is a head-to-head on the dimensions that actually decide your build, with a verdict by use case at the end. For the wider map of who makes what, I keep that in the frontier model landscape.
What we're actually comparing
By open-weight I mean models whose weights are published so you can download them and run them on your own hardware or a rented GPU — the strong open families released by a handful of large labs. You control the box they run on. (People say "open-source," but most of these release the weights, not the full training recipe and data, so "open-weight" is the more honest term.)
By closed I mean models you don't get to hold — you send a request to a provider's API and get a response back. Anthropic's Claude family lives here, and I'll name its tiers precisely because I build on them daily: Opus 4.8 for the hardest reasoning, Sonnet 4.6 as the balanced default, Haiku 4.5 for fast and cheap, plus Fable 5. OpenAI's GPT family and Google's Gemini family are also closed-API; I'll describe them by role rather than quote specifics, since those revise fast and a stale number misleads. The frontier — the top of the capability curve — is still closed today, but the gap to the best open-weight models is narrower than the tribal version of this debate admits, and it's moving.
The dimensions that matter
| Dimension | Open-weight (self-hosted) | Closed (API) |
|---|---|---|
| Capability ceiling | Strong and rising; trails the frontier on hardest reasoning and agent reliability | Highest available; the frontier tiers live here |
| Cost shape | Per-token cheap, but you pay GPUs + ops; wins only at high, steady volume | Pay-per-call; cheaper in total below high volume; scales with usage |
| Privacy / data control | Strongest — nothing leaves your environment; good for air-gapped/regulated | Strong with enterprise terms (no-training, data handling), but data leaves your box |
| Control / customization | Full — fine-tune, quantize, pin a version forever, modify serving | Limited to what the provider exposes; versions can deprecate |
| Support / reliability | You own uptime, scaling, and incidents | Provider owns uptime and scaling; you get an SLA |
| Speed to ship | Slower — infra, serving, and ops to stand up | Fastest — an API key and you're building |
| Total cost of ownership | Infra + engineering + ops time, not just tokens | Mostly the API bill + light integration |
Read that as a map of trade-offs, not a winner. Now the rows worth expanding.
Capability
The best closed frontier models still hold the top of the curve for the hardest work — sustained multi-step reasoning, the most reliable tool calling, long agent runs that don't drift. That's the honest state of things, and specifics move fast. But here's the part the tribal debate misses: a huge share of real product work isn't at the top of that curve. Classification, extraction, summarization, routing, straightforward generation — strong open-weight models handle these at a quality that's genuinely fine for production. So the capability question isn't "is open-weight as smart as the frontier?" (not quite, at the top) but "is it smart enough for this job?" (very often, yes).
Cost and total cost of ownership
This is where people get fooled. The per-token price of running your own open-weight model can look dramatically lower than an API bill, and that headline number sells a lot of self-hosting projects that shouldn't happen. The trap is that tokens are only one line. Self-hosting means you're now paying for GPUs, the engineering to stand up serving, autoscaling for traffic spikes, monitoring, on-call when it falls over, and the ongoing ops time to keep it healthy. Fold all of that in and the picture flips: below a high and steady volume, a closed API is almost always cheaper in total cost of ownership, because the provider amortizes the infrastructure across everyone. Self-hosting's economics only turn favorable when your volume is large, predictable, and you have a team that can run inference infrastructure as a real competency. Most builders are nowhere near that line.
Privacy and self-hosting
Self-hosting gives you the strongest data control that exists: the data never leaves your environment, full stop. For air-gapped systems or anything where "the data physically cannot go to a third party" is a hard requirement, that's the deciding factor and a legitimate reason to take on the ops burden. But don't reach for self-hosting out of vague unease. The major closed providers offer enterprise terms — no-training-on-your-data guarantees, defined data handling and retention — that clear the privacy bar for the large majority of products. Decide on your actual compliance requirement written down, not a gut feeling that "the cloud is scary."
Control and lock-in
Open-weight's real superpower is control. You can fine-tune on your own data, quantize to fit cheaper hardware, modify how it's served, and — underrated — pin a version forever. A closed provider can deprecate the exact model you tuned your evals against, and you have to re-validate. With weights on disk, the model behaves the same next year as today unless you change it.
The flip side is the lock-in fear people pin on closed models, and it's mostly overstated if you build right. The actual risk isn't the price — it's wiring your product so tightly around one provider's quirks that you can't leave. The fix is identical whichever way you lean: put a thin routing layer between your app and the model so the thing underneath is swappable. Do that and a closed API stops being a marriage and becomes a replaceable part — which is how I architect every build, including the ones I run on Claude by choice.
Speed to ship and support
If you're trying to get something in front of users this week, closed APIs win on raw speed: an API key and you're generating. Standing up your own serving stack — model loading, batching, autoscaling, monitoring — is real engineering before you write a line of product. And once you're live, the support question is just "who gets paged at 3am?" With a closed API, the provider owns uptime and you get an SLA. Self-host and that pager is yours. For a small team, that operational ownership is often the hidden cost that dwarfs the token savings.
Verdict: decision rules by use case
Stop treating this as identity and route by job. Here's how I actually decide:
- Shipping fast / small team / early product: use a closed API. Speed to ship and zero ops are worth far more than token savings you can't yet realize. Start here by default.
- High-volume, low-stakes fan-out (classification, extraction, tagging, routing, first-pass drafts): a cheap open-weight model is a great fit if your volume is high and steady enough to amortize the infra — otherwise the cheap tier of a closed API (Haiku-class) gets you the same savings with none of the ops.
- Frontier judgment (hardest reasoning, final-quality output, long reliable agent runs, anything irreversible and user-facing): pay for a closed frontier tier — Opus 4.8 is my default here. The top of the curve is still closed, and that's the job worth paying for.
- Hard privacy / air-gapped / regulated where data physically cannot leave your environment: self-host open-weight. This is the clearest open-weight win, and it's about compliance, not cost.
- You need full control — pin a version forever, fine-tune deeply, modify serving, escape deprecation: open-weight, eyes open about the ops bill.
- The honest default for most builds: hybrid. Run a cheap model (open-weight if you're at scale, closed-cheap if not) for the high-volume fan-out, and escalate the hard or high-stakes calls to a closed frontier model. You bank most of the cost savings where it's safe and keep the quality ceiling where it counts.
And the meta-rule that outlasts every version: don't hard-wire your product to one model, open or closed. Build the thin routing layer, keep the model swappable, and let the price-performance and privacy math — which shifts every few months — pick the engine. The builders who win treat models as interchangeable parts behind a stable interface, not as a flag to plant.
FAQ
Are open-weight models good enough to build a real product on?
For many jobs, yes. The strong open-weight families now handle classification, extraction, summarization, routing, and a lot of straightforward generation at a quality that's fine for production. Where they still trail the best closed frontier models is the hardest reasoning, the most reliable tool use, and long multi-step agent work. So the honest answer is: open-weight is good enough for a large slice of the work, and the frontier closed models are still worth paying for at the top of the difficulty curve.
Is self-hosting an open-weight model cheaper than a closed API?
Only at scale, and only after you count everything. The per-token price of running your own model can look far lower than an API bill, but you're now paying for GPUs, ops time, scaling, monitoring, and the engineering to keep it all up. Below a high and steady volume, a closed API is almost always cheaper in total cost of ownership because someone else eats the infrastructure. Self-hosting wins when volume is large, predictable, and you have the team to run it.
Do I need to self-host to keep my data private?
Not necessarily. Self-hosting an open-weight model gives you the strongest possible data control because nothing leaves your environment, and for some regulated or air-gapped use cases that's the deciding factor. But the major closed providers also offer enterprise terms with no-training guarantees and data-handling commitments that satisfy most privacy requirements. Decide based on your actual compliance bar, not a vague unease.
What's the real lock-in risk with a closed model?
The risk isn't usually the price — it's building your whole product so tightly around one provider's quirks that you can't leave. The fix is the same whichever way you go: put a thin routing layer between your app and the model so the model underneath is swappable. Do that and a closed API stops being a marriage and becomes a part you can replace.
Can I mix open-weight and closed models in one product?
Yes, and it's often the best design. Run a cheap open-weight model for high-volume, low-stakes fan-out work, and escalate the hard or high-stakes calls to a closed frontier model. You get most of the cost savings of open-weight where it's safe, and the quality ceiling of the frontier where it counts. That hybrid is what a lot of mature builds actually look like.
Use the free, no-API prompt generators to put it into practice.
Big Model vs Small Model: When Cheap and Fast Wins
Frontier model or small fast one? Quality, cost, latency, and reliability head to head, plus the fan-out-cheap, escalate-to-frontier pattern.
ComparisonClaude vs GPT vs Gemini: Picking a Model as a Builder
Choosing an LLM to build on, not chat with: reasoning, tool use, context, cost tiers, and where each family actually wins.
GuideBuilding With Grok (xAI): Where It Fits
An honest operator's take on xAI's Grok — its real-time and X-data edge, where you'd reach for it, and the tradeoffs to weigh.