“Should we fine-tune?” gets asked at every kickoff. Almost always, the honest answer is “not yet, and possibly not at all.” The question usually comes from a reasonable place — fine-tuning sounds like the serious, committed option, the one real AI companies do. In practice it is the last tool we reach for, not the first. Here is the decision pattern we use at Accolades, mapping three different techniques to three different problems.

Three techniques, three jobs

The fastest way to cut through the jargon is to notice that each technique answers a different failure. When a system is not doing what you want, the fix depends on why it is failing.

Prompt engineering solves behavior

You want the model to follow a specific persona, format outputs a specific way, or reason in a specific structure. The best prompt is the smallest prompt that hits your eval bar. Iterating on the prompt and system instructions handles the vast majority of “the model is not doing what we want” complaints.

A concrete example: a client wants customer emails classified into eight categories with a confidence score and a one-line justification. That is entirely a behavior problem. No retrieval, no training — a well-structured prompt with clear category definitions and a few worked examples gets you there, and you can change the categories next quarter by editing text instead of retraining anything.

RAG solves knowledge

Retrieval-augmented generation is for when you want the model to answer using information it could not possibly have memorized at training time — your private documents, your knowledge base, last week’s policy update. RAG injects that context into the prompt at inference time. The model itself does not change.

The tell that you need RAG: the model writes fluently but confidently wrong about your specifics. Your return policy, your product SKUs, your service contracts. No amount of prompt cleverness fixes ignorance; the model needs the source material in front of it. Retrieval is also the backbone of trustworthy systems generally — it is much easier to make a model cite what it just read than to hope it remembers correctly, a point we cover in more depth in Building AI Agents That Don’t Hallucinate.

Fine-tuning solves style and patterns that prompts cannot scale

You have thousands of examples of “what good looks like” in your specific domain — call transcripts, code in a niche framework, formal contracts — and you need the model to generate that style by default without a massive prompt. Fine-tuning teaches it.

Note what fine-tuning is not for: teaching the model facts. Facts belong in retrieval, where you can update them tomorrow. A fine-tuned model’s knowledge is frozen at training time, and refreshing it means another training run, another eval pass, and another deployment. Fine-tuning earns its keep on form, tone, and domain-specific patterns, not on content.

The trade-offs that decide it

Each layer you add carries an ongoing cost, and the costs are not symmetric.

Prompt engineering is nearly free to change. An edit, an eval run, a deploy. That agility is worth protecting, which is why we exhaust it first.

RAG adds real infrastructure: a document pipeline, chunking decisions, an embedding index, retrieval quality to measure and maintain. Most RAG failures in the wild are retrieval failures — the model answered badly because it was handed the wrong passages. But in exchange, your system’s knowledge is as fresh as your last document sync, and you can trace any answer back to its source. For businesses whose information changes weekly, that freshness is not optional.

Fine-tuning adds the heaviest ongoing commitment: training data curation, versioned models, regression testing every time you retrain, and a hosting story. It also quietly couples you to a specific base model at a specific moment, in a field where the strongest available model changes every few months. Every one of those costs is justified when the pattern-matching load is real and stable. Most of the time it is not.

When to combine them

Production systems almost always need two of the three. Most commonly: prompt engineering to set the contract, plus RAG to ground the answer in your data. Fine-tuning enters the picture later, when prompts have become unmanageably long because they are doing too much pattern-matching, or when per-request costs at scale make shorter prompts a genuine line item.

The order matters. Start with prompt engineering against the strongest available foundation model. Add RAG once your accuracy plateau is “the model does not know our data.” Only consider fine-tuning when you have shipped, you have an eval suite that tells you what “better” means, and the cost case for shorter prompts at scale is real. Teams that fine-tune before they have evals are tuning blind — they cannot tell whether the new model is better, only that it is different.

Most projects never reach step three. That is a feature, not a bug — every layer you add is a layer you have to maintain.

How this plays out in an engagement

In our own AI development work, this decision is a discovery-phase output, not a kickoff assumption. We map the workflow, look at the data you actually have, and write down which failure mode you are fighting — behavior, knowledge, or pattern — before anyone proposes an architecture. The proof-of-concept phase then tests the cheapest technique that could plausibly work, on your real data, against measurable criteria. If prompt engineering alone clears the bar, you just avoided months of unnecessary infrastructure; if it does not, we know precisely which gap RAG or fine-tuning has to close. The full three-phase rhythm is laid out in how a real AI engagement runs.

The pattern to take away: match the technique to the failure, start with the cheapest layer, and let evidence — not enthusiasm — pull you toward the expensive ones. If you are staring at a vendor proposal that leads with fine-tuning and you cannot articulate which of the three problems it solves, that is worth a conversation before you sign it.

RAG, Fine-Tuning, or Prompt Engineering? Choosing the Right AI Approach