Building LLM applications that ship: RAG, tools, and the moat that's actually defensible

Aior · Friday at 1:42 AM

The shape of an LLM application that survives a year

There's a very specific kind of LLM application that ships, gets used, and is still running a year later — the one where the model is a component in a system, not the system itself. Below are the patterns we've seen consistently produce that outcome.

RAG (retrieval-augmented generation): when it's right and when it isn't

The default architecture for "use the LLM to answer questions about my documents". Fundamentals:

Chunk the corpus into retrievable units (paragraphs, sections, document fragments)
Embed the chunks; store in a vector database
On a query, embed the query, retrieve top-k chunks
Pass query + retrieved context to the LLM, ask it to answer with citations

Where it works: factual Q&A over a defined corpus, especially when the corpus is large and changes often.

Where it doesn't:

Tasks requiring synthesis across many chunks — naive RAG retrieves locally relevant chunks, misses the global picture
Tasks where the answer requires reasoning about the corpus structure (e.g. "how often does this term appear")
Cases where the user's query doesn't lexically match the documents — retrieval misses the right chunks

Mitigations: hybrid retrieval (lexical + semantic), reranking, query rewriting, hierarchical / graph-based RAG. Each adds engineering cost.

Tool use is the more durable pattern

Letting the model call tools (functions) is increasingly the architecture that holds up. The model becomes the orchestration layer; the tools do the actual work.

The pattern:

Define a small, well-named set of tools — search, fetch, compute, write
Each tool has a strict input schema and a structured output
The model is given the toolset and instructed to use them when relevant
The application validates and executes the tool calls; results go back to the model

Tool use survives model upgrades better than RAG-only architectures. The application's value is in the tools (which you own) and the orchestration (which the model handles).

Agents — the careful version

Multi-step agents (model plans, executes, evaluates, replans) are useful for narrow domains where the cost of incorrect autonomy is bounded:

Code generation with test feedback (write code, run tests, fix failures)
Data analysis with iterative queries (ask, examine, refine)
Customer support triage (gather information, classify, route)

Where they fail:

Long-horizon tasks without good intermediate signals — the agent drifts off-task
Tasks where every step has irreversible side effects — the cost of a bad step compounds
Tasks where the user expects determinism — agents are inherently non-deterministic

The moat — what's actually defensible

The model is not your moat. The model upgrades for everyone simultaneously. What is defensible:

Proprietary data and the right to use it — the corpus and its rights structure
Domain-specific evaluation — the eval set that lets you ship reliably in your domain
Workflow integration — the user's existing tools, processes, deployments
Trust and accountability — being the company that takes responsibility when the model is wrong
Enterprise-grade plumbing — auth, audit, compliance, multi-tenancy

The LLM application that competes on "we have the best prompt" is the one that loses next quarter to someone with the same prompt and a better business.

The cost conversation[/HEADING>
LLM costs are real and they scale with usage:

Cache aggressively — same query, same answer, no API call

Right-size the model — don't use the most powerful model for tasks the cheaper model handles

Limit the context window — most production calls don't need 200k tokens

Stream where the user is waiting; batch where they're not

Track per-feature spend — cost attribution is essential

One pattern we'd warn about

The "wrap everything in an LLM" temptation. If a deterministic algorithm can do the job, use it. LLMs for the parts that genuinely need natural language understanding; SQL / code / regex for everything else.

One pattern that always pays off[/HEADING>
Logging the full conversation (input, intermediate steps, output, model version, latency, cost) for every production call. Enables eval set construction, regression debugging, and cost optimisation. Storage is cheap; replayability is gold.

What's your LLM stack? And — for the agent folks — what's the longest-horizon task you've had reliably autonomous in production?

Building LLM applications that ship: RAG, tools, and the moat that's actually defensible

Building LLM applications that ship: RAG, tools, and the moat that's actually defensible

Aior

Administrator

The shape of an LLM application that survives a year

RAG (retrieval-augmented generation): when it's right and when it isn't

Tool use is the more durable pattern

Agents — the careful version

The moat — what's actually defensible

One pattern we'd warn about

Forum statistics

Members online

Latest posts

Newest members

Featured content

Trending content

Share this page

Legal Notice

We value your privacy

Building LLM applications that ship: RAG, tools, and the moat that's actually defensible

Building LLM applications that ship: RAG, tools, and the moat that's actually defensible

Aior

Administrator

The shape of an LLM application that survives a year​

RAG (retrieval-augmented generation): when it's right and when it isn't​

Tool use is the more durable pattern​

Agents — the careful version​

The moat — what's actually defensible​

One pattern we'd warn about​

Forum statistics

Members online

Latest posts

Newest members

Featured content

Trending content

Share this page

Tüm ihtiyaçlarınız için Teklif alın

Legal Notice

We value your privacy

The shape of an LLM application that survives a year

RAG (retrieval-augmented generation): when it's right and when it isn't

Tool use is the more durable pattern

Agents — the careful version

The moat — what's actually defensible

One pattern we'd warn about