All courses > Technology and Programming > Artificial Intelligence and Machine Learning ::

Designing Grounded Workflows

Capítulo 11

Estimated reading time: 14 minutes

What “Grounded” Means in an LLM Workflow

A grounded workflow is a process where an LLM’s outputs are explicitly tied to verifiable sources of truth—documents, databases, APIs, logs, or other authoritative systems—so that the model is not asked to “invent” facts. In practice, grounding means two things: (1) the model is given access to relevant evidence at the time it generates an answer, and (2) the workflow enforces checks so the final output stays consistent with that evidence.

Grounding is not a single feature; it is a design pattern. You decide what the truth sources are, how the model can access them, and what rules prevent unsupported claims. A grounded workflow often includes retrieval (getting the right evidence), transformation (summarizing or extracting), reasoning (connecting evidence to the user’s request), and verification (ensuring the response is supported and safe).

Examples of “truth sources” commonly used for grounding include: internal policy documents, product manuals, a CRM database, a ticketing system, a code repository, a pricing API, a knowledge base, or a curated set of FAQs. The key is that the workflow treats these sources as the authority, and the LLM as a language interface that must cite, quote, or otherwise anchor its statements to what it was given.

Core Principles for Designing Grounded Workflows

1) Separate “knowledge” from “language”

In a grounded workflow, the LLM is primarily responsible for language tasks: interpreting the user’s intent, drafting a response, restructuring information, and explaining. The workflow (and the connected systems) are responsible for knowledge: fetching the right records, enforcing business rules, and determining what is allowed to be said.

This separation reduces risk. If the model is asked to recall policy details from memory, it may be wrong. If the workflow fetches the policy section and the model is asked to explain it, the model’s job becomes much more reliable.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

2) Make evidence explicit and inspectable

Grounding works best when the evidence is visible to the system and (when appropriate) to the user. Internally, you should be able to log: what sources were retrieved, which passages were used, and what version of the document was referenced. Externally, you may show citations, quotes, or links.

Inspectability is essential for debugging. If an answer is wrong, you need to know whether retrieval failed (wrong evidence), the evidence was outdated (wrong source), or generation failed (misinterpretation).

3) Constrain outputs to what is supported

A grounded workflow should include rules like: “If the evidence does not contain the answer, say you don’t know and propose next steps.” Another common rule: “Do not provide numbers unless they appear in the evidence.” These constraints can be implemented through instructions, structured output formats, and post-generation checks.

4) Prefer structured data when possible

If the user asks, “What is the current price of Plan X?” the best grounding is a pricing API or database query, not a paragraph from a PDF. Unstructured documents are useful, but structured sources reduce ambiguity and make verification easier.

5) Design for change: versioning and freshness

Policies, product specs, and prices change. Grounded workflows should track document versions, timestamps, and data freshness. If the workflow cannot guarantee freshness, it should disclose that limitation (e.g., “Based on the latest policy document retrieved, last updated on…”).

A Practical Blueprint: The Grounded Answer Pipeline

The following pipeline is a common, adaptable structure. You can implement it for customer support, internal Q&A, compliance assistance, analytics explanations, and more.

Step 1: Classify the request and choose a route

Start by identifying what kind of request it is, because different request types require different grounding strategies. Typical routes include:

Document Q&A (needs retrieval from a knowledge base)
Record lookup (needs database/API access)
Procedure guidance (needs policy + step list)
Drafting (needs templates + constraints)
Computation (needs a calculator/tool, not free-form text)

Routing can be done with a lightweight classifier (rules or an LLM) that outputs a structured label, such as {"route":"policy_qa"}. The route determines which tools and sources are allowed.

Step 2: Define the “answer contract” (what the output must contain)

Before retrieving anything, define the response format and requirements. For example, a policy Q&A answer contract might require:

A short direct answer
Supporting quotes from the policy
Citations (document name + section)
Any exceptions or conditions
If insufficient evidence: a clear “not found” response and suggested next steps

By setting an answer contract, you reduce the chance the model fills gaps with speculation.

Step 3: Retrieve evidence (documents, records, or both)

Retrieval is the step where the workflow collects candidate evidence. The design choice here is not only “how to search,” but also “how much to retrieve” and “how to filter.” Practical tactics include:

Query rewriting: transform the user question into a search-friendly query (e.g., expand acronyms, include product names, add synonyms).
Scoped retrieval: limit search to relevant collections (e.g., only HR policies, only the “EU region” docs).
Metadata filters: filter by version, department, region, product line, or effective date.
Top-k retrieval: retrieve multiple candidates, not just one, to reduce missed context.

When retrieving from structured systems (databases/APIs), retrieval should be parameterized and validated. For example, if the user asks for “customer 123,” the workflow should confirm that “123” matches the expected ID format and that the user is authorized to access that record.

Step 4: Normalize and prepare evidence for the model

Raw evidence often needs cleanup. Documents may contain headers, footers, duplicated sections, or irrelevant boilerplate. Records may contain fields the model should not see. Evidence preparation typically includes:

Redaction: remove sensitive fields (e.g., personal identifiers) unless explicitly needed and authorized.
Chunk selection: keep only the most relevant passages.
Canonical formatting: convert evidence into a consistent structure (e.g., Source, Excerpt, URL, LastUpdated).

Well-structured evidence makes it easier to enforce grounding. A common pattern is to provide evidence as a list of items, each with an ID that can be cited later.

Step 5: Generate an answer that must cite evidence

Now the LLM produces the response under strict instructions: use only the provided evidence, cite the evidence IDs, and do not guess. The model should be asked to produce both the user-facing answer and a machine-checkable representation of which evidence supports which claims.

Example of a structured response format:

{"answer":"...","citations":[{"claim":"...","evidence_ids":["E2","E5"]}],"unknowns":["..."]}

This structure enables automated checks in the next step.

Step 6: Verify grounding and enforce constraints

Verification can be lightweight or strict depending on risk. Common checks include:

Citation presence: every factual claim must have at least one evidence ID.
Quote match: for high-stakes cases, require direct quotes for key statements.
Numeric validation: numbers in the answer must appear in evidence or be computed from structured data with a logged calculation.
Policy constraints: ensure the answer does not include disallowed content (e.g., legal advice disclaimers, restricted internal info).
Freshness checks: if evidence is older than a threshold, require a warning or a re-fetch.

If verification fails, the workflow can: (1) retrieve more evidence, (2) ask the model to revise with stricter constraints, or (3) escalate to a human.

Step 7: Log, monitor, and iterate

Grounded workflows improve over time when you log the right artifacts: the user question, route, retrieved evidence IDs, the final answer, verification results, and user feedback. These logs help you identify patterns like “retrieval often misses the right doc for this product line” or “answers fail numeric validation when the evidence is a scanned PDF.”

Design Patterns You Can Reuse

Pattern A: Retrieval-Augmented Response with Citations

Use this when the truth source is primarily unstructured text (policies, manuals, runbooks). The key design decisions are: how to scope retrieval, how to format evidence, and how to enforce citations.

Practical checklist:

Maintain a curated document set with clear ownership and update cadence.
Attach metadata: department, region, effective date, version, and audience.
Require citations per paragraph or per claim.
When evidence conflicts, instruct the model to surface the conflict and prefer the newest effective version.

Pattern B: Tool-First, LLM-Second

Use this when the answer depends on live or structured data (account status, inventory, pricing). The workflow should call tools first, then ask the LLM to explain results.

Example: A user asks, “Is my order delayed and why?” The workflow should query order status and shipment events, then have the LLM summarize in plain language. The LLM should not infer delay reasons without an event record that supports it.

Pattern C: Extract-Then-Compose

Use this when you need high precision. Instead of asking the model to “answer the question,” ask it to extract specific fields from evidence (e.g., eligibility criteria, deadlines, exceptions), then compose the final response from those extracted fields.

This reduces the chance of the model blending multiple sources incorrectly. It also makes verification easier because each extracted field can be checked against evidence.

Pattern D: Decision Support with Explicit Uncertainty

Some tasks involve incomplete evidence. A grounded workflow can still help by clearly separating: what is known from evidence, what is unknown, and what actions would resolve the unknowns. The output contract might require an unknowns list and a next_steps list.

This pattern is especially useful for internal operations: “Based on the logs we retrieved, we can confirm X and Y. We cannot confirm Z because the audit log is missing for that time window.”

Step-by-Step Example: Grounded Customer Support Answer

Scenario

A customer asks: “Can I cancel my subscription and get a refund if I’m within 14 days?” Your truth sources are: a refund policy document and the customer’s subscription record.

Step 1: Route

Classify as policy_qa + account_lookup because the answer depends on both policy and the customer’s purchase date.

Step 2: Answer contract

State whether the customer is eligible
Quote the relevant policy clause
Include the customer’s purchase date and day count (computed)
If not eligible, explain why and provide alternatives

Step 3: Retrieve evidence

Fetch policy sections related to cancellations, refunds, and time windows
Fetch the customer’s subscription start date and current status from the billing system

Step 4: Prepare evidence

Provide the model with a compact evidence bundle:

E1: {"source":"Refund Policy v3","section":"2.1","excerpt":"Customers may request a refund within 14 days of purchase...","last_updated":"2025-09-01"} E2: {"source":"Billing API","field":"purchase_date","value":"2026-01-03"} E3: {"source":"Billing API","field":"status","value":"active"}

Step 5: Generate with citations

Instruct the model: “Use only E1–E3. If you compute day count, show the calculation basis. Cite E1 for policy and E2 for dates.”

Step 6: Verify

Check that the refund eligibility statement cites E1 and E2
Check that the day count is correct (computed by code, not by the model)
Check that no additional policy claims appear without citations

Notice the design choice: the workflow should compute “days since purchase” with deterministic code and pass the result to the model, rather than asking the model to do date arithmetic.

Common Failure Modes and How to Design Against Them

Failure Mode 1: Retrieval returns plausible but wrong evidence

If your search retrieves a similar policy for a different region or product, the model may confidently answer using the wrong rules. Mitigations:

Use metadata filters (region, product, audience) as mandatory constraints.
Require the model to restate the scope of the evidence it used (e.g., “This applies to EU customers”).
When scope is missing, force a clarification question before answering.

Failure Mode 2: Evidence is correct but incomplete

The model may fill gaps. Mitigations:

Enforce an “insufficient evidence” path in the answer contract.
Retrieve more broadly when confidence is low (increase top-k, expand query).
Add a second pass that asks the model: “What information is missing to answer fully?” and then retrieve that information.

Failure Mode 3: Conflicting sources

Two documents may disagree due to version drift. Mitigations:

Prefer sources with higher authority (e.g., official policy over a wiki summary).
Prefer the newest effective date when both are authoritative.
Require the model to surface conflicts rather than silently choosing.

Failure Mode 4: The model cites evidence but misrepresents it

Citations alone do not guarantee correctness. Mitigations:

For critical claims, require direct quotes.
Use automated checks that compare key phrases or numbers in the answer to the cited excerpts.
Use a verification pass: a separate prompt that asks the model to judge whether each claim is supported by the cited evidence, returning a pass/fail per claim.

Failure Mode 5: Sensitive data leakage

Grounding can accidentally increase exposure if retrieval pulls sensitive content. Mitigations:

Apply access control before retrieval (who can ask, what they can see).
Redact sensitive fields during evidence preparation.
Use allowlists of fields for structured records.
Log and audit what evidence was retrieved and shown.

Designing the Evidence Pack: A Practical Template

The “evidence pack” is the bundle you pass to the model. A good evidence pack is compact, structured, and includes metadata that helps the model avoid mistakes.

Recommended fields per evidence item:

id: stable identifier like E7
source: document name or system name
type: policy, manual, ticket, database_record, api_response
scope: region/product/audience if applicable
timestamp: last updated or retrieval time
excerpt: the relevant text or fields
url/path: link for humans (optional)

Example evidence pack in JSON-like form:

[{"id":"E1","source":"Security Runbook","type":"manual","scope":"Prod","timestamp":"2026-01-10","excerpt":"If login failures exceed 5/min, check rate limiter..."},{"id":"E2","source":"Auth Logs","type":"log","timestamp":"2026-01-13T09:10Z","excerpt":"rate_limiter_triggered=true; threshold=5/min"}]

This structure makes it easier to require precise citations and to run automated checks.

Grounded Workflows for Multi-Step Tasks (Plans, Reports, and Analyses)

Grounding is not only for Q&A. It is also essential when the model produces multi-step outputs like project plans, incident reports, or compliance summaries. The risk in multi-step tasks is that early assumptions propagate. A grounded design keeps each step tied to evidence and makes assumptions explicit.

Pattern: Plan with “Inputs, Assumptions, Evidence, Actions”

Require the model to structure plans like this:

Inputs (evidence-backed): facts pulled from sources
Assumptions (explicit): anything not supported by evidence
Actions: steps to take
Open questions: what must be confirmed

Then enforce a rule: actions must reference either an input or an assumption. This prevents the model from adding steps that have no basis.

Step-by-step workflow for a grounded report

1) Collect: retrieve relevant tickets, logs, and policy excerpts.
2) Extract: pull key fields (dates, systems, error codes) into a structured table.
3) Compose: generate the narrative report using only extracted fields and cited excerpts.
4) Verify: check that every timeline entry maps to a log/ticket ID; check that remediation steps align with the runbook.

Implementation Notes: Making Grounding Work in Real Systems

Use deterministic code for calculations and transformations

Whenever the workflow needs arithmetic, date differences, sorting, filtering, or formatting that must be correct, do it with code and pass the results to the model. The model can explain the result, but it should not be the calculator.

Prefer constrained outputs for machine consumption

If the output will trigger actions (creating a ticket, updating a record, sending an email), require structured output. For example, generate:

{"action":"create_ticket","priority":"high","summary":"...","evidence_ids":["E2","E4"]}

Then validate fields with a schema before executing anything.

Design escalation paths

A grounded workflow should know when to stop. Define thresholds for escalation, such as:

No relevant evidence found
Conflicting evidence with no clear authority
High-risk domain (financial, medical, legal) requiring human review
User requests an action that requires approval

Escalation can mean asking clarifying questions, handing off to a human agent, or generating a draft for review rather than a final answer.

Measure grounding quality with operational metrics

Beyond general quality metrics, grounded workflows benefit from specific measures:

Evidence coverage: percentage of factual claims with citations
Retrieval success rate: how often the correct source appears in top-k
Verification pass rate: how often responses pass automated checks
Freshness compliance: how often evidence meets recency requirements
Escalation rate: how often the system correctly refuses or escalates

These metrics point directly to what to fix: retrieval, evidence preparation, generation constraints, or verification rules.

Now answer the exercise about the content:

Which design choice best reflects a grounded LLM workflow?

You are right! Congratulations, now go to the next page

You missed! Try again.

A grounded workflow ties outputs to authoritative sources, makes evidence explicit, and enforces constraints such as required citations and verification so the model does not invent unsupported facts.

Next chapter

Course Wrap-Up: Building Sound Expectations and Habits

92%

Introduction to Large Language Models (LLMs): How They Work and What They Can (and Can’t) Do

New course

12 pages