RAG Clarification Loop: Ask Once, Learn the Default

This article is part of Enterprise Document Intelligence, a series that builds an enterprise RAG system from four bricks: parsing, question parsing, retrieval, and generation.

It extends Article 6 (question parsing) on the case where the question is not precise enough: ask one focused clarification, learn the default from the answer, stay silent next time.

Where this article sits in the series: a companion extending Article 6 (question parsing)

The question-parsing brick turns the user’s text into a typed ParsedQuestion. This companion picks up the failure mode that brick names in one bullet and develops it as its own pattern. The question is missing a piece of information the system needs (which document? which page? which clause type?). The cheap fix is to ask. The right fix is to ask, then learn the default so the next case is silent. Two Pydantic schemas and one short loop close the gap.

The question-parsing brick sketches the pipeline: the user types free text, question parsing produces a typed ParsedQuestion, the dispatcher routes on the typed fields, retrieval scopes the corpus. The bullet inside that sketch that this companion expands: when ParsedQuestion has a missing or low-confidence field, the system can either (a) silently infer a default, (b) refuse and ask the user, or (c) do both with a learned policy. The third option is the production pattern. This companion ships the contract and a worked broker example.

1. The Failure Mode the Main Article Only Mentions

The question parsing brick covers the happy path. The user types “what is the deductible on Acme Premier?”, the question parser identifies the entity (Acme Premier), the intent (deductible lookup), the schema field to fill (deductible_amount), and the dispatcher routes. Most production traffic does not fit the happy path.

The common failures, on a single uploaded contract, by frequency on real broker traffic:

Ambiguous field type: “what is the limit?”. Most contracts have several: coverage limit per occurrence, aggregate limit, sub-limit per peril, claim deductible. The system has to guess which one.
Missing page scope: “what does it say?” on a 200-page document. Where in the document — the summary? the exclusions? the schedule? The system can answer if it knows where to look.
Ambiguous date scope: “what is the deductible on the home contents cover?” on a contract with an old schedule and a renewal endorsement. Which schedule applies?
Ambiguous intent: “the warranty section”. Read it to cite a clause, summarise it, or extract the conditions? Each path uses different bricks downstream.
Implicit entity: “the policyholder” on a contract that lists a corporate insured, a beneficiary, and an additional named insured. Which role does the user mean?

Every one of these is a question about one document the user already pinned. The corpus-level version of the same failure (which document? which policy in a portfolio?) lives one layer up and is touched at the end of section 6.

The typed ParsedQuestion has the fields. What is missing is the loop that fills them when the user did not.

2. The Two-Pydantic-Schema Contract

Two structured objects do the work. The first is a ClarificationRequest the system emits when a field on ParsedQuestion is below the confidence threshold. The second is a ClarificationDefault the system stores after each request, so the next equivalent question is answered without asking.

from datetime import datetime
from pydantic import BaseModel, Field

class ClarificationRequest(BaseModel):
    """Emitted when a ParsedQuestion field is below confidence threshold."""
    target_field: str                          # field on ParsedQuestion to fill
    question_to_user: str                      # plain-English question to show
    candidate_values: list[str]                # values the system can propose
    proposed_default: str | None = None        # the value the system would pick
    proposed_default_reason: str | None = None # one-sentence why
    audit: dict = Field(default_factory=dict)  # request_id, model, prompt_version

class ClarificationDefault(BaseModel):
    """The learned answer, refreshed across requests."""
    target_field: str                          # which ParsedQuestion field
    doctype: str                               # broker_contract, invoice, ...
    sub_conditions: dict = Field(default_factory=dict)   # stratifying keys
    candidate_votes: dict[str, float]          # value -> weighted vote count
    confidence: float                          # 0..1, drives ask/apply decision
    sample_size: int
    last_refreshed: datetime

The first object is the request to the user. The second is what the system learns from many requests, so it stops asking the easy ones.

3. The Worked Broker Example

The clarification loop fires once per request, not once per conversation turn. Each request below is a separate event over time: the user uploads a contract, asks one question, the system either asks for clarification or applies a learned default, the answer ships. The next request can be days later. This is not a multi-turn conversation (V2 Bonus B04 covers that pattern separately).

The user is a junior claim adjuster at the broker. She uploads a new contract and types “qui est l’assureur?” (who is the insurer?). The system handles the request:

Case 1 (first time the system sees this user / this contract type). ParsedQuestion’s target_field parses as insurer_name. The system has no learned default for where to look. It opens a ClarificationRequest:

I will look on page 1, since that is where the insurer is usually named on a broker contract. Is that the right starting point?

The user clicks Yes. The system reads page 1, finds the insurer, answers. A ClarificationDefault is written: for target_field = insurer_name on doctype = broker_contract, the default source_page = 1 gets a +1 vote.

Case 2 (a week later, a different contract). Same question shape: “who is the insurer?”. The system reads its learned defaults. source_page = 1 is the recommended default with confidence 0.78 from 12 prior cases. The system applies the default silently and answers. No clarification fired.

Case 12 (a contract where page 1 is a coversheet, not the body). Page 1 has no insurer name. The system reads source_page = 1 from learned defaults, fails, detects the failure (the schema field comes back null), and falls back to asking:

Page 1 did not name an insurer on this contract. Should I try the table of contents to find where it is named, or do you want to point me to a page?

The user says try TOC. The system reads the TOC, finds the insurer-information section, retrieves, and answers. The learned default is now stratified: source_page = 1 for broker contracts with page_1_kind = body, source_page = TOC for broker contracts with page_1_kind = coversheet. The classifier for page_1_kind is a small learned column.

4. The Mechanism for Learning the Default

The learned default is a small table, one row per (target_field, doctype, optional sub-conditions). Each row tracks the candidate values the system has tried, the user’s votes (explicit Yes / No, or implicit when the user accepts the answer without correction), and a confidence band.

The update rules:

Explicit user agreement: the user clicks Yes on a proposed default. The default’s vote count increments. Confidence rises.
Implicit acceptance: the system applies a default silently, the answer is correct (downstream eval signal from the per-failure-mode evaluation layer), no correction in the conversation. Counted as a soft +1.
Explicit disagreement: the user says No or corrects. The default’s vote count for the proposed value drops, and the candidate the user named gains.
Failure detection: the default’s candidate value returns null from the schema. Counted as a stratification signal, not a vote drop, because the value might be right for some contracts and wrong for others.

The confidence determines whether the system asks or just applies. Below 0.6, always ask. Above 0.85, always apply silently. Between the two thresholds, ask occasionally to refresh the signal.

from typing import Literal
from datetime import datetime

Signal = Literal["explicit_yes", "explicit_no", "implicit_ok", "failure"]

def update(default: ClarificationDefault, value: str, signal: Signal) -> ClarificationDefault:
    """One vote on a ClarificationDefault row, returns a new row."""
    votes = dict(default.candidate_votes)
    if signal == "explicit_yes":   votes[value] = votes.get(value, 0) + 1.0
    elif signal == "explicit_no":  votes[value] = votes.get(value, 0) - 1.0
    elif signal == "implicit_ok":  votes[value] = votes.get(value, 0) + 0.5
    # "failure": no vote change, only a stratification candidate
    n_new = default.sample_size + 1
    top = max(votes.values()) if votes else 0.0
    confidence_new = max(0.0, top) / n_new
    return default.model_copy(update={
        "candidate_votes": votes,
        "confidence": confidence_new,
        "sample_size": n_new,
        "last_refreshed": datetime.now(),
    })

def gate(default: ClarificationDefault) -> Literal["apply", "ask_occasionally", "ask"]:
    """Per-row gate: confidence < 0.6 always asks; > 0.85 applies; in between, refresh."""
    if default.confidence > 0.85: return "apply"
    if default.confidence < 0.60: return "ask"
    return "ask_occasionally"

The discipline that matters: every clarification asked and every default applied lands on the audit surface. The clarification fires as a row on the storage layer’s query_log (alongside the user’s question, the model version, and the dispatch decision). The default-application records both the default value used and the ClarificationDefault table row ID at the timestamp of the request, so the audit trail can reconstruct exactly how the system arrived at any given answer — including which learned default was in effect and how confident it was at the time.