Free Ebook cover Python Data Modeling in Practice: Dataclasses, Pydantic, and Type Hints

Python Data Modeling in Practice: Dataclasses, Pydantic, and Type Hints

New course

14 pages

Error Modeling and Validation Feedback Design

Capítulo 11

Estimated reading time: 11 minutes

+ Exercise

Why error modeling matters

Validation is not only about rejecting bad inputs; it is also about communicating what went wrong in a way that helps the caller fix it. In practice, the “caller” might be a REST client, a CLI user, a batch job, or another internal service. If your system returns vague messages like “invalid request” or throws raw exceptions, you create friction: debugging takes longer, support tickets increase, and clients implement brittle parsing of error strings.

Error modeling is the deliberate design of error types, error payloads, and validation feedback so that failures are predictable, machine-readable, and actionable. The goal is to make failures as well-structured as successes: consistent codes, stable fields, and clear mapping to the input that caused the problem.

This chapter focuses on designing error models and validation feedback in Python systems that use dataclasses, Pydantic, and type hints. We will not re-explain validation basics; instead, we will design the shape of errors, how to aggregate them, and how to present them across boundaries (HTTP, CLI, internal APIs) without leaking implementation details.

Principles for validation feedback

1) Errors should be structured, not stringly-typed

A human-readable message is useful, but it should not be the primary contract. Prefer stable identifiers (error codes) and structured fields (path, expected, actual, constraints) so clients can reliably react.

  • Good: code="too_short", path=["user", "email"], min_length=5
  • Risky: message="Email is too short" (clients start parsing text)

2) Errors should point to a location

When validating nested input, the consumer needs to know which field failed. Use a path representation that works for objects and arrays. A common approach is a list of segments, where each segment is a string key or integer index.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

  • Example path: ["items", 2, "quantity"]
  • Alternative: JSON Pointer string: "/items/2/quantity"

Pick one and standardize it. Lists are easy to build and transform; JSON Pointer is easy to display and aligns with RFC 6901.

3) Support multiple errors at once

Fail-fast is fine for internal invariants, but for user-facing input you often want to report all issues in one response. This reduces the “fix one error, resubmit, discover next error” loop.

Design your error model to hold a list of issues, not just one.

4) Separate developer diagnostics from user-facing messaging

Operational debugging needs stack traces, exception types, and internal context. Users need concise and safe messages. Your error payload can include both, but keep them separated and ensure sensitive details are not exposed outside trusted boundaries.

  • User message: safe, actionable, no secrets
  • Debug info: behind a feature flag, only in logs, or only in internal responses

5) Make error codes stable and versionable

Error codes become part of your API contract. Treat them like public identifiers: stable naming, documented meaning, and careful changes. If you must change semantics, introduce a new code rather than reusing an old one.

A practical error model for validation

Start with a small set of reusable types. The following example uses dataclasses to define a transport-agnostic error representation. You can serialize it to JSON for HTTP, print it for CLI, or attach it to logs.

from __future__ import annotations

from dataclasses import dataclass, field
from typing import Any, Mapping, Sequence, Union

PathSegment = Union[str, int]

@dataclass(frozen=True)
class ValidationIssue:
    code: str
    message: str
    path: Sequence[PathSegment] = field(default_factory=tuple)
    details: Mapping[str, Any] = field(default_factory=dict)

@dataclass(frozen=True)
class ValidationErrorReport(Exception):
    issues: Sequence[ValidationIssue]

    def __str__(self) -> str:
        # Keep __str__ readable for logs/CLI; do not rely on it as a contract.
        return f"Validation failed with {len(self.issues)} issue(s)"

Key design choices:

  • ValidationIssue is the atomic unit: one problem at one location.
  • code is stable and machine-readable.
  • message is human-readable and can be localized later.
  • details holds structured metadata (min/max, allowed values, pattern, etc.).
  • ValidationErrorReport aggregates issues and can be raised as an exception in internal flows.

Designing a code system

Define a small vocabulary of codes and reuse them across fields and models. A typical set for input validation:

  • required
  • type_mismatch
  • invalid_format
  • too_short / too_long
  • out_of_range
  • not_allowed
  • not_unique
  • conflict
  • invalid_state (cross-field rule)

Keep codes generic; put specifics in details. For example, instead of email_too_short, use too_short with details {"min_length": 6} and a path pointing to the email field.

Step-by-step: mapping validation failures to your error model

Step 1: Decide where validation happens and what it returns

Even if you validate in multiple places, you should normalize failures into a single representation at the boundary where you respond to callers. That normalization layer is where you convert framework-specific errors (Pydantic errors, custom exceptions, database constraint violations) into ValidationIssue objects.

Step 2: Normalize paths

Different sources represent paths differently. Pydantic uses tuples like ("items", 2, "quantity"). A JSON Schema validator might use JSON Pointer. Normalize them into your chosen format (here: list/tuple of segments).

Step 3: Normalize codes

Frameworks often have many granular error types. Map them into your stable code set. Preserve original information in details if useful.

Step 4: Provide safe messages

Messages should be actionable and avoid internal jargon. If you need localization, treat code + details as the source of truth and generate messages at the edge.

Example: converting Pydantic errors to a stable report

Pydantic exposes a structured list of errors. The exact shape differs slightly between major versions, but the idea is consistent: each error contains a location, a type, and a message. The adapter below focuses on the common fields and maps them into our model.

from typing import Any, Iterable

try:
    # Pydantic v2
    from pydantic import ValidationError as PydanticValidationError
except Exception:  # pragma: no cover
    PydanticValidationError = Exception  # type: ignore

CODE_MAP = {
    "missing": "required",
    "string_too_short": "too_short",
    "string_too_long": "too_long",
    "int_parsing": "type_mismatch",
    "float_parsing": "type_mismatch",
    "value_error": "invalid_format",
}

def _normalize_loc(loc: Any) -> tuple[PathSegment, ...]:
    if loc is None:
        return ()
    if isinstance(loc, (list, tuple)):
        return tuple(loc)
    return (str(loc),)

def pydantic_to_report(err: PydanticValidationError) -> ValidationErrorReport:
    issues: list[ValidationIssue] = []

    # v2: err.errors() returns list[dict]
    for e in err.errors():
        loc = _normalize_loc(e.get("loc"))
        pyd_type = e.get("type", "value_error")
        code = CODE_MAP.get(pyd_type, "invalid")

        details = dict(e.get("ctx") or {})
        details["pydantic_type"] = pyd_type

        issues.append(
            ValidationIssue(
                code=code,
                message=e.get("msg", "Invalid value"),
                path=loc,
                details=details,
            )
        )

    return ValidationErrorReport(issues=tuple(issues))

Notes:

  • The mapping table is intentionally small. Expand it based on the error types you actually see.
  • We store the original Pydantic type in details for debugging and analytics.
  • We do not expose raw exception strings as the primary contract.

Designing HTTP error payloads

For an HTTP API, you typically want a top-level envelope with a stable shape. A common pattern:

  • error: a short category (e.g., validation_error)
  • issues: list of field-level issues
  • request_id: correlation id for support/logs

Example JSON shape (shown as Python dict for clarity):

def report_to_http_payload(report: ValidationErrorReport, request_id: str | None = None) -> dict:
    return {
        "error": "validation_error",
        "request_id": request_id,
        "issues": [
            {
                "code": i.code,
                "message": i.message,
                "path": list(i.path),
                "details": dict(i.details),
            }
            for i in report.issues
        ],
    }

Design tips:

  • Keep issues always present (even if empty) to simplify clients.
  • Use consistent HTTP status codes (often 400 for syntactic/field validation, 422 for semantic validation, depending on your conventions).
  • Do not include stack traces or internal exception names in the payload.

CLI and batch feedback design

For CLI tools, the same ValidationErrorReport can be rendered differently. The key is to keep it readable and to show paths clearly.

def format_path(path: Sequence[PathSegment]) -> str:
    if not path:
        return "<root>"
    out = []
    for seg in path:
        if isinstance(seg, int):
            out.append(f"[{seg}]")
        else:
            if out:
                out.append(".")
            out.append(seg)
    return "".join(out)

def report_to_cli_text(report: ValidationErrorReport) -> str:
    lines = []
    for issue in report.issues:
        lines.append(f"- {format_path(issue.path)}: {issue.code} ({issue.message})")
    return "\n".join(lines)

Batch jobs often need both human-readable output and machine-readable logs. You can log the JSON payload while printing a concise summary to stderr.

Cross-field and business-rule validation: modeling global issues

Some validations are not tied to a single field: “end_date must be after start_date”, “either email or phone is required”, “shipping address required when delivery_method=shipping”. These are still validation issues, but the path might be:

  • the root path [] to indicate a global issue, or
  • a synthetic path like ["__root__"], or
  • multiple issues, one per involved field, plus a global summary

A practical approach is to emit:

  • one global issue with code invalid_state and details listing involved fields, and
  • optionally field-level issues to highlight where the user should look
def date_order_issues(start: str | None, end: str | None) -> list[ValidationIssue]:
    if start and end and end <= start:
        return [
            ValidationIssue(
                code="invalid_state",
                message="end_date must be after start_date",
                path=(),
                details={"fields": ["start_date", "end_date"]},
            )
        ]
    return []

Handling collections: index-specific errors

When validating lists, index-specific paths are essential. If the third item has a problem, the client should be able to highlight exactly that row.

Example issue path: ["items", 2, "sku"]. This supports UI patterns like “scroll to row 3 and highlight SKU”.

When items can be reordered, consider including a stable identifier in details (e.g., {"item_id": "..."}) so clients can match errors even if indices change.

Errors from external systems: database constraints and uniqueness

Not all validation failures come from input parsing. Some are discovered when interacting with external systems:

  • Unique constraint violation (email already exists)
  • Foreign key violation (referenced object missing)
  • Optimistic concurrency conflict

These should still be expressed in your stable error model. The key is to avoid leaking vendor-specific messages (e.g., raw SQL error text). Map them to codes like not_unique, not_found, or conflict, and attach safe details.

def unique_violation(field: str, value: str) -> ValidationIssue:
    return ValidationIssue(
        code="not_unique",
        message=f"{field} must be unique",
        path=(field,),
        details={"value": value},
    )

Exception taxonomy: when to raise vs return

Internally, you may choose between returning a ValidationErrorReport (as a value) or raising it (as an exception). The design choice should be consistent per layer:

  • At boundaries (HTTP handlers, CLI entrypoints): catch exceptions and convert to payloads/exit codes.
  • In application services: raising can be convenient to abort flows and bubble up a report.
  • In domain logic: prefer precise exceptions for invariants, then translate them into validation issues at the boundary if they are user-correctable.

A useful pattern is to treat “user-correctable” problems as validation issues, and “programmer/operational” problems as internal errors. If a failure is not actionable by the caller, it should not be presented as a validation issue.

Step-by-step: building a validation feedback pipeline

Step 1: Define your stable error schema

Choose fields and naming conventions. Decide on path format, code vocabulary, and whether details is allowed to contain arbitrary keys.

Step 2: Implement adapters from common sources

Create small functions that convert:

  • Pydantic validation errors
  • Custom domain exceptions that represent user-correctable problems
  • Database constraint errors

Each adapter should output ValidationErrorReport or a list of ValidationIssue.

Step 3: Implement renderers for each channel

Renderers convert the stable report into:

  • HTTP JSON payload + status code
  • CLI text + exit code
  • Structured logs (JSON)

Keep renderers dumb: no business logic, only formatting and policy (e.g., hide details in production).

Step 4: Add correlation and observability hooks

Include a request id in HTTP responses and logs. Consider adding an error_id or hashing the set of codes for analytics. Avoid logging raw user input if it may contain secrets; log paths and codes instead.

Designing for localization and UX

If you anticipate multiple languages or different UX surfaces, treat message as a presentation layer concern. Two common strategies:

  • Server-generated messages: simplest; server returns localized messages based on request locale.
  • Client-generated messages: server returns code + details; client maps to localized strings.

Even if you keep server-generated messages, still include stable codes so clients can implement behavior (e.g., highlight fields, show specific help) without parsing text.

Security and privacy considerations

Validation feedback can accidentally leak sensitive information. Common pitfalls:

  • Echoing secrets back in error details (passwords, tokens)
  • Revealing whether an account exists (“email already registered”) in contexts where that is sensitive
  • Returning internal exception messages that include table names or stack traces

Mitigations:

  • Redact sensitive fields before putting values into details.
  • Use generic messages where necessary, while still providing a code (e.g., conflict).
  • Keep debug metadata in logs, not in public responses.

Testing error contracts

Error payloads are part of your API contract and deserve tests. Focus on stability:

  • Codes do not change unexpectedly
  • Paths are correct for nested structures and lists
  • Multiple issues are returned when expected
  • Details contain expected keys and do not contain sensitive values
def test_pydantic_error_mapping_has_stable_code_and_path():
    # Pseudocode: create a model that fails, then map.
    try:
        raise Exception("simulate")
    except Exception:
        report = ValidationErrorReport(
            issues=(
                ValidationIssue(code="required", message="Field required", path=("email",)),
            )
        )

    assert report.issues[0].code == "required"
    assert list(report.issues[0].path) == ["email"]

In real tests, generate actual Pydantic errors and run them through your adapter. Snapshot testing can be effective for full payloads, but be careful to avoid brittle snapshots that include variable messages. Prefer asserting on codes, paths, and key details.

Now answer the exercise about the content:

When designing validation feedback for APIs or CLIs, which approach best makes errors machine-readable and actionable?

You are right! Congratulations, now go to the next page

You missed! Try again.

Actionable validation feedback should be structured: stable codes and fields like path and details, and it should support returning multiple issues at once. Human-readable messages help, but they should not be the contract.

Next chapter

Testing Data Models: Invariants, Edge Cases, and Contracts

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.