Why “Input Boundaries” Need Strong Validation
In a real system, most defects around data modeling do not come from your carefully constructed domain objects; they come from the edges where data enters: HTTP requests, CLI arguments, message queues, partner webhooks, CSV imports, and configuration files. These boundaries are where you must assume data is incomplete, incorrectly typed, inconsistently formatted, or even malicious. “Pydantic-style validation” refers to a practical approach: define explicit input schemas that parse and validate raw data into well-typed, trustworthy objects, producing actionable error messages when the input is wrong.
The goal is not to “make everything a Pydantic model.” The goal is to create a robust boundary layer that converts untrusted input into safe, normalized values before the rest of the application touches it. This chapter focuses on how to design those boundary schemas, how to validate and normalize common patterns, and how to integrate them without leaking boundary concerns into the domain.
Core Idea: Parse, Validate, Normalize, Then Hand Off
Pydantic-style validation typically follows a pipeline:
- Parse: Convert raw input types (strings, numbers, dicts) into Python types.
- Validate: Enforce constraints (required fields, ranges, formats, allowed values, cross-field rules).
- Normalize: Canonicalize values (trim whitespace, lowercasing, timezone normalization, defaulting).
- Hand off: Provide a validated object to the next layer (service/domain), or return structured errors to the caller.
This approach is especially effective when inputs are messy. For example, a client might send "42" for an integer, "TRUE" for a boolean, or a date in multiple formats. A boundary schema can accept reasonable variations, normalize them, and reject the rest with clear errors.
Choosing the Right Tooling: Pydantic v2 Concepts You’ll Use
Pydantic v2 is a common implementation of this style. The most relevant building blocks for boundary validation are:
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
BaseModel: Defines the input schema and performs parsing/validation.- Field constraints: Declarative constraints like min/max, length, regex/pattern, and defaults.
field_validator: Per-field validation and normalization.model_validator: Cross-field validation (e.g., “end_date must be after start_date”).ConfigDict: Model configuration (extra fields, strictness, aliasing, etc.).- Error reporting: Structured errors that can be returned to API clients.
The patterns below are transferable even if you use another validation library: the key is to keep boundary schemas explicit and separate from domain objects.
Step-by-Step: Building a Boundary Schema for an API Request
Consider an endpoint that creates a user account. The boundary must handle raw JSON input, validate it, normalize it, and then pass a clean command object to the application layer.
Step 1: Define the input model with constraints
from pydantic import BaseModel, Field, ConfigDict, EmailStr, ValidationError, field_validator, model_validatorclass CreateUserInput(BaseModel): model_config = ConfigDict(extra='forbid') email: EmailStr display_name: str = Field(min_length=1, max_length=50) age: int | None = Field(default=None, ge=13, le=130) marketing_opt_in: bool = FalseKey points:
extra='forbid'rejects unknown fields, which prevents silent acceptance of misspelled keys and reduces attack surface.EmailStrparses and validates email format.Field(min_length=..., max_length=...)enforces basic string constraints.ge/leenforce numeric bounds.
Step 2: Normalize fields with field_validator
Normalization belongs at the boundary because it’s about accepting a variety of input representations and producing a canonical form.
class CreateUserInput(BaseModel): model_config = ConfigDict(extra='forbid') email: EmailStr display_name: str = Field(min_length=1, max_length=50) age: int | None = Field(default=None, ge=13, le=130) marketing_opt_in: bool = False @field_validator('display_name') @classmethod def normalize_display_name(cls, v: str) -> str: v = v.strip() if not v: raise ValueError('display_name cannot be blank') return vThis ensures that " Alice " becomes "Alice", and that whitespace-only names are rejected with a clear message.
Step 3: Add cross-field rules with model_validator (when needed)
Not every model needs cross-field validation, but it’s common for filters, date ranges, and conditional requirements.
class CreateUserInput(BaseModel): model_config = ConfigDict(extra='forbid') email: EmailStr display_name: str = Field(min_length=1, max_length=50) age: int | None = Field(default=None, ge=13, le=130) marketing_opt_in: bool = False @field_validator('display_name') @classmethod def normalize_display_name(cls, v: str) -> str: v = v.strip() if not v: raise ValueError('display_name cannot be blank') return v @model_validator(mode='after') def check_marketing_requires_age(self): if self.marketing_opt_in and self.age is None: raise ValueError('age is required when marketing_opt_in is true') return selfThis demonstrates a conditional requirement: opting into marketing requires an age value.
Step 4: Validate incoming data and return structured errors
At the boundary (e.g., a web handler), you validate raw input and handle errors in a consistent way.
def parse_create_user(payload: dict) -> CreateUserInput: try: return CreateUserInput.model_validate(payload) except ValidationError as e: # In a web API you would map this to a 400 response # with e.errors() as the body. raisePydantic’s ValidationError includes a list of error objects with locations and messages. This is ideal for client-facing APIs because you can point to exactly which field failed and why.
Strict vs Coercive Parsing: Decide Per Boundary
One of the most important design decisions is whether your boundary should coerce types (accept "123" for an int) or be strict (reject anything not already the correct type). Coercion can improve usability for external clients, but strictness can reduce ambiguity and unexpected behavior.
You can tune this per model. For example, to enforce strict types for a sensitive internal boundary (like a message consumed from a queue where producers are controlled), you can use strict types and configuration.
from pydantic import BaseModel, ConfigDict, StrictInt, StrictBoolclass InternalEventInput(BaseModel): model_config = ConfigDict(extra='forbid') event_id: str retry_count: StrictInt is_replay: StrictBoolWith strict types, "1" will not be accepted as an integer. This is useful when you want producers to fix their serialization rather than relying on the consumer to guess.
Handling “Extra Fields”: Forbid, Ignore, or Allow
Unknown fields are a common source of subtle bugs. Pydantic supports different strategies:
- Forbid: Reject unknown keys. Best for public APIs and security-sensitive boundaries.
- Ignore: Drop unknown keys. Useful when you want forward compatibility with clients sending extra data.
- Allow: Keep unknown keys. Useful for pass-through scenarios, but increases complexity.
class LenientInput(BaseModel): model_config = ConfigDict(extra='ignore') q: strWhen you choose ignore, document it and ensure you are not accidentally discarding important data. For most create/update commands, forbid is the safer default.
Aliases and Field Names: Accept External Conventions Without Leaking Them
External inputs often use different naming conventions (camelCase) than Python code (snake_case). A boundary schema can accept external names while exposing internal names to your code.
from pydantic import BaseModel, Field, ConfigDictclass PaginationInput(BaseModel): model_config = ConfigDict(extra='forbid', populate_by_name=True) page: int = Field(ge=1, default=1, alias='pageNumber') page_size: int = Field(ge=1, le=200, default=50, alias='pageSize')With populate_by_name=True, the model can accept either page_size or pageSize. This is useful during migrations or when multiple clients exist.
Reusable Validation Patterns for Common Boundary Problems
Pattern 1: Trim and collapse whitespace
Inputs from forms and CSV often contain inconsistent whitespace. Normalize it early.
import refrom pydantic import BaseModel, field_validatorclass SearchInput(BaseModel): q: str @field_validator('q') @classmethod def normalize_query(cls, v: str) -> str: v = v.strip() v = re.sub(r'\s+', ' ', v) if len(v) < 2: raise ValueError('q must be at least 2 characters after trimming') return vPattern 2: Parse flexible date/time formats into a canonical form
Clients may send timestamps in multiple formats. Decide what you accept, parse it, and normalize to a consistent timezone.
from datetime import datetime, timezonefrom pydantic import BaseModel, field_validatorclass ReportRangeInput(BaseModel): start: datetime end: datetime @field_validator('start', 'end') @classmethod def ensure_timezone(cls, v: datetime) -> datetime: # If naive, assume UTC (or reject, depending on your policy) if v.tzinfo is None: v = v.replace(tzinfo=timezone.utc) return v.astimezone(timezone.utc)from pydantic import model_validatorclass ReportRangeInput(BaseModel): start: datetime end: datetime @field_validator('start', 'end') @classmethod def ensure_timezone(cls, v: datetime) -> datetime: if v.tzinfo is None: v = v.replace(tzinfo=timezone.utc) return v.astimezone(timezone.utc) @model_validator(mode='after') def check_order(self): if self.end <= self.start: raise ValueError('end must be after start') return selfThis boundary model guarantees that downstream code always receives UTC-aware datetimes and a valid range.
Pattern 3: Validate lists with constraints and per-item normalization
Batch operations often accept lists of identifiers. You typically want to enforce size limits and normalize each item.
from pydantic import BaseModel, Field, field_validatorclass BatchLookupInput(BaseModel): ids: list[str] = Field(min_length=1, max_length=100) @field_validator('ids') @classmethod def normalize_ids(cls, v: list[str]) -> list[str]: cleaned = [] for item in v: s = item.strip() if not s: raise ValueError('ids cannot contain blank values') cleaned.append(s) # Optional: enforce uniqueness while preserving order seen = set() unique = [] for s in cleaned: if s not in seen: seen.add(s) unique.append(s) return uniqueThis prevents empty IDs, enforces a maximum batch size, and optionally deduplicates.
Boundary Models as “Commands” and “Queries”
A practical way to keep boundaries clean is to define separate models for:
- Commands: create/update actions with required fields and strict constraints.
- Queries: filters and pagination with optional fields and normalization.
For example, a query boundary often needs to accept optional filters but still enforce that at least one filter is present, or that certain combinations are valid.
from pydantic import BaseModel, Field, model_validatorclass UserSearchQuery(BaseModel): email: str | None = None display_name: str | None = Field(default=None, min_length=1, max_length=50) is_active: bool | None = None @model_validator(mode='after') def require_some_filter(self): if self.email is None and self.display_name is None and self.is_active is None: raise ValueError('at least one filter must be provided') return selfThis prevents “unbounded” queries that could accidentally scan large datasets.
Mapping Validated Input to Application/Domain Types
Boundary models should not become the universal data structure across your codebase. A common pattern is:
- Validate raw input into a boundary model.
- Transform it into an internal command/query object (could be a dataclass) used by services.
- Keep the transformation explicit so that boundary concerns (aliases, lenient parsing) do not leak inward.
from dataclasses import dataclass@dataclass(frozen=True)class CreateUserCommand: email: str display_name: str age: int | None marketing_opt_in: booldef to_command(inp: CreateUserInput) -> CreateUserCommand: return CreateUserCommand( email=str(inp.email), display_name=inp.display_name, age=inp.age, marketing_opt_in=inp.marketing_opt_in, )This keeps your application layer independent from Pydantic while still benefiting from robust validation at the boundary.
Error Translation: Turning Validation Errors into Client-Friendly Responses
Pydantic provides structured error details. A boundary layer typically converts these into your API’s error format. The key is to preserve:
- Location: which field failed (including nested paths).
- Message: a human-readable explanation.
- Type/code: a stable identifier for programmatic handling.
from pydantic import ValidationErrordef format_validation_error(e: ValidationError) -> dict: return { 'error': 'validation_failed', 'details': [ { 'loc': err['loc'], 'msg': err['msg'], 'type': err['type'], } for err in e.errors() ], }This is especially helpful for frontends that want to highlight specific fields.
Security and Robustness Considerations at the Boundary
Limit sizes to prevent resource abuse
Always constrain:
- String lengths (names, descriptions, free-form text).
- List sizes (batch endpoints).
- Numeric ranges (pagination, counts).
Even if your database has constraints, validating early prevents wasted work and reduces the risk of denial-of-service patterns.
Be deliberate about permissive parsing
Permissive parsing can be convenient, but it can also hide client bugs. A good compromise is:
- Be permissive for public-facing inputs where you expect variability (e.g., booleans from forms).
- Be strict for internal events and service-to-service boundaries where producers are controlled.
Prefer explicit normalization rules
Normalization should be deterministic and documented. Examples:
- Lowercase emails (if your system treats them case-insensitively).
- Normalize datetimes to UTC.
- Strip whitespace and collapse internal whitespace for search queries.
Testing Boundary Schemas Like Any Other Critical Component
Because boundary schemas define what your system accepts, they deserve focused tests. The most valuable tests are table-driven: a list of payloads that should pass and a list that should fail with specific error locations.
import pytestfrom pydantic import ValidationErrordef test_create_user_rejects_unknown_fields(): payload = { 'email': 'a@example.com', 'display_name': 'Alice', 'unknown': 123, } with pytest.raises(ValidationError) as exc: CreateUserInput.model_validate(payload) errors = exc.value.errors() assert errors[0]['loc'] == ('unknown',)def test_create_user_trims_display_name(): payload = {'email': 'a@example.com', 'display_name': ' Alice '} inp = CreateUserInput.model_validate(payload) assert inp.display_name == 'Alice'These tests act as executable documentation for clients and for future maintainers.