Why collections, nesting, and optionality change the shape of your model
Once your models move beyond a single object, you start modeling relationships: “an order has many lines”, “a user has one profile”, “a payload contains a list of events”, “a record may or may not have a middle name”. These three themes—collections, nested structures, and optionality—are where many real-world bugs hide: accidental mutation of shared lists, confusing None with “empty”, accepting malformed nested data, or losing type information when you pass dictionaries around.
This chapter focuses on practical patterns for representing these shapes in Python using type hints, dataclasses, and Pydantic. The goal is to make your intent explicit: what can be repeated, what can be missing, what must be present, and how nested objects are validated and serialized.
Modeling collections: choosing the right container type
Python offers many collection types, and your choice communicates rules. Use the narrowest type that matches your intent.
list[T]: ordered, allows duplicates, mutable. Good for “lines in the order as entered”.tuple[T, ...]: ordered, allows duplicates, immutable. Good for “a snapshot of items” or “a fixed set of values”.set[T]: unordered, unique elements. Good for “tags” or “capabilities”.frozenset[T]: immutable set. Good for “a stable set of permissions”.dict[K, V]: mapping keyed by something meaningful (IDs, codes). Good for “items by SKU”.collections.abc.Sequence[T]orIterable[T]: interface types when you want to accept many inputs without committing to a concrete container.
A common modeling mistake is to default to list everywhere. If your domain meaning is “unique”, use a set. If your meaning is “read-only snapshot”, use a tuple or frozenset. This reduces the need for extra validation code.
Dataclasses: safe defaults for collection fields
With dataclasses, never use a mutable object as a default value directly. Use default_factory so each instance gets its own container.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
from dataclasses import dataclass, field
@dataclass
class Cart:
# Each Cart gets its own list
item_ids: list[str] = field(default_factory=list)
# Each Cart gets its own set
tags: set[str] = field(default_factory=set)
This is not just a style rule; it prevents shared state across instances. It also makes your model behavior predictable when you append or add items.
Pydantic: collection parsing and element validation
Pydantic will parse and validate collection elements based on the type parameter. If you declare list[int], it will validate each element as an int (and may coerce from strings depending on configuration).
from pydantic import BaseModel
class Metrics(BaseModel):
samples: list[int]
m = Metrics(samples=["1", 2, 3])
assert m.samples == [1, 2, 3]
When you want stricter behavior (no coercion), use strict types (e.g., StrictInt) or configure the model. The key modeling idea is that “collection of T” is a first-class constraint, not a comment.
Nested structures: composing models instead of passing dicts
Nesting is how you express “has-a” relationships. Instead of representing nested data as a dict with ad-hoc keys, compose models. This gives you: (1) validation at the boundary, (2) better IDE support, and (3) consistent serialization.
Dataclasses: nested dataclasses and conversion boundaries
Dataclasses do not validate input by default. They are excellent for internal domain structures, but you typically need an explicit conversion step when ingesting untrusted data (e.g., JSON). A practical approach is to use Pydantic at the boundary and convert to dataclasses for internal use, or use Pydantic models throughout if that fits your architecture.
Here is a nested dataclass structure for an order with lines and a shipping address:
from dataclasses import dataclass, field
@dataclass
class Address:
street: str
city: str
postal_code: str
@dataclass
class OrderLine:
sku: str
quantity: int
@dataclass
class Order:
order_id: str
shipping: Address
lines: list[OrderLine] = field(default_factory=list)
To build this from a dictionary, you must explicitly construct nested objects. That explicitness is a feature: it forces you to decide what happens when keys are missing or types are wrong.
def order_from_dict(data: dict) -> Order:
shipping = Address(**data["shipping"])
lines = [OrderLine(**item) for item in data.get("lines", [])]
return Order(order_id=data["order_id"], shipping=shipping, lines=lines)
Step-by-step, this conversion does three important things: it chooses defaults (empty lines if missing), it creates nested objects (Address, OrderLine), and it centralizes the boundary assumptions in one place.
Pydantic: nested models with automatic parsing
Pydantic shines for nested structures because it will recursively parse nested dictionaries into nested models. You declare the shape once, and parsing follows it.
from pydantic import BaseModel
class AddressModel(BaseModel):
street: str
city: str
postal_code: str
class OrderLineModel(BaseModel):
sku: str
quantity: int
class OrderModel(BaseModel):
order_id: str
shipping: AddressModel
lines: list[OrderLineModel] = []
payload = {
"order_id": "A100",
"shipping": {"street": "1 Main", "city": "Oslo", "postal_code": "0001"},
"lines": [{"sku": "SKU1", "quantity": "2"}],
}
order = OrderModel.model_validate(payload)
assert order.lines[0].quantity == 2
Notice the nested parsing: shipping becomes an AddressModel, and each line becomes an OrderLineModel. This is especially useful when your input is JSON from APIs.
One caution: in Pydantic, using [] as a default is handled safely (Pydantic copies defaults), but it is still a good habit to prefer default_factory when you want to emphasize “new list per instance”. In Pydantic v2 you can write:
from pydantic import BaseModel, Field
class OrderModel(BaseModel):
order_id: str
shipping: AddressModel
lines: list[OrderLineModel] = Field(default_factory=list)
Optionality: modeling “missing”, “unknown”, and “empty”
Optionality is not just about allowing None. It is about distinguishing different meanings:
- Missing: the field was not provided at all.
- Unknown: the field exists but the value is unknown (often represented as
None). - Empty: the field exists and is intentionally empty (e.g., empty string, empty list).
Conflating these leads to subtle bugs. For example, “no tags provided” might mean “leave tags unchanged” in a patch request, while “tags: []” might mean “clear all tags”. Your model should make this distinction visible.
Type hints for optional fields
In Python typing, Optional[T] is shorthand for T | None. It means the value may be None, not that the key may be missing. Whether a key can be omitted depends on your framework or constructor defaults.
from dataclasses import dataclass
@dataclass
class Profile:
display_name: str
bio: str | None # can be None, but must be passed unless a default is given
If you want the field to be omittable in a constructor, you must provide a default:
@dataclass
class Profile:
display_name: str
bio: str | None = None # can be omitted
Pydantic: optional vs required, and the “missing vs None” distinction
In Pydantic, a field is required unless it has a default. Optional only affects whether None is accepted as a value.
from pydantic import BaseModel
class UserIn(BaseModel):
email: str
phone: str | None # required, but may be None
# UserIn.model_validate({"email": "a@b.com"}) # validation error: phone missing
To make it optional (omittable), provide a default:
class UserIn(BaseModel):
email: str
phone: str | None = None # omittable and may be None
For patch/update models, you often want “missing means do not change”. Pydantic v2 provides a sentinel for “not provided”: PydanticUndefined internally, and you can use exclude_unset=True when dumping to detect what was provided.
from pydantic import BaseModel
class UserPatch(BaseModel):
display_name: str | None = None
phone: str | None = None
patch = UserPatch.model_validate({"display_name": "New"})
changes = patch.model_dump(exclude_unset=True)
assert changes == {"display_name": "New"}
This step-by-step pattern is useful:
- Validate patch payload into a patch model.
- Dump with
exclude_unset=Trueto get only provided fields. - Apply changes to your domain object explicitly.
It prevents accidental overwrites where omitted fields become None.
Collections with optional elements vs optional collections
These two are different and should be modeled differently:
- Optional collection:
list[T] | Nonemeans the entire list may be missing/unknown. - Collection of optional elements:
list[T | None]means the list exists, but some elements may be missing/unknown.
This matters when validating and when writing business logic. For example, a list of phone numbers might allow null entries in raw input, but your internal model might want to filter them out.
from pydantic import BaseModel
class RawPhones(BaseModel):
phones: list[str | None]
raw = RawPhones.model_validate({"phones": ["123", None, "456"]})
clean = [p for p in raw.phones if p is not None]
If instead the entire field may be absent:
class RawPhones(BaseModel):
phones: list[str] | None = None
Now your logic must handle None at the container level, which often indicates a different meaning (“not provided”).
Nested optionality: optional nested objects and partial nested updates
Nested optionality appears when a nested object may not exist (e.g., a user may not have a company profile), or when you accept partial updates to nested structures.
Optional nested object
from pydantic import BaseModel
class Company(BaseModel):
name: str
vat_id: str | None = None
class User(BaseModel):
email: str
company: Company | None = None
This expresses that company may be absent or explicitly null in JSON. Your code should treat these cases intentionally. For example, “company omitted” in a patch might mean “do not change company”, while “company: null” might mean “remove company”. To represent that distinction, use a patch model and exclude_unset again.
Step-by-step: partial update of a nested structure
Suppose you have a user record and want to apply a patch that may include nested company changes. A robust approach is to create patch models that mirror the structure but make every field optional and omittable.
from pydantic import BaseModel
class CompanyPatch(BaseModel):
name: str | None = None
vat_id: str | None = None
class UserPatch(BaseModel):
email: str | None = None
company: CompanyPatch | None = None
Now apply it in steps:
def apply_user_patch(user: dict, patch: UserPatch) -> dict:
changes = patch.model_dump(exclude_unset=True)
# Top-level fields
for key in ("email",):
if key in changes:
user[key] = changes[key]
# Nested company
if "company" in changes:
if changes["company"] is None:
user["company"] = None
else:
user.setdefault("company", {})
user["company"].update(changes["company"])
return user
This pattern makes the semantics explicit:
- Omitted
company: no change. company: null: remove company.company: { ... }: update only provided nested fields.
Modeling dictionaries and keyed collections
Lists are not always the best representation for “many”. If elements have a natural key, a dictionary can encode uniqueness and enable fast lookups. The type hint dict[str, T] also documents what the key represents.
Example: inventory by SKU
from pydantic import BaseModel
class StockItem(BaseModel):
sku: str
on_hand: int
class Inventory(BaseModel):
# Keyed by SKU
items: dict[str, StockItem]
inv = Inventory.model_validate({
"items": {
"SKU1": {"sku": "SKU1", "on_hand": 10},
"SKU2": {"sku": "SKU2", "on_hand": 0},
}
})
When using keyed collections, consider whether you want to duplicate the key inside the value (here sku appears twice). Sometimes you can remove redundancy by modeling the value without the key and relying on the dict key as the identifier. Other times redundancy is useful to detect mismatches (e.g., payload says key SKU1 but value has sku="SKU9"), in which case you can add a validation rule to enforce consistency.
Constraining collections: length, uniqueness, and element rules
Beyond “list of T”, you often need constraints: at least one element, maximum size, unique items, or element-level patterns.
Pydantic: constrained collections and element types
In Pydantic v2 you can use Field constraints for length and other properties. For uniqueness, prefer set when possible; otherwise validate.
from pydantic import BaseModel, Field
class Survey(BaseModel):
# At least 1 answer
answers: list[str] = Field(min_length=1)
# Tags are unique by construction
tags: set[str] = Field(default_factory=set)
If you must keep order and also enforce uniqueness, you can validate that len(list) == len(set(list)) with a custom validator (implementation details depend on your chosen approach). The key modeling decision is whether uniqueness is a property of the domain (use set) or a property of a particular workflow (validate a list).
Serialization and shape control for nested and optional fields
When you serialize models (to JSON/dicts), optionality affects output. You often want to omit fields that are None or omit fields that were never set.
Pydantic: dumping nested models with omission rules
from pydantic import BaseModel
class Address(BaseModel):
street: str
city: str
postal_code: str | None = None
class User(BaseModel):
email: str
address: Address | None = None
u = User.model_validate({"email": "a@b.com", "address": {"street": "1 Main", "city": "Oslo"}})
as_dict = u.model_dump(exclude_none=True)
# postal_code omitted because it's None
exclude_none=True is useful when None means “not present” in your output contract. If None has meaning (e.g., explicit null), do not exclude it.
For patch semantics, exclude_unset=True is the more important tool: it omits fields that were not provided, even if their default is None.
Practical recipe: modeling an API payload with nested lists and optional fields
This step-by-step example shows a realistic payload: a batch of events, each with optional metadata and nested context. The model should validate element types, allow optional fields, and keep the structure explicit.
Step 1: define nested models
from pydantic import BaseModel, Field
class DeviceContext(BaseModel):
os: str
app_version: str | None = None
class Event(BaseModel):
event_id: str
type: str
timestamp_ms: int
device: DeviceContext | None = None
attributes: dict[str, str] = Field(default_factory=dict)
class EventBatch(BaseModel):
source: str
events: list[Event] = Field(min_length=1)
Step 2: validate incoming data
payload = {
"source": "mobile",
"events": [
{
"event_id": "e1",
"type": "open",
"timestamp_ms": 1700000000000,
"device": {"os": "iOS", "app_version": "1.2.3"},
"attributes": {"campaign": "winter"}
},
{
"event_id": "e2",
"type": "click",
"timestamp_ms": 1700000001000,
"device": None,
"attributes": {}
}
]
}
batch = EventBatch.model_validate(payload)
assert batch.events[0].device.os == "iOS"
Step 3: serialize with the right omission behavior
out = batch.model_dump(exclude_none=True)
# device omitted for events where device is None
This recipe highlights the core techniques:
- Use nested models to avoid raw dicts.
- Use
Field(default_factory=...)for safe container defaults. - Use
min_lengthto enforce non-empty collections. - Use
exclude_none(or not) depending on your output contract.