Free Ebook cover Python Data Modeling in Practice: Dataclasses, Pydantic, and Type Hints

Python Data Modeling in Practice: Dataclasses, Pydantic, and Type Hints

New course

14 pages

Modeling Collections, Nested Structures, and Optionality

Capítulo 8

Estimated reading time: 12 minutes

+ Exercise

Why collections, nesting, and optionality change the shape of your model

Once your models move beyond a single object, you start modeling relationships: “an order has many lines”, “a user has one profile”, “a payload contains a list of events”, “a record may or may not have a middle name”. These three themes—collections, nested structures, and optionality—are where many real-world bugs hide: accidental mutation of shared lists, confusing None with “empty”, accepting malformed nested data, or losing type information when you pass dictionaries around.

This chapter focuses on practical patterns for representing these shapes in Python using type hints, dataclasses, and Pydantic. The goal is to make your intent explicit: what can be repeated, what can be missing, what must be present, and how nested objects are validated and serialized.

Modeling collections: choosing the right container type

Python offers many collection types, and your choice communicates rules. Use the narrowest type that matches your intent.

  • list[T]: ordered, allows duplicates, mutable. Good for “lines in the order as entered”.
  • tuple[T, ...]: ordered, allows duplicates, immutable. Good for “a snapshot of items” or “a fixed set of values”.
  • set[T]: unordered, unique elements. Good for “tags” or “capabilities”.
  • frozenset[T]: immutable set. Good for “a stable set of permissions”.
  • dict[K, V]: mapping keyed by something meaningful (IDs, codes). Good for “items by SKU”.
  • collections.abc.Sequence[T] or Iterable[T]: interface types when you want to accept many inputs without committing to a concrete container.

A common modeling mistake is to default to list everywhere. If your domain meaning is “unique”, use a set. If your meaning is “read-only snapshot”, use a tuple or frozenset. This reduces the need for extra validation code.

Dataclasses: safe defaults for collection fields

With dataclasses, never use a mutable object as a default value directly. Use default_factory so each instance gets its own container.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

from dataclasses import dataclass, field

@dataclass
class Cart:
    # Each Cart gets its own list
    item_ids: list[str] = field(default_factory=list)
    # Each Cart gets its own set
    tags: set[str] = field(default_factory=set)

This is not just a style rule; it prevents shared state across instances. It also makes your model behavior predictable when you append or add items.

Pydantic: collection parsing and element validation

Pydantic will parse and validate collection elements based on the type parameter. If you declare list[int], it will validate each element as an int (and may coerce from strings depending on configuration).

from pydantic import BaseModel

class Metrics(BaseModel):
    samples: list[int]

m = Metrics(samples=["1", 2, 3])
assert m.samples == [1, 2, 3]

When you want stricter behavior (no coercion), use strict types (e.g., StrictInt) or configure the model. The key modeling idea is that “collection of T” is a first-class constraint, not a comment.

Nested structures: composing models instead of passing dicts

Nesting is how you express “has-a” relationships. Instead of representing nested data as a dict with ad-hoc keys, compose models. This gives you: (1) validation at the boundary, (2) better IDE support, and (3) consistent serialization.

Dataclasses: nested dataclasses and conversion boundaries

Dataclasses do not validate input by default. They are excellent for internal domain structures, but you typically need an explicit conversion step when ingesting untrusted data (e.g., JSON). A practical approach is to use Pydantic at the boundary and convert to dataclasses for internal use, or use Pydantic models throughout if that fits your architecture.

Here is a nested dataclass structure for an order with lines and a shipping address:

from dataclasses import dataclass, field

@dataclass
class Address:
    street: str
    city: str
    postal_code: str

@dataclass
class OrderLine:
    sku: str
    quantity: int

@dataclass
class Order:
    order_id: str
    shipping: Address
    lines: list[OrderLine] = field(default_factory=list)

To build this from a dictionary, you must explicitly construct nested objects. That explicitness is a feature: it forces you to decide what happens when keys are missing or types are wrong.

def order_from_dict(data: dict) -> Order:
    shipping = Address(**data["shipping"])
    lines = [OrderLine(**item) for item in data.get("lines", [])]
    return Order(order_id=data["order_id"], shipping=shipping, lines=lines)

Step-by-step, this conversion does three important things: it chooses defaults (empty lines if missing), it creates nested objects (Address, OrderLine), and it centralizes the boundary assumptions in one place.

Pydantic: nested models with automatic parsing

Pydantic shines for nested structures because it will recursively parse nested dictionaries into nested models. You declare the shape once, and parsing follows it.

from pydantic import BaseModel

class AddressModel(BaseModel):
    street: str
    city: str
    postal_code: str

class OrderLineModel(BaseModel):
    sku: str
    quantity: int

class OrderModel(BaseModel):
    order_id: str
    shipping: AddressModel
    lines: list[OrderLineModel] = []

payload = {
    "order_id": "A100",
    "shipping": {"street": "1 Main", "city": "Oslo", "postal_code": "0001"},
    "lines": [{"sku": "SKU1", "quantity": "2"}],
}

order = OrderModel.model_validate(payload)
assert order.lines[0].quantity == 2

Notice the nested parsing: shipping becomes an AddressModel, and each line becomes an OrderLineModel. This is especially useful when your input is JSON from APIs.

One caution: in Pydantic, using [] as a default is handled safely (Pydantic copies defaults), but it is still a good habit to prefer default_factory when you want to emphasize “new list per instance”. In Pydantic v2 you can write:

from pydantic import BaseModel, Field

class OrderModel(BaseModel):
    order_id: str
    shipping: AddressModel
    lines: list[OrderLineModel] = Field(default_factory=list)

Optionality: modeling “missing”, “unknown”, and “empty”

Optionality is not just about allowing None. It is about distinguishing different meanings:

  • Missing: the field was not provided at all.
  • Unknown: the field exists but the value is unknown (often represented as None).
  • Empty: the field exists and is intentionally empty (e.g., empty string, empty list).

Conflating these leads to subtle bugs. For example, “no tags provided” might mean “leave tags unchanged” in a patch request, while “tags: []” might mean “clear all tags”. Your model should make this distinction visible.

Type hints for optional fields

In Python typing, Optional[T] is shorthand for T | None. It means the value may be None, not that the key may be missing. Whether a key can be omitted depends on your framework or constructor defaults.

from dataclasses import dataclass

@dataclass
class Profile:
    display_name: str
    bio: str | None  # can be None, but must be passed unless a default is given

If you want the field to be omittable in a constructor, you must provide a default:

@dataclass
class Profile:
    display_name: str
    bio: str | None = None  # can be omitted

Pydantic: optional vs required, and the “missing vs None” distinction

In Pydantic, a field is required unless it has a default. Optional only affects whether None is accepted as a value.

from pydantic import BaseModel

class UserIn(BaseModel):
    email: str
    phone: str | None  # required, but may be None

# UserIn.model_validate({"email": "a@b.com"})  # validation error: phone missing

To make it optional (omittable), provide a default:

class UserIn(BaseModel):
    email: str
    phone: str | None = None  # omittable and may be None

For patch/update models, you often want “missing means do not change”. Pydantic v2 provides a sentinel for “not provided”: PydanticUndefined internally, and you can use exclude_unset=True when dumping to detect what was provided.

from pydantic import BaseModel

class UserPatch(BaseModel):
    display_name: str | None = None
    phone: str | None = None

patch = UserPatch.model_validate({"display_name": "New"})
changes = patch.model_dump(exclude_unset=True)
assert changes == {"display_name": "New"}

This step-by-step pattern is useful:

  • Validate patch payload into a patch model.
  • Dump with exclude_unset=True to get only provided fields.
  • Apply changes to your domain object explicitly.

It prevents accidental overwrites where omitted fields become None.

Collections with optional elements vs optional collections

These two are different and should be modeled differently:

  • Optional collection: list[T] | None means the entire list may be missing/unknown.
  • Collection of optional elements: list[T | None] means the list exists, but some elements may be missing/unknown.

This matters when validating and when writing business logic. For example, a list of phone numbers might allow null entries in raw input, but your internal model might want to filter them out.

from pydantic import BaseModel

class RawPhones(BaseModel):
    phones: list[str | None]

raw = RawPhones.model_validate({"phones": ["123", None, "456"]})
clean = [p for p in raw.phones if p is not None]

If instead the entire field may be absent:

class RawPhones(BaseModel):
    phones: list[str] | None = None

Now your logic must handle None at the container level, which often indicates a different meaning (“not provided”).

Nested optionality: optional nested objects and partial nested updates

Nested optionality appears when a nested object may not exist (e.g., a user may not have a company profile), or when you accept partial updates to nested structures.

Optional nested object

from pydantic import BaseModel

class Company(BaseModel):
    name: str
    vat_id: str | None = None

class User(BaseModel):
    email: str
    company: Company | None = None

This expresses that company may be absent or explicitly null in JSON. Your code should treat these cases intentionally. For example, “company omitted” in a patch might mean “do not change company”, while “company: null” might mean “remove company”. To represent that distinction, use a patch model and exclude_unset again.

Step-by-step: partial update of a nested structure

Suppose you have a user record and want to apply a patch that may include nested company changes. A robust approach is to create patch models that mirror the structure but make every field optional and omittable.

from pydantic import BaseModel

class CompanyPatch(BaseModel):
    name: str | None = None
    vat_id: str | None = None

class UserPatch(BaseModel):
    email: str | None = None
    company: CompanyPatch | None = None

Now apply it in steps:

def apply_user_patch(user: dict, patch: UserPatch) -> dict:
    changes = patch.model_dump(exclude_unset=True)

    # Top-level fields
    for key in ("email",):
        if key in changes:
            user[key] = changes[key]

    # Nested company
    if "company" in changes:
        if changes["company"] is None:
            user["company"] = None
        else:
            user.setdefault("company", {})
            user["company"].update(changes["company"])

    return user

This pattern makes the semantics explicit:

  • Omitted company: no change.
  • company: null: remove company.
  • company: { ... }: update only provided nested fields.

Modeling dictionaries and keyed collections

Lists are not always the best representation for “many”. If elements have a natural key, a dictionary can encode uniqueness and enable fast lookups. The type hint dict[str, T] also documents what the key represents.

Example: inventory by SKU

from pydantic import BaseModel

class StockItem(BaseModel):
    sku: str
    on_hand: int

class Inventory(BaseModel):
    # Keyed by SKU
    items: dict[str, StockItem]

inv = Inventory.model_validate({
    "items": {
        "SKU1": {"sku": "SKU1", "on_hand": 10},
        "SKU2": {"sku": "SKU2", "on_hand": 0},
    }
})

When using keyed collections, consider whether you want to duplicate the key inside the value (here sku appears twice). Sometimes you can remove redundancy by modeling the value without the key and relying on the dict key as the identifier. Other times redundancy is useful to detect mismatches (e.g., payload says key SKU1 but value has sku="SKU9"), in which case you can add a validation rule to enforce consistency.

Constraining collections: length, uniqueness, and element rules

Beyond “list of T”, you often need constraints: at least one element, maximum size, unique items, or element-level patterns.

Pydantic: constrained collections and element types

In Pydantic v2 you can use Field constraints for length and other properties. For uniqueness, prefer set when possible; otherwise validate.

from pydantic import BaseModel, Field

class Survey(BaseModel):
    # At least 1 answer
    answers: list[str] = Field(min_length=1)
    # Tags are unique by construction
    tags: set[str] = Field(default_factory=set)

If you must keep order and also enforce uniqueness, you can validate that len(list) == len(set(list)) with a custom validator (implementation details depend on your chosen approach). The key modeling decision is whether uniqueness is a property of the domain (use set) or a property of a particular workflow (validate a list).

Serialization and shape control for nested and optional fields

When you serialize models (to JSON/dicts), optionality affects output. You often want to omit fields that are None or omit fields that were never set.

Pydantic: dumping nested models with omission rules

from pydantic import BaseModel

class Address(BaseModel):
    street: str
    city: str
    postal_code: str | None = None

class User(BaseModel):
    email: str
    address: Address | None = None

u = User.model_validate({"email": "a@b.com", "address": {"street": "1 Main", "city": "Oslo"}})

as_dict = u.model_dump(exclude_none=True)
# postal_code omitted because it's None

exclude_none=True is useful when None means “not present” in your output contract. If None has meaning (e.g., explicit null), do not exclude it.

For patch semantics, exclude_unset=True is the more important tool: it omits fields that were not provided, even if their default is None.

Practical recipe: modeling an API payload with nested lists and optional fields

This step-by-step example shows a realistic payload: a batch of events, each with optional metadata and nested context. The model should validate element types, allow optional fields, and keep the structure explicit.

Step 1: define nested models

from pydantic import BaseModel, Field

class DeviceContext(BaseModel):
    os: str
    app_version: str | None = None

class Event(BaseModel):
    event_id: str
    type: str
    timestamp_ms: int
    device: DeviceContext | None = None
    attributes: dict[str, str] = Field(default_factory=dict)

class EventBatch(BaseModel):
    source: str
    events: list[Event] = Field(min_length=1)

Step 2: validate incoming data

payload = {
    "source": "mobile",
    "events": [
        {
            "event_id": "e1",
            "type": "open",
            "timestamp_ms": 1700000000000,
            "device": {"os": "iOS", "app_version": "1.2.3"},
            "attributes": {"campaign": "winter"}
        },
        {
            "event_id": "e2",
            "type": "click",
            "timestamp_ms": 1700000001000,
            "device": None,
            "attributes": {}
        }
    ]
}

batch = EventBatch.model_validate(payload)
assert batch.events[0].device.os == "iOS"

Step 3: serialize with the right omission behavior

out = batch.model_dump(exclude_none=True)
# device omitted for events where device is None

This recipe highlights the core techniques:

  • Use nested models to avoid raw dicts.
  • Use Field(default_factory=...) for safe container defaults.
  • Use min_length to enforce non-empty collections.
  • Use exclude_none (or not) depending on your output contract.

Now answer the exercise about the content:

In a Pydantic patch/update workflow, how can you avoid accidentally overwriting fields that were not provided in the input payload?

You are right! Congratulations, now go to the next page

You missed! Try again.

In patch models, fields may default to None, but omitted fields should mean no change. Dumping with exclude_unset=True keeps only fields that were actually provided, preventing unintended overwrites.

Next chapter

Serialization and Deserialization Strategies

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.