Free Ebook cover Python Data Modeling in Practice: Dataclasses, Pydantic, and Type Hints

Python Data Modeling in Practice: Dataclasses, Pydantic, and Type Hints

New course

14 pages

Dataclasses for Clean Domain Objects

Capítulo 3

Estimated reading time: 12 minutes

+ Exercise

Why dataclasses fit domain objects

Domain objects are the small, focused types that represent concepts in your problem space: Money, EmailAddress, OrderLine, CustomerId, TimeRange, and so on. In Python, you can always model these with plain classes, but you quickly end up writing repetitive code: an __init__, comparisons, a readable __repr__, and sometimes defensive copying or immutability rules. dataclasses reduce that boilerplate while keeping you in “normal Python” (no framework required) and letting you express intent through field definitions and a small set of configuration flags.

For clean domain objects, the key idea is: use dataclasses to generate the mechanical parts (construction, representation, comparison), and write explicit domain logic for invariants and behaviors. A dataclass should not be “just a bag of fields”; it should be a type that protects its own validity and provides operations that make sense in the domain.

Choosing the right dataclass options

frozen: prefer immutability for value objects

Many domain types are value objects: their identity is defined by their attributes, and they are safe to share. For these, @dataclass(frozen=True) is a strong default. It prevents accidental mutation and makes instances hashable (when all fields are hashable), which is useful for using them as dictionary keys or set elements.

from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class Money:
    amount: int  # store minor units (e.g., cents)
    currency: str
    def __post_init__(self) -> None:
        if self.amount < 0:
            raise ValueError("Money.amount must be non-negative")
        if len(self.currency) != 3 or not self.currency.isalpha():
            raise ValueError("Money.currency must be a 3-letter code")
        object.__setattr__(self, "currency", self.currency.upper())

With frozen=True, you cannot assign to fields normally, but you can still normalize values in __post_init__ via object.__setattr__. This is a common pattern for enforcing canonical forms (uppercasing currency codes, trimming whitespace, normalizing phone numbers).

slots: reduce accidental attributes and memory footprint

slots=True prevents adding new attributes at runtime and can reduce memory usage. It also helps keep domain objects “closed” to ad-hoc fields, which is often desirable for correctness.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

eq and order: define comparisons intentionally

Dataclasses generate __eq__ by default based on fields. For value objects, that is usually correct. For entities (objects with identity that can change over time), field-based equality can be wrong: two different entities might temporarily share the same attributes. In those cases, either implement equality yourself or model identity explicitly and compare by identity.

@dataclass(slots=True)
class Customer:
    customer_id: str
    name: str
    email: str
    def __eq__(self, other: object) -> bool:
        if not isinstance(other, Customer):
            return NotImplemented
        return self.customer_id == other.customer_id

Alternatively, keep the dataclass-generated equality but ensure the identity field is the only field participating in comparison by marking other fields with compare=False.

from dataclasses import dataclass, field
@dataclass(slots=True)
class Customer:
    customer_id: str
    name: str = field(compare=False)
    email: str = field(compare=False)

Step-by-step: building a clean value object with invariants

This section walks through a practical pattern you can reuse: define fields, enforce invariants in __post_init__, and add domain methods that keep the object valid.

Step 1: start with a minimal dataclass

from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class EmailAddress:
    value: str

At this point, EmailAddress("not an email") is allowed. The type exists, but it does not protect itself.

Step 2: enforce invariants and normalize

Domain objects should be hard to misuse. Add validation and canonicalization in __post_init__. Keep it lightweight; you can use a simple rule set that matches your domain needs.

@dataclass(frozen=True, slots=True)
class EmailAddress:
    value: str
    def __post_init__(self) -> None:
        v = self.value.strip()
        if "@" not in v:
            raise ValueError("Invalid email address")
        local, _, domain = v.partition("@")
        if not local or not domain or "." not in domain:
            raise ValueError("Invalid email address")
        object.__setattr__(self, "value", v.lower())

Now the object guarantees a basic invariant: it always contains a normalized email string.

Step 3: add domain-friendly behavior

Instead of scattering string operations across the codebase, add methods that express domain intent.

@dataclass(frozen=True, slots=True)
class EmailAddress:
    value: str
    def __post_init__(self) -> None:
        v = self.value.strip()
        if "@" not in v:
            raise ValueError("Invalid email address")
        local, _, domain = v.partition("@")
        if not local or not domain or "." not in domain:
            raise ValueError("Invalid email address")
        object.__setattr__(self, "value", v.lower())
    @property
    def domain(self) -> str:
        return self.value.split("@", 1)[1]
    def is_corporate(self, corporate_domain: str) -> bool:
        return self.domain == corporate_domain.lower()

The rest of the system can now depend on EmailAddress rather than raw strings, reducing repeated checks and edge cases.

Entities with dataclasses: identity, mutation, and invariants

Entities often change over time (status, address, assigned agent). For these, frozen=True may be inappropriate. You can still use dataclasses to generate the initializer and representation, while carefully controlling mutation through methods that enforce rules.

Encapsulate state changes through methods

A common anti-pattern is to expose mutable fields and let any caller assign to them. A cleaner approach is to keep fields “public” (Python does not enforce privacy) but treat them as internal and provide explicit methods for state transitions. You can also use properties if you want stricter control.

from dataclasses import dataclass, field
from datetime import datetime
@dataclass(slots=True)
class Order:
    order_id: str
    created_at: datetime
    status: str = field(default="draft")
    paid_at: datetime | None = field(default=None)
    def mark_paid(self, when: datetime) -> None:
        if self.status == "cancelled":
            raise ValueError("Cannot pay a cancelled order")
        if self.paid_at is not None:
            raise ValueError("Order is already paid")
        if when < self.created_at:
            raise ValueError("paid_at cannot be earlier than created_at")
        self.status = "paid"
        self.paid_at = when
    def cancel(self) -> None:
        if self.status == "paid":
            raise ValueError("Cannot cancel a paid order")
        self.status = "cancelled"

The dataclass provides a clean constructor and readable representation, while the methods ensure transitions remain valid.

Use __post_init__ to validate initial state

Even for mutable entities, validate that the object starts in a consistent state.

@dataclass(slots=True)
class Order:
    order_id: str
    created_at: datetime
    status: str = field(default="draft")
    paid_at: datetime | None = field(default=None)
    def __post_init__(self) -> None:
        allowed = {"draft", "paid", "cancelled"}
        if self.status not in allowed:
            raise ValueError(f"Invalid status: {self.status}")
        if self.status == "paid" and self.paid_at is None:
            raise ValueError("paid_at must be set when status is paid")
        if self.status != "paid" and self.paid_at is not None:
            raise ValueError("paid_at must be None unless status is paid")

Modeling collections safely: default factories and defensive design

Domain objects often contain collections: order lines, tags, applied discounts. In dataclasses, never use a mutable object as a default value directly. Use default_factory so each instance gets its own list/dict/set.

from dataclasses import dataclass, field
@dataclass(slots=True)
class Cart:
    cart_id: str
    item_skus: list[str] = field(default_factory=list)

Beyond avoiding shared defaults, consider how callers can mutate collections. If you want stronger protection, expose tuples instead of lists, or provide methods that manage updates.

@dataclass(slots=True)
class Cart:
    cart_id: str
    _item_skus: list[str] = field(default_factory=list, repr=False)
    def add(self, sku: str) -> None:
        if not sku:
            raise ValueError("sku must be non-empty")
        self._item_skus.append(sku)
    def remove(self, sku: str) -> None:
        self._item_skus.remove(sku)
    @property
    def item_skus(self) -> tuple[str, ...]:
        return tuple(self._item_skus)

This pattern keeps mutation inside the class, where you can enforce invariants (no duplicates, maximum size, allowed SKUs) without relying on callers.

Composing domain objects: nested dataclasses

Dataclasses compose naturally: a domain object can contain other domain objects. This encourages small, reusable types and keeps validation close to the data it protects.

from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class Address:
    line1: str
    city: str
    postal_code: str
    country_code: str
    def __post_init__(self) -> None:
        if not self.line1.strip():
            raise ValueError("Address.line1 is required")
        if not self.city.strip():
            raise ValueError("Address.city is required")
        if len(self.country_code) != 2:
            raise ValueError("country_code must be 2 letters")
        object.__setattr__(self, "country_code", self.country_code.upper())
@dataclass(slots=True)
class Customer:
    customer_id: str
    name: str
    email: EmailAddress
    shipping_address: Address

Notice how Customer does not need to re-validate email or address rules; it relies on those types to be correct by construction.

Controlling what appears in repr and comparisons

Domain objects often contain sensitive or noisy fields (password hashes, tokens, large blobs). Dataclasses let you exclude fields from repr and from comparisons using field(repr=False) and field(compare=False).

from dataclasses import dataclass, field
@dataclass(slots=True)
class ApiCredentials:
    client_id: str
    client_secret: str = field(repr=False)
    def __post_init__(self) -> None:
        if not self.client_id:
            raise ValueError("client_id is required")
        if len(self.client_secret) < 16:
            raise ValueError("client_secret is too short")

This keeps logs and debugging output safer by default.

Creating derived values: computed fields and caching

Sometimes a domain object needs a derived value (for example, a normalized key, a display label, or a precomputed total). With dataclasses, you can compute it in __post_init__ and store it in a field marked init=False. This is useful when the derived value is expensive or you want it to participate in equality/hashing in a controlled way.

from dataclasses import dataclass, field
@dataclass(frozen=True, slots=True)
class ProductCode:
    raw: str
    normalized: str = field(init=False)
    def __post_init__(self) -> None:
        r = self.raw.strip()
        if not r:
            raise ValueError("ProductCode.raw is required")
        n = r.upper().replace("-", "")
        object.__setattr__(self, "raw", r)
        object.__setattr__(self, "normalized", n)

Because the object is frozen, the computed field is stable. You can choose whether equality should use raw, normalized, or both by setting compare flags appropriately.

Replacing values in immutable objects: dataclasses.replace

When using frozen dataclasses, you cannot mutate fields. Instead, create a new instance with a small change using dataclasses.replace. This is especially useful for value objects that evolve through transformations (apply discount, change time zone, adjust rounding) while keeping the original instance intact.

from dataclasses import dataclass, replace
@dataclass(frozen=True, slots=True)
class Percentage:
    value: int  # 0..100
    def __post_init__(self) -> None:
        if not (0 <= self.value <= 100):
            raise ValueError("Percentage must be between 0 and 100")
    def clamp(self, min_value: int, max_value: int) -> "Percentage":
        v = min(max(self.value, min_value), max_value)
        return replace(self, value=v)

This keeps transformations explicit and testable: each method returns a new valid instance.

Interoperability: converting to dictionaries without leaking internals

Domain objects often need to cross boundaries: persistence, messaging, or API responses. Dataclasses provide dataclasses.asdict, but it recursively converts nested dataclasses and collections, which can be convenient but also too permissive (it may include internal fields you intended to hide, and it copies everything).

A practical approach for clean domain objects is to define explicit conversion methods that return exactly what you want to expose. This keeps boundary decisions out of the core dataclass mechanics.

from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class Address:
    line1: str
    city: str
    postal_code: str
    country_code: str
    def to_record(self) -> dict[str, str]:
        return {
            "line1": self.line1,
            "city": self.city,
            "postal_code": self.postal_code,
            "country_code": self.country_code,
        }

For entities with internal mutable collections or private fields, explicit conversion methods are even more valuable because they prevent accidental leakage of internal state.

Testing domain objects built with dataclasses

Dataclasses make tests simpler because construction and equality are straightforward. Focus tests on invariants and behavior: invalid inputs raise errors, normalization happens, and domain methods enforce rules.

  • Test that invalid construction fails (e.g., empty email, negative money).
  • Test canonicalization (currency uppercased, whitespace trimmed).
  • Test that state transitions are guarded (cannot pay twice, cannot cancel after payment).
  • Test that equality matches domain meaning (value objects compare by value; entities compare by identity).
import pytest
from datetime import datetime, timedelta
def test_money_currency_is_normalized():
    m = Money(amount=100, currency="usd")
    assert m.currency == "USD"
def test_order_cannot_be_paid_twice():
    o = Order(order_id="o1", created_at=datetime.utcnow())
    o.mark_paid(datetime.utcnow())
    with pytest.raises(ValueError):
        o.mark_paid(datetime.utcnow() + timedelta(seconds=1))

Practical checklist for clean dataclass-based domain objects

  • Use frozen=True for value objects; prefer immutability when possible.
  • Use slots=True to prevent accidental attributes and reduce memory overhead.
  • Validate and normalize in __post_init__; keep objects valid by construction.
  • For entities, define identity-based equality (custom __eq__ or compare=False on non-identity fields).
  • Use default_factory for mutable defaults; consider exposing immutable views (tuples) of internal collections.
  • Hide sensitive fields from repr with repr=False.
  • Prefer explicit conversion methods (to_record, to_dict) over blanket asdict when crossing boundaries.
  • Keep domain behavior in methods; avoid letting callers mutate fields directly in ways that bypass invariants.

Now answer the exercise about the content:

When designing a dataclass-based domain object, which approach best helps keep the object valid and hard to misuse?

You are right! Congratulations, now go to the next page

You missed! Try again.

Clean domain objects should protect their own validity. Use dataclasses for construction and representation, then validate and normalize in __post_init__ and add methods that enforce domain rules and safe state transitions.

Next chapter

Enforcing Invariants with Post-Initialization and Validators

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.