What “Testing a Data Model” Really Means
Testing data models is not the same as testing CRUD operations or verifying that “a field exists.” A data model is a compact set of rules about what values are allowed, how values relate to each other, and what must always be true. When you test a data model, you are testing those rules as executable contracts.
In practice, model tests answer questions like:
- Which invariants must hold for every instance, regardless of how it is created?
- Which edge cases are likely to break assumptions (empty strings, boundary numbers, time zones, NaN, large collections, Unicode)?
- Which contracts exist between layers (e.g., “this model accepts external input,” “this model is internal-only,” “this model can be serialized to JSON without loss”)?
- How do we prevent regressions when the model evolves (new fields, relaxed constraints, stricter constraints)?
A useful mental model is to treat each data model as a small specification. Your tests should read like that specification and should fail loudly when the specification is violated.
Test Surfaces: Invariants, Edge Cases, and Contracts
Invariants
An invariant is a property that must always be true for a valid instance. Invariants can be local (single field) or relational (multiple fields). Examples include:
- Numeric bounds: quantity > 0, discount between 0 and 1.
- Cross-field rules: end_date must be after start_date; if status is “paid” then paid_at must be set.
- Normalization rules: emails are lowercased; whitespace trimmed; currency codes uppercased.
- Uniqueness constraints inside a collection: line item SKUs must be unique.
Model tests should verify both acceptance (valid examples) and rejection (invalid examples). For invariants, rejection tests are often more important because they prevent silent corruption.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
Edge cases
Edge cases are values that are technically valid types but sit at the boundaries of meaning or representation. They are a common source of production bugs because they bypass “happy path” assumptions.
- Empty and whitespace-only strings, very long strings, Unicode normalization (e.g., accented characters).
- Zero, negative, and extremely large numbers; floating point edge cases like NaN and Infinity.
- Time zones, DST transitions, naive vs aware datetimes.
- Empty lists vs missing lists; duplicates; ordering differences.
- Optional fields: None vs empty string vs missing key.
Edge-case testing is where you encode the “we learned this the hard way” knowledge into a permanent safety net.
Contracts
A contract is a promise about how the model behaves at its boundaries. Contracts are especially relevant when models are used across layers (API boundary, persistence boundary, messaging boundary). Typical contracts include:
- Construction contract: “This model rejects invalid input with a predictable exception type.”
- Serialization contract: “This model can be converted to JSON and back without losing meaning.”
- Compatibility contract: “Adding a new optional field does not break older payloads.”
- Stability contract: “Equality and hashing behave consistently for use as dict keys or set members.”
Contract tests are about integration points, but you can still write them as unit tests around the model itself.
Organizing Model Tests: A Practical Structure
For each model, aim for a small, repeatable test template. A common structure is:
- Valid construction tests: a few representative valid instances.
- Invariant violation tests: one test per rule, with minimal failing inputs.
- Normalization tests: verify canonicalization (case-folding, trimming, rounding rules).
- Edge-case tests: boundary values and tricky representations.
- Contract tests: serialization round-trips, equality/hash behavior, backward compatibility.
Keep tests focused: each test should fail for one reason. When a test fails, it should immediately tell you which rule was broken.
Example Model: Order Discount Rules (Invariant + Edge Cases)
Consider a simplified discount model with cross-field rules. The exact implementation can be dataclasses or Pydantic; the testing approach is the same: treat the model as a contract and test the contract.
from dataclasses import dataclass, field
from datetime import datetime, timezone
from decimal import Decimal
@dataclass(frozen=True)
class Discount:
code: str
percent: Decimal | None = None
amount: Decimal | None = None
starts_at: datetime | None = None
ends_at: datetime | None = None
def __post_init__(self):
# Example invariants (implementation details may differ in your codebase)
if not self.code or not self.code.strip():
raise ValueError("code must be non-empty")
if (self.percent is None) == (self.amount is None):
raise ValueError("exactly one of percent or amount must be set")
if self.percent is not None and not (Decimal("0") < self.percent <= Decimal("1")):
raise ValueError("percent must be in (0, 1]")
if self.amount is not None and self.amount <= Decimal("0"):
raise ValueError("amount must be > 0")
if self.starts_at and self.ends_at and self.ends_at <= self.starts_at:
raise ValueError("ends_at must be after starts_at")
if (self.starts_at and self.starts_at.tzinfo is None) or (self.ends_at and self.ends_at.tzinfo is None):
raise ValueError("datetimes must be timezone-aware")
def is_active(self, now: datetime) -> bool:
if now.tzinfo is None:
raise ValueError("now must be timezone-aware")
if self.starts_at and now < self.starts_at:
return False
if self.ends_at and now >= self.ends_at:
return False
return TrueEven if your real model differs, the tests below illustrate how to cover invariants, edge cases, and contracts.
Step-by-step: Writing invariant tests with pytest
Step 1: Create a helper that builds a valid instance with defaults. This reduces noise and makes each test focus on one rule.
import pytest
from datetime import datetime, timezone
from decimal import Decimal
def valid_discount(**overrides):
base = dict(
code="WELCOME10",
percent=Decimal("0.10"),
amount=None,
starts_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
ends_at=datetime(2026, 2, 1, tzinfo=timezone.utc),
)
base.update(overrides)
return Discount(**base)Step 2: Test valid construction first. This ensures your helper is correct and provides a baseline.
def test_discount_valid_percent_construction():
d = valid_discount()
assert d.code == "WELCOME10"
assert d.percent == Decimal("0.10")
def test_discount_valid_amount_construction():
d = valid_discount(percent=None, amount=Decimal("5.00"))
assert d.amount == Decimal("5.00")Step 3: Write one test per invariant. Use minimal failing inputs and assert the exception type and message fragment.
def test_discount_code_must_be_non_empty():
with pytest.raises(ValueError, match="code must be non-empty"):
valid_discount(code=" ")
def test_discount_requires_exactly_one_of_percent_or_amount():
with pytest.raises(ValueError, match="exactly one"):
valid_discount(amount=Decimal("1.00")) # percent still set
with pytest.raises(ValueError, match="exactly one"):
valid_discount(percent=None, amount=None)
def test_discount_percent_bounds():
with pytest.raises(ValueError, match="percent must be"):
valid_discount(percent=Decimal("0"), amount=None)
with pytest.raises(ValueError, match="percent must be"):
valid_discount(percent=Decimal("1.5"), amount=None)
def test_discount_amount_must_be_positive():
with pytest.raises(ValueError, match="amount must be"):
valid_discount(percent=None, amount=Decimal("0"))
def test_discount_end_must_be_after_start():
with pytest.raises(ValueError, match="ends_at must be after"):
valid_discount(
starts_at=datetime(2026, 2, 1, tzinfo=timezone.utc),
ends_at=datetime(2026, 2, 1, tzinfo=timezone.utc),
)Step 4: Add edge-case tests that reflect real-world failures. For example, timezone awareness and boundary time comparisons.
def test_discount_rejects_naive_datetimes():
naive = datetime(2026, 1, 1)
with pytest.raises(ValueError, match="timezone-aware"):
valid_discount(starts_at=naive)
def test_is_active_boundary_end_is_exclusive():
d = valid_discount(
starts_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
ends_at=datetime(2026, 1, 2, tzinfo=timezone.utc),
)
assert d.is_active(datetime(2026, 1, 1, 0, 0, tzinfo=timezone.utc)) is True
assert d.is_active(datetime(2026, 1, 2, 0, 0, tzinfo=timezone.utc)) is FalseNotice how the edge-case test encodes a contract: “ends_at is exclusive.” If you later change that rule, the test will force you to update the spec intentionally.
Testing Normalization as a Contract
Many models normalize inputs (trim, case-fold, canonicalize). Normalization is not just a convenience; it is part of the model’s contract. If the model promises that codes are stored uppercase, you should test that behavior explicitly.
Suppose your model normalizes discount codes to uppercase and strips whitespace. You want tests that prove:
- Normalization happens consistently.
- Normalization does not silently change meaning in unexpected ways.
def test_discount_code_is_normalized():
d = Discount(
code=" welcome10 ",
percent=Decimal("0.10"),
amount=None,
starts_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
ends_at=datetime(2026, 2, 1, tzinfo=timezone.utc),
)
# This assertion depends on your model's normalization rules
assert d.code == "WELCOME10"If your model does not normalize but instead rejects non-canonical forms, then the test should assert rejection. The key is to make the rule explicit and stable.
Property-Based Testing for Invariants and Edge Cases
Example-based tests are necessary but not sufficient for tricky domains. Property-based testing generates many inputs automatically and checks that a property always holds. This is especially effective for:
- Numeric boundaries (percent ranges, rounding rules).
- String normalization and Unicode edge cases.
- Collections (duplicates, ordering, empty vs large).
With Hypothesis, you can express invariants as properties. For example, “if construction succeeds, then percent is within bounds and exactly one of percent/amount is set.”
import pytest
from decimal import Decimal
from hypothesis import given, strategies as st
decimals_0_2 = st.decimals(min_value=Decimal("-2"), max_value=Decimal("2"), allow_nan=False, allow_infinity=False)
@given(p=decimals_0_2)
def test_discount_percent_property(p):
# For values in (0, 1], construction should succeed; otherwise fail.
if Decimal("0") < p <= Decimal("1"):
d = Discount(code="X", percent=p, amount=None)
assert d.percent == p
assert d.amount is None
else:
with pytest.raises(ValueError):
Discount(code="X", percent=p, amount=None)This test explores many values you would not think to write by hand. It also forces you to be precise about boundary conditions.
Testing Serialization Round-Trips as a Contract
Even if serialization mechanics were covered elsewhere, the testing angle is different: you are verifying a contract. A round-trip test checks that converting to a transport form and back preserves meaning.
For models with datetimes, decimals, and optional fields, round-trips can fail subtly (timezone loss, decimal stringification, missing vs null). A contract test should pin down your intended behavior.
import json
from dataclasses import asdict
def test_discount_json_round_trip_contract():
d1 = Discount(
code="WELCOME10",
percent=Decimal("0.10"),
amount=None,
starts_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
ends_at=None,
)
# Example: define a stable JSON representation (you may have a dedicated serializer)
payload = {
"code": d1.code,
"percent": str(d1.percent) if d1.percent is not None else None,
"amount": str(d1.amount) if d1.amount is not None else None,
"starts_at": d1.starts_at.isoformat() if d1.starts_at else None,
"ends_at": d1.ends_at.isoformat() if d1.ends_at else None,
}
raw = json.dumps(payload)
loaded = json.loads(raw)
d2 = Discount(
code=loaded["code"],
percent=Decimal(loaded["percent"]) if loaded["percent"] is not None else None,
amount=Decimal(loaded["amount"]) if loaded["amount"] is not None else None,
starts_at=datetime.fromisoformat(loaded["starts_at"]) if loaded["starts_at"] else None,
ends_at=datetime.fromisoformat(loaded["ends_at"]) if loaded["ends_at"] else None,
)
assert d2 == d1This test is not about “how to serialize,” but about ensuring that your chosen representation is stable and reversible. If you later change the representation (e.g., epoch seconds instead of ISO strings), you update the contract intentionally.
Backward/Forward Compatibility Tests for Evolving Models
Models evolve. Fields are added, renamed, or made optional. Compatibility tests prevent accidental breaking changes, especially at boundaries where older payloads may still exist.
Step-by-step: Golden payload tests
Step 1: Store a few representative payloads as “golden” fixtures (as dicts in tests, or JSON files). Include tricky cases: missing optional fields, nulls, and boundary values.
GOLDEN_V1 = {
"code": "WELCOME10",
"percent": "0.10",
"amount": None,
"starts_at": "2026-01-01T00:00:00+00:00",
# ends_at intentionally missing in v1
}
def test_can_load_older_payload_missing_optional_fields():
loaded = GOLDEN_V1
d = Discount(
code=loaded["code"],
percent=Decimal(loaded["percent"]) if loaded.get("percent") else None,
amount=Decimal(loaded["amount"]) if loaded.get("amount") else None,
starts_at=datetime.fromisoformat(loaded["starts_at"]) if loaded.get("starts_at") else None,
ends_at=datetime.fromisoformat(loaded["ends_at"]) if loaded.get("ends_at") else None,
)
assert d.code == "WELCOME10"Step 2: Add a forward-compatibility test if your boundary allows unknown fields (common in event payloads). The contract might be: “unknown keys are ignored.” If that is your rule, test it.
def test_unknown_fields_do_not_break_loading_contract():
payload = dict(GOLDEN_V1)
payload["new_field"] = "ignored"
# Your loader should ignore unknown fields; this example does so by only reading known keys.
d = Discount(
code=payload["code"],
percent=Decimal(payload["percent"]) if payload.get("percent") else None,
amount=None,
starts_at=datetime.fromisoformat(payload["starts_at"]),
ends_at=None,
)
assert d.code == "WELCOME10"These tests force you to decide and document compatibility rules. If you prefer strictness (reject unknown fields), invert the test accordingly.
Testing Equality, Hashing, and “Identity-Like” Behavior
Even when a model is not an entity, equality and hashing are contracts that affect correctness in subtle ways. If a model is used as a dict key, placed in a set, or compared in tests, you should pin down expected behavior.
Common pitfalls include:
- Mutable fields used in hashing (leading to “lost” set members).
- Floating point fields causing surprising equality behavior.
- Datetime comparisons across time zones.
Write tests that reflect how the model is used. For example, if a model is frozen and hashable, verify it behaves correctly in sets.
def test_discount_is_hashable_and_set_stable():
d1 = Discount(code="A", percent=Decimal("0.10"), amount=None)
d2 = Discount(code="A", percent=Decimal("0.10"), amount=None)
s = {d1}
assert d2 in sIf your model intentionally should not be hashable, test that too (e.g., TypeError on hashing). The point is to make the contract explicit.
Testing Collection Invariants: Uniqueness, Totals, and Ordering
Models that contain collections often have invariants that are easy to forget: uniqueness, non-emptiness, totals matching sums, and ordering constraints. These invariants are fertile ground for edge cases (empty list, duplicates, large lists).
Here is a small example of an order with line items where SKUs must be unique and totals must match.
from dataclasses import dataclass
from decimal import Decimal
@dataclass(frozen=True)
class LineItem:
sku: str
qty: int
unit_price: Decimal
@dataclass(frozen=True)
class Order:
items: tuple[LineItem, ...]
total: Decimal
def __post_init__(self):
if not self.items:
raise ValueError("order must have at least one item")
skus = [i.sku for i in self.items]
if len(set(skus)) != len(skus):
raise ValueError("duplicate sku")
computed = sum((i.unit_price * i.qty for i in self.items), Decimal("0"))
if computed != self.total:
raise ValueError("total mismatch")Step-by-step: Testing collection invariants
Step 1: Build a valid order fixture.
def valid_order(**overrides):
items = (
LineItem(sku="SKU1", qty=1, unit_price=Decimal("10.00")),
LineItem(sku="SKU2", qty=2, unit_price=Decimal("5.00")),
)
base = dict(items=items, total=Decimal("20.00"))
base.update(overrides)
return Order(**base)Step 2: Test each invariant with minimal failing examples.
def test_order_requires_at_least_one_item():
import pytest
with pytest.raises(ValueError, match="at least one"):
Order(items=(), total=Decimal("0"))
def test_order_rejects_duplicate_skus():
import pytest
items = (
LineItem(sku="SKU1", qty=1, unit_price=Decimal("10.00")),
LineItem(sku="SKU1", qty=1, unit_price=Decimal("10.00")),
)
with pytest.raises(ValueError, match="duplicate sku"):
Order(items=items, total=Decimal("20.00"))
def test_order_total_must_match_sum():
import pytest
with pytest.raises(ValueError, match="total mismatch"):
valid_order(total=Decimal("19.99"))Step 3: Add edge-case tests: large quantities, zero quantities, and rounding rules if applicable. If your domain disallows qty=0, test that. If it allows it, test that totals still match.
def test_order_large_quantities_do_not_overflow_decimal_logic():
items = (LineItem(sku="BULK", qty=10_000, unit_price=Decimal("0.01")),)
o = Order(items=items, total=Decimal("100.00"))
assert o.total == Decimal("100.00")Contract Tests for “Public Constructors” and Factories
Many codebases expose a public constructor or factory method that is the supported way to create a model (for example, from user input, from a database row, or from an event). Even if the internal model is strict, the factory may apply defaults, coercions, or compatibility behavior. That factory is a contract and deserves dedicated tests.
When writing these tests, focus on:
- What inputs are accepted and how they are interpreted.
- What defaults are applied and under which conditions.
- What errors are raised for invalid inputs (type, message, and field context if applicable).
def discount_from_payload(payload: dict) -> Discount:
# Example factory: tolerant of missing optional keys
return Discount(
code=payload["code"],
percent=Decimal(payload["percent"]) if payload.get("percent") else None,
amount=Decimal(payload["amount"]) if payload.get("amount") else None,
starts_at=datetime.fromisoformat(payload["starts_at"]) if payload.get("starts_at") else None,
ends_at=datetime.fromisoformat(payload["ends_at"]) if payload.get("ends_at") else None,
)
def test_factory_accepts_missing_optional_keys():
d = discount_from_payload({"code": "X", "percent": "0.2"})
assert d.percent == Decimal("0.2")
assert d.amount is NoneThese tests protect the boundary behavior that other parts of the system rely on, even if the internal model rules remain unchanged.
Common Testing Pitfalls and How to Avoid Them
Over-testing implementation details
Tests should assert externally visible behavior: acceptance/rejection, normalized outputs, stable serialization, and defined comparisons. Avoid asserting internal helper calls or exact validator ordering unless that ordering is part of the contract.
Too few negative tests
Model tests often skew toward “it works” examples. For invariants, the most valuable tests are the ones that prove invalid states cannot exist. Aim for at least one negative test per rule.
Ambiguous boundary rules
If you cannot write a crisp test for a boundary (e.g., whether an end timestamp is inclusive), the rule is not clear. Decide the rule and encode it as a test. This turns ambiguity into a stable contract.
Ignoring representation edge cases
Decimals, datetimes, and Unicode strings are frequent sources of subtle bugs. Add targeted tests for:
- Decimal string parsing and formatting ("0.10" vs "0.1").
- Timezone-aware datetimes and DST boundaries.
- Unicode normalization if identifiers are user-provided.