All courses > Technology and Programming > App Development ::

Offline Data Modeling for Syncable Domains

Capítulo 3

Estimated reading time: 12 minutes

Why Offline Data Modeling Is Different

Offline-first data modeling is the practice of designing your domain data so it can be created, read, updated, and deleted locally, then synchronized safely with a server when connectivity is available. The key difference from traditional “online-only” modeling is that the local database is not a cache; it is a primary source of truth for the user’s work. That changes how you represent identity, relationships, edits, and conflicts.

A syncable domain model must answer questions that online-only models often ignore: How do we uniquely identify objects before the server assigns an ID? How do we represent partial knowledge (e.g., a referenced object not yet downloaded)? How do we record intent (the user’s actions) so it can be replayed? How do we merge concurrent edits without losing meaning? These questions are solved by adding explicit metadata and by choosing data shapes that are stable under reordering, duplication, and retries.

Core Principles for Syncable Domain Models

1) Stable identity from the moment of creation

Every entity that can be created offline needs a globally unique identifier generated on the client. Avoid “temporary IDs” that must be replaced later, because replacement ripples through foreign keys, caches, and UI state. Use UUIDv4, ULID, KSUID, or another collision-resistant ID scheme. ULID/KSUID have the additional benefit of being roughly time-sortable, which can help with local ordering and debugging.

2) Separate domain fields from sync metadata

Keep business meaning (e.g., title, amount, dueDate) separate from synchronization concerns (e.g., version, updatedAt, deletedAt, lastSyncedAt, dirty). This separation reduces accidental coupling and makes it easier to evolve sync strategies without rewriting domain logic.

3) Model deletions explicitly

In syncable systems, deletion is usually a state, not an immediate physical removal. If you physically delete a row locally, you may lose the ability to tell the server about the deletion later, and you may break referential integrity for other offline objects. Use tombstones: mark an entity as deleted with deletedAt (and optionally deletedBy) and keep it until the deletion is acknowledged by the server and no longer needed for conflict resolution.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

4) Prefer mergeable data shapes

Some shapes merge cleanly (sets, maps with per-field timestamps, append-only logs), while others are fragile (single “blob” JSON fields, positional arrays where order matters). When possible, represent user-editable collections as sets of items with stable IDs rather than arrays that rely on index positions.

5) Capture intent when meaning matters

Sometimes you cannot safely merge by “last write wins” without violating user expectations. For example, “increment counter by 1” is not the same as “set counter to 5.” If the user’s action is inherently an operation, model it as an operation (event/command) rather than a final state overwrite. This is especially important for financial totals, inventory adjustments, and collaborative edits.

Choosing a Synchronization-Friendly Data Strategy

State-based replication (snapshot/field merge)

In state-based replication, you synchronize the current state of entities. Conflicts are resolved by comparing versions, timestamps, or per-field metadata. This is often simpler to implement and works well when entities are small and edits are not highly concurrent.

To make state-based replication safer offline, add:

Entity version: a monotonically increasing number or server-issued revision token.
Per-field metadata when needed: e.g., titleUpdatedAt, notesUpdatedAt.
Server acknowledgement markers: e.g., syncStatus or pendingMutationCount.

Operation-based replication (mutation log)

In operation-based replication, you synchronize a log of user actions (mutations). The server applies operations in order (or in a causally consistent way) and returns acknowledgements. This approach is robust to retries and supports richer conflict handling, but requires careful design of operations and idempotency.

Operation-based replication typically needs:

Mutation IDs (unique per operation) for idempotency.
Entity IDs targeted by the operation.
Preconditions (optional): e.g., “apply only if baseVersion == 7”.
Deterministic application on both client and server.

Hybrid approach

Many apps use a hybrid: state-based sync for simple entities and operation logs for sensitive or high-conflict areas (e.g., counters, collaborative lists). Your offline data model should allow both by keeping entity identity stable and by storing a local mutation outbox.

A Practical Domain Example: Tasks with Projects and Comments

Consider a domain with Projects, Tasks, and Comments. Users can create and edit everything offline. Tasks belong to a Project, and Comments belong to a Task. Users can also reorder tasks within a project.

Entity tables (domain + metadata)

// Project (client-generated id, tombstone deletion, revision token from server optional)  Project {    id: string,           // ULID/UUID generated on client    name: string,    color: string | null,    createdAt: number,    updatedAt: number,    deletedAt: number | null,    // sync metadata    serverRevision: string | null,    lastSyncedAt: number | null  }  // Task  Task {    id: string,    projectId: string,     // foreign key to Project.id    title: string,    notes: string,    status: 'todo' | 'doing' | 'done',    dueAt: number | null,    // ordering: stable rank rather than array index    sortKey: string,       // e.g., LexoRank-like string or fractional index    createdAt: number,    updatedAt: number,    deletedAt: number | null,    serverRevision: string | null,    lastSyncedAt: number | null  }  // Comment  Comment {    id: string,    taskId: string,    body: string,    createdAt: number,    updatedAt: number,    deletedAt: number | null,    serverRevision: string | null,    lastSyncedAt: number | null  }

Key modeling choices:

Client-generated IDs allow offline creation without later ID replacement.
Tombstones preserve deletion intent until synced.
sortKey supports reordering without relying on array indices, which are conflict-prone.
serverRevision can store an ETag/revision from the server to detect conflicts.

Step-by-Step: Designing an Offline-First Data Model

Step 1: List user-editable entities and relationships

Start from the domain, not from the API. Enumerate entities users can create or modify offline, and define relationships: one-to-many, many-to-many, optional references, and ownership boundaries (e.g., per workspace/account).

For each relationship, decide whether the client must function with partial data. For example, a Task may reference a Project that is not yet downloaded if the user was invited to a workspace but hasn’t synced everything. If partial data is possible, ensure your UI and local schema can tolerate missing referenced rows (e.g., show “Unknown project” placeholder and keep the foreign key as-is).

Step 2: Choose identity strategy and key types

Pick a single ID strategy across platforms. ULID is popular for offline-first because it is sortable and URL-friendly; UUIDv4 is widely supported and simple. Store IDs as strings to avoid platform differences.

Also decide if you need a separate localId. In most cases, you do not. A single globally unique ID is enough and reduces complexity.

Step 3: Add minimal sync metadata per entity

At minimum, include:

updatedAt: local timestamp of last modification (used for UI and sometimes for merge hints).
deletedAt: tombstone marker.
serverRevision (optional but recommended): a token the server returns after accepting changes.
lastSyncedAt (optional): useful for diagnostics and cleanup policies.

Avoid adding too many flags early. If you also maintain an outbox (recommended), you can derive “dirty” status by checking whether there are pending mutations for that entity.

Step 4: Decide how to represent collections and ordering

Offline reorder is a classic conflict hotspot. If you store an integer position and two devices reorder concurrently, you can easily end up with duplicates or gaps that require renumbering.

Prefer one of these patterns:

Fractional indexing: assign positions like 1.0, 1.5, 2.0 and insert between; periodically rebalance.
String ranks: LexoRank-like strings that can generate a value between two ranks without renumbering the whole list.
Move operations: store reorder as operations (moveTask(taskId, beforeTaskId)) in an operation log.

For many apps, a sortKey string rank is a pragmatic choice: it keeps state-based sync feasible while reducing conflicts.

Step 5: Model many-to-many relationships with join entities

For tags, collaborators, or memberships, use explicit join tables with their own IDs and tombstones. This makes adds/removes mergeable and auditable.

Tag { id, name, createdAt, updatedAt, deletedAt, serverRevision }  TaskTag {   id,           // client-generated   taskId,   tagId,   createdAt,   deletedAt,   serverRevision }

Do not store tags as an array field inside Task if you expect concurrent edits. Arrays are hard to merge; join entities behave like a set of edges.

Step 6: Introduce an outbox for pending mutations

Even if you use state-based sync, an outbox helps you guarantee delivery and handle retries without duplicating effects. The outbox stores what needs to be sent to the server, in order, with idempotency keys.

OutboxMutation {   id: string,            // mutationId (UUID)   entityType: 'Project' | 'Task' | 'Comment' | 'TaskTag',   entityId: string,   type: 'create' | 'update' | 'delete',   payload: string,       // JSON payload or structured columns   createdAt: number,   attemptCount: number,   lastAttemptAt: number | null,   status: 'pending' | 'inFlight' | 'acked' | 'failed' }

Important modeling detail: the outbox should be written in the same local transaction as the entity change. That ensures you never have a local update without a corresponding sync intent.

Step 7: Define conflict resolution semantics per field

Not all fields should resolve the same way. Decide per field (or per entity) whether to use:

Last write wins (LWW): acceptable for cosmetic fields like color, sometimes for titles.
Merge: e.g., union of a set, or per-field LWW on a map.
Operational: counters, balances, or any field where “add” differs from “set”.

Encode these decisions in your model by choosing appropriate shapes. For example, represent a checklist as items with IDs rather than a single text blob, so you can merge item-level changes.

Handling Partial Data and Placeholders

Offline-first apps often operate with incomplete local knowledge: you may have a Task referencing a Project that hasn’t synced yet, or you may have a Comment whose parent Task is tombstoned locally but not yet removed.

Modeling techniques that help:

Nullable foreign keys only when the relationship is truly optional. Otherwise keep the foreign key and allow the referenced row to be missing temporarily.
Soft constraints: enforce referential integrity in application logic rather than hard database constraints if missing references are expected during sync.
Placeholder entities: create a minimal stub row when you first see a reference, then fill it in when data arrives.

Example: when receiving a Task from the server with projectId that is not in the local DB, insert a Project stub with name = '(Loading...)' and a flag like isStub = true (if you choose to add it). Keep stub flags out of your core domain if possible; they are UI/sync concerns.

Time, Timestamps, and Causality

Relying on client timestamps for conflict resolution can be risky because device clocks drift. Prefer server-issued revisions for authoritative ordering. Still, local timestamps are valuable for UI (“edited 2 minutes ago”) and for local sorting.

Common patterns:

Use serverRevision for concurrency control: send the last known revision with updates; server rejects if stale.
Use client updatedAt for local UX: show recent edits immediately and keep ordering stable offline.
Store both createdAt and updatedAt: createdAt is often immutable; updatedAt changes per edit.

If you need stronger causality (e.g., collaborative editing), consider vector clocks or CRDTs, but only if your product truly needs them; they add complexity to the model and payloads.

Idempotency and Deduplication in the Model

Offline sync involves retries. Your model should assume the same mutation may be sent multiple times. Idempotency is achieved by including a unique mutation ID and ensuring the server records it (or can derive it) to avoid applying duplicates.

Model-level implications:

Outbox mutations need a stable id that never changes across retries.
For create operations, the entity ID itself can serve as an idempotency key if the server treats “create with same ID” as upsert.
For operations like “add tag”, the join entity ID can prevent duplicates: creating the same TaskTag.id twice is safe if the server upserts.

Schema Evolution: Designing for Change

Offline databases live on devices for a long time. Your model should anticipate migrations and mixed-version sync.

Practical guidelines:

Additive changes are easiest: add nullable columns or new tables rather than changing existing meanings.
Keep payloads forward-compatible: if you store outbox payload as JSON, include a schemaVersion or operation version so the server can interpret it.
Avoid renaming fields without mapping: support both old and new fields during a transition.
Be careful with enum changes: unknown enum values should not crash the app; store as strings and handle defaults.

Worked Example: Modeling a Counter Safely

Suppose Tasks have a pomodoroCount that users can increment offline. If you model it as a simple integer field and sync state with LWW, two devices incrementing concurrently can lose increments.

Better modeling options:

Option A: Operation-based increments

// Task field remains an integer, but changes are synced as operations  OutboxMutation { type: 'incrementPomodoro', entityId: taskId, payload: { delta: 1 } }

The server applies increments additively, making retries safe with mutation IDs.

Option B: Per-device contributions (CRDT-like, simplified)

TaskPomodoroShard {   id: string,          // e.g., taskId + ':' + deviceId   taskId: string,   deviceId: string,   value: number,        // monotonically increasing   updatedAt: number,   serverRevision: string | null }

Total is the sum across shards. This is mergeable but adds complexity and requires device identity management.

Choosing between these is a modeling decision: if you already have an outbox and operation processing, Option A is usually simpler.

Validation and Invariants in an Offline Model

Offline edits must be validated locally, but some invariants depend on server knowledge (e.g., uniqueness across a workspace). Model your data to support “optimistic acceptance” locally while allowing later correction.

Local-only constraints: required fields, type checks, ranges. Enforce immediately.
Server-validated constraints: global uniqueness, permission checks. Represent potential rejection by storing sync error state per mutation (e.g., status='failed' and errorCode in the outbox) and by keeping the local entity so the user can fix it.
Derived fields: avoid storing derived totals that can go stale; compute from base data when possible, or store with clear recomputation rules.

Checklist: What to Verify in Your Offline Data Model

Every offline-creatable entity has a client-generated globally unique ID.
Deletions are represented as tombstones and are syncable.
Relationships are robust to partial data and missing references.
Collections are modeled as sets/join entities rather than arrays when concurrency is expected.
Ordering uses a conflict-resistant strategy (rank keys or move operations).
There is a durable outbox written transactionally with entity changes.
Mutations are idempotent via mutation IDs and server-side deduplication.
Conflict semantics are defined per field/entity and reflected in the data shape.
Schema evolution is considered (additive changes, tolerant parsing, versioned payloads).

Now answer the exercise about the content:

In an offline-first app where users can delete items while offline, what is the most reliable way to model deletions so they can be synced safely later?

You are right! Congratulations, now go to the next page

You missed! Try again.

Tombstones preserve deletion intent offline. Keeping a deletedAt marker prevents losing the ability to tell the server about the deletion later and helps avoid breaking relationships until the deletion is acknowledged.