What a Sync Engine Is (and What It Is Not)
A sync engine is the subsystem that reconciles changes between a local data store and a remote system of record. It is responsible for detecting what changed, packaging those changes into network operations, applying remote changes locally, and keeping enough metadata to do this reliably across app restarts, intermittent connectivity, and partial failures.
A sync engine is not the same as a cache, a networking layer, or a database abstraction. It sits above local persistence and above HTTP/WebSocket clients, orchestrating them with deterministic rules. It also is not a UI concern: the UI should observe local state and sync status, but should not decide which records to push or how to resolve conflicts.
Architecturally, a sync engine is easiest to reason about when you treat it as a state machine that consumes events (local mutations, remote updates, retries, auth changes) and produces effects (API calls, local writes, status updates). The rest of this chapter focuses on designing that architecture.
Core Architectural Goals
- Determinism: given the same local state and the same remote responses, the engine should produce the same outcome. This makes debugging and recovery possible.
- Idempotency: operations should be safe to retry without duplicating side effects.
- Crash safety: if the app is killed at any point, the engine can resume without losing or duplicating work.
- Isolation: local user actions should be recorded immediately and independently of network availability; sync should be asynchronous.
- Observability: the engine should expose progress, backlog size, last sync time, and error states in a way that can be logged and surfaced to the UI.
High-Level Components
1) Local Change Capture (Mutation Recorder)
This component records user-initiated changes as durable entries. Instead of directly calling the API, the app writes to local tables/collections and also writes a corresponding mutation record (often called an outbox entry). The mutation record is what the sync engine replays later.
Key responsibilities:
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
- Create mutation records atomically with local state changes.
- Assign stable identifiers to mutations (UUID) and to affected entities.
- Store enough information to replay the change (operation type, payload, preconditions, dependencies).
2) Outbox Processor (Push Pipeline)
The push pipeline reads pending mutation records and turns them into remote API calls. It handles ordering, batching, retries, and mapping server responses back into local state.
Key responsibilities:
- Select next mutations to send (by priority, entity, dependency graph).
- Ensure idempotency via idempotency keys or mutation IDs.
- Handle partial failures and retry policies.
- Update mutation state (pending, in-flight, applied, failed, dead-letter).
3) Inbox Applier (Pull Pipeline)
The pull pipeline fetches remote changes and applies them locally. It may use incremental tokens, timestamps, or server-provided change feeds. The engine should treat remote changes as inputs that must be validated, ordered, and applied transactionally.
Key responsibilities:
- Fetch remote deltas using a cursor/token.
- Apply changes to local store with conflict handling.
- Advance the cursor only after successful local commit.
4) Conflict Resolver
Conflict resolution is the policy layer that decides what to do when local and remote changes overlap. The resolver should be pluggable per entity type because different domains need different rules (for example, notes vs. inventory counts).
Key responsibilities:
- Detect conflicts (version mismatch, concurrent edits, invariant violations).
- Resolve automatically when possible (merge fields, last-writer-wins, server-wins, client-wins, custom merge).
- Escalate to manual resolution when required (produce a conflict record for UI workflows).
5) Scheduler and Triggers
The scheduler decides when to run push and pull. It reacts to triggers such as app start, foregrounding, periodic timers, network availability, authentication refresh, and explicit user actions (pull-to-refresh or “sync now”).
Key responsibilities:
- Debounce frequent triggers (avoid syncing on every keystroke).
- Respect platform constraints (background execution limits).
- Coordinate push and pull ordering (often push first, then pull).
6) Sync Metadata Store
Beyond domain data, the engine needs metadata: outbox entries, per-entity version vectors or etags, last successful cursor, retry counters, and tombstones. Keep this metadata in the same local database for transactional integrity, but separate it logically from domain tables.
Data Structures You Will Likely Need
Outbox Entry Schema
An outbox entry should be durable and expressive enough to replay operations safely.
OutboxEntry = { id: UUID, entityType: string, entityId: string, op: 'create' | 'update' | 'delete', payload: JSON, baseVersion: string | null, idempotencyKey: string, createdAt: timestamp, state: 'pending' | 'in_flight' | 'applied' | 'failed' | 'dead', attemptCount: number, lastError: string | null, dependsOn: UUID[] }baseVersion is the local view of the server version when the mutation was created (etag, revision, or logical clock). It enables optimistic concurrency checks.
dependsOn is critical when you create related entities offline (for example, create a project, then create tasks in that project). Dependencies allow the engine to send mutations in a valid order.
Inbox Change Schema (Optional)
Some architectures store fetched remote changes in an inbox table before applying them. This can improve crash safety and allow validation before touching domain tables.
InboxChange = { id: UUID, cursor: string, entityType: string, entityId: string, op: 'upsert' | 'delete', payload: JSON, serverVersion: string, receivedAt: timestamp, appliedAt: timestamp | null }Per-Entity Sync State
Even if you do not store an inbox, you often need per-entity metadata to support merges and conflict detection.
EntitySyncState = { entityType: string, entityId: string, serverVersion: string | null, lastPulledAt: timestamp | null, deleted: boolean }Step-by-Step: Designing the Push Pipeline
Step 1: Define Mutation Boundaries
Decide what constitutes one mutation. A good rule is: one mutation should map to one server-side command that can be retried idempotently. Avoid bundling unrelated edits into one mutation because it complicates retries and conflict resolution.
Example: editing a task title and toggling completion could be two mutations if your API has separate endpoints, or one mutation if your API supports patching multiple fields with a single idempotent request.
Step 2: Record Mutations Atomically
When the user changes local state, write the domain update and the outbox entry in a single database transaction. This guarantees that the UI never shows a change that the sync engine cannot later push.
// Pseudocode transaction: update task + enqueue mutationBEGIN TRANSACTION UPDATE Task SET title = ? WHERE id = ?; INSERT INTO OutboxEntry(...) VALUES (...);COMMIT;Step 3: Select the Next Work Item
Outbox processing should be deterministic. Common strategies:
- FIFO per entity: process mutations in creation order for each entityId.
- Topological order: if you use dependsOn, process mutations whose dependencies are applied.
- Priority lanes: user-visible operations (sending a message) may be prioritized over background updates.
A practical approach is: pick the oldest pending mutation whose dependencies are satisfied, then lock it for processing.
Step 4: Acquire a Processing Lease
To avoid double-processing (especially if you have multiple workers or app instances), mark the mutation as in-flight with a lease expiration. If the app crashes, the lease expires and the mutation becomes eligible again.
UPDATE OutboxEntrySET state = 'in_flight', leaseUntil = now()+30sWHERE id = ? AND state = 'pending';Step 5: Build an Idempotent Request
Every push request should include an idempotency key derived from the outbox entry ID. If the server supports it, send it as a header. If not, include it in the request body and implement deduplication server-side.
Also include concurrency controls when available (If-Match with etag, or a baseVersion field). This allows the server to reject stale updates with a clear error (409 conflict or 412 precondition failed).
Step 6: Handle Responses and Map to Local State
On success, you typically need to:
- Mark the outbox entry as applied.
- Update the entity’s serverVersion and any server-assigned fields (canonical timestamps, server IDs, normalized content).
- Resolve temporary IDs if the entity was created offline.
Temporary ID mapping is a common requirement. If you create an entity offline with a client-generated ID, you can keep that ID permanently if the server accepts it. If the server assigns IDs, you must store a mapping and rewrite references.
// Example: server returns {serverId: 'A12', clientTempId: 'tmp-9'}BEGIN TRANSACTION UPDATE Project SET id = 'A12' WHERE id = 'tmp-9'; UPDATE Task SET projectId = 'A12' WHERE projectId = 'tmp-9'; UPDATE OutboxEntry SET entityId = 'A12' WHERE entityId = 'tmp-9'; INSERT INTO IdMap(tempId, serverId) VALUES ('tmp-9','A12');COMMIT;Step 7: Retry and Backoff Strategy
Classify errors into buckets:
- Transient: timeouts, 502/503, network errors. Retry with exponential backoff and jitter.
- Auth: 401/403. Pause sync, refresh credentials, then resume.
- Conflict: 409/412. Invoke conflict resolver; may require pulling latest state before retrying.
- Permanent validation: 400 with field errors. Mark mutation failed and create a user-visible remediation record.
Store attemptCount and lastError on the outbox entry. After N attempts, move to dead-letter to prevent infinite loops and to keep the rest of the queue flowing.
Step-by-Step: Designing the Pull Pipeline
Step 1: Choose a Delta Mechanism
The pull pipeline works best with an incremental feed rather than full downloads. Typical options include:
- Change cursor: server returns a cursor token; client stores it and requests changes since that cursor.
- Per-collection revision: server returns a monotonically increasing revision number.
- Timestamp-based: request changes since lastPulledAt (be careful with clock skew; prefer server timestamps).
From an engine perspective, the key is that the token/cursor advances only after local application succeeds.
Step 2: Fetch Pages and Persist Progress Safely
Remote change feeds are often paginated. For each page:
- Fetch changes with current cursor.
- Apply changes locally in a transaction.
- Persist the new cursor in the same transaction.
// Pseudocode: pull page and applyBEGIN TRANSACTION for change in page.changes: applyChange(change) UPDATE SyncCursor SET value = page.nextCursor;COMMIT;This ensures that if the app crashes after applying half a page, you do not advance the cursor prematurely.
Step 3: Apply Remote Changes with Local Awareness
When applying a remote upsert/delete, you must consider local pending mutations. A common rule is: do not blindly overwrite fields that the user has edited locally but not yet pushed.
Practical techniques:
- Field-level dirty tracking: store which fields have local edits pending; merge remote updates into non-dirty fields.
- Entity-level shadow copy: keep a “base” snapshot of last-synced state; compute diffs to merge.
- Rebase pending mutations: if server state changed, rewrite pending patches to apply on top of the new base.
Which technique you choose depends on your domain complexity. The architectural point is to make this logic explicit in the conflict resolver layer, not scattered across network callbacks.
Coordinating Push and Pull
Preferred Ordering: Push Then Pull
In many products, pushing first reduces conflicts because the server sees the client’s latest intent before the client ingests remote changes. A typical cycle:
- Run push until outbox is empty or blocked (auth/conflict/permanent failure).
- Run pull to fetch remote deltas.
- Optionally run push again if pull introduced server-side transformations that require follow-up.
When Pull First Makes Sense
If the server enforces strict version checks, pulling first can reduce 412/409 errors by ensuring the client has the latest serverVersion before attempting updates. In that case, the engine can do:
- Pull latest versions for entities that have pending mutations (targeted pull).
- Then push those mutations with updated baseVersion.
This is a good example of why the scheduler should be policy-driven and configurable.
Conflict Handling as an Architectural Boundary
Conflict Detection Inputs
Design the resolver API so it receives the same set of inputs regardless of platform:
- Local current entity state.
- Local base (last-synced) entity state, if available.
- Remote entity state or remote change payload.
- Pending local mutations affecting the entity.
- Error context (HTTP status, server error codes, violated constraints).
Resolver Outputs
The resolver should output one of:
- Auto-merge result: a merged entity to write locally plus possibly a rewritten mutation to push.
- Server-wins/client-wins decision: apply one side and discard or transform the other.
- Manual conflict record: create a conflict item for UI, pause further mutations for that entity.
Keep the resolver pure (no I/O) when possible. This makes it testable with fixtures.
State Machine Model for the Engine
Modeling the sync engine as a state machine helps avoid ad-hoc logic. A simplified set of states:
- Idle: no work or waiting for triggers.
- Pushing: processing outbox entries.
- Pulling: fetching and applying remote changes.
- BlockedAuth: cannot proceed until auth is refreshed.
- BlockedConflict: requires conflict resolution or user action.
- Backoff: waiting before retry after transient failure.
Events that cause transitions include: mutationEnqueued, networkAvailable, timerFired, pushSucceeded, pushFailedTransient, pushFailedConflict, pullSucceeded, pullFailedTransient, authRefreshed.
// Example transition sketchif state == Idle and event == mutationEnqueued: state = Pushingif state == Pushing and outboxEmpty: state = Pullingif state == Pushing and event == pushFailedAuth: state = BlockedAuthif state == Pulling and event == pullFailedTransient: state = BackoffThis model is especially useful when implementing the engine across platforms (iOS/Android/Flutter/React Native) because it clarifies what must be consistent and what can be platform-specific (threading, background scheduling APIs).
Transactions, Ordering, and Invariants
Atomicity Rules
- Recording a local change and enqueuing its outbox entry must be atomic.
- Applying a remote page and advancing the cursor must be atomic.
- Marking an outbox entry applied and updating serverVersion must be atomic.
These rules prevent “phantom” states where the UI shows something that cannot be reconciled later.
Ordering Rules
Define ordering explicitly, and encode it in the outbox selection logic:
- Mutations for the same entity should generally be processed in order.
- Creates must precede updates/deletes for the same entity.
- Dependent entities must wait for parent creation if the server requires server IDs.
If you violate ordering, you will see hard-to-debug errors like 404 on update because the create has not been pushed yet.
API Design Considerations That Affect the Engine
Command-Style Endpoints vs. Resource Patches
A sync engine is simpler when the server supports idempotent commands with clear semantics. For example, POST /tasks/{id}:complete with an idempotency key is often easier to replay than a generic patch that depends on current server state.
If you use patch endpoints, prefer sending explicit field updates and include baseVersion to detect conflicts.
Server-Generated Side Effects
Servers often add derived fields (updatedAt, normalized text, computed totals). The engine should treat server responses as authoritative and write them back locally. This implies your push handler should always parse the response body and update local entities, not just mark the mutation applied.
Operational Concerns: Observability and Debuggability
Expose Sync Metrics
At minimum, track:
- Outbox pending count, oldest pending age.
- Last successful push time, last successful pull time.
- Current state (Idle/Pushing/Pulling/Blocked).
- Last error summary and the entity/mutation involved.
These metrics should be accessible to the UI (for status indicators) and to logs (for support diagnostics).
Structured Logging with Correlation IDs
Use the outbox entry ID as a correlation ID across logs: mutation recorded, request sent, response received, local commit completed. This makes it possible to reconstruct what happened on a user device.
Reference Implementation Blueprint (Putting It Together)
The following blueprint shows how the pieces interact without prescribing a specific framework:
- Domain layer writes to local store and enqueues outbox entries in the same transaction.
- SyncEngine exposes
triggerSync(reason)and maintains a state machine. - OutboxRepository provides deterministic selection and leasing of pending entries.
- RemoteClient executes idempotent requests and returns typed results (success, transient error, auth error, conflict, validation error).
- ConflictResolver is a pure module invoked on conflict outcomes.
- PullClient fetches change pages using a stored cursor; ChangeApplier applies them transactionally.
- SyncMetadataStore persists cursor, versions, and engine bookkeeping.
// Simplified engine loopwhile (hasTrigger): if (canPush): result = pushNextOutboxEntry() handle(result) else if (canPull): result = pullNextPage() handle(result) else: waitForEvent()Even if your actual implementation is event-driven rather than a loop, this blueprint helps ensure you have clear boundaries: recording, pushing, pulling, resolving, and scheduling.