All courses > Technology and Programming > App Development ::

Optimistic Updates, Operation Queues, and Idempotency

Capítulo 8

Estimated reading time: 14 minutes

Why Optimistic Updates Matter in Offline-First Apps

In an offline-first mobile app, users expect the interface to respond immediately when they tap “Like”, edit a title, or move an item to another list. Waiting for the network round-trip (or blocking when offline) makes the app feel broken, even if it is technically correct. Optimistic updates solve this by applying the user’s intent locally first, updating the UI instantly, and then reconciling with the server later.

Optimistic updates are not “fake” updates; they are a deliberate UX and data strategy: you record the user’s intent as an operation, apply a local projection of the result, and then attempt to make the server reflect the same change. If the server later rejects the change, you must either roll back or compensate in a way that keeps the UI and local state coherent.

To implement optimistic updates reliably, you need two supporting pillars: an operation queue (to persist and replay user intents) and idempotency (to ensure retries do not create duplicate effects). This chapter focuses on how these three pieces fit together and how to implement them in a resilient way.

Optimistic Updates: The Core Pattern

What an optimistic update actually is

An optimistic update is a local state transition that assumes the remote write will eventually succeed. The key is that you do not just mutate local data; you also record enough information to later synchronize the same intent to the server and to recover if the app restarts mid-flight.

Conceptually, each user action becomes an operation with:

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Intent: what the user wanted (e.g., “rename task T to ‘Buy milk’”).
Local application: how you update local state immediately (e.g., set task.title locally).
Remote application: how you call the API later (e.g., PATCH /tasks/T).
Reconciliation: what to do if the server response differs or fails (e.g., revert title, show error, or keep local and mark as conflicted).

Optimistic UI vs optimistic data

Many teams implement “optimistic UI” by updating the visible UI state but not persisting the change. In offline-first apps, you typically need optimistic data: the local database should reflect the change immediately so that the app remains consistent across screens, survives process death, and supports background sync.

A practical approach is to treat the local database as the source of truth for the UI, and apply optimistic updates by writing to the database plus recording an operation in the queue in the same transaction.

When optimistic updates are safe and when they are risky

Optimistic updates are easiest when the operation is:

Deterministic: applying it locally yields the same result as the server would (e.g., setting a field to a value).
Commutative or mergeable: order doesn’t matter much (e.g., adding tags to a set).
Reversible: you can undo it if needed (e.g., toggling a boolean).

They are trickier when the server enforces constraints you cannot fully validate locally (permissions, uniqueness, quotas), or when the server assigns canonical values (slugs, sequence numbers). In those cases, you can still be optimistic, but you must plan for correction after the server responds.

Operation Queues: Turning User Actions into Durable Work

What an operation queue is

An operation queue is a persistent list of pending work items representing user intents that must be sent to the server. It is not just a retry mechanism; it is the backbone that makes optimistic updates durable and replayable.

Each operation should be stored in local persistence (e.g., SQLite/Room, Core Data, Realm) so that if the app is killed, the queue remains and can resume later.

Recommended operation record shape

A minimal, practical schema for an operation might include:

op_id: a unique identifier for the operation (UUID).
entity_type and entity_id: what it targets.
type: create/update/delete/move/add_member/etc.
payload: JSON describing the intended change (patch fields, new values).
created_at, attempt_count, next_attempt_at.
state: pending, in_flight, succeeded, failed_permanent, blocked.
idempotency_key: value to send to the server to dedupe retries.
depends_on: optional reference to another op_id if ordering is required.

Keep payloads as small as possible but sufficient to replay the intent. Avoid storing entire entity snapshots unless you need them for rollback or conflict UI.

Queue ordering: global vs per-entity

Many apps start with a single FIFO queue. This is simple but can cause unnecessary blocking: one failing operation can stall unrelated entities. A more resilient approach is to process operations per entity (or per “sync group”), allowing independent progress.

Common strategies:

Per-entity serialization: operations for the same entity must run in order; different entities can run concurrently.
Dependency graph: a create operation must succeed before updates to that entity can run.
Priority lanes: user-visible operations (e.g., sending a message) may be prioritized over background maintenance.

Coalescing and squashing operations

When offline, users may edit the same item repeatedly. Sending every intermediate change wastes bandwidth and increases conflict surface. Coalescing means combining multiple queued operations into a smaller set that yields the same final state.

Examples:

Three “rename” operations on the same task can be squashed into one with the final title.
A “set completed=true” followed by “set completed=false” can cancel out entirely.
Multiple “add tag” operations can be merged into a single set-add operation.

Coalescing must preserve semantics. Be careful with operations that have side effects (e.g., “send email”, “charge card”)—those should generally not be squashed.

Step-by-step: Writing an optimistic update with a durable queue

This is a practical sequence you can implement in any platform stack:

Step 1: Validate locally. Check basic constraints you can enforce offline (non-empty title, max length). If invalid, reject immediately.
Step 2: Create an operation record. Generate op_id and idempotency_key. Build payload (e.g., {"title":"Buy milk"}).
Step 3: Apply local mutation. Update the entity in the local database to reflect the new title. Mark it as having pending changes if you track that.
Step 4: Persist both atomically. In a single local transaction, write the entity update and insert the operation into the queue. This prevents “UI changed but no op recorded” and “op recorded but UI not updated” inconsistencies.
Step 5: Update UI from local DB. The UI should observe the database and reflect the new title immediately.
Step 6: Background processor picks it up. A worker reads pending operations, sends them, and updates operation state based on the response.
Step 7: Reconcile server response. If the server returns canonical fields (e.g., updated_at, normalized title), write them to local DB. Mark operation succeeded and clear pending flags.
Step 8: Handle failure. If transient, schedule retry with backoff. If permanent, mark failed and trigger a UI affordance (error badge, “tap to resolve”).

Idempotency: Making Retries Safe

The problem retries create

Offline-first apps retry. Networks drop, timeouts happen, the app is backgrounded mid-request, or the server responds but the client never receives it. Without idempotency, a retry can apply the same operation twice, producing duplicates (two comments, two payments, two “likes”).

Idempotency means: performing the same operation multiple times results in the same final state as performing it once.

Client-generated idempotency keys

A practical technique is to generate an idempotency key per operation and send it with the request (header or field). The server stores the key and the result for a period of time. If it receives the same key again, it returns the same result without applying the change again.

Guidelines:

Generate a UUID per operation and persist it in the operation record.
Reuse the same key for all retries of that operation.
Do not reuse keys across different intents, even if payload is identical.
Prefer server endpoints that explicitly support idempotency for non-idempotent actions (POST create, side-effect operations).

Idempotent operation design by type

Different operation types have different idempotency approaches:

Set field (PATCH): naturally idempotent if it sets a value (title = X). Retrying yields same state.
Increment (POST /increment): not idempotent by default. Prefer “set to value” or include idempotency key so server increments only once per key.
Create (POST): can be made idempotent by using a client-generated ID (UUID) as the resource ID, or by idempotency key mapping to the created resource.
Delete: often idempotent if deleting a missing resource returns success or a consistent “already deleted” response.
Add to set: idempotent if server treats it as set semantics (add member if not present). If server uses list semantics, you need idempotency keys or unique constraints.

Step-by-step: Making “create” safe with client IDs

Creates are the most common source of duplicates. A robust pattern is to assign the new object an ID on the client before sending it.

Step 1: Generate entity_id locally (UUID).
Step 2: Insert the new entity locally with that ID and mark it as pending creation.
Step 3: Enqueue a create operation whose payload includes the same entity_id and fields.
Step 4: Server accepts client-provided IDs (or maps idempotency_key to a created server ID).
Step 5: On retry, the server sees the same ID and returns the existing resource rather than creating a second one.

If your server cannot accept client IDs, you can still use idempotency keys: the first successful create stores a mapping from key to created resource, and retries return that resource.

Reconciling Optimistic State with Server Reality

Canonicalization and server-calculated fields

Even when an operation succeeds, the server may return canonical values: normalized text, computed counters, permissions-based fields, or updated timestamps. Your optimistic local state should be treated as a prediction. After success, overwrite local fields with the server response where appropriate.

A common approach is to store a “pending” marker and show subtle UI indicators (e.g., a small spinner or “sending…” state) until the operation is acknowledged. Once acknowledged, clear the marker and apply canonical fields.

Handling permanent failures without breaking UX

Some failures are not transient: permission denied, validation rules you couldn’t check offline, or the entity no longer exists on the server. For these, endless retries are harmful. Mark the operation as permanently failed and surface a resolution path.

Resolution patterns:

Rollback: revert the optimistic change to the last known good state.
Compensate: apply a new local operation that brings the UI to a valid state (e.g., remove a member that couldn’t be added).
User intervention: show “Tap to fix” and allow editing to satisfy constraints.

Choose based on the domain. For example, a failed “send message” might remain visible with a “retry” button; a failed “rename” might revert and show an error toast.

Out-of-order acknowledgments and race conditions

When you allow concurrency, you can receive responses out of order. If you apply server responses naively, you can overwrite newer optimistic changes with older acknowledgments.

Mitigations:

Per-entity sequencing: process operations for an entity in order and only send the next after the previous is acknowledged.
Client operation versioning: store a local “mutation sequence” number on the entity; only apply server responses if they correspond to the latest acknowledged sequence.
Patch-based reconciliation: apply only the fields that the operation touched, not a full entity overwrite.

Designing the Operation Processor

State machine for operations

A simple, effective state machine:

pending: ready to send when allowed.
in_flight: currently being sent; store a lease timestamp to recover if the app dies.
succeeded: acknowledged; can be deleted or archived.
failed_transient: retry later with backoff.
failed_permanent: do not retry automatically; requires user action or code path to resolve.
blocked: waiting on dependency (e.g., update blocked until create succeeds).

Use a lease mechanism for in_flight operations: if an op is in_flight longer than a timeout, return it to pending. This prevents “stuck forever” after crashes.

Retry policy and backoff

Retries should be deliberate. A typical policy:

Retry transient network errors with exponential backoff (e.g., 1s, 2s, 4s, 8s… capped).
Retry 5xx server errors similarly, possibly with jitter.
Do not retry 4xx validation/permission errors automatically; mark permanent.
Respect server rate limits (429) by using Retry-After.

Store next_attempt_at in the operation record so retries survive restarts and background scheduling differences across platforms.

Step-by-step: Processing loop (platform-agnostic)

Step 1: Fetch eligible operations where state is pending or failed_transient and next_attempt_at <= now, and dependencies are satisfied.
Step 2: Claim operations by setting state=in_flight and lease_expires_at=now+timeout in a transaction to avoid multiple workers sending the same op.
Step 3: Send request with idempotency_key and payload. Include auth and any required headers.
Step 4: Interpret response: success (2xx), transient failure (timeouts/5xx/429), permanent failure (validation/permission).
Step 5: On success, apply server response to local DB (canonical fields), mark op succeeded, clear pending markers on the entity if no more queued ops remain for it.
Step 6: On transient failure, set state=failed_transient, increment attempt_count, compute next_attempt_at with backoff, release lease.
Step 7: On permanent failure, set state=failed_permanent, attach error_code/message for UI, and optionally revert/compensate local state.

Practical Examples

Example 1: Toggling a “Like” with optimistic count

Scenario: user taps Like on a post. You want immediate feedback and correct counts.

Optimistic local update: set post.liked_by_me=true and post.like_count += 1.
Enqueue operation: type=like_post, payload={post_id, action:"like"}.
Idempotency: use idempotency_key so repeated retries don’t double-like.

Edge case: user taps Like then quickly Unlike while offline. Coalesce by keeping only the final intent (unlike) and adjust local count accordingly. The queue can squash the like/unlike pair into a single “set liked=false” operation if your server supports it, or into two operations that cancel out and can be dropped.

Example 2: Editing a title repeatedly while offline

Scenario: user edits a note title five times.

Optimistic local update: each keystroke might update local state, but you should not enqueue an operation per keystroke.
Queue strategy: enqueue a single update operation when the user commits (blur/save), or debounce operation creation (e.g., after 1–2 seconds of inactivity).
Coalescing: if an update op already exists for that note and is pending, merge the payload to the latest title.

This reduces queue size and makes reconciliation simpler because the server sees only the final title.

Example 3: Create then attach (dependency handling)

Scenario: user creates a task and immediately adds it to a project while offline.

Create task: generate task_id locally, insert task, enqueue create_task op.
Attach to project: enqueue add_task_to_project op that depends_on create_task op_id (or depends on task pending creation flag).
Processor: send create first; once succeeded, send attach. If create fails permanently, mark attach blocked/failed and surface UI.

Data Modeling Tips for Optimistic and Queued Writes

Tracking pending state on entities

To support UX like “sending…” indicators and to avoid confusing the user, store lightweight metadata on entities:

pending_ops_count or has_pending_writes
last_local_modified_at
last_error (optional, for failed_permanent)

These fields help the UI show status without needing to query the entire queue each time.

Local rollback vs forward-only compensation

Rollback requires you to know the previous value. You can store a “before” snapshot in the operation payload (e.g., old_title) or rely on a local history table. Compensation is often simpler: instead of reverting, you apply a new operation that corrects state (e.g., if adding a collaborator fails, remove them locally and show an error).

For collaborative domains where conflicts are common, forward-only compensation tends to be more robust than deep rollback chains, but it depends on UX expectations.

Testing Strategies Specific to Optimistic Updates and Idempotency

Failure injection tests

Optimistic systems fail in the gaps: after local commit but before enqueue, after enqueue but before local commit, after server success but before client receives response. You should test these explicitly.

Kill the app immediately after local transaction commit; verify queue persists and UI state is consistent on restart.
Simulate request timeout after server applied the change; verify retry does not duplicate (idempotency works).
Simulate 400 validation error; verify operation becomes failed_permanent and UI offers a fix.

Deterministic replay tests

Because operations are durable, you can write tests that replay a recorded queue against a fake server and assert final local state. This is especially valuable when you implement coalescing rules: you want to ensure squashing preserves semantics.

// Pseudocode: replay queued ops deterministically against a fake server
seedLocalDb()
enqueue(opRename("t1","A"))
enqueue(opRename("t1","B"))
coalesceQueueForEntity("task","t1")
processAll()
assert(local.task("t1").title == "B")
assert(server.task("t1").title == "B")

Now answer the exercise about the content:

In an offline-first app, what practice best ensures an optimistic change survives app restarts and can be synced reliably later?

You are right! Congratulations, now go to the next page

You missed! Try again.

Reliable optimistic updates require optimistic data: persist the local mutation and the queued operation together atomically so the UI stays consistent and the intent can be replayed after process death.