Local Persistence Strategy Selection and Tradeoffs

Capítulo 4

Estimated reading time: 14 minutes

+ Exercise

Why local persistence strategy selection matters

In an offline-first app, local persistence is not just a cache; it is the system of record while the device is offline and often the primary source of truth for UI rendering. The persistence strategy you choose determines how quickly screens load, how reliably data survives crashes and OS evictions, how expensive sync becomes, and how hard it is to evolve the schema over time. “Strategy” here means the combination of storage technologies (database, key-value store, file system), data layout (tables/collections/files), and operational patterns (transactions, batching, compaction, encryption, migrations).

This chapter focuses on selecting a local persistence strategy and understanding tradeoffs. It avoids rehashing offline-first requirements, connectivity modeling, and domain modeling; instead, it assumes you already know what data must be available offline and that you have a syncable domain model. The goal is to map those needs to concrete storage choices and operational practices.

Decision dimensions: what you must evaluate

1) Access patterns and query shape

Start by listing the app’s dominant read/write patterns. The same dataset can be stored in multiple ways, but the best choice depends on how you query it.

  • Point lookups (get by id, get settings): key-value stores and document stores perform well.

  • Filtered lists (tasks by status, messages by thread, orders by date): relational databases with indexes excel; document stores can work if you precompute indexes or store denormalized views.

    Continue in our app.
    • Listen to the audio with the screen off.
    • Earn a certificate upon completion.
    • Over 5000 courses for you to explore!
    Or continue reading below...
    Download App

    Download the app

  • Full-text search: often requires a dedicated index (SQLite FTS, Lucene-based engines) or a server-side fallback; if offline search is required, plan for it explicitly.

  • Aggregations (counts, sums, grouping): relational databases are strong; key-value stores require precomputed counters or map-reduce-like logic.

  • Streaming UI (live lists, reactive queries): choose a persistence layer that supports observation efficiently or can be wrapped with change notifications.

2) Consistency and transactional needs

Offline-first apps frequently perform multi-entity updates: create an item, update a parent’s counters, append to an outbox, and update a local “last modified” marker. If these steps can partially apply, you may show inconsistent UI or lose sync intents. A storage engine with robust transactions (ACID) reduces risk and simplifies code. If you use multiple stores (e.g., database + file system), you must design around the lack of cross-store atomicity.

3) Volume, growth, and retention

Estimate not only current dataset size but also growth over months. Consider:

  • Hot vs cold data: recent items accessed often vs archives rarely opened.

  • Attachment-heavy data: images, audio, PDFs should usually live in the file system with metadata in a database.

  • Retention policies: do you purge old logs, cache entries, or synced content?

Some engines degrade when the dataset becomes large without periodic maintenance (vacuum/compaction). Plan for that operational cost.

4) Schema evolution and migrations

Offline-first apps must evolve without breaking existing installs. Evaluate how painful migrations are:

  • Relational schema migrations: powerful but require careful versioning and backfills.

  • Document/JSON storage: flexible but can accumulate “schema drift” and require runtime handling of multiple shapes.

  • Binary formats: fast and compact but harder to migrate and debug.

5) Performance: latency, throughput, and battery

Mobile performance is dominated by I/O latency and wakeups. Many small writes drain battery and cause jank. A good strategy uses batching, transactions, and background scheduling. Also consider startup time: loading a huge JSON blob into memory is often slower than indexed queries that fetch only what you need.

6) Security and privacy

Local persistence often contains sensitive data. Evaluate:

  • Encryption at rest: database-level encryption (e.g., SQLCipher) vs file-level encryption.

  • Key management: OS keystore/keychain integration, biometric gating, and key rotation.

  • Data minimization: store only what you need offline; avoid persisting secrets like access tokens in plain text.

7) Cross-platform parity and tooling

If you ship on multiple platforms, consider whether the same persistence approach exists everywhere, and whether debugging tools are available. SQLite is widely supported; some embedded databases are platform-specific. Tooling matters: being able to inspect a database on-device speeds up development and incident response.

Common local persistence building blocks and their tradeoffs

Relational database (SQLite-family)

Best for: complex queries, joins, sorting/filtering, transactional integrity, incremental reads, and predictable performance with indexes.

Tradeoffs:

  • Pros: ACID transactions, mature tooling, strong indexing, supports partial reads, good for outbox/inbox tables, widely available across platforms.

  • Cons: schema migrations require discipline; denormalization may be needed for performance; large write bursts can cause contention if not batched; encryption may require an additional library.

When it shines: messaging, task management, inventory, any app with lists and filters that must work offline.

Document store / object database

Best for: storing nested objects close to your domain model, rapid iteration, and cases where you mostly fetch whole documents by id.

Tradeoffs:

  • Pros: flexible schema, fewer joins, often simpler mapping from in-memory objects, can be efficient for “load one object graph” screens.

  • Cons: ad-hoc querying can be limited or slower; indexing capabilities vary; handling partial updates and conflict merges can be trickier; large documents can cause write amplification if you rewrite the whole blob frequently.

When it shines: note-taking with rich nested content, forms with variable fields, offline content packs where each item is naturally a document.

Key-value store (preferences, small caches)

Best for: small configuration, feature flags, tokens (secure storage), and tiny caches.

Tradeoffs:

  • Pros: simple API, fast for small values, minimal overhead.

  • Cons: not suitable for large datasets; lacks query capabilities; can become a dumping ground that is hard to migrate; may not be transactional across multiple keys.

Rule of thumb: if you need to list, filter, or paginate, you likely need a database.

File system (attachments and large blobs)

Best for: images, audio, video, PDFs, and any large binary data.

Tradeoffs:

  • Pros: efficient streaming, avoids bloating the database, easier to integrate with media APIs, can use OS-level file protection.

  • Cons: you must manage lifecycle (cleanup, orphan detection); atomicity is limited; path handling differs across platforms; backups and restores need planning.

Typical pattern: store metadata (id, mime type, size, checksum, local path, remote url, sync state) in a database, and store the bytes in files.

Hybrid approach (most common)

Many offline-first apps use a relational database for structured data and an outbox, a key-value store for small settings, and the file system for attachments. The tradeoff is increased complexity: you must coordinate writes and handle partial failures. The benefit is using the right tool for each data type.

Strategy patterns you can choose from

Pattern A: “Database as the local source of truth”

All UI reads come from the database. Network responses are written to the database, and the UI observes database changes. This pattern produces consistent offline behavior and reduces “double state” bugs.

Tradeoffs: requires careful schema design and migrations; you must optimize queries and indexes; you must avoid writing too frequently on the main thread.

Pattern B: “In-memory state with persistence as a backing store”

The app keeps an in-memory store as the primary state and periodically persists snapshots or deltas. This can feel fast but is risky if the app is killed before persistence completes.

Tradeoffs: harder to guarantee durability; requires explicit flush points; more complex crash recovery; can lead to divergence between memory and disk if not designed carefully.

Pattern C: “Event log + materialized views”

Persist an append-only log of user actions (events) and build queryable views (tables/documents) from it. This can simplify sync and conflict handling because you preserve intent.

Tradeoffs: more storage and complexity; requires compaction/snapshotting; rebuilding views can be expensive; debugging can be harder without good tooling.

Step-by-step selection process

Step 1: Inventory entities and operations

Create a table of entities and the operations you must support offline. For each entity, list:

  • Typical screen queries (e.g., “list by date”, “filter by status”, “search by keyword”).

  • Write frequency (rare, moderate, high).

  • Size characteristics (small records vs large payloads).

  • Relationships (one-to-many, many-to-many).

This inventory prevents you from choosing a store based on familiarity rather than fit.

Step 2: Classify data into three buckets

  • Structured queryable data: belongs in a database with indexes.

  • Small configuration and flags: belongs in key-value storage (or secure storage for secrets).

  • Large blobs: belongs in the file system with metadata in the database.

If a piece of data doesn’t clearly fit, decide based on access pattern: “Do I need to query inside it?” If yes, store it in a queryable form; if no, store it as a blob with metadata.

Step 3: Choose the primary store and commit to a single read path

Pick one primary store for UI reads (commonly a database). Then enforce a rule: screens read from that store, not from network responses directly. Network becomes an input that updates the store. This reduces edge cases where online and offline code paths diverge.

Step 4: Design for durability with explicit write boundaries

Define what must be persisted atomically. For example, when a user creates a task offline, you may need to write:

  • The task row/document.

  • An outbox entry describing the pending operation.

  • A local index update (e.g., list ordering key).

In a relational database, group these into a single transaction. In a hybrid design, you may need a two-phase approach: write the database transaction first, then write the file, then update the database with the file path. If the file write fails, you can keep a “needs_attachment” flag and retry.

Step 5: Plan indexes and query budgets

For each list screen, define a “query budget”: maximum time and maximum rows scanned. Then add indexes accordingly. Common indexes include:

  • (user_id, updated_at) for incremental sync and “recently updated” lists.

  • (status, due_date) for task filters.

  • (thread_id, created_at) for message timelines.

Be careful: too many indexes slow down writes. Favor indexes that serve multiple screens and sync operations.

Step 6: Define retention and compaction

Offline-first does not mean “store everything forever.” Define policies such as:

  • Keep only the last N days of logs/outbox attempts.

  • Archive old records into a separate table or mark them as cold.

  • Delete orphaned attachments not referenced by any row.

Schedule maintenance in background-friendly windows and avoid doing heavy cleanup on app startup.

Step 7: Validate with failure scenarios

Test your strategy against realistic failures:

  • App killed during write.

  • Disk full.

  • Corrupted attachment file.

  • Migration interrupted.

For each scenario, define what the user sees and what the system does on next launch (replay outbox, mark item as needing attention, retry cleanup).

Concrete example: choosing between SQLite and document storage for a task app

Suppose you have tasks with subtasks, tags, and comments. Screens include: “My tasks” filtered by status and due date, “Tasks by tag,” and “Task detail” showing nested content. Writes occur frequently (checking off subtasks, adding comments).

Option 1: Relational (SQLite)

Schema sketch: tasks, subtasks, tags, task_tags join table, comments, plus outbox.

Benefits: fast filtered lists with indexes; efficient pagination; transactional updates when toggling a subtask and updating a task’s completion percentage; easy to compute counts.

Costs: more tables and joins; migrations when adding fields; need to design queries carefully for nested detail screens.

Option 2: Document per task

Schema sketch: one document contains task + subtasks + tags + comments.

Benefits: task detail is a single read; schema flexibility for optional fields.

Costs: listing tasks by tag requires indexing or scanning; adding a comment rewrites the document (write amplification); conflict resolution can be harder if multiple parts of the document change independently.

Selection: if list screens and filters are core and writes are frequent, relational tends to be the safer default. If the app is mostly “open one item and edit it” with minimal cross-item queries, document storage can be simpler.

Operational tradeoffs you must design for

Write amplification and batching

Write amplification happens when a small logical change triggers large physical writes (e.g., rewriting a whole JSON blob, updating many indexes, or frequent fsync). Mitigations:

  • Batch multiple changes in one transaction.

  • Use append-only outbox entries rather than rewriting a single “pending” record.

  • Prefer partial updates (UPDATE specific columns) over rewriting entire rows/documents when possible.

Concurrency and locking

Mobile apps often have background sync, UI writes, and cleanup jobs running concurrently. With SQLite, long transactions can block readers/writers. Mitigations:

  • Keep transactions short.

  • Use WAL mode where available to improve concurrency.

  • Move heavy writes off the main thread and throttle background jobs.

Migrations without breaking offline usage

Migrations should be designed so the app remains usable even if migration takes time. Practical techniques:

  • Additive changes first: add new columns/tables, write both old and new fields, then later remove old fields in a future version.

  • Background backfills: migrate large datasets incrementally rather than in one blocking step.

  • Versioned decoding: for document/JSON storage, accept multiple shapes and normalize on write.

Encryption and performance

Encryption adds CPU overhead and can increase I/O. If you encrypt the database, measure:

  • Cold-start query latency.

  • Bulk sync write throughput.

  • Battery impact during background sync.

Sometimes a mixed approach works: encrypt sensitive tables/fields or store sensitive blobs in encrypted files, while keeping non-sensitive caches unencrypted. The tradeoff is complexity and the risk of misclassification.

Implementation blueprint: a practical hybrid persistence layout

1) Database tables/collections

  • Domain tables: normalized or lightly denormalized entities needed for offline UI.

  • Outbox: pending operations to sync (one row per intent), with fields like id, entity_type, entity_id, operation, payload, created_at, attempt_count, last_error.

  • Sync metadata: per-entity or per-collection cursors, last successful sync time, and server version markers.

  • Attachment metadata: maps entity references to local file paths and checksums.

2) File storage layout

Use a deterministic directory structure to simplify cleanup and debugging:

  • attachments/{entity_type}/{entity_id}/{attachment_id}

  • temp/ for in-progress downloads

  • staging/ for files awaiting database linkage

Write downloads to temp, verify checksum/size, then move/rename to final path (rename is typically atomic within the same filesystem). Only after the move succeeds, update the database metadata in a transaction.

3) Step-by-step: persisting a user action safely

Example: user adds a photo to a report while offline.

  • Step 1: generate ids locally (report_id, attachment_id) and create a staging file path.

  • Step 2: write the image bytes to staging/attachment_id and fsync/close.

  • Step 3: move the file to attachments/report/report_id/attachment_id.

  • Step 4: in a single database transaction, insert attachment metadata, update the report row (e.g., has_attachments=1), and insert an outbox entry describing “upload_attachment”.

  • Step 5: UI reads from the database and shows the attachment immediately using the local path.

If the app crashes after Step 2 but before Step 4, you can detect orphaned staging files on next launch and either retry linking or delete them. If it crashes after Step 4, the database references a file that should exist; if the move failed, you can mark the attachment as missing and prompt a retry.

4) Step-by-step: optimizing list screens

Example: “Reports” list sorted by updated time with a status filter.

  • Step 1: define the query: WHERE status IN (...) ORDER BY updated_at DESC LIMIT 50.

  • Step 2: add an index on (status, updated_at DESC).

  • Step 3: store derived fields needed for the list (e.g., comment_count, attachment_count) in the report row to avoid expensive joins; update them transactionally when related rows change.

  • Step 4: paginate using updated_at + id as a stable cursor to avoid duplicates when items update.

Tradeoff checklist: quick selection guide

Choose a relational database when

  • You need multiple filtered lists, sorting, pagination, and aggregations offline.

  • You need strong transactional guarantees across multiple entities and an outbox.

  • You expect the dataset to grow and you want predictable performance with indexes.

Choose a document/object store when

  • Your primary access is “load one object graph by id” and you rarely need cross-entity queries.

  • Your schema changes frequently and you can tolerate runtime handling of multiple shapes.

  • You can design conflict handling around document-level updates (or your engine provides it).

Always add file storage when

  • You have large binary data or media.

  • You need streaming reads/writes and want to avoid database bloat.

Use key-value storage only for

  • Small settings and feature flags.

  • Lightweight caches with clear eviction rules.

Code-oriented sketches (technology-agnostic)

Transactional write with outbox (SQL-style pseudocode)

BEGIN TRANSACTION; INSERT INTO tasks(id, title, status, updated_at) VALUES(:id, :title, 'open', :now); INSERT INTO outbox(id, entity_type, entity_id, operation, payload_json, created_at) VALUES(:outbox_id, 'task', :id, 'create', :payload, :now); COMMIT;

Attachment metadata + file move (pseudocode)

// 1) write to temp path writeFile(tempPath, bytes); // 2) verify checksum assert sha256(tempPath) == expected; // 3) move into final location moveFile(tempPath, finalPath); // 4) commit metadata + outbox atomically db.transaction(() => { db.insert('attachments', { id: attachmentId, entityId, path: finalPath, checksum }); db.insert('outbox', { operation: 'upload_attachment', entityId, attachmentId }); });

These sketches highlight the core idea: make the database the coordinator for durable intent (outbox) and metadata, while the file system holds large bytes. The exact APIs differ by platform, but the tradeoffs and failure handling remain the same.

Now answer the exercise about the content:

When an offline-first app needs attachments like images or PDFs, which persistence approach best matches the recommended tradeoffs?

You are right! Congratulations, now go to the next page

You missed! Try again.

The suggested pattern is a hybrid: large blobs live in the file system for efficient streaming and to avoid database bloat, while the database stores metadata (path, checksum, sync state) and an outbox entry to preserve durable sync intent.

Next chapter

Caching Layers, Read Policies, and Freshness Guarantees

Arrow Right Icon
Free Ebook cover Offline-First Mobile Apps: Sync, Storage, and Resilient UX Across Platforms
21%

Offline-First Mobile Apps: Sync, Storage, and Resilient UX Across Platforms

New course

19 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.