OLTP vs OLAP: Why Operational Databases and Analytics Systems Differ

Capítulo 3

Estimated reading time: 8 minutes

+ Exercise

Two Different Jobs: Running the Business vs Understanding the Business

Operational systems and analytics systems solve different problems. An operational database exists to record business events reliably (a sale, a payment, a shipment). An analytics system exists to analyze those events across time, customers, products, and other perspectives.

AspectOLTP (Operational)OLAP (Analytics)
Primary workloadMany small writes + point readsFewer, large scans + aggregations
Typical query“Update order #123 status”“Revenue trend by region for 24 months”
Data modelNormalized (many related tables)Denormalized (wide tables, pre-joins)
ConcurrencyHigh: many users updating at onceModerate: fewer users, heavier queries
StorageRow-oriented commonColumnar common
Performance goalLow-latency transactions, integrityFast reads, fast aggregates, history

OLTP Systems: Transaction-Focused by Design

What OLTP optimizes for

  • Fast writes: inserting a new order line, updating inventory, recording a payment.
  • Data integrity: constraints, foreign keys, and transactional guarantees help prevent inconsistent states (e.g., an order line referencing a non-existent order).
  • High concurrency: many users and services touching the same data at the same time (cashiers, web checkout, warehouse scanners).
  • Normalization: data is split into multiple tables to reduce duplication and avoid update anomalies.

How normalization shows up in practice

A simplified order flow in an OLTP database might look like this:

customers(customer_id, name, ...)
orders(order_id, customer_id, order_ts, status, ...)
order_lines(order_id, line_no, product_id, qty, unit_price, ...)
products(product_id, sku, name, category_id, ...)
payments(payment_id, order_id, amount, payment_ts, method, ...)

This structure is excellent for day-to-day operations: you can update a single order, add a line, or record a payment without rewriting large records. It also supports strict constraints and transactional updates.

Mapping Common Source Systems to OLTP Characteristics

POS (Point of Sale)

Typical OLTP traits:

  • Write-heavy bursts during peak hours: each basket produces multiple inserts (sale header, sale lines, discounts, taxes, payment records).
  • Low-latency requirement: a cashier cannot wait seconds for a commit.
  • Concurrency: many terminals writing simultaneously.
  • Strict correctness: inventory decrements and payment capture must be consistent.

Example operational query patterns:

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

  • Insert a sale and its lines
  • Update inventory for a SKU
  • Fetch today’s price for a barcode

CRM (Customer Relationship Management)

Typical OLTP traits:

  • Frequent updates: leads change status, contacts get updated, activities are logged.
  • Many small reads: “open opportunities for account X”, “last 10 activities”.
  • Normalized entities: accounts, contacts, opportunities, activities, owners, territories.
  • Concurrency: sales teams and automation workflows updating the same records.

CRM data is operationally “alive”: records are edited, overwritten, merged, and reassigned. That is great for operations but creates challenges for analytics that need stable historical views.

ERP (Enterprise Resource Planning)

Typical OLTP traits:

  • Complex transactional workflows: procure-to-pay, order-to-cash, manufacturing, accounting postings.
  • Strong integrity constraints: financial postings and inventory movements must reconcile.
  • Normalized, highly relational schemas: many tables and reference data relationships.
  • Mixed workload: still transaction-oriented, but often with some reporting features that can tempt teams to run analytics directly on the ERP database.

OLAP Systems: Analysis-Focused by Design

What OLAP optimizes for

  • Fast reads over large volumes: scanning millions/billions of rows.
  • Aggregations: sums, counts, averages across many dimensions and long time ranges.
  • Denormalization: fewer joins at query time; data is shaped for analysis.
  • Columnar storage (common): reads only the columns needed, compresses well, accelerates aggregates.
  • Historical analysis: keeping past states, not just the current value.

Why denormalization helps analytics

Analytics queries often need many attributes at once (product category, store region, customer segment, promotion flag) and then aggregate measures across time. Joining many normalized OLTP tables repeatedly is expensive and unpredictable under load. OLAP systems commonly store data in fewer, wider structures so queries can scan and aggregate efficiently.

Example of an analytics-friendly wide table (illustrative):

sales_analytics(
  sale_date,
  store_id, store_region,
  product_id, product_category,
  customer_id, customer_segment,
  qty, net_amount, discount_amount,
  payment_method,
  promotion_id
)

Columnar storage intuition (why it matters)

In a row-oriented layout, reading a few columns across many rows still pulls entire rows from disk/memory. In a columnar layout, a query like “sum net_amount by month” can read primarily sale_date and net_amount, skipping many other columns. This is especially effective with compression and vectorized execution.

How Analytics Requirements Drive OLAP Design

Requirement 1: Historical snapshots (the “what did we know then?” problem)

Operational systems typically store the current state. Analytics often needs the state as of a point in time.

  • CRM example: An opportunity’s amount or stage changes. If you only keep the latest value, you cannot accurately answer: “What was the pipeline forecast at the end of last month?”
  • ERP example: A supplier’s terms or a product’s cost changes. Finance may need to analyze margins using the cost that was valid at the time of sale.
  • POS example: Product category assignments change. Trend analysis by category becomes inconsistent if history is not preserved.

Practical implication: OLAP designs commonly include snapshotting or history tracking so reports can be reproduced and audited.

Step-by-step: implementing a basic daily snapshot pattern

  1. Choose the snapshot grain: e.g., daily snapshot of inventory by store and SKU, or daily snapshot of open opportunities by sales rep.
  2. Pick the snapshot time: e.g., 23:59 local time, or a consistent UTC cutoff.
  3. Extract the current state from the OLTP source (or from a staging area).
  4. Write into a snapshot table with a snapshot_date column.
  5. Ensure idempotency: if the job reruns, it should overwrite that day’s snapshot or use a deterministic merge.

Example snapshot table shape:

inventory_snapshot(
  snapshot_date,
  store_id,
  product_id,
  on_hand_qty,
  on_order_qty,
  cost,
  PRIMARY KEY (snapshot_date, store_id, product_id)
)

Requirement 2: Trend analysis across long time ranges

Trend analysis typically means scanning large time windows and grouping by multiple attributes. OLAP systems are built to handle queries like:

SELECT month, store_region, product_category, SUM(net_amount)
FROM sales_analytics
WHERE sale_date BETWEEN '2024-01-01' AND '2025-12-31'
GROUP BY month, store_region, product_category;

In OLTP, this kind of query often requires multiple joins and touches many rows, competing with transactional workloads.

Requirement 3: Wide tables and “bring many attributes” queries

BI users frequently ask questions that require many descriptive fields at once: “Show revenue by region, store format, product category, brand, customer segment, and promotion type.”

OLTP impact: those attributes are usually spread across many normalized tables, so the query becomes join-heavy and can be slow and lock-prone under concurrency.

OLAP response: pre-join or denormalize into analytics-ready structures so the query is mostly scanning and aggregating.

Practical Implications You’ll See in Real Systems

Why BI queries can slow down OLTP

  • Resource contention: large scans consume CPU, memory, and I/O that the transactional workload needs.
  • Locking and blocking: depending on isolation levels and database behavior, long-running reads can block writes or create pressure on versioning/undo mechanisms.
  • Join amplification: normalized schemas require many joins; the optimizer may choose plans that are expensive and unpredictable as data grows.
  • Index trade-offs: OLTP indexes are tuned for point lookups and writes; adding analytics-friendly indexes can slow inserts/updates.

Example scenario: A POS database is handling checkout traffic. A BI analyst runs a query that scans all sales lines for the last year to compute category trends. The scan saturates I/O, and checkout transactions start timing out. Even if the query is “read-only,” it can still degrade operational performance.

Why snapshots matter (beyond “history is nice to have”)

  • Reproducibility: finance and operations need to reproduce last month’s numbers even if master data changed.
  • Consistency across reports: if each report queries “current state,” two reports run a day apart can disagree about the past.
  • Correct trend lines: without snapshots, you may unknowingly trend today’s classifications backward in time (e.g., a product moved categories).

Data latency and architecture choices (real-time vs hourly vs daily)

Latency is the acceptable delay between an operational event and its availability for analysis. It directly affects system design and cost.

Latency targetTypical use casesCommon architectural approachTrade-offs
Real-time / near real-time (seconds–minutes)Fraud detection, live operational dashboards, inventory alertsChange data capture (CDC) or streaming into an analytics store; incremental updatesHigher complexity; careful handling of late/out-of-order events; more operational overhead
HourlySales performance monitoring, call center KPIsMicro-batch ingestion; incremental loads; partitioned tablesBalance of freshness and simplicity; still needs robust incremental logic
DailyFinancial reporting, executive summaries, long-term trendsNightly batch ETL/ELT; daily snapshotsSimple and stable; less timely for operational decisions

Step-by-step: choosing a latency target for a BI dataset

  1. List decisions the dataset supports: “Do we need to react within minutes, or is next-day fine?”
  2. Identify operational risk: will analytics queries ever hit OLTP directly? If yes, prioritize isolating workloads.
  3. Estimate data volume and change rate: high-frequency changes push you toward incremental/CDC patterns.
  4. Define freshness SLAs: e.g., “Data is complete up to the last full hour.”
  5. Design ingestion accordingly: streaming/CDC for near real-time, micro-batch for hourly, batch + snapshots for daily.

Putting It Together: From OLTP Sources to OLAP-Ready Data

Operational sources like POS, CRM, and ERP are optimized for correctness and transaction throughput. Analytics requirements—history, trends, and wide attribute-rich queries—push you toward an OLAP design that favors read performance, denormalization, columnar storage, and precomputed aggregates or partitions.

A common practical pattern is:

  • OLTP systems handle transactions and remain protected from heavy analytical scans.
  • Ingestion layer (batch, micro-batch, or CDC) copies and reshapes data.
  • OLAP store serves BI queries with structures designed for scanning, grouping, and historical analysis.

Now answer the exercise about the content:

Why do analytics systems often use denormalized, wide tables instead of the highly normalized schemas common in operational databases?

You are right! Congratulations, now go to the next page

You missed! Try again.

Analytics queries often need many attributes at once and then aggregate over large time windows. Denormalized wide tables reduce join-heavy plans and make scans and aggregations faster in OLAP-oriented designs.

Next chapter

From Raw Data to BI-Ready Data: Warehouses, Marts, and the Semantic Layer

Arrow Right Icon
Free Ebook cover Business Intelligence Foundations: Concepts, Roles, and Common Architectures
30%

Business Intelligence Foundations: Concepts, Roles, and Common Architectures

New course

10 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.