All courses > Technology and Programming > Data Science and Business Intelligence ::

OLTP vs OLAP: Why Operational Databases and Analytics Systems Differ

Capítulo 3

Estimated reading time: 8 minutes

Two Different Jobs: Running the Business vs Understanding the Business

Operational systems and analytics systems solve different problems. An operational database exists to record business events reliably (a sale, a payment, a shipment). An analytics system exists to analyze those events across time, customers, products, and other perspectives.

Aspect	OLTP (Operational)	OLAP (Analytics)
Primary workload	Many small writes + point reads	Fewer, large scans + aggregations
Typical query	“Update order #123 status”	“Revenue trend by region for 24 months”
Data model	Normalized (many related tables)	Denormalized (wide tables, pre-joins)
Concurrency	High: many users updating at once	Moderate: fewer users, heavier queries
Storage	Row-oriented common	Columnar common
Performance goal	Low-latency transactions, integrity	Fast reads, fast aggregates, history

OLTP Systems: Transaction-Focused by Design

What OLTP optimizes for

Fast writes: inserting a new order line, updating inventory, recording a payment.
Data integrity: constraints, foreign keys, and transactional guarantees help prevent inconsistent states (e.g., an order line referencing a non-existent order).
High concurrency: many users and services touching the same data at the same time (cashiers, web checkout, warehouse scanners).
Normalization: data is split into multiple tables to reduce duplication and avoid update anomalies.

How normalization shows up in practice

A simplified order flow in an OLTP database might look like this:

customers(customer_id, name, ...)
orders(order_id, customer_id, order_ts, status, ...)
order_lines(order_id, line_no, product_id, qty, unit_price, ...)
products(product_id, sku, name, category_id, ...)
payments(payment_id, order_id, amount, payment_ts, method, ...)

This structure is excellent for day-to-day operations: you can update a single order, add a line, or record a payment without rewriting large records. It also supports strict constraints and transactional updates.

Mapping Common Source Systems to OLTP Characteristics

POS (Point of Sale)

Typical OLTP traits:

Write-heavy bursts during peak hours: each basket produces multiple inserts (sale header, sale lines, discounts, taxes, payment records).
Low-latency requirement: a cashier cannot wait seconds for a commit.
Concurrency: many terminals writing simultaneously.
Strict correctness: inventory decrements and payment capture must be consistent.

Example operational query patterns:

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Insert a sale and its lines
Update inventory for a SKU
Fetch today’s price for a barcode

CRM (Customer Relationship Management)

Typical OLTP traits:

Frequent updates: leads change status, contacts get updated, activities are logged.
Many small reads: “open opportunities for account X”, “last 10 activities”.
Normalized entities: accounts, contacts, opportunities, activities, owners, territories.
Concurrency: sales teams and automation workflows updating the same records.

CRM data is operationally “alive”: records are edited, overwritten, merged, and reassigned. That is great for operations but creates challenges for analytics that need stable historical views.

ERP (Enterprise Resource Planning)

Typical OLTP traits:

Complex transactional workflows: procure-to-pay, order-to-cash, manufacturing, accounting postings.
Strong integrity constraints: financial postings and inventory movements must reconcile.
Normalized, highly relational schemas: many tables and reference data relationships.
Mixed workload: still transaction-oriented, but often with some reporting features that can tempt teams to run analytics directly on the ERP database.

OLAP Systems: Analysis-Focused by Design

What OLAP optimizes for

Fast reads over large volumes: scanning millions/billions of rows.
Aggregations: sums, counts, averages across many dimensions and long time ranges.
Denormalization: fewer joins at query time; data is shaped for analysis.
Columnar storage (common): reads only the columns needed, compresses well, accelerates aggregates.
Historical analysis: keeping past states, not just the current value.

Why denormalization helps analytics

Analytics queries often need many attributes at once (product category, store region, customer segment, promotion flag) and then aggregate measures across time. Joining many normalized OLTP tables repeatedly is expensive and unpredictable under load. OLAP systems commonly store data in fewer, wider structures so queries can scan and aggregate efficiently.

Example of an analytics-friendly wide table (illustrative):

sales_analytics(
  sale_date,
  store_id, store_region,
  product_id, product_category,
  customer_id, customer_segment,
  qty, net_amount, discount_amount,
  payment_method,
  promotion_id
)

Columnar storage intuition (why it matters)

In a row-oriented layout, reading a few columns across many rows still pulls entire rows from disk/memory. In a columnar layout, a query like “sum net_amount by month” can read primarily sale_date and net_amount, skipping many other columns. This is especially effective with compression and vectorized execution.

How Analytics Requirements Drive OLAP Design

Requirement 1: Historical snapshots (the “what did we know then?” problem)

Operational systems typically store the current state. Analytics often needs the state as of a point in time.

CRM example: An opportunity’s amount or stage changes. If you only keep the latest value, you cannot accurately answer: “What was the pipeline forecast at the end of last month?”
ERP example: A supplier’s terms or a product’s cost changes. Finance may need to analyze margins using the cost that was valid at the time of sale.
POS example: Product category assignments change. Trend analysis by category becomes inconsistent if history is not preserved.

Practical implication: OLAP designs commonly include snapshotting or history tracking so reports can be reproduced and audited.

Step-by-step: implementing a basic daily snapshot pattern

Choose the snapshot grain: e.g., daily snapshot of inventory by store and SKU, or daily snapshot of open opportunities by sales rep.
Pick the snapshot time: e.g., 23:59 local time, or a consistent UTC cutoff.
Extract the current state from the OLTP source (or from a staging area).
Write into a snapshot table with a snapshot_date column.
Ensure idempotency: if the job reruns, it should overwrite that day’s snapshot or use a deterministic merge.

Example snapshot table shape:

inventory_snapshot(
  snapshot_date,
  store_id,
  product_id,
  on_hand_qty,
  on_order_qty,
  cost,
  PRIMARY KEY (snapshot_date, store_id, product_id)
)

Requirement 2: Trend analysis across long time ranges

Trend analysis typically means scanning large time windows and grouping by multiple attributes. OLAP systems are built to handle queries like:

SELECT month, store_region, product_category, SUM(net_amount)
FROM sales_analytics
WHERE sale_date BETWEEN '2024-01-01' AND '2025-12-31'
GROUP BY month, store_region, product_category;

In OLTP, this kind of query often requires multiple joins and touches many rows, competing with transactional workloads.

Requirement 3: Wide tables and “bring many attributes” queries

BI users frequently ask questions that require many descriptive fields at once: “Show revenue by region, store format, product category, brand, customer segment, and promotion type.”

OLTP impact: those attributes are usually spread across many normalized tables, so the query becomes join-heavy and can be slow and lock-prone under concurrency.

OLAP response: pre-join or denormalize into analytics-ready structures so the query is mostly scanning and aggregating.

Practical Implications You’ll See in Real Systems

Why BI queries can slow down OLTP

Resource contention: large scans consume CPU, memory, and I/O that the transactional workload needs.
Locking and blocking: depending on isolation levels and database behavior, long-running reads can block writes or create pressure on versioning/undo mechanisms.
Join amplification: normalized schemas require many joins; the optimizer may choose plans that are expensive and unpredictable as data grows.
Index trade-offs: OLTP indexes are tuned for point lookups and writes; adding analytics-friendly indexes can slow inserts/updates.

Example scenario: A POS database is handling checkout traffic. A BI analyst runs a query that scans all sales lines for the last year to compute category trends. The scan saturates I/O, and checkout transactions start timing out. Even if the query is “read-only,” it can still degrade operational performance.

Why snapshots matter (beyond “history is nice to have”)

Reproducibility: finance and operations need to reproduce last month’s numbers even if master data changed.
Consistency across reports: if each report queries “current state,” two reports run a day apart can disagree about the past.
Correct trend lines: without snapshots, you may unknowingly trend today’s classifications backward in time (e.g., a product moved categories).

Data latency and architecture choices (real-time vs hourly vs daily)

Latency is the acceptable delay between an operational event and its availability for analysis. It directly affects system design and cost.

Latency target	Typical use cases	Common architectural approach	Trade-offs
Real-time / near real-time (seconds–minutes)	Fraud detection, live operational dashboards, inventory alerts	Change data capture (CDC) or streaming into an analytics store; incremental updates	Higher complexity; careful handling of late/out-of-order events; more operational overhead
Hourly	Sales performance monitoring, call center KPIs	Micro-batch ingestion; incremental loads; partitioned tables	Balance of freshness and simplicity; still needs robust incremental logic
Daily	Financial reporting, executive summaries, long-term trends	Nightly batch ETL/ELT; daily snapshots	Simple and stable; less timely for operational decisions

Step-by-step: choosing a latency target for a BI dataset

List decisions the dataset supports: “Do we need to react within minutes, or is next-day fine?”
Identify operational risk: will analytics queries ever hit OLTP directly? If yes, prioritize isolating workloads.
Estimate data volume and change rate: high-frequency changes push you toward incremental/CDC patterns.
Define freshness SLAs: e.g., “Data is complete up to the last full hour.”
Design ingestion accordingly: streaming/CDC for near real-time, micro-batch for hourly, batch + snapshots for daily.

Putting It Together: From OLTP Sources to OLAP-Ready Data

Operational sources like POS, CRM, and ERP are optimized for correctness and transaction throughput. Analytics requirements—history, trends, and wide attribute-rich queries—push you toward an OLAP design that favors read performance, denormalization, columnar storage, and precomputed aggregates or partitions.

A common practical pattern is:

OLTP systems handle transactions and remain protected from heavy analytical scans.
Ingestion layer (batch, micro-batch, or CDC) copies and reshapes data.
OLAP store serves BI queries with structures designed for scanning, grouping, and historical analysis.

Now answer the exercise about the content:

Why do analytics systems often use denormalized, wide tables instead of the highly normalized schemas common in operational databases?

You are right! Congratulations, now go to the next page

You missed! Try again.

Analytics queries often need many attributes at once and then aggregate over large time windows. Denormalized wide tables reduce join-heavy plans and make scans and aggregations faster in OLAP-oriented designs.