All courses > Professional Skills > Logistics and Supply chain ::

Logistics Data Basics: Master Data, Transactions, and Data Quality

Capítulo 3

Estimated reading time: 11 minutes

1) Key logistics data categories

Logistics teams make decisions from data, but not all data is the same. To specify requirements clearly (for integrations, dashboards, alerts, and automation), separate your data into three categories: master data, transactional data, and reference data. Each category has different owners, update frequency, and quality risks.

Master data (relatively stable “nouns”)

Master data describes the entities you plan and execute with. It changes, but not per event. Master data errors typically create repeated operational failures (wrong picks, wrong routing, wrong billing).

Items / SKUs: SKU code, description, dimensions, weight, hazardous flag, temperature range, units of measure (UoM), case pack, barcode(s), shelf-life rules, handling constraints (stackable, fragile).
Locations: site codes, warehouse zones, bin locations, dock doors, geo-coordinates, address validation fields, operating hours, cut-off times.
Carriers: carrier ID, service offerings, SCAC (where applicable), account numbers, EDI/API endpoints, insurance limits, accessorial rules.
Customers / Ship-to: ship-to IDs, addresses, delivery windows, appointment rules, special instructions, compliance requirements (labels, ASN rules), contact details.

Transactional data (event “verbs”)

Transactional data records what happened (or is planned to happen) in operations. It is high-volume and time-stamped. Transactional data is what most dashboards and alerts are built on.

Receipts: purchase order receipt, ASN receipt, inbound appointment, receiving exceptions (over/short/damaged), putaway tasks.
Picks: pick waves/batches, pick tasks, pick confirmations, substitutions, short picks, cycle count adjustments.
Shipments: shipment creation, load building, packing, label generation, manifesting, tendering, departure, proof of delivery, claims.

Reference data (shared “rules and codes”)

Reference data standardizes meaning across systems and teams. It is often overlooked, but it prevents “apples vs oranges” reporting.

Calendars: working days, holidays, peak season overrides, cut-off calendars by site/carrier/service.
Service levels: promised ship-by/deliver-by definitions, SLA tiers, on-time measurement rules (e.g., “on-time if delivered within window”).
Status codes: standardized order/shipment statuses and exception codes (e.g., “Picked”, “Packed”, “Loaded”, “Departed”, “Delivered”, “Damaged”).
Units and conversions: UoM codes, conversion factors, dimensional weight rules.

2) Data granularity: order line vs shipment vs handling unit

Granularity means the level at which data is recorded and reported. In logistics, the same physical flow can be described at multiple levels. Choosing the wrong level leads to misleading KPIs and broken integrations.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Common granularity levels

Order header: one customer order (may contain many lines and multiple shipments).
Order line: one SKU line on an order (SKU + quantity + UoM). Often the best level for fill rate and backorder analysis.
Pick task: a work instruction (from location A pick X units). Best for labor productivity and process bottlenecks.
Handling unit (HU): a physical unit like a carton, tote, pallet, or container with an ID (SSCC, license plate). Best for traceability and warehouse visibility.
Shipment: a transport movement (may include multiple orders/handling units). Best for transportation cost and on-time delivery.
Stop: a pickup/delivery location within a route. Best for multi-stop performance and dwell time.

Why granularity matters (practical examples)

Fill rate: If you measure fill rate at order header, one missing line makes the whole order “not filled,” which can overstate problems. Measuring at order line shows which SKUs drive shortages.
On-time performance: If you measure on-time at shipment, a late partial shipment can hide that the first shipment met the promise. If you measure at order line, you can align to customer promise per line (but need more data).
Traceability: If you only track at shipment level, you cannot reliably answer “Which pallet/carton contained lot X?” Handling unit granularity enables recalls and exception handling.
Cost allocation: Freight cost is naturally at shipment level; allocating it to order line requires rules (by weight, cube, value, or units). Without explicit allocation logic, dashboards will disagree.

Step-by-step: choose the right granularity for a KPI

Write the decision the KPI supports (e.g., “Which SKUs cause late orders?”).
Identify the action owner (inventory planner, warehouse supervisor, carrier manager).
Pick the lowest level needed to act (SKU line for SKU issues; shipment for carrier issues; HU for warehouse traceability).
Define roll-up rules (how lines roll up to orders; how cartons roll up to shipments; how partials are treated).
Confirm data availability (do you have HU IDs? timestamps? status codes?).
Document the definition in a shared metric dictionary (name, formula, granularity, filters, exceptions).

3) Data quality dimensions with logistics examples

Data quality is not abstract; it shows up as rework, missed cut-offs, chargebacks, and customer complaints. Use four core dimensions to diagnose issues: accuracy, completeness, timeliness, and consistency.

Accuracy (is it correct?)

Example: SKU dimensions wrong → cartonization fails, cube utilization looks poor, carrier bills dimensional weight surcharges.
Example: ship-to address incorrect → delivery failures, re-delivery fees, on-time KPI unfairly penalized.
Check: sample audit against physical measurement, carrier invoice, or customer master records.

Completeness (is anything missing?)

Example: missing carrier service level → cannot compute promised delivery date; dashboards show blanks or default values.
Example: missing HU ID on outbound cartons → cannot reconcile scan events; “in transit” visibility breaks.
Check: required-field rules by process step (e.g., cannot manifest without weight and dimensions).

Timeliness (is it available when needed?)

Example: late departure scan → shipment appears “not shipped” at cut-off, triggering false expedite actions.
Example: delayed receipt posting → inventory shows unavailable, causing unnecessary backorders.
Check: event timestamp vs system posting timestamp; define acceptable latency (e.g., <5 minutes for scan events, <1 hour for carrier status updates).

Consistency (does it match across systems and time?)

Example: UoM mismatch (ERP in cases, WMS in eaches) → inventory and shipped quantities disagree; fill rate becomes unreliable.
Example: status code mismatch (“Shipped” means “packed” in one system) → on-time and cycle time metrics diverge.
Check: reconciliation reports (quantities, statuses, IDs) and standardized code lists.

Step-by-step: run a weekly data quality triage

Pick 5–10 critical fields tied to operational pain (e.g., weight, dimensions, ship-to address, promised date, HU ID, carrier service).
Define thresholds (e.g., <1% missing, <0.5% invalid, <15 minutes latency for key events).
Measure using automated queries or BI checks (missing rate, invalid rate, duplicates, outliers).
Classify issues by root cause: master data setup, process non-compliance, integration mapping, or timing delays.
Assign owners and due dates; track recurrence (same SKU/location/customer repeatedly failing).
Implement prevention: validation at entry, controlled picklists, mandatory scans, automated address validation, UoM conversion rules.

4) Practical data ownership and governance

Governance does not need to be heavy. It needs to be explicit: who can create/change data, who approves, how changes are logged, and how downstream systems are notified. Without this, teams “fix” data locally and dashboards become untrustworthy.

Typical ownership model (adapt to your org)

Data domain	Primary owner	Contributors	Approval needed?	Common risks
Item master (dimensions, weight, UoM)	Master data / Product ops	Warehouse, Procurement	Yes (quality gate)	Freight cost errors, slotting errors
Location master (bins, zones, dock doors)	Warehouse operations	Engineering, IT	Often (for structural changes)	Pick path issues, inventory misplacement
Customer ship-to and delivery rules	Customer service / Sales ops	Transportation, Billing	Yes (address validation)	Failed deliveries, chargebacks
Carrier master and services	Transportation	IT (integration), Finance	Yes (contract alignment)	Wrong tendering, invoice disputes
Reference codes (statuses, reason codes)	Process owner + BI	IT, Ops	Yes (cross-system)	Inconsistent reporting

Approval flows (lightweight but enforceable)

Use a simple flow for changes that affect cost, compliance, or customer promise.

Create/change request: submitted via ticket/form with required fields (what changes, why, effective date, impacted sites).
Validation: automated checks (mandatory fields, format, duplicates) + business checks (e.g., dimensions within plausible ranges).
Approval: domain owner approves; for high-impact fields, require a second approver (e.g., transportation approves carrier services; finance approves billing codes).
Publish: update master/reference data in the system of record; propagate to dependent systems.
Communicate: notify affected teams (warehouse, customer service) for process changes.

Change logging (what to record)

Who changed it (user/service account).
What changed (field-level before/after).
When (timestamp + effective date).
Why (reason code, ticket ID).
Impact scope (sites, customers, carriers, SKUs).

Step-by-step: set up a “data issue to fix” workflow

Create a shared intake (form or ticket type) with categories: item, location, customer, carrier, reference, transactional defect.
Require evidence: screenshot, order/shipment IDs, and the field suspected to be wrong.
Route automatically to the domain owner based on category.
Define SLA: e.g., critical shipping blockers within 4 hours; non-critical within 3 business days.
Close the loop: confirm fix and document prevention (validation rule, training, integration mapping update).

5) Minimum data model checklist for warehouse and transportation visibility

A “minimum data model” is the smallest set of entities and fields that enables reliable visibility across warehouse and transportation. Use this as a requirements baseline for dashboards, integrations, and operational alerts.

Core entities and required fields

Item (SKU)

SKU ID, description
Base UoM + conversion factors (each/case/pallet)
Weight, dimensions (L/W/H) + dimension UoM
Hazmat/temperature flags (if applicable)
Barcode(s) / GTIN (if used)

Location (site + warehouse locations)

Site ID, address, time zone
Operating calendar (working days, cut-offs)
Warehouse location IDs (zone/bin) where inventory is stored (if bin-level visibility is required)

Customer / Ship-to

Customer ID, ship-to ID
Validated address + geo (optional but useful)
Delivery window / appointment requirements
Service level expectation (SLA tier)

Carrier + service

Carrier ID, service codes
Tendering method (API/EDI/manual)
Transit time assumptions (by lane, if available)
Accessorial rules (liftgate, residential, appointment)

Order (demand)

Order ID, order date/time, requested ship date
Ship-from site, ship-to ID
Promised ship-by / deliver-by (or SLA tier + rule to compute)
Order lines: line ID, SKU, quantity, UoM, allocation status

Inventory (availability)

On-hand, allocated, available quantities by SKU (and by location/bin if needed)
Lot/batch/expiry (if relevant)
Last updated timestamp

Handling unit (warehouse traceability)

HU ID (license plate/SSCC)
HU type (carton/tote/pallet)
Contents (SKU/qty/lot) or link to packing details
Current status and last scan timestamp

Shipment (transport visibility)

Shipment ID, linked order IDs / HU IDs
Carrier + service, tracking number(s)
Planned vs actual milestones: tendered, picked up, departed, delivered
Origin/destination, ship date/time, delivery date/time
Weight, cube, package count

Events (the backbone of visibility)

Visibility improves dramatically when you treat scans and status updates as first-class data.

Event ID, event type (picked/packed/loaded/departed/delivered/exception)
Timestamp + time zone
Entity link (order line, HU, shipment, stop)
Source system and confidence (scan vs manual entry vs carrier feed)

Workshop-style checklist: assess current data readiness

Run this as a 60–90 minute workshop with warehouse ops, transportation, customer service, IT/integration, and BI/analytics. The goal is to identify gaps that block trustworthy dashboards and automation.

A. Inventory and warehouse readiness

Do we have a single, agreed SKU ID across systems? If not, what is the cross-reference?
Are UoM conversions documented and consistent (each/case/pallet)?
What % of SKUs have weight and dimensions populated and within plausible ranges?
Do we track bin/location at the level required for operations and reporting?
Do outbound cartons/pallets have handling unit IDs and are they scanned at key steps?
Are cycle count adjustments captured with reason codes and timestamps?

B. Order and promise readiness

Is there a clear definition of promised date (ship-by vs deliver-by) and where it is stored?
Can we distinguish partial shipments vs complete orders in data?
Do we capture cancellations, substitutions, and backorders with standardized reason codes?

C. Transportation readiness

Do all shipments have carrier, service, and tracking number captured at ship time?
Do we receive carrier status events with timestamps (pickup, in-transit, delivered, exceptions)?
Are accessorials captured consistently (what happened vs what was billed)?
Do we have a standard lane definition (origin/destination codes) for transit time analysis?

D. Data quality controls

Which fields are mandatory at creation time (order, shipment, item)? Are validations enforced?
What is the acceptable latency for key events (warehouse scans, carrier updates)?
Do we have a data dictionary for KPIs (definition, granularity, filters, owner)?
How do we detect duplicates (duplicate shipments, duplicate tracking numbers, duplicate HUs)?

E. Ownership and governance

Is there a named owner for each domain (item, location, customer, carrier, reference codes)?
Is there an approval flow for high-impact changes (dimensions, addresses, service levels)?
Do we maintain a change log with who/what/when/why and effective date?
Is there a standard data issue workflow with SLAs and root-cause tagging?

Scoring template (use during the workshop)

Area	Score (0–2)	Evidence	Top gap	Owner	Next action
SKU master completeness (weight/dims/UoM)
Handling unit traceability (IDs + scans)
Order promise fields (ship-by/deliver-by)
Carrier tracking + milestone events
Status code consistency across systems
Governance (owners, approvals, change logs)

Scoring guidance: 0 = not available/unknown, 1 = partially available/inconsistent, 2 = available, consistent, and used operationally.

Now answer the exercise about the content:

When creating a KPI for warehouse traceability (e.g., answering which carton or pallet contained a specific lot), which data granularity is most appropriate?

You are right! Congratulations, now go to the next page

You missed! Try again.

Traceability requires knowing exactly which physical unit held the goods. Handling unit (HU) granularity provides carton/tote/pallet IDs, enabling recalls and exception handling that shipment or order-level data cannot reliably support.