Statistical Thinking for Data-Driven Decisions

Capítulo 1

Estimated reading time: 7 minutes

+ Exercise

A practical workflow for statistical thinking

Statistical thinking is a disciplined way to move from a real-world question to a decision, while openly accounting for variability and uncertainty. The core workflow is: (1) define the decision question, (2) identify variables, (3) collect or access data, (4) summarize evidence, and (5) decide under uncertainty.

Key terms you will use throughout the workflow

  • Population: the full set of units you care about (all customers, all manufactured parts, all patients in a region).
  • Sample: the subset you actually observe.
  • Parameter: a numerical feature of the population (true average delivery time, true defect rate). Usually unknown.
  • Statistic: a numerical feature computed from the sample (sample mean, sample proportion). Used to learn about parameters.
  • Variability: natural differences across units or over time (customers differ, days differ, measurements differ).
  • Uncertainty: what you don’t know because you only see a sample, measurements are imperfect, or the process changes.

Descriptive vs inferential goals

Descriptive statistics summarize what you observed in your data (e.g., “in this month’s sample, Option A had a 4.2% defect rate”). Inferential statistics use the sample to make a claim about the population or future outcomes (e.g., “Option A likely has a lower defect rate than Option B overall”). Many decisions require both: describe what happened, then infer what will likely happen if you choose an option.

Scenario 1: Choosing between two options (A vs B)

Many data-driven decisions reduce to comparing two options: two marketing messages, two suppliers, two product designs, two workflows. Statistical thinking helps you avoid being fooled by random ups and downs.

Step 1 — Define the decision question (and success metric)

Write the decision in a way that forces clarity about the outcome and the unit of analysis.

  • Decision: Choose Option A or Option B.
  • Outcome (metric): conversion rate, average cost, defect rate, time-to-complete, satisfaction score.
  • Unit: a customer, an order, a part, a day, a session.
  • Time horizon: next week, next quarter, ongoing.

Example question: “Which email subject line yields a higher conversion rate among new subscribers over the next month?”

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

Step 2 — Identify variables

List the variables you will measure and how they relate to the decision.

  • Treatment/option variable: which option each unit receives (A or B).
  • Outcome variable: what you measure (converted: yes/no).
  • Context variables (potential confounders): device type, region, day of week, customer segment.

Be explicit about variable type because it affects summaries and comparisons:

  • Categorical: A/B, region, yes/no.
  • Numeric: time, cost, rating.

Step 3 — Collect or access data (with a plan)

To compare A vs B fairly, aim for data where the only systematic difference is the option itself.

  • Prefer random assignment when possible (A/B test). Randomization helps balance context variables on average.
  • If observational (no random assignment), record key context variables so you can check comparability and interpret results cautiously.
  • Define inclusion rules: who counts as “new subscriber,” what time window, how to handle duplicates.
  • Define measurement rules: what counts as a conversion, how long after exposure you measure it.

Data structure example:

user_idoptionconverteddevicesignup_date
101A1mobile2026-01-02
102B0desktop2026-01-02

Step 4 — Summarize evidence (separate signal from noise)

Start descriptively: compute the sample statistics for each option, then compare them.

Example: In a sample of 2,000 users per option:

  • Option A: 220 conversions → sample conversion rate p̂_A = 220/2000 = 0.11
  • Option B: 200 conversions → sample conversion rate p̂_B = 200/2000 = 0.10
  • Observed difference: Δ = p̂_A − p̂_B = 0.01 (1 percentage point)

Now interpret with uncertainty: ask whether a 1-point difference is likely to persist beyond this sample. Two practical tools:

  • Stability checks: Does the difference look similar across days, segments, or devices? If A wins only on one day or one segment, variability may be driving the result.
  • Uncertainty summaries: Use a confidence interval or a standard error conceptually to express “how much this estimate could wiggle” if you repeated the sampling.

Even without doing full calculations in this chapter, you can adopt the habit: estimate + uncertainty, not just estimate.

Step 5 — Make a decision under uncertainty

Statistical decisions are rarely “certain.” Combine evidence with practical constraints.

  • Decision threshold: What minimum improvement matters? (e.g., at least +0.5 percentage points)
  • Costs and risks: Is one option riskier (brand impact, compliance, operational complexity)?
  • Reversibility: If you can switch back easily, you may accept more uncertainty.
  • Value of more data: If the decision is high-stakes, it may be worth collecting more data to reduce uncertainty.

Decision framing example: “Adopt A if it improves conversion by at least 0.5 points and does not reduce conversion in any major segment; otherwise continue testing.”

Scenario 2: Is performance improving over time?

Another common decision is whether a process change improved outcomes: a new onboarding flow, a new machine setting, a new policy.

Step-by-step: before/after with variability in mind

  • Define the question: “Did average handling time decrease after the new script?”
  • Identify variables: handling time (numeric), period (before/after), agent, call type.
  • Collect data: choose comparable windows (e.g., 4 weeks before and 4 weeks after), ensure consistent measurement.
  • Summarize: compare averages and spreads; look at distributions, not only means.
  • Decide: consider whether changes could be due to seasonality, staffing, or mix of call types.

A key statistical habit here is to ask: What else changed? If call volume doubled or call types shifted, the observed difference might not be attributable to the script alone.

Scenario 3: Predicting outcomes for planning (not certainty)

Sometimes the decision is about planning resources: inventory, staffing, budget. Statistics helps you treat forecasts as ranges rather than single numbers.

From point estimate to range

  • Question: “How many support tickets should we staff for next Monday?”
  • Data: past Mondays, recent trend, known events (product launch).
  • Summary: typical level and variability (e.g., median and spread).
  • Decision: staff for a high-percentile scenario if under-staffing is costly; staff closer to typical if over-staffing is costly.

Statistical thinking here is less about “being right” and more about being prepared for plausible variation.

Common pitfalls (and how to avoid them)

Pitfall 1: Treating numbers as exact truth

A sample statistic is not the population parameter. A conversion rate of 11% in your sample is an estimate, not a permanent fact.

  • Antidote: Always pair an estimate with uncertainty language: “about,” “approximately,” “likely within a range.”
  • Practice: Report plus a confidence interval (or at least acknowledge sampling variability).

Pitfall 2: Confusing individual outcomes with long-run patterns

Statistics describes patterns across many units, not guarantees for one unit. Even if Option A has a higher average conversion rate, many individuals will still not convert.

  • Antidote: Keep the unit of inference clear: “On average,” “in the long run,” “for the population.”

Pitfall 3: Ignoring context and measurement quality

Bad measurement can dominate good analysis. If “conversion” is tracked inconsistently, or if a sensor drifts, your statistics summarize error.

  • Antidote: Define metrics precisely, audit data pipelines, check missingness, and verify that measurement is comparable across groups and time.
  • Quick checks: Are there sudden jumps due to logging changes? Are some segments under-recorded? Are there duplicates?

Pitfall 4: Comparing groups that aren’t comparable

If Option A was shown mostly to mobile users and Option B mostly to desktop users, the difference may reflect device mix rather than option quality.

  • Antidote: Use random assignment when possible; otherwise stratify summaries by key context variables and interpret causality cautiously.

Pitfall 5: Overreacting to small samples or short windows

Early results can swing widely due to variability. A “winner” after 20 observations may reverse after 2,000.

  • Antidote: Plan a sample size or stopping rule in advance; monitor stability over time; avoid peeking-driven decisions.

A compact checklist you can apply immediately

  • Question: What decision am I making, and what metric defines success?
  • Variables: What is the outcome, what is the option/exposure, what context variables matter?
  • Data: What is the population, what is my sample, and how was it collected?
  • Evidence: What are the key descriptive summaries, and how variable are they?
  • Uncertainty: How confident am I that the observed difference will persist?
  • Decision: What threshold, costs, and risks determine the action?

Now answer the exercise about the content:

When comparing Option A vs Option B, what best reflects statistical thinking about the observed difference in sample conversion rates?

You are right! Congratulations, now go to the next page

You missed! Try again.

Good decisions combine estimate + uncertainty. You compare sample statistics, check stability across segments/days, and acknowledge variability before acting under uncertainty.

Next chapter

Data Types, Measurement, and Organizing a Dataset

Arrow Right Icon
Free Ebook cover Statistics Fundamentals: From Data to Decisions
8%

Statistics Fundamentals: From Data to Decisions

New course

12 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.