Random Variables and Common Distribution Patterns

Capítulo 9

Estimated reading time: 9 minutes

+ Exercise

What a Random Variable Is (and Why It Matters)

A random variable is a rule that assigns a numerical value to the outcome of an uncertain process. It lets you move from “something uncertain happens” to “a number is produced,” which makes uncertainty measurable and comparable.

  • Example (discrete): Let X be the number of defective items in a sample of 10. Possible values are 0,1,2,...,10.
  • Example (continuous): Let Y be the time (in seconds) it takes a webpage to load. Possible values are any real number in a range (e.g., 0.8, 1.03, 2.51).

A probability distribution is a model describing how likely each value (or range of values) is. In practice, distributions are used as simplified descriptions of data patterns: they help you anticipate typical outcomes and how much variability to expect.

Discrete vs Continuous Random Variables

Discrete random variables

Discrete random variables take countable values (often counts). Their distributions are described with a list or formula of probabilities for each possible value.

  • Interpretation focus: Which values are most likely? How concentrated are probabilities around those values?
  • Modeling mindset: You are modeling a process that produces counts (defects, clicks, arrivals, successes).

Continuous random variables

Continuous random variables take values on a continuum. For continuous variables, the probability of any exact single value is effectively zero; probabilities are assigned to intervals (e.g., between 1.8 and 2.2 seconds).

  • Interpretation focus: Where is the distribution centered? How wide is it? How likely are extreme values?
  • Modeling mindset: You are modeling measurements (time, weight, temperature, voltage) that can vary smoothly.

Probability Distributions as Models for Data Patterns

In real work, you rarely know the “true” distribution. You choose a distribution that captures the main pattern of the data-generating process. A useful distribution model should help you answer questions like:

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

  • Typical value: What outcomes are most common or expected?
  • Variability: How spread out are outcomes? How often do you see unusual values?
  • Parameter meaning: What do the model’s parameters represent in the real situation?

Think of a distribution as a compressed story about a process: a few parameters summarize a recurring pattern.

Bernoulli: The Success/Failure Building Block

A Bernoulli random variable models a single trial with two outcomes, often coded as:

  • 1 = success
  • 0 = failure

The model has one parameter:

  • p = P(success)

How to interpret p and variability

  • Typical value: If p is close to 1, successes are typical; if close to 0, failures are typical.
  • Variability: Outcomes are most variable when p is near 0.5 (success and failure both common). When p is near 0 or 1, outcomes are more predictable.

Practical checklist: when a Bernoulli model fits

  1. Two outcomes only: Each trial can be classified as success/failure (or yes/no).
  2. Clear definition of success: The rule for labeling success is consistent.
  3. Stable probability: The chance of success is roughly constant from trial to trial.
  4. Independence (approx.): One trial’s outcome doesn’t strongly change the next trial’s probability.

Example: “Did a user click the button?” for a single page view can be modeled as Bernoulli with p = click-through probability.

Binomial: Counting Successes Across Repeated Trials

The binomial distribution models the number of successes in n Bernoulli trials:

X = number of successes in n trials

It has two parameters:

  • n: number of trials
  • p: probability of success on each trial

Interpretation: typical count and variability

Even without heavy computation, you can interpret the binomial model using these ideas:

  • Typical count (center): around n × p. This is the “expected” number of successes in repeated runs of the process.
  • Variability (spread): increases with n and is largest when p is near 0.5. When p is very small or very large, outcomes cluster more tightly.

Step-by-step: deciding whether a binomial model is reasonable

  1. Define the trial: What is one repeatable unit? (one email sent, one product inspected, one patient treated)
  2. Define success: What counts as success? (opened email, defect found, symptom resolved)
  3. Check the “same p” idea: Are trials similar enough that the success probability is roughly constant?
  4. Check independence: Are outcomes mostly independent? (Sampling without replacement from a small batch can violate this.)
  5. Confirm the outcome is a count: You want the number of successes out of n, not the time to first success or the size of a measurement.

Concrete example: quality checks

You inspect n = 20 items from a production line. Suppose the long-run defect rate is about p = 0.05. A binomial model describes the distribution of X, the number of defective items in those 20.

  • Typical value: n × p = 20 × 0.05 = 1 defect is a reasonable “typical” count.
  • Interpretation of unusual outcomes: Seeing 0 defects might happen sometimes; seeing 6 defects would be surprising and could indicate a process shift.

Common binomial pitfalls (interpretation-focused)

  • Changing conditions: If early items come from one machine and later items from another, p may not be constant.
  • Dependence: Defects can cluster (e.g., a miscalibrated tool causes streaks), making variability larger than the binomial model expects.
  • Not truly two-outcome: If outcomes have multiple categories (no defect / minor / major), forcing success/failure can hide important structure.

Normal Distribution: A Useful Model for Aggregated Measurements

The normal distribution is a continuous distribution often used to model measurements that result from the combined effect of many small influences (instrument noise, small biological differences, many tiny process variations). It is especially useful because many sums/averages of small effects tend to look approximately normal.

Parameters and interpretation

The normal distribution is described by two parameters:

  • μ (mu): the center (typical value)
  • σ (sigma): the spread (typical deviation from the center)

Interpretation is the main advantage:

  • μ sets the location: shifting μ moves the whole curve left/right.
  • σ sets the variability: larger σ means a wider curve and more extreme values are plausible.

Practical “normal thinking” without heavy computation

When data are roughly normal, you can use simple rules of thumb to interpret typical ranges:

  • Most values fall within about μ ± 2σ.
  • Very unusual values are often beyond about μ ± 3σ.

These are not exact guarantees, but they are useful for quick reasoning about what is typical versus surprising.

Example: process measurement

Suppose the fill volume of a machine is approximately normal with μ = 500 ml and σ = 4 ml.

  • Typical range: about 500 ± 8 → roughly 492 to 508 ml.
  • Potential outliers: values beyond 500 ± 12 → below 488 or above 512 ml are rare and may indicate a special cause (calibration drift, sensor issue).

Normal as an Approximation (Including for Counts)

Normal models can sometimes approximate other distributions when you are aggregating many small contributions or when counts are not too close to their boundaries.

Binomial-to-normal intuition

A binomial count can look approximately normal when:

  • n is large enough, and
  • p is not extremely close to 0 or 1 (so the distribution isn’t heavily skewed or piled up near 0 or n).

Interpretation benefit: you can reason about “typical” counts using the binomial center (n×p) and a normal-like spread, and you can treat unusually high/low counts as signals worth investigating.

When Normal-Based Thinking Breaks

The normal model is powerful, but it is not universal. Misusing it often leads to underestimating how often extreme values occur or to predicting impossible values.

1) Strongly skewed data

Examples: income, time-to-complete a task, waiting times, file sizes. These often have a long right tail.

  • What goes wrong with normal: It may place too much probability on negative values (impossible for many measures) and underestimate the chance of very large values.
  • Practical alternative: Consider a transformation (often a log transform) or summarize with robust measures (median and IQR) when making comparisons.

2) Small samples

With few observations, it is hard to tell whether the underlying pattern is truly symmetric and bell-shaped.

  • What goes wrong with normal: A small sample can look “roughly normal” by chance, or hide skew/outliers that matter.
  • Practical alternative: Use visual checks (histogram/density/boxplot) and emphasize robust summaries; avoid overconfident normal-based assumptions.

3) Bounded measures

Examples: percentages (0–100), proportions (0–1), ratings (1–5), test scores with a max, probabilities.

  • What goes wrong with normal: It can imply values below the minimum or above the maximum, especially when variability is large or the mean is near a boundary.
  • Practical alternative: Work on a scale that respects bounds (e.g., logit transform for proportions not at 0/1), or use models designed for bounded outcomes.

4) Heavy tails and outliers

Some processes produce extreme values more often than the normal model predicts (e.g., network latency spikes, financial returns).

  • What goes wrong with normal: It underestimates the frequency of extreme events, leading to overly optimistic risk assessments.
  • Practical alternative: Use robust summaries (median, IQR, trimmed mean) and consider models that allow heavier tails if modeling is required.

Practical Alternatives When Normal Isn’t a Good Fit

Transformations (to make patterns more regular)

Transformations change the scale to reduce skewness or stabilize variability.

  • Log transform: useful for positive, right-skewed data (times, sizes, monetary amounts). Interpretation often becomes multiplicative (ratios) rather than additive (differences).
  • Square-root transform: sometimes useful for count-like measurements where variability grows with the mean.

Robust summaries (to reduce sensitivity to outliers)

When the shape is irregular or outliers are meaningful, robust summaries often communicate “typical” and “spread” more reliably than normal-based summaries.

  • Typical value: median
  • Variability: IQR (interquartile range)
  • Practical comparison: compare medians and IQRs across groups rather than relying on a normal model.

Step-by-step: choosing a distribution model (interpretation-first)

  1. Identify the outcome type: count of successes (discrete) vs measurement (continuous).
  2. Check constraints: can values be negative? are they bounded? are there natural limits?
  3. Match the data-generating story: repeated success/failure trials → Bernoulli/binomial; aggregated measurement noise → often normal.
  4. Assess shape quickly: is it roughly symmetric or clearly skewed/heavy-tailed?
  5. Decide your tool: if normal seems plausible, interpret via μ and σ; if not, use transformations or robust summaries for interpretation.

Summary Table: Common Patterns and What to Look For

SituationRandom variable typeCommon modelKey parameters to interpretInterpretation focus
Single yes/no outcomeDiscreteBernoullipHow likely success is; most variable near p=0.5
Number of successes in n trialsDiscreteBinomialn, pTypical count near n×p; variability depends on n and p
Aggregated measurement (many small effects)ContinuousNormalμ, σCenter and spread; typical range about μ±2σ
Positive, right-skewed measurementContinuousTransform + normal-like thinkingTransform choiceInterpret ratios; use median/IQR if needed
Bounded proportion/percentageContinuous-ish (bounded)Transform or bounded-model thinkingBounds + centerAvoid impossible values; use robust summaries near boundaries

Now answer the exercise about the content:

A team models a single page view as a Bernoulli random variable where 1 = “user clicked” and 0 = “user did not click.” Which statement best interprets the parameter p?

You are right! Congratulations, now go to the next page

You missed! Try again.

In a Bernoulli model there are two outcomes (0/1), and the single parameter p represents P(success), i.e., the probability the outcome equals 1 (a click).

Next chapter

Estimation: Confidence Intervals as Ranges of Plausible Values

Arrow Right Icon
Free Ebook cover Statistics Fundamentals: From Data to Decisions
75%

Statistics Fundamentals: From Data to Decisions

New course

12 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.