Why “center” and “spread” go together
A single number rarely describes a dataset well. Measures of center summarize a typical value, while measures of spread summarize how variable the values are around that center. Two datasets can share the same mean but have very different variability; likewise, two datasets can share the same median but differ in how extreme the tails are.
A small dataset we will use throughout
We will compute each statistic on the same quantitative dataset so you can compare them directly. Suppose these are eight delivery times (in minutes):
Data: 8, 9, 10, 10, 11, 12, 13, 30Notice the last value (30) is much larger than the rest; it behaves like an outlier and will help illustrate sensitivity.
Measures of central tendency
Mean (arithmetic average)
What it captures: the balance point of the data; it uses every value and reflects the overall level.
When it is useful: when the distribution is roughly symmetric and outliers are not dominating; when you need a value that works well in further calculations (e.g., variance, standard deviation).
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
What it is sensitive to: outliers and strong skew; a single extreme value can pull the mean noticeably.
Step-by-step: calculating the mean
1) Add all values:
8 + 9 + 10 + 10 + 11 + 12 + 13 + 30 = 1032) Divide by the number of observations (n = 8):
Mean = 103 / 8 = 12.875Interpretation (plain language): “If these delivery times were evenly redistributed, each would be about 12.9 minutes.”
Median (middle value)
What it captures: the 50th percentile; half the values are at or below it, and half are at or above it.
When it is useful: when the distribution is skewed or includes outliers; when you want a typical value that is not pulled by extremes.
What it is sensitive to: much less sensitive to outliers; it can change if values around the middle change, but extreme tails have limited impact.
Step-by-step: calculating the median
1) Sort the data (already sorted):
8, 9, 10, 10, 11, 12, 13, 302) With an even number of observations (n = 8), the median is the average of the 4th and 5th values:
4th value = 10, 5th value = 113) Average them:
Median = (10 + 11) / 2 = 10.5Interpretation (plain language): “A typical delivery time is about 10.5 minutes; half the deliveries are faster than 10.5 minutes and half are slower.”
Mode (most frequent value)
What it captures: the most common value(s). A dataset can be unimodal (one mode), bimodal (two modes), or have no mode (all values occur equally often).
When it is useful: when the most common outcome matters (e.g., most common order size); when data are discrete or rounded (e.g., times recorded to the nearest minute).
What it is sensitive to: depends heavily on how values are recorded (rounding/binning); can be unstable in small samples; ignores how far other values are from the mode.
Step-by-step: finding the mode
Count frequencies:
- 8 (1 time)
- 9 (1 time)
- 10 (2 times)
- 11 (1 time)
- 12 (1 time)
- 13 (1 time)
- 30 (1 time)
The most frequent value is 10, so:
Mode = 10Interpretation (plain language): “The most common delivery time in this sample is 10 minutes.”
Measures of variability (spread)
Range
What it captures: the total spread from minimum to maximum.
When it is useful: quick sense of overall span; useful for rough checks (e.g., “Are values within a plausible limit?”).
What it is sensitive to: extremely sensitive to outliers because it uses only the min and max.
Step-by-step: calculating the range
Identify min and max:
Min = 8, Max = 30Compute:
Range = 30 - 8 = 22Interpretation (plain language): “Delivery times vary by 22 minutes from fastest to slowest in this sample.”
Interquartile range (IQR)
What it captures: the spread of the middle 50% of the data: IQR = Q3 - Q1.
When it is useful: when data are skewed or have outliers; pairs naturally with the median.
What it is sensitive to: relatively robust; extreme values have little effect because it focuses on the middle half.
Step-by-step: calculating Q1, Q3, and IQR
We will use the common “median of halves” method for an even-sized dataset.
Sorted data:
8, 9, 10, 10, 11, 12, 13, 30Lower half (first 4 values):
8, 9, 10, 10Upper half (last 4 values):
11, 12, 13, 30Compute Q1 as the median of the lower half:
Q1 = (9 + 10) / 2 = 9.5Compute Q3 as the median of the upper half:
Q3 = (12 + 13) / 2 = 12.5Then:
IQR = Q3 - Q1 = 12.5 - 9.5 = 3.0Interpretation (plain language): “The middle 50% of delivery times fall within a 3-minute band (from about 9.5 to 12.5 minutes).”
Variance
What it captures: average squared distance from the mean; it quantifies spread using all observations.
When it is useful: foundational for many statistical methods; helpful when you need a mathematically convenient measure of variability.
What it is sensitive to: outliers strongly affect it because deviations are squared; also influenced by skew.
Two common versions:
- Population variance (when you truly have every value):
σ^2 = Σ(x - μ)^2 / N - Sample variance (when data are a sample):
s^2 = Σ(x - x̄)^2 / (n - 1)
In practice, you will often compute sample variance using n - 1 (Bessel’s correction) to reduce bias when estimating population variance from a sample.
Step-by-step: calculating the sample variance
We already have the mean:
x̄ = 12.875Compute each deviation and squared deviation:
| x | x - x̄ | (x - x̄)^2 |
|---|---|---|
| 8 | -4.875 | 23.765625 |
| 9 | -3.875 | 15.015625 |
| 10 | -2.875 | 8.265625 |
| 10 | -2.875 | 8.265625 |
| 11 | -1.875 | 3.515625 |
| 12 | -0.875 | 0.765625 |
| 13 | 0.125 | 0.015625 |
| 30 | 17.125 | 293.265625 |
Sum the squared deviations:
Σ(x - x̄)^2 = 23.765625 + 15.015625 + 8.265625 + 8.265625 + 3.515625 + 0.765625 + 0.015625 + 293.265625 = 352.875Divide by n - 1 = 7:
s^2 = 352.875 / 7 = 50.410714...Interpretation (plain language): “Using squared minutes, the average squared distance from the mean is about 50.41. Because this is in squared units, it is usually reported via the standard deviation for easier interpretation.”
Standard deviation
What it captures: typical distance from the mean, in the original units (minutes here). It is the square root of variance.
When it is useful: when mean is an appropriate center (roughly symmetric data without extreme outliers); for comparing variability across groups measured in the same units; for many modeling and inference procedures.
What it is sensitive to: outliers and skew, because it inherits sensitivity from the mean and squared deviations.
Step-by-step: calculating the sample standard deviation
Take the square root of the sample variance:
s = sqrt(50.410714...) ≈ 7.10Interpretation (plain language): “Delivery times typically differ from the mean by about 7.1 minutes.”
Notice how large this is relative to most of the data (8–13). The single value 30 inflates the standard deviation substantially.
Comparing sensitivity: what changes when an outlier appears?
In our dataset, most values cluster between 8 and 13, but 30 is far away. Here is how the statistics respond:
- Mean is pulled upward (12.875) compared with the cluster near 10–12.
- Median stays near the middle of the cluster (10.5).
- Mode remains 10 because it depends on frequency, not magnitude.
- Range becomes large (22) because it uses the maximum.
- IQR stays small (3.0) because it focuses on the middle 50%.
- Variance/SD become large because the outlier creates a huge squared deviation.
Robust summaries: median and IQR
What “robust” means in practice
A statistic is robust if a small number of extreme values does not change it much. Robust summaries are especially helpful when:
- Data are right-skewed (a long tail of large values), common with money amounts and waiting times.
- Outliers may reflect rare events (e.g., a delayed shipment) rather than the typical process.
- You want a summary that represents the “typical” case rather than the “average including extremes.”
Median + IQR as a pair
Median answers “What is typical?” while IQR answers “How variable is the typical middle?” Together they describe the central bulk of the data without being dominated by extremes.
For the delivery-time data:
- Median = 10.5 minutes (typical delivery)
- IQR = 3.0 minutes (middle half is fairly tight)
This combination communicates that most deliveries are close to 10–12 minutes, even though a rare long delivery exists.
When to prefer mean + standard deviation
Mean and standard deviation are often preferred when:
- The distribution is approximately symmetric and outliers are not extreme.
- You plan to use methods that assume or work best with mean/SD summaries.
- You want a measure that reflects the impact of rare but important extremes (e.g., risk management contexts).
Quick reference table
| Statistic | Captures | Useful when | Sensitive to |
|---|---|---|---|
| Mean | Balance point | Roughly symmetric data; further calculations | Outliers, skew |
| Median | Middle (50th percentile) | Skewed data; outliers present | Much less sensitive to outliers |
| Mode | Most common value | Discrete/rounded data; “most frequent” matters | Rounding/binning; small-sample instability |
| Range | Min to max span | Quick overall spread check | Outliers (very) |
| IQR | Middle 50% spread | Skew/outliers; robust variability | Relatively robust |
| Variance | Average squared deviation | Modeling/math convenience | Outliers (very) |
| Std. deviation | Typical deviation from mean | Symmetric data; comparisons; modeling | Outliers, skew |
Decision prompts: choosing a summary in context
Salaries in a company
- If a few executives earn far more than most employees, which center better reflects a “typical” salary: mean or median?
- Which spread better describes typical variability among most employees: standard deviation or IQR?
Test scores on a fairly consistent exam
- If scores are roughly symmetric with no extreme values, which pair is more informative: mean + standard deviation or median + IQR?
- If a handful of students missed the exam and got zeros, how might that change your choice?
Customer wait times at a service desk
- Wait times often have occasional long delays. Would you report the “typical” wait using the median or the mean?
- Would you track improvement using IQR (middle spread) or range (extremes)?
Home prices in a neighborhood
- With a few luxury homes, which center would you use for a listing summary: median or mean?
- Which spread would best communicate the typical market variability: IQR or standard deviation?