Describing Quantitative Data with Center and Spread

Capítulo 4

Estimated reading time: 8 minutes

+ Exercise

Why “center” and “spread” go together

A single number rarely describes a dataset well. Measures of center summarize a typical value, while measures of spread summarize how variable the values are around that center. Two datasets can share the same mean but have very different variability; likewise, two datasets can share the same median but differ in how extreme the tails are.

A small dataset we will use throughout

We will compute each statistic on the same quantitative dataset so you can compare them directly. Suppose these are eight delivery times (in minutes):

Data: 8, 9, 10, 10, 11, 12, 13, 30

Notice the last value (30) is much larger than the rest; it behaves like an outlier and will help illustrate sensitivity.

Measures of central tendency

Mean (arithmetic average)

What it captures: the balance point of the data; it uses every value and reflects the overall level.

When it is useful: when the distribution is roughly symmetric and outliers are not dominating; when you need a value that works well in further calculations (e.g., variance, standard deviation).

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

What it is sensitive to: outliers and strong skew; a single extreme value can pull the mean noticeably.

Step-by-step: calculating the mean

1) Add all values:

8 + 9 + 10 + 10 + 11 + 12 + 13 + 30 = 103

2) Divide by the number of observations (n = 8):

Mean = 103 / 8 = 12.875

Interpretation (plain language): “If these delivery times were evenly redistributed, each would be about 12.9 minutes.”

Median (middle value)

What it captures: the 50th percentile; half the values are at or below it, and half are at or above it.

When it is useful: when the distribution is skewed or includes outliers; when you want a typical value that is not pulled by extremes.

What it is sensitive to: much less sensitive to outliers; it can change if values around the middle change, but extreme tails have limited impact.

Step-by-step: calculating the median

1) Sort the data (already sorted):

8, 9, 10, 10, 11, 12, 13, 30

2) With an even number of observations (n = 8), the median is the average of the 4th and 5th values:

4th value = 10, 5th value = 11

3) Average them:

Median = (10 + 11) / 2 = 10.5

Interpretation (plain language): “A typical delivery time is about 10.5 minutes; half the deliveries are faster than 10.5 minutes and half are slower.”

Mode (most frequent value)

What it captures: the most common value(s). A dataset can be unimodal (one mode), bimodal (two modes), or have no mode (all values occur equally often).

When it is useful: when the most common outcome matters (e.g., most common order size); when data are discrete or rounded (e.g., times recorded to the nearest minute).

What it is sensitive to: depends heavily on how values are recorded (rounding/binning); can be unstable in small samples; ignores how far other values are from the mode.

Step-by-step: finding the mode

Count frequencies:

  • 8 (1 time)
  • 9 (1 time)
  • 10 (2 times)
  • 11 (1 time)
  • 12 (1 time)
  • 13 (1 time)
  • 30 (1 time)

The most frequent value is 10, so:

Mode = 10

Interpretation (plain language): “The most common delivery time in this sample is 10 minutes.”

Measures of variability (spread)

Range

What it captures: the total spread from minimum to maximum.

When it is useful: quick sense of overall span; useful for rough checks (e.g., “Are values within a plausible limit?”).

What it is sensitive to: extremely sensitive to outliers because it uses only the min and max.

Step-by-step: calculating the range

Identify min and max:

Min = 8, Max = 30

Compute:

Range = 30 - 8 = 22

Interpretation (plain language): “Delivery times vary by 22 minutes from fastest to slowest in this sample.”

Interquartile range (IQR)

What it captures: the spread of the middle 50% of the data: IQR = Q3 - Q1.

When it is useful: when data are skewed or have outliers; pairs naturally with the median.

What it is sensitive to: relatively robust; extreme values have little effect because it focuses on the middle half.

Step-by-step: calculating Q1, Q3, and IQR

We will use the common “median of halves” method for an even-sized dataset.

Sorted data:

8, 9, 10, 10, 11, 12, 13, 30

Lower half (first 4 values):

8, 9, 10, 10

Upper half (last 4 values):

11, 12, 13, 30

Compute Q1 as the median of the lower half:

Q1 = (9 + 10) / 2 = 9.5

Compute Q3 as the median of the upper half:

Q3 = (12 + 13) / 2 = 12.5

Then:

IQR = Q3 - Q1 = 12.5 - 9.5 = 3.0

Interpretation (plain language): “The middle 50% of delivery times fall within a 3-minute band (from about 9.5 to 12.5 minutes).”

Variance

What it captures: average squared distance from the mean; it quantifies spread using all observations.

When it is useful: foundational for many statistical methods; helpful when you need a mathematically convenient measure of variability.

What it is sensitive to: outliers strongly affect it because deviations are squared; also influenced by skew.

Two common versions:

  • Population variance (when you truly have every value): σ^2 = Σ(x - μ)^2 / N
  • Sample variance (when data are a sample): s^2 = Σ(x - x̄)^2 / (n - 1)

In practice, you will often compute sample variance using n - 1 (Bessel’s correction) to reduce bias when estimating population variance from a sample.

Step-by-step: calculating the sample variance

We already have the mean:

x̄ = 12.875

Compute each deviation and squared deviation:

xx - x̄(x - x̄)^2
8-4.87523.765625
9-3.87515.015625
10-2.8758.265625
10-2.8758.265625
11-1.8753.515625
12-0.8750.765625
130.1250.015625
3017.125293.265625

Sum the squared deviations:

Σ(x - x̄)^2 = 23.765625 + 15.015625 + 8.265625 + 8.265625 + 3.515625 + 0.765625 + 0.015625 + 293.265625 = 352.875

Divide by n - 1 = 7:

s^2 = 352.875 / 7 = 50.410714...

Interpretation (plain language): “Using squared minutes, the average squared distance from the mean is about 50.41. Because this is in squared units, it is usually reported via the standard deviation for easier interpretation.”

Standard deviation

What it captures: typical distance from the mean, in the original units (minutes here). It is the square root of variance.

When it is useful: when mean is an appropriate center (roughly symmetric data without extreme outliers); for comparing variability across groups measured in the same units; for many modeling and inference procedures.

What it is sensitive to: outliers and skew, because it inherits sensitivity from the mean and squared deviations.

Step-by-step: calculating the sample standard deviation

Take the square root of the sample variance:

s = sqrt(50.410714...) ≈ 7.10

Interpretation (plain language): “Delivery times typically differ from the mean by about 7.1 minutes.”

Notice how large this is relative to most of the data (8–13). The single value 30 inflates the standard deviation substantially.

Comparing sensitivity: what changes when an outlier appears?

In our dataset, most values cluster between 8 and 13, but 30 is far away. Here is how the statistics respond:

  • Mean is pulled upward (12.875) compared with the cluster near 10–12.
  • Median stays near the middle of the cluster (10.5).
  • Mode remains 10 because it depends on frequency, not magnitude.
  • Range becomes large (22) because it uses the maximum.
  • IQR stays small (3.0) because it focuses on the middle 50%.
  • Variance/SD become large because the outlier creates a huge squared deviation.

Robust summaries: median and IQR

What “robust” means in practice

A statistic is robust if a small number of extreme values does not change it much. Robust summaries are especially helpful when:

  • Data are right-skewed (a long tail of large values), common with money amounts and waiting times.
  • Outliers may reflect rare events (e.g., a delayed shipment) rather than the typical process.
  • You want a summary that represents the “typical” case rather than the “average including extremes.”

Median + IQR as a pair

Median answers “What is typical?” while IQR answers “How variable is the typical middle?” Together they describe the central bulk of the data without being dominated by extremes.

For the delivery-time data:

  • Median = 10.5 minutes (typical delivery)
  • IQR = 3.0 minutes (middle half is fairly tight)

This combination communicates that most deliveries are close to 10–12 minutes, even though a rare long delivery exists.

When to prefer mean + standard deviation

Mean and standard deviation are often preferred when:

  • The distribution is approximately symmetric and outliers are not extreme.
  • You plan to use methods that assume or work best with mean/SD summaries.
  • You want a measure that reflects the impact of rare but important extremes (e.g., risk management contexts).

Quick reference table

StatisticCapturesUseful whenSensitive to
MeanBalance pointRoughly symmetric data; further calculationsOutliers, skew
MedianMiddle (50th percentile)Skewed data; outliers presentMuch less sensitive to outliers
ModeMost common valueDiscrete/rounded data; “most frequent” mattersRounding/binning; small-sample instability
RangeMin to max spanQuick overall spread checkOutliers (very)
IQRMiddle 50% spreadSkew/outliers; robust variabilityRelatively robust
VarianceAverage squared deviationModeling/math convenienceOutliers (very)
Std. deviationTypical deviation from meanSymmetric data; comparisons; modelingOutliers, skew

Decision prompts: choosing a summary in context

Salaries in a company

  • If a few executives earn far more than most employees, which center better reflects a “typical” salary: mean or median?
  • Which spread better describes typical variability among most employees: standard deviation or IQR?

Test scores on a fairly consistent exam

  • If scores are roughly symmetric with no extreme values, which pair is more informative: mean + standard deviation or median + IQR?
  • If a handful of students missed the exam and got zeros, how might that change your choice?

Customer wait times at a service desk

  • Wait times often have occasional long delays. Would you report the “typical” wait using the median or the mean?
  • Would you track improvement using IQR (middle spread) or range (extremes)?

Home prices in a neighborhood

  • With a few luxury homes, which center would you use for a listing summary: median or mean?
  • Which spread would best communicate the typical market variability: IQR or standard deviation?

Now answer the exercise about the content:

A dataset of delivery times is right-skewed because it includes one unusually large value. Which pair of statistics is most appropriate to describe a typical delivery time and the typical variability without being dominated by that extreme value?

You are right! Congratulations, now go to the next page

You missed! Try again.

For skewed data with outliers, the median gives a typical value and the IQR summarizes the middle 50% spread. Both are relatively robust to extreme values compared with mean/SD or range.

Next chapter

Understanding Distributions: Shape, Skew, and Outliers

Arrow Right Icon
Free Ebook cover Statistics Fundamentals: From Data to Decisions
33%

Statistics Fundamentals: From Data to Decisions

New course

12 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.