All courses > School Subjects > Statistics ::

From Samples to Populations: Sampling, Bias, and Variability

Capítulo 7

Estimated reading time: 9 minutes

Why samples differ (even when nothing changes)

In practice we rarely observe an entire population, so we use a sample to estimate a population quantity (a proportion, a mean, a rate). A key idea is that different random samples from the same population will not be identical. This natural sample-to-sample fluctuation is called sampling variability.

Two kinds of error matter when we generalize from a sample to a population:

Systematic error (bias): the sampling or measurement process pushes estimates consistently too high or too low.
Random error (variance): estimates bounce around from sample to sample because of chance selection.

Good sampling aims to reduce bias (systematic error) and to quantify variability (random error). Increasing sample size mainly reduces variability; it does not automatically remove bias.

Random sampling (conceptual) and why it helps

Random sampling means that the selection of units is governed by chance in a way that gives each unit a known, nonzero probability of selection. Conceptually, random sampling helps because it prevents the researcher (or the data collection process) from consistently favoring certain types of units.

Thought experiment: repeated samples

Imagine a population where the true support for a policy is 52%. If you repeatedly take simple random samples of size 50 and compute the sample proportion each time, you might see results like 46%, 54%, 60%, 50%… The population did not change; the sample did.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

If you increase the sample size to 500 and repeat the process, the sample proportions will typically cluster much closer to 52%. This illustrates:

Unbiasedness (in expectation): random sampling does not systematically overshoot or undershoot the truth.
Stability increases with n: larger samples reduce variability.

Sampling frames: what you can sample from

A sampling frame is the operational list or mechanism that defines which units can be selected (a customer database, voter registry, list of clinics, set of transactions). Your target population may be broader than your frame.

Common frame problems

Undercoverage: some population members are missing from the frame (e.g., a phone survey frame missing people without stable phone access).
Overcoverage: the frame includes units not in the population (e.g., duplicate records, outdated entries).
Mismatched unit: frame units don’t match the analysis unit (e.g., sampling households but analyzing individuals without a within-household selection rule).

Frame issues often create bias because the excluded or misrepresented groups can differ systematically on the outcome of interest.

Selection bias: when inclusion depends on the outcome (or its drivers)

Selection bias occurs when the probability of being included in the sample is related to the outcome you want to estimate, or to strong predictors of that outcome. This can happen even with large datasets.

Example: estimating average delivery time

If your dataset contains only deliveries that were successfully completed, and failed deliveries are more likely to be delayed, then the sample will systematically underestimate true average delivery time for all attempted deliveries. More rows won’t fix the missing mechanism; the estimate remains biased.

Practical step-by-step: diagnose selection bias risk

Define the target population precisely (who, where, when).
Describe the inclusion rule: how did a unit end up in the dataset?
List exclusion pathways (not in frame, not contacted, opted out, data not recorded, filtered by business rules).
Ask whether exclusions correlate with the outcome (or its key drivers). If yes, bias risk is high.
Check representativeness signals: compare sample composition to known population benchmarks when available (e.g., region, age bands, customer segments).

Nonresponse: missingness created by refusal or inability to participate

Nonresponse bias arises when those who do not respond differ systematically from responders on the outcome. Nonresponse is not just “less data”; it can change what the data represent.

Two patterns to watch

Unit nonresponse: entire sampled units provide no data (e.g., survey not completed).
Item nonresponse: some questions/fields are missing (e.g., income not reported).

Practical step-by-step: reduce and assess nonresponse bias

Track response rates overall and by subgroup (region, device type, time of day, customer tier).
Compare early vs late responders as a proxy check (late responders sometimes resemble nonresponders).
Use follow-ups targeted at low-response groups.
Document reasons for nonresponse when possible (cannot contact, refusal, language barrier).
Consider weighting or adjustment if you have auxiliary variables measured for both responders and nonresponders (or known population margins).

Measurement bias: when the recorded value differs systematically from the true value

Even with a perfect sample, poor measurement can bias results. Measurement bias happens when the measurement process systematically overstates or understates the true value.

Common sources

Instrument bias: miscalibrated sensors, inconsistent scales, rounding rules.
Question wording / mode effects: leading questions, different answers online vs phone.
Observer/recording bias: inconsistent coding, subjective classification, data entry defaults.
Timing bias: measuring at a time that systematically differs from the target (e.g., measuring satisfaction immediately after support contact vs a week later).

Practical step-by-step: audit measurement quality

Write an operational definition of each key variable (what exactly counts?).
Map the data pipeline: where is the value generated, transformed, and stored?
Look for systematic missingness (e.g., a field missing more often for certain groups).
Check consistency across sources (same event recorded in two systems).
Run spot checks against ground truth (manual review, calibration records, gold-standard subset).

Simulating sampling variability (a repeatable mental model)

You can understand sampling variability by imagining a “machine” that repeatedly draws samples from the same population and computes an estimate each time. The collection of those estimates forms a distribution of possible outcomes.

Simulation recipe (conceptual, no software required)

Assume a population truth (e.g., true defect rate = 3%).
Choose a sample size n (e.g., n = 50).
Draw a sample (imagine selecting 50 items at random from a huge stream).
Compute the estimate (sample defect rate).
Repeat many times and note how much the estimates vary.

What you will observe:

With small n, estimates swing widely (high variability).
With large n, estimates cluster tightly (low variability).
If the sampling process is biased (e.g., only day-shift production), the cluster can be tight but centered on the wrong value.

Mini thought experiment: “tight but wrong” vs “wide but right”

Scenario	What happens when you repeat samples?	Interpretation
Large n, biased frame	Estimates are very similar each time, but consistently off	Low variance, high bias
Small n, random sampling	Estimates vary a lot, but average out around the truth	High variance, low bias

How sample size affects stability (variance) without fixing bias

As sample size increases, random fluctuations average out. Practically, this means your estimate becomes more stable from sample to sample. But if your data collection systematically misses or mismeasures certain units, increasing n can simply give you a very precise estimate of the wrong quantity.

Rule-of-thumb intuition

To cut random noise roughly in half, you often need about 4× the sample size (because many standard errors shrink like 1/sqrt(n)).
Bias does not shrink automatically with n; it requires fixing the process (frame, selection, response, measurement).

Practical step-by-step: separating bias from variance in a real project

List potential bias sources: frame gaps, selection filters, nonresponse, measurement issues.
Assess variance: if you can resample (bootstrap) or split the data by time/region, do estimates change a lot?
Increase n only after bias checks: if the process is biased, more data mostly increases confidence in the biased result.
Use design improvements: better frame, randomization, follow-ups, standardized measurement.

Common sampling designs (conceptual) and their bias/variance tradeoffs

Different designs can improve feasibility and precision, but each introduces its own risks if implemented poorly.

Simple random sample: conceptually clean; requires a good frame.
Stratified sampling: sample within important subgroups (e.g., region, customer tier) to ensure coverage and improve precision for subgroup estimates.
Cluster sampling: sample groups (stores, schools) then units within them; cheaper but can increase variance if units within clusters are similar.
Convenience sampling: easiest operationally; often high bias risk because inclusion is not governed by a known chance mechanism.

Checklist: does this dataset support a population-level decision?

Use this checklist before making claims about a population (customers, voters, patients, transactions, devices) based on a dataset.

A. Define the decision and the target population

What decision will be made from the estimate?
Who exactly is the population (eligibility rules, geography, time window)?
What is the unit of analysis (person, account, visit, order)?

B. Sampling frame and coverage

What is the sampling frame (list/system) and how was it constructed?
Who is missing (undercoverage) and who is duplicated or irrelevant (overcoverage)?
Do key subgroups appear in expected proportions compared with benchmarks?

C. Selection mechanism

What determined inclusion (filters, opt-in, platform constraints, business rules)?
Could inclusion depend on the outcome or its drivers (selection bias risk)?
Were any records removed (e.g., outliers, failed events) in a way that could be outcome-related?

D. Nonresponse and missing data

What is the response rate (or completion rate) overall and by subgroup?
Are missingness patterns related to the outcome (e.g., dissatisfied users less likely to respond)?
Is there a plan for follow-up, weighting, or adjustment using auxiliary information?

E. Measurement quality

Are variables defined operationally and measured consistently?
Any known instrument, mode, timing, or coding biases?
Are there validation checks against a trusted source or audit sample?

F. Variability and stability

Is the sample size adequate for the needed precision (especially for subgroups)?
Do estimates change materially across reasonable splits (time periods, regions, data sources)?
If you repeated the sampling process, would you expect similar results?

G. Scope of inference

Does the dataset support inference to the full target population, or only to the frame/participants?
Are you implicitly generalizing beyond the time window or context of data collection?
Have limitations been documented in a way decision-makers can act on (e.g., “applies to active users only”)?

Now answer the exercise about the content:

A team increases its sample size from 50 to 500 but still uses a sampling frame that systematically misses a key subgroup. What is the most likely effect on their estimate?

You are right! Congratulations, now go to the next page

You missed! Try again.

Increasing sample size mainly reduces random sampling variability, making estimates cluster more tightly. If the sampling frame has undercoverage (or other systematic issues), the result can be very precise but still biased.