Why experimentation (incrementality) complements attribution
Attribution helps you assign credit for conversions across touchpoints. Experimentation helps you answer a different question: did this change cause additional results that would not have happened otherwise? That “additional results” idea is called incrementality.
Incrementality testing is especially useful when:
- Multiple channels touch the same people and attribution feels “too good to be true.”
- You are changing something (a page, creative, budget) and want causal proof.
- You need confidence to scale spend or roll back a change.
The core logic is simple: compare a treatment group (exposed to the change) to a control group (not exposed), while keeping everything else as similar as possible.
Step 1: Write a clear hypothesis
A good hypothesis is specific, measurable, and tied to a user action. Use this structure:
- Change: what you will do differently
- Audience/scope: who/where it applies
- Expected impact: direction and (optionally) size
- Primary metric: what will determine success
Examples
- Landing page: “If we shorten the signup form from 6 fields to 3 fields for paid search traffic, then signup conversion rate will increase.”
- Email: “If we use a benefit-led subject line for trial users, then email click-through rate will increase.”
- Budget: “If we increase Brand Search spend by 20% in selected regions, then total purchases will increase versus similar regions with no change.”
Keep hypotheses focused on one change. If you change multiple things at once, you may learn that performance changed, but not why.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
Step 2: Choose a primary metric (and define it)
Your primary metric is the single number that decides whether the test “wins.” Pick a metric that is:
- Closest to value (e.g., purchases, qualified leads)
- Sensitive enough to move during the test window
- Hard to game (avoid metrics that can rise while real outcomes fall)
Common primary metrics by test type
- Landing page A/B: conversion rate to signup/purchase, or revenue per visitor
- Creative test: cost per acquisition (CPA) or conversion rate from click to purchase (if you can measure it reliably)
- Email subject line: click-through rate (CTR) or downstream conversions from email traffic
- Budget holdout / geo split: incremental purchases, incremental revenue, or incremental leads
Write the metric definition in plain language (what counts, what time window, what population). Example: “Primary metric = purchases within 7 days of first session during the test period.”
Step 3: Set up control vs treatment
Every incrementality test needs a comparison. The best comparison depends on what you can randomize.
Control vs treatment basics
- Treatment: users/regions/campaigns that receive the change
- Control: users/regions/campaigns that do not receive the change
- Randomization: assign units to control/treatment randomly when possible (reduces bias)
In many digital tests, the “unit” is a user (A/B testing). In other cases, the unit might be a region (geo split) or a campaign budget (holdout).
What you compare
At the end, you compare the primary metric between groups. The difference is your estimated incremental lift:
Incremental lift = (Treatment outcome) - (Control outcome)Sometimes you’ll express it as a percent:
% lift = (Treatment - Control) / ControlStep 4 (high level): Minimum Detectable Effect (MDE)
Minimum Detectable Effect is the smallest lift your test is likely to detect as “real” given your traffic/volume and how noisy the metric is. You don’t need advanced statistics to use the concept:
- If your MDE is 5%, a true lift of 1% will probably look like random fluctuation.
- If you expect only a small improvement, you typically need more volume (more visitors, more emails, more time) to detect it.
Use MDE as a planning check:
- Low volume? Choose bigger changes, longer run time, or a more sensitive metric (e.g., conversion rate instead of revenue if revenue is too sparse).
- High volume? You can test smaller improvements and still detect them.
Practical rule of thumb: if you can’t realistically run long enough to detect the lift you care about, redesign the test (bigger change, broader audience) before launching.
Step 5: Add guardrail metrics (to prevent “winning the wrong way”)
Guardrail metrics are secondary metrics that must not get worse beyond an acceptable threshold. They protect against unintended consequences.
Examples of guardrails
- Landing page test: page load time, bounce rate, refund rate, lead quality rate
- Creative test: frequency, negative feedback rate, unsubscribe rate (if applicable)
- Email subject line: unsubscribe rate, spam complaint rate
- Budget increase: CPA, conversion rate, return rate, customer support contacts
Guardrails should be defined with a clear “do not exceed” rule (e.g., “unsubscribe rate must not increase by more than 0.1 percentage points”).
Practical test types (with beginner-friendly steps)
1) Landing page A/B test
Goal: isolate whether a page change improves conversions.
Step-by-step
- Pick one main change: headline, form length, CTA copy, layout, trust badges, pricing display.
- Create Variant A (control): current page.
- Create Variant B (treatment): page with the single change.
- Split traffic: typically 50/50 for speed and simplicity.
- Primary metric: conversion rate (e.g., signups/visitors) or revenue per visitor.
- Guardrails: page speed, downstream quality (e.g., % qualified leads).
- Run time: run through at least one full business cycle (often 1–2 weeks) so weekday/weekend behavior is represented.
- Decision: ship if primary metric improves and guardrails stay within limits.
Beginner pitfall: changing both the headline and the form and the hero image at once. If it wins, you won’t know what caused it.
2) Creative testing (ads)
Goal: learn which message or visual drives better outcomes at similar delivery conditions.
Step-by-step
- Choose one creative variable: hook, offer, image style, call-to-action, length.
- Keep targeting and budget consistent: avoid changing audience and creative simultaneously.
- Run creatives in parallel: same time window to avoid seasonality effects.
- Primary metric: ideally conversions or CPA; if conversion volume is low, use a closer proxy (e.g., landing page view rate) but treat it as directional.
- Guardrails: frequency, negative feedback, conversion rate from click to conversion (to avoid clickbait).
- Decision: promote the winner and archive losers; document what message angle worked.
Beginner pitfall: judging creative only by click-through rate. A creative can attract clicks that don’t convert.
3) Budget holdout test (incrementality of spend)
Goal: estimate how much incremental value a channel/campaign produces by holding out spend for a portion of the audience or time.
Two common approaches
- Audience holdout: exclude a random slice of eligible users from seeing ads (control), while the rest can be targeted (treatment).
- Time-based holdout: pause or reduce spend during selected periods and compare to similar periods (less ideal because time effects can confound results).
Step-by-step (audience holdout)
- Define eligible audience: who would normally be targeted.
- Create holdout group: a random percentage (e.g., 10–20%) that receives no ads from that campaign.
- Run as usual for treatment: keep creative and bidding stable.
- Primary metric: purchases/revenue/leads per eligible user (not per click).
- Guardrails: overall CPA, customer mix, brand search volume (if relevant).
- Compute incrementality: compare outcomes per eligible user between treatment and holdout.
Beginner pitfall: comparing “ad-attributed conversions” between groups. Holdouts are about total outcomes, not attributed ones.
4) Geo split test (conceptual)
Goal: test incrementality when user-level randomization is hard by splitting by geography (cities, DMAs, regions).
Conceptual setup
- Select matched geos: choose regions with similar baseline performance and seasonality.
- Assign geos: some are treatment (change applied), others are control (no change).
- Run simultaneously: same dates to reduce time-based confounding.
- Primary metric: conversions or revenue per capita / per site visitor from those geos.
- Guardrails: shipping constraints, stockouts, customer support load (geo differences can matter).
Beginner pitfall: picking one “big city” as treatment and one “small city” as control. Mismatched geos can make results misleading.
5) Email subject line test
Goal: improve engagement and downstream actions by testing subject lines.
Step-by-step
- Choose one variable: curiosity vs direct, personalization, urgency, length.
- Randomly split recipients: A gets subject line A, B gets subject line B.
- Keep everything else identical: sender name, send time, email body.
- Primary metric: click-through rate or conversions from email traffic.
- Guardrails: unsubscribe rate, spam complaints.
- Run time: long enough for most opens/clicks to occur (often 24–72 hours depending on your list behavior).
- Decision: send the winning subject line to the remaining audience (if you’re doing a staged send) or use it as the new default for future campaigns.
Beginner pitfall: changing both the subject line and the offer in the email body, then attributing the lift to the subject line.
Test hygiene: rules that prevent misleading results
1) One main change
Design the test so you can explain the result with one sentence: “We changed X, and Y happened.” If you must change multiple things, treat it as a package test and document that you learned about the package, not each component.
2) Pre-define success criteria
Before launching, write:
- Primary metric and required direction (increase/decrease)
- Minimum improvement you care about (e.g., “at least +3% conversion rate lift”)
- Guardrail thresholds (e.g., “unsubscribe rate not worse than +0.1pp”)
- Decision rule (ship, iterate, or stop)
3) Sufficient run time
Run long enough to:
- Cover normal variability (weekday/weekend, paydays, typical cycles)
- Accumulate enough conversions to make the result stable
If volume is low, prefer fewer, larger tests over many tiny tests that never reach clarity.
4) Avoid peeking
“Peeking” means checking results repeatedly and stopping the test the moment it looks good. This increases the chance you ship a false winner. Instead:
- Set a planned end date (or required sample size) in advance.
- Only make decisions at the planned checkpoint(s).
- If you must monitor, monitor for breakage (tracking issues, site errors), not for early wins.
A simple test documentation template (idea → decision)
Copy/paste and fill this in for every test. Keeping a consistent template makes your testing program easier to manage and easier to learn from.
| Section | What to write |
|---|---|
| Test name | Short, searchable name (e.g., “LP Form 6→3 Fields”) |
| Owner | Who is responsible for setup, QA, and decision |
| Background | What problem/opportunity prompted the test (1–2 sentences) |
| Hypothesis | If [change], then [primary metric] will [increase/decrease] for [audience] because [reason]. |
| Type | Landing page A/B, creative test, email subject line, budget holdout, geo split |
| Unit of randomization | User, session, email recipient, region, campaign audience |
| Control | What stays as-is (include screenshots/links if relevant) |
| Treatment | What changes (exactly one main change if possible) |
| Primary metric | Name + definition (numerator/denominator, time window) |
| Guardrail metrics | List + thresholds (what “must not get worse”) |
| Planned duration | Start/end dates or required sample size; note business cycle coverage |
| MDE (high level) | Smallest lift you want to be able to detect; note if test is underpowered |
| Success criteria | “Ship if primary metric improves by at least X and guardrails are within limits.” |
| QA checklist | Tracking verified, variant assignment works, pages load, emails render, budgets applied correctly |
| Results snapshot | Control vs treatment values, absolute difference, % lift |
| Decision | Ship / iterate / stop + what you will do next |
| Learnings | What you learned about users/messages/offers (not just “B won”) |
Example (filled in briefly)
Test name: LP Form 6→3 Fields (Paid Search) Owner: Jamie Type: Landing page A/B Hypothesis: If we reduce the form from 6 fields to 3 for paid search visitors, then signup conversion rate will increase because the effort is lower. Primary metric: Signup conversion rate = signups / unique visitors (same session). Guardrails: Lead qualification rate (must not drop > 2pp), page load time (must not worsen > 200ms). Duration: 14 days (covers two weekends). Success criteria: Ship if +3% or more lift and guardrails OK.