All courses > Business and Marketing > Digital Marketing ::

Experimentation and Incrementality: Testing What Actually Works

Capítulo 9

Estimated reading time: 10 minutes

Why experimentation (incrementality) complements attribution

Attribution helps you assign credit for conversions across touchpoints. Experimentation helps you answer a different question: did this change cause additional results that would not have happened otherwise? That “additional results” idea is called incrementality.

Incrementality testing is especially useful when:

Multiple channels touch the same people and attribution feels “too good to be true.”
You are changing something (a page, creative, budget) and want causal proof.
You need confidence to scale spend or roll back a change.

The core logic is simple: compare a treatment group (exposed to the change) to a control group (not exposed), while keeping everything else as similar as possible.

Step 1: Write a clear hypothesis

A good hypothesis is specific, measurable, and tied to a user action. Use this structure:

Change: what you will do differently
Audience/scope: who/where it applies
Expected impact: direction and (optionally) size
Primary metric: what will determine success

Examples

Landing page: “If we shorten the signup form from 6 fields to 3 fields for paid search traffic, then signup conversion rate will increase.”
Email: “If we use a benefit-led subject line for trial users, then email click-through rate will increase.”
Budget: “If we increase Brand Search spend by 20% in selected regions, then total purchases will increase versus similar regions with no change.”

Keep hypotheses focused on one change. If you change multiple things at once, you may learn that performance changed, but not why.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Step 2: Choose a primary metric (and define it)

Your primary metric is the single number that decides whether the test “wins.” Pick a metric that is:

Closest to value (e.g., purchases, qualified leads)
Sensitive enough to move during the test window
Hard to game (avoid metrics that can rise while real outcomes fall)

Common primary metrics by test type

Landing page A/B: conversion rate to signup/purchase, or revenue per visitor
Creative test: cost per acquisition (CPA) or conversion rate from click to purchase (if you can measure it reliably)
Email subject line: click-through rate (CTR) or downstream conversions from email traffic
Budget holdout / geo split: incremental purchases, incremental revenue, or incremental leads

Write the metric definition in plain language (what counts, what time window, what population). Example: “Primary metric = purchases within 7 days of first session during the test period.”

Step 3: Set up control vs treatment

Every incrementality test needs a comparison. The best comparison depends on what you can randomize.

Control vs treatment basics

Treatment: users/regions/campaigns that receive the change
Control: users/regions/campaigns that do not receive the change
Randomization: assign units to control/treatment randomly when possible (reduces bias)

In many digital tests, the “unit” is a user (A/B testing). In other cases, the unit might be a region (geo split) or a campaign budget (holdout).

What you compare

At the end, you compare the primary metric between groups. The difference is your estimated incremental lift:

Incremental lift = (Treatment outcome) - (Control outcome)

Sometimes you’ll express it as a percent:

% lift = (Treatment - Control) / Control

Step 4 (high level): Minimum Detectable Effect (MDE)

Minimum Detectable Effect is the smallest lift your test is likely to detect as “real” given your traffic/volume and how noisy the metric is. You don’t need advanced statistics to use the concept:

If your MDE is 5%, a true lift of 1% will probably look like random fluctuation.
If you expect only a small improvement, you typically need more volume (more visitors, more emails, more time) to detect it.

Use MDE as a planning check:

Low volume? Choose bigger changes, longer run time, or a more sensitive metric (e.g., conversion rate instead of revenue if revenue is too sparse).
High volume? You can test smaller improvements and still detect them.

Practical rule of thumb: if you can’t realistically run long enough to detect the lift you care about, redesign the test (bigger change, broader audience) before launching.

Step 5: Add guardrail metrics (to prevent “winning the wrong way”)

Guardrail metrics are secondary metrics that must not get worse beyond an acceptable threshold. They protect against unintended consequences.

Examples of guardrails

Landing page test: page load time, bounce rate, refund rate, lead quality rate
Creative test: frequency, negative feedback rate, unsubscribe rate (if applicable)
Email subject line: unsubscribe rate, spam complaint rate
Budget increase: CPA, conversion rate, return rate, customer support contacts

Guardrails should be defined with a clear “do not exceed” rule (e.g., “unsubscribe rate must not increase by more than 0.1 percentage points”).

Practical test types (with beginner-friendly steps)

1) Landing page A/B test

Goal: isolate whether a page change improves conversions.

Step-by-step

Pick one main change: headline, form length, CTA copy, layout, trust badges, pricing display.
Create Variant A (control): current page.
Create Variant B (treatment): page with the single change.
Split traffic: typically 50/50 for speed and simplicity.
Primary metric: conversion rate (e.g., signups/visitors) or revenue per visitor.
Guardrails: page speed, downstream quality (e.g., % qualified leads).
Run time: run through at least one full business cycle (often 1–2 weeks) so weekday/weekend behavior is represented.
Decision: ship if primary metric improves and guardrails stay within limits.

Beginner pitfall: changing both the headline and the form and the hero image at once. If it wins, you won’t know what caused it.

2) Creative testing (ads)

Goal: learn which message or visual drives better outcomes at similar delivery conditions.

Step-by-step

Choose one creative variable: hook, offer, image style, call-to-action, length.
Keep targeting and budget consistent: avoid changing audience and creative simultaneously.
Run creatives in parallel: same time window to avoid seasonality effects.
Primary metric: ideally conversions or CPA; if conversion volume is low, use a closer proxy (e.g., landing page view rate) but treat it as directional.
Guardrails: frequency, negative feedback, conversion rate from click to conversion (to avoid clickbait).
Decision: promote the winner and archive losers; document what message angle worked.

Beginner pitfall: judging creative only by click-through rate. A creative can attract clicks that don’t convert.

3) Budget holdout test (incrementality of spend)

Goal: estimate how much incremental value a channel/campaign produces by holding out spend for a portion of the audience or time.

Two common approaches

Audience holdout: exclude a random slice of eligible users from seeing ads (control), while the rest can be targeted (treatment).
Time-based holdout: pause or reduce spend during selected periods and compare to similar periods (less ideal because time effects can confound results).

Step-by-step (audience holdout)

Define eligible audience: who would normally be targeted.
Create holdout group: a random percentage (e.g., 10–20%) that receives no ads from that campaign.
Run as usual for treatment: keep creative and bidding stable.
Primary metric: purchases/revenue/leads per eligible user (not per click).
Guardrails: overall CPA, customer mix, brand search volume (if relevant).
Compute incrementality: compare outcomes per eligible user between treatment and holdout.

Beginner pitfall: comparing “ad-attributed conversions” between groups. Holdouts are about total outcomes, not attributed ones.

4) Geo split test (conceptual)

Goal: test incrementality when user-level randomization is hard by splitting by geography (cities, DMAs, regions).

Conceptual setup

Select matched geos: choose regions with similar baseline performance and seasonality.
Assign geos: some are treatment (change applied), others are control (no change).
Run simultaneously: same dates to reduce time-based confounding.
Primary metric: conversions or revenue per capita / per site visitor from those geos.
Guardrails: shipping constraints, stockouts, customer support load (geo differences can matter).

Beginner pitfall: picking one “big city” as treatment and one “small city” as control. Mismatched geos can make results misleading.

5) Email subject line test

Goal: improve engagement and downstream actions by testing subject lines.

Step-by-step

Choose one variable: curiosity vs direct, personalization, urgency, length.
Randomly split recipients: A gets subject line A, B gets subject line B.
Keep everything else identical: sender name, send time, email body.
Primary metric: click-through rate or conversions from email traffic.
Guardrails: unsubscribe rate, spam complaints.
Run time: long enough for most opens/clicks to occur (often 24–72 hours depending on your list behavior).
Decision: send the winning subject line to the remaining audience (if you’re doing a staged send) or use it as the new default for future campaigns.

Beginner pitfall: changing both the subject line and the offer in the email body, then attributing the lift to the subject line.

Test hygiene: rules that prevent misleading results

1) One main change

Design the test so you can explain the result with one sentence: “We changed X, and Y happened.” If you must change multiple things, treat it as a package test and document that you learned about the package, not each component.

2) Pre-define success criteria

Before launching, write:

Primary metric and required direction (increase/decrease)
Minimum improvement you care about (e.g., “at least +3% conversion rate lift”)
Guardrail thresholds (e.g., “unsubscribe rate not worse than +0.1pp”)
Decision rule (ship, iterate, or stop)

3) Sufficient run time

Run long enough to:

Cover normal variability (weekday/weekend, paydays, typical cycles)
Accumulate enough conversions to make the result stable

If volume is low, prefer fewer, larger tests over many tiny tests that never reach clarity.

4) Avoid peeking

“Peeking” means checking results repeatedly and stopping the test the moment it looks good. This increases the chance you ship a false winner. Instead:

Set a planned end date (or required sample size) in advance.
Only make decisions at the planned checkpoint(s).
If you must monitor, monitor for breakage (tracking issues, site errors), not for early wins.

A simple test documentation template (idea → decision)

Copy/paste and fill this in for every test. Keeping a consistent template makes your testing program easier to manage and easier to learn from.

Section	What to write
Test name	Short, searchable name (e.g., “LP Form 6→3 Fields”)
Owner	Who is responsible for setup, QA, and decision
Background	What problem/opportunity prompted the test (1–2 sentences)
Hypothesis	If [change], then [primary metric] will [increase/decrease] for [audience] because [reason].
Type	Landing page A/B, creative test, email subject line, budget holdout, geo split
Unit of randomization	User, session, email recipient, region, campaign audience
Control	What stays as-is (include screenshots/links if relevant)
Treatment	What changes (exactly one main change if possible)
Primary metric	Name + definition (numerator/denominator, time window)
Guardrail metrics	List + thresholds (what “must not get worse”)
Planned duration	Start/end dates or required sample size; note business cycle coverage
MDE (high level)	Smallest lift you want to be able to detect; note if test is underpowered
Success criteria	“Ship if primary metric improves by at least X and guardrails are within limits.”
QA checklist	Tracking verified, variant assignment works, pages load, emails render, budgets applied correctly
Results snapshot	Control vs treatment values, absolute difference, % lift
Decision	Ship / iterate / stop + what you will do next
Learnings	What you learned about users/messages/offers (not just “B won”)

Example (filled in briefly)

Test name: LP Form 6→3 Fields (Paid Search)  Owner: Jamie  Type: Landing page A/B  Hypothesis: If we reduce the form from 6 fields to 3 for paid search visitors, then signup conversion rate will increase because the effort is lower.  Primary metric: Signup conversion rate = signups / unique visitors (same session).  Guardrails: Lead qualification rate (must not drop > 2pp), page load time (must not worsen > 200ms).  Duration: 14 days (covers two weekends).  Success criteria: Ship if +3% or more lift and guardrails OK.

Now answer the exercise about the content:

In an incrementality test, what best represents the estimated incremental lift?

You are right! Congratulations, now go to the next page

You missed! Try again.

Incremental lift is estimated by comparing the primary metric between treatment (exposed to the change) and control (not exposed). The difference, Treatment minus Control, represents the additional results caused by the change.