All courses > School Subjects > Statistics ::

Mini Case Study: Updating a Belief with a Simple Beta–Binomial Calculation

Capítulo 4

Estimated reading time: 13 minutes

Mini Case Study Setup: A Conversion Rate You Need to Decide On

Imagine you run a small e-commerce site and you are considering a change to the checkout flow. The change costs money to implement and maintain, so you only want to ship it if the true conversion rate after the change is likely to be above a minimum acceptable level. You ran a small pilot: you exposed n visitors to the new flow and observed x purchases. Your goal is to update your belief about the unknown conversion probability p using a simple Beta–Binomial calculation, then translate that updated belief into a decision-relevant probability such as “What is the probability p exceeds our threshold?”

In this mini case study we will focus on the mechanics and interpretation of the Beta–Binomial update: how to choose a practical prior, how to compute the posterior parameters, how to summarize the posterior with quantities you can act on, and how to do quick sensitivity checks. We will not re-derive general Bayesian terminology; instead we will treat the Beta prior and Binomial data model as a working tool you can apply immediately.

Why Beta–Binomial Is the Workhorse for Yes/No Outcomes

When each trial results in a success or failure (purchase or no purchase, defect or no defect, click or no click), a Binomial model is a natural starting point: x successes out of n independent trials, with success probability p. The Beta distribution is a convenient prior for p because after observing Binomial data, the posterior is also Beta. This “conjugacy” means you can update beliefs by simple arithmetic on two parameters rather than by numerical integration or simulation.

Practically, Beta–Binomial is useful because it gives you: (1) a full distribution for p, not just a point estimate; (2) a transparent way to encode prior information as pseudo-counts; and (3) closed-form expressions for many summaries (mean, variance) and easy computation for others (tail probabilities, quantiles) using standard statistical functions.

Case Study Data: A Small Pilot with Sparse Evidence

Suppose the pilot exposed n = 200 visitors to the new checkout flow and observed x = 18 purchases. The raw conversion rate is 18/200 = 0.09 (9%). The question is not “Is 9% good?” but “Given this limited sample, what do we believe about the true conversion probability p, and how likely is it to exceed our business threshold?”

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Assume the business rule is: ship the change only if there is at least an 80% probability that p is above 8% (0.08). This is a concrete decision criterion that can be computed directly from the posterior distribution.

Step 1: Choose a Prior That Matches Your Situation

The Beta distribution is parameterized by two positive numbers, often written as α (alpha) and β (beta). A helpful interpretation is that α − 1 and β − 1 behave like prior counts of successes and failures, though you should treat them as an intuition rather than a literal dataset. The prior mean is α/(α+β), and the strength (how concentrated it is) grows with α+β.

Option A: Weakly Informative Prior (Let the Pilot Speak)

If you want the pilot to dominate and you do not have strong prior knowledge, a common choice is Beta(1, 1), which is uniform on [0, 1]. It corresponds to essentially no preference for any conversion rate. Another weak choice is Beta(0.5, 0.5), which is slightly more spread out near 0 and 1, but Beta(1,1) is often easier to explain to stakeholders.

Option B: Prior from Historical Baseline (Encode What You Already Know)

Suppose the current checkout flow historically converts at about 7%, and you have seen enough traffic over time that you feel moderately confident the baseline is around that value. You might encode this as a Beta prior with mean 0.07 and an “effective sample size” m representing how many pseudo-observations you want the prior to contribute. A practical construction is: α = 0.07·m and β = 0.93·m. If you pick m = 50, then α = 3.5 and β = 46.5. This prior says: before seeing the pilot, you believe p is around 7%, with about the weight of 50 trials.

In this case study we will compute results under both priors to show how the update works and how sensitive the decision is to reasonable prior choices.

Step 2: Apply the Beta–Binomial Update (The Arithmetic)

If the prior is Beta(α, β) and you observe x successes out of n trials, then the posterior is Beta(α + x, β + n − x). That is the entire update rule.

Posterior with Weak Prior Beta(1, 1)

Prior: α = 1, β = 1. Data: x = 18, n = 200, so failures are n − x = 182. Posterior parameters are: α_post = 1 + 18 = 19 and β_post = 1 + 182 = 183. So p | data ~ Beta(19, 183).

Posterior with Baseline Prior Beta(3.5, 46.5)

Prior: α = 3.5, β = 46.5. Posterior parameters are: α_post = 3.5 + 18 = 21.5 and β_post = 46.5 + 182 = 228.5. So p | data ~ Beta(21.5, 228.5).

Notice what happened: the data added 18 to α and 182 to β. This is why people call α and β “pseudo-counts”: the update looks like adding observed successes and failures to prior successes and failures.

Step 3: Compute Actionable Posterior Summaries

Once you have Beta(α_post, β_post), you can compute summaries that map directly to decisions. The most common are the posterior mean (a smoothed estimate), the posterior variance (uncertainty), and tail probabilities such as P(p > threshold).

Posterior Mean (A Smoothed Conversion Estimate)

The posterior mean is α_post/(α_post + β_post). With Beta(19,183), the mean is 19/(19+183) = 19/202 ≈ 0.0941 (9.41%). With Beta(21.5,228.5), the mean is 21.5/(21.5+228.5) = 21.5/250 = 0.086 (8.6%).

These are both close to the observed 9%, but they differ because the baseline prior pulls the estimate toward 7%. This is not “bias” in a bad sense; it is the explicit incorporation of prior information. The key is whether that prior information is appropriate and whether your decision is robust to it.

Posterior Variance (How Uncertain Are We?)

The Beta variance is (αβ)/[(α+β)^2(α+β+1)]. For Beta(19,183), the variance is (19·183)/[(202^2)(203)]. For Beta(21.5,228.5), it is (21.5·228.5)/[(250^2)(251)]. You do not need to compute these by hand in practice, but it helps to know what drives them: uncertainty shrinks as α+β grows. Here α+β is about 202 or 250, so the posterior is reasonably concentrated, but still wide enough that tail probabilities matter.

Decision Probability: P(p > 0.08)

Your shipping rule is: ship if P(p > 0.08 | data) ≥ 0.80. For a Beta distribution, this probability is 1 − F(0.08), where F is the Beta cumulative distribution function. In software, this is typically computed via a regularized incomplete beta function. Conceptually, you are measuring how much posterior mass lies above your threshold.

For Beta(19,183), you would compute P(p > 0.08) = 1 − BetaCDF(0.08; 19, 183). For Beta(21.5,228.5), compute 1 − BetaCDF(0.08; 21.5, 228.5). The exact numeric values depend on the CDF evaluation, but the workflow is the same: (1) compute posterior parameters, (2) evaluate the tail probability at the threshold, (3) compare to your decision rule.

Even without exact numbers, you can reason about direction: the weak prior posterior mean is ~9.4%, comfortably above 8%, so P(p > 0.08) is likely fairly high. The baseline prior posterior mean is ~8.6%, only slightly above 8%, so P(p > 0.08) will be lower. Whether it crosses 0.80 is an empirical question you answer with the Beta CDF.

Step 4: Do the Calculation in Practice (Spreadsheet or Code)

In real work you will compute Beta CDFs and quantiles using a tool. Below are minimal examples showing the exact steps. The important part is that the only “Bayesian” arithmetic you do manually is updating α and β; everything else is standard distribution functions.

Python (SciPy) Step-by-Step

from scipy.stats import beta  # Beta distribution

# Data

n = 200

x = 18

threshold = 0.08

# Prior A: weak Beta(1,1)

a0, b0 = 1, 1

a_post = a0 + x

b_post = b0 + (n - x)

p_mean = a_post / (a_post + b_post)

prob_above = 1 - beta.cdf(threshold, a_post, b_post)

# Optional: 90% credible interval endpoints (quantiles)

ci_low, ci_high = beta.ppf([0.05, 0.95], a_post, b_post)

print(a_post, b_post, p_mean, prob_above, ci_low, ci_high)

# Prior B: baseline mean 0.07 with strength m=50

m = 50

a0, b0 = 0.07*m, 0.93*m

a_post = a0 + x

b_post = b0 + (n - x)

p_mean = a_post / (a_post + b_post)

prob_above = 1 - beta.cdf(threshold, a_post, b_post)

ci_low, ci_high = beta.ppf([0.05, 0.95], a_post, b_post)

print(a_post, b_post, p_mean, prob_above, ci_low, ci_high)

This code produces: posterior parameters, posterior mean, probability above the threshold, and a central 90% interval. You can swap 0.05 and 0.95 for other quantiles if you need different bounds.

R Step-by-Step

# Data

n <- 200

x <- 18

threshold <- 0.08

# Prior A: Beta(1,1)

a0 <- 1; b0 <- 1

a_post <- a0 + x

b_post <- b0 + (n - x)

p_mean <- a_post / (a_post + b_post)

prob_above <- 1 - pbeta(threshold, a_post, b_post)

ci <- qbeta(c(0.05, 0.95), a_post, b_post)

c(a_post=a_post, b_post=b_post, mean=p_mean, prob_above=prob_above, ci_low=ci[1], ci_high=ci[2])

# Prior B: baseline mean 0.07, strength m=50

m <- 50

a0 <- 0.07*m; b0 <- 0.93*m

a_post <- a0 + x

b_post <- b0 + (n - x)

p_mean <- a_post / (a_post + b_post)

prob_above <- 1 - pbeta(threshold, a_post, b_post)

ci <- qbeta(c(0.05, 0.95), a_post, b_post)

c(a_post=a_post, b_post=b_post, mean=p_mean, prob_above=prob_above, ci_low=ci[1], ci_high=ci[2])

In R, pbeta gives the CDF and qbeta gives quantiles. The logic mirrors the Python version exactly.

Step 5: Translate the Posterior into a Shipping Decision

Now apply the rule: ship if P(p > 0.08) ≥ 0.80. The computation produces a single number, prob_above, which you compare to 0.80. This is often easier to communicate than a p-value because it directly answers the operational question.

If the weak prior yields prob_above above 0.80 but the baseline prior yields prob_above below 0.80, you have learned something important: the decision is sensitive to prior assumptions. That does not mean the method failed; it means the pilot is not strong enough to overcome reasonable disagreement about the baseline. In that case, you can either (1) collect more data, (2) revisit the prior strength m, or (3) adjust the decision threshold to reflect the cost of being wrong.

Step 6: A Quick Sensitivity Check Using Prior Strength

A practical way to stress-test your decision is to vary the prior strength m while keeping the prior mean fixed. For example, keep the baseline mean at 7% but try m = 10, 50, 200. This answers: “If we are only slightly confident in the baseline, do we ship? What if we are very confident?”

Construct priors as α = 0.07·m and β = 0.93·m, then update with the same data. As m increases, the posterior mean will be pulled closer to 7% and the posterior will become more concentrated around that baseline, making it harder for a small pilot to push P(p > 0.08) above 0.80. As m decreases, the posterior approaches what you get from the weak prior, and the pilot dominates.

# Example loop (Python)

from scipy.stats import beta

n, x = 200, 18

threshold = 0.08

for m in [10, 50, 200]:

    a0, b0 = 0.07*m, 0.93*m

    a_post, b_post = a0 + x, b0 + (n - x)

    prob_above = 1 - beta.cdf(threshold, a_post, b_post)

    mean = a_post / (a_post + b_post)

    print(m, a_post, b_post, mean, prob_above)

This sensitivity check is not academic; it is a practical negotiation tool. You can show stakeholders how much the decision depends on how strongly they believe the baseline is around 7%.

Step 7: Interpreting the Posterior as “Effective Counts”

Because the Beta prior and posterior can be read as pseudo-counts, you can explain the update in plain terms. Under Beta(1,1), the posterior Beta(19,183) can be described as “as if we had seen 18 successes and 182 failures, plus one success and one failure for smoothing.” Under the baseline prior Beta(3.5,46.5), the posterior Beta(21.5,228.5) is “as if we started with about 3.5 successes and 46.5 failures worth of belief, then added the pilot outcomes.”

This framing helps when someone asks, “How much is the prior influencing the result?” You can answer with a number: the prior contributes m pseudo-trials. If m = 50, then the pilot’s 200 trials still dominate, but the prior is not negligible.

Step 8: A Second Mini Scenario: Defect Rate with a Safety Threshold

To reinforce the mechanics, consider a manufacturing example where p is a defect probability. Suppose you inspect n = 40 units from a new supplier and find x = 1 defect. You will accept the supplier if there is at least a 90% probability that the defect rate is below 5% (p < 0.05). This is the same Beta–Binomial update, just with a “less than” threshold.

Pick a weak prior Beta(1,1). Posterior is Beta(1+1, 1+39) = Beta(2,40). The acceptance probability is P(p < 0.05) = BetaCDF(0.05; 2, 40). If that probability is at least 0.90, accept; otherwise, request more inspection or reject.

If you have historical supplier performance suggesting a typical defect rate around 2% with moderate confidence, you might use a prior mean 0.02 with strength m = 100: α = 2, β = 98. Then posterior becomes Beta(3, 137). Compute P(p < 0.05) = BetaCDF(0.05; 3, 137). Again, the update is just adding observed defects and non-defects to α and β.

Step 9: Common Implementation Pitfalls (and How to Avoid Them)

Mixing Up α and β

α corresponds to successes and β to failures when you use the update α + x and β + (n − x). If you accidentally swap them, your posterior mean becomes (failures)/(total) and everything flips. A quick sanity check is: if x is small, the posterior mean should be small; if x is large, it should be large.

Using a Prior That Is Too Strong Without Realizing It

If you set m to a very large number, the prior will dominate even large pilots. This can be appropriate (for example, when you truly have extensive historical data that is comparable), but it should be explicit. Always report α+β as “prior effective sample size” so the team can see the weight you are giving it.

Forgetting That the Binomial Model Assumes Identical Trials

The Binomial likelihood assumes each trial has the same p and that trials are independent. In conversion settings, p may vary by traffic source, device type, or day. If the pilot mixes heterogeneous traffic, the Beta–Binomial posterior is still a useful summary, but you should be cautious: the “true p” you are estimating is an average over a mixture. If heterogeneity is large, consider stratifying the data (compute separate posteriors by segment) before aggregating decisions.

Confusing the Posterior Mean with the Decision Probability

A posterior mean above 8% does not automatically imply P(p > 0.08) is high. With small n, the distribution can be wide and still place substantial mass below the threshold. Always compute the tail probability that matches your decision rule.

Step 10: A Compact Checklist You Can Reuse

Define the unknown rate p and the decision threshold (e.g., p > 0.08) plus the required confidence (e.g., 80%).
Choose a Beta(α, β) prior: specify prior mean and prior strength m = α+β if using the mean-strength parameterization.
Collect data: x successes out of n trials.
Update: α_post = α + x, β_post = β + (n − x).
Compute decision probability: for “greater than,” use 1 − BetaCDF(threshold; α_post, β_post); for “less than,” use BetaCDF(threshold; α_post, β_post).
Run a sensitivity check by varying m (and possibly the prior mean) within a reasonable range.
Document α, β, n, x, and the resulting decision probability so the update is auditable.

Now answer the exercise about the content:

In a Beta–Binomial decision setup, which quantity should be compared to a rule like ship only if there is at least an 80% probability that p exceeds 0.08?

You are right! Congratulations, now go to the next page

You missed! Try again.

The decision rule is defined in terms of a probability statement about p. You compute the posterior, then evaluate the tail probability above the threshold (for greater-than rules: 1 minus the Beta CDF) and compare it to 0.80.