Free Ebook cover Practical Bayesian Statistics for Real-World Decisions: From Intuition to Implementation

Practical Bayesian Statistics for Real-World Decisions: From Intuition to Implementation

New course

28 pages

Interpreting Uncertainty: Credible Intervals and Decision-Relevant Probabilities

Capítulo 3

Estimated reading time: 0 minutes

+ Exercise

What a Credible Interval Actually Means

A Bayesian credible interval is a range of parameter values that contains a specified fraction of the posterior probability. If you report a 95% credible interval for a parameter θ as [a, b], you are saying: given your model and the observed data, the posterior probability that θ lies between a and b is 0.95. This is a probability statement about the unknown quantity itself, conditional on the assumptions you made (model structure, data quality, and prior choices already established earlier in the course).

This interpretation is often the main practical advantage of Bayesian uncertainty summaries: you can directly talk about “how likely” a parameter is to be in a range, and you can use that probability in decision rules. In real-world decisions, you rarely need a single best estimate; you need to understand what values are plausible and how much probability mass lies in regions that trigger different actions.

Credible Intervals vs. Confidence Intervals (Only What You Need for Decisions)

In practice, people often confuse Bayesian credible intervals with frequentist confidence intervals because both look like “a range with 95%.” For decision-making, the key difference is interpretability: a 95% credible interval is a 95% probability statement about the parameter given the data and model; a 95% confidence interval is a procedure that, over repeated hypothetical samples, would cover the true parameter 95% of the time. When you are deciding whether to ship a product, change a policy, or allocate budget, you typically want a probability statement about the unknown right now, not a long-run coverage guarantee of a method.

That said, you should treat credible intervals as conditional on your modeling assumptions. If the model is misspecified, the interval can be misleading. Decision-relevant uncertainty is not just about computing an interval; it is about ensuring the interval reflects the uncertainties that matter (measurement error, selection bias, unmodeled heterogeneity) to the extent your model captures them.

Two Common Types of Credible Intervals: Equal-Tailed and Highest Posterior Density

Equal-tailed credible interval

An equal-tailed 95% credible interval places 2.5% of posterior probability below the lower bound and 2.5% above the upper bound. It is defined by posterior quantiles: [q0.025, q0.975]. This is easy to compute from posterior samples and is stable across many problems.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

Highest posterior density (HPD) interval

An HPD (or HDI) interval is the narrowest interval containing 95% posterior probability. Every point inside the interval has posterior density at least as high as any point outside. HPD intervals are often shorter than equal-tailed intervals when the posterior is skewed or multimodal, but they can be harder to compute robustly and can be unintuitive when the posterior has multiple separated peaks (the “interval” might not be a single contiguous range unless you allow sets).

For decision-making, the choice matters when the posterior is asymmetric or when the decision boundary is near a tail. Equal-tailed intervals are often preferred for communication; HPD intervals are often preferred when you want the tightest uncertainty summary. If you are going to convert uncertainty into an action, you usually need more than an interval anyway: you need a probability of exceeding a threshold, or an expected loss calculation.

Step-by-Step: Computing a Credible Interval from Posterior Samples

In applied Bayesian work, you often have posterior draws from MCMC or another sampling method. Credible intervals then become simple operations on these draws. The steps below apply to any parameter: a mean difference, a conversion rate, a risk ratio, a time-to-failure parameter, or a predicted outcome.

Step 1: Collect posterior draws

Assume you have S posterior samples θ(1), θ(2), …, θ(S). These are draws from p(θ | data). You may have them from a probabilistic programming tool or from an analytic posterior you sampled directly.

Step 2: Choose the interval type and level

Pick a probability mass, commonly 0.90, 0.95, or 0.99. Higher levels communicate more caution but produce wider ranges. Decide whether you want equal-tailed quantiles or an HPD interval.

Step 3: Compute the bounds

For an equal-tailed 95% interval, compute the 2.5th and 97.5th percentiles of the samples. For an HPD interval, sort samples and find the shortest window containing 95% of them.

Step 4: Report with context

Report the interval alongside a point summary (often posterior mean or median) and, crucially, interpret it in the language of the decision. If the decision depends on whether θ exceeds a threshold, also report that probability directly.

# Pseudocode for equal-tailed credible interval from posterior draws theta[1:S]  level = 0.95  alpha = (1 - level) / 2  lower = quantile(theta, alpha)  upper = quantile(theta, 1 - alpha)  median = quantile(theta, 0.5)  prob_above_0 = mean(theta > 0)  

Decision-Relevant Probabilities: Go Beyond “Is Zero in the Interval?”

A common but weak decision habit is to look at whether a credible interval includes a reference value (like 0 for a difference or 1 for a ratio) and then declare “effect” or “no effect.” Real decisions are rarely binary in that way. What you usually need is a probability that the effect is large enough to matter, or a probability that a risk exceeds a safety limit, or the probability that a new option is better than the current one by a practically meaningful margin.

Decision-relevant probabilities are posterior probabilities of events that map directly to actions. Examples include: P(θ > 0), P(θ > δ), P(risk < limit), P(ROI > 1.2), or P(metric improves by at least 2%). These are computed directly from posterior draws by counting how many draws satisfy the condition.

Step-by-step: Probability of exceeding a practical threshold

Suppose θ is the lift in conversion rate (in percentage points) from a new checkout flow relative to the old one. Your business might only care if lift exceeds δ = 0.5 percentage points, because smaller gains do not cover engineering and rollout costs.

  • Define the event: θ > 0.5.
  • Compute: P(θ > 0.5 | data) ≈ (1/S) Σ I(θ(s) > 0.5).
  • Use it in a rule: ship if this probability exceeds a chosen decision threshold, such as 0.9.
# Pseudocode  delta = 0.5  prob_practically_positive = mean(theta > delta)  

This approach is more aligned with real-world constraints than simply checking whether 0 lies inside a 95% interval. It also forces you to specify what “meaningful” means, which is a decision requirement, not a statistical one.

Worked Example: Launch Decision with a Minimum Effect Size

Imagine you ran an experiment comparing a new onboarding flow (B) to the current flow (A). Let θ be the difference in retention at day 7: retention(B) − retention(A), measured in percentage points. After fitting your model, you have posterior draws for θ.

Summaries you might compute

  • Posterior median of θ: 0.8 percentage points.
  • 95% credible interval: [−0.1, 1.7].
  • P(θ > 0): 0.96.
  • P(θ > 0.5): 0.78.

Notice how these numbers support different narratives. The interval includes slightly negative values, so a simplistic “interval includes zero” check might discourage shipping. But P(θ > 0) is high, suggesting improvement is likely. Yet the probability of exceeding the minimum meaningful lift (0.5) is only 0.78, which might be below your organization’s bar for rollout.

Decision-relevant interpretation could be: “There is a 96% chance retention improved, but only a 78% chance the improvement is at least 0.5 points. If rollout is costly or risky, we may want more evidence; if rollout is cheap and reversible, we might proceed.” The same posterior supports different actions depending on costs, reversibility, and risk tolerance.

Credible Intervals for Predictions (Not Just Parameters)

Many decisions depend on future outcomes, not on a parameter in isolation. Bayesian analysis naturally supports predictive uncertainty: you can compute credible intervals for predicted quantities such as next month’s demand, the number of incidents next week, or the expected revenue under a policy change. These are often more decision-relevant than parameter intervals because they incorporate both parameter uncertainty and outcome variability.

For example, suppose you forecast weekly demand. A credible interval for the mean demand might be narrow, but a predictive interval for actual demand will be wider because it includes randomness in outcomes. If you are deciding inventory levels, staffing, or safety stock, you need the predictive distribution, not just the uncertainty about the mean.

Step-by-step: Predictive interval from posterior predictive draws

  • For each posterior draw of parameters, simulate a future outcome ỹ(s) from the model.
  • Collect S simulated outcomes ỹ(1), …, ỹ(S).
  • Compute quantiles of ỹ to form a predictive credible interval (often called a posterior predictive interval).
  • Compute probabilities of operational events: P(ỹ > capacity), P(stockout), P(wait time > SLA).
# Pseudocode  # given posterior draws of parameters, generate posterior predictive draws y_tilde[1:S]  lower = quantile(y_tilde, 0.05)  upper = quantile(y_tilde, 0.95)  prob_exceed_capacity = mean(y_tilde > capacity)  

Choosing the Probability Level: 90%, 95%, 99% Is a Decision Choice

The level of a credible interval is not a moral standard; it is a communication and decision choice. Wider intervals (99%) emphasize caution and worst-case planning; narrower intervals (90%) emphasize typical ranges and may be more useful for fast iteration. The right level depends on the cost of being wrong and the asymmetry of consequences.

In operational settings, you might use different levels for different decisions. For example, you might plan staffing using a 90% predictive interval (accepting occasional overload) but set safety constraints using a 99% interval (rarely tolerating violations). The Bayesian workflow makes it straightforward to compute any level once you have posterior samples.

One-Sided Credible Bounds for Safety and Compliance

Many decisions are one-sided: you care that a failure rate is below a limit, that a toxicity probability is below a threshold, or that latency is under an SLA. In those cases, a one-sided credible bound is often clearer than a two-sided interval.

A 95% upper credible bound u for a risk r means P(r ≤ u | data) = 0.95. If your policy is “approve only if the 95% upper bound is below the regulatory limit,” you have a transparent, probability-based rule. Similarly, a 95% lower bound can support decisions that require minimum performance.

Step-by-step: Compute a one-sided bound from posterior draws

  • Upper bound at 95%: u = q0.95 of the posterior draws.
  • Lower bound at 95%: l = q0.05 of the posterior draws.
  • Compare to a limit: approve if u < limit (for safety) or if l > target (for performance).
# Pseudocode  upper_95 = quantile(risk_draws, 0.95)  approve = (upper_95 < risk_limit)  

From Intervals to Decisions: Expected Loss and Utility (Practical Framing)

Credible intervals summarize uncertainty, but decisions require a criterion. A practical Bayesian approach is to compute expected loss (or expected utility) under each action using the posterior. You do not need a complicated framework to start; you can define a simple cost function that reflects what you care about.

Suppose action A is “do nothing” and action B is “roll out the change.” Let θ be the true lift in revenue per user. If rollout costs C and the benefit is proportional to θ, your net gain might be G = k·θ − C. Under the posterior, you can compute P(G > 0) and E[G]. You might choose the action with higher expected gain, or require that the probability of loss is below a tolerance.

Step-by-step: Expected net gain from posterior draws

  • For each posterior draw θ(s), compute gain G(s) = k·θ(s) − C.
  • Estimate expected gain: E[G] ≈ mean(G(s)).
  • Estimate risk of loss: P(G < 0) ≈ mean(G(s) < 0).
  • Decide using a rule aligned with your risk tolerance.
# Pseudocode  G = k * theta - C  expected_gain = mean(G)  prob_loss = mean(G < 0)  

This connects uncertainty directly to business or policy outcomes. Notice that an interval alone does not tell you expected gain; two posteriors can have the same 95% interval but very different tail behavior and therefore different downside risk.

Communicating Uncertainty to Stakeholders: Use Plain-Language Probability Statements

Credible intervals are useful, but many stakeholders make better decisions when you translate them into probability statements tied to outcomes. Instead of saying “the 95% credible interval is [−0.1, 1.7],” you can say “there is a 96% chance the change improves retention, and a 78% chance it improves retention by at least 0.5 points.” This is still rigorous, but it is easier to act on.

A practical communication pattern is to present three layers: a point estimate (median), an interval (e.g., 90% or 95%), and one or two decision-relevant probabilities (exceeding a meaningful threshold, violating a constraint, or beating the status quo). This keeps uncertainty visible without forcing non-technical readers to interpret quantiles.

Common Pitfalls When Interpreting Credible Intervals

Pitfall 1: Treating the interval as a guarantee

A 95% credible interval does not guarantee the true value is inside; it states posterior probability under your model. If important sources of uncertainty are missing (unmodeled confounding, data drift, measurement bias), the interval can be overconfident. In decision contexts, overconfidence is often more harmful than imprecision.

Pitfall 2: Using “includes zero” as the decision rule

Whether 0 is inside a 95% interval is not a decision criterion unless your utility function is aligned with that exact rule. Most decisions depend on effect size, costs, and downside risk. Replace “includes zero” with probabilities of exceeding meaningful thresholds or with expected loss calculations.

Pitfall 3: Ignoring asymmetry and tail risk

Two distributions can share the same median and similar intervals but differ in tail probability. If rare negative outcomes are costly (safety, compliance, reputational risk), you should compute explicit tail probabilities such as P(θ < −δ) or P(loss > L). Credible intervals can hide tail structure if you only report one range.

Pitfall 4: Confusing parameter uncertainty with outcome uncertainty

A credible interval for a parameter (like the average effect) is not the same as the uncertainty in future outcomes. For planning and operations, you often need posterior predictive uncertainty, which is typically wider. If you staff based on a parameter interval, you may systematically under-prepare for variability.

Checklist: Turning Posterior Uncertainty into Actionable Numbers

When you finish a Bayesian analysis and need to support a decision, you can use the following checklist to ensure you are extracting decision-relevant uncertainty rather than just reporting a generic interval.

  • Compute a credible interval for the key quantity (parameter or prediction) at a level appropriate to the decision.
  • Define a practical threshold δ (minimum effect size, safety limit, SLA) and compute the posterior probability of meeting it.
  • Compute a downside probability for the main risk (e.g., probability of harm, probability of loss, probability of exceeding capacity).
  • If possible, compute expected loss or expected net gain under each action using posterior draws.
  • Report results in plain language: “probability of benefit,” “probability of meeting target,” and “probability of violating constraint,” alongside the interval.

Now answer the exercise about the content:

Which statement best explains why decision-making often benefits from computing P(θ > δ) in addition to reporting a 95% credible interval?

You are right! Congratulations, now go to the next page

You missed! Try again.

A credible interval summarizes where posterior mass lies, but actions often require the probability that the effect clears a meaningful cutoff (for costs, safety, or targets). Computing P(θ > δ) maps uncertainty directly to the decision rule.

Next chapter

Mini Case Study: Updating a Belief with a Simple Beta–Binomial Calculation

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.