All courses > School Subjects > Statistics ::

Regression with Uncertainty: Predictive Distributions and Actionable Forecasts

Capítulo 11

Estimated reading time: 13 minutes

What “Regression with Uncertainty” Means in Practice

Regression is often introduced as “predicting an average outcome given inputs,” but real decisions need more than an average. Regression with uncertainty means you treat the prediction as a distribution: a range of plausible outcomes with probabilities attached. In Bayesian regression, uncertainty shows up in two places: uncertainty about the model parameters (for example, the slope relating price to demand) and uncertainty in the outcome even if parameters were known (for example, day-to-day demand noise). A predictive distribution combines both, letting you answer decision questions like “What is the probability next week’s demand exceeds capacity?” or “What is the expected profit if we set price to $X, accounting for uncertainty?”

Two Kinds of Uncertainty: Parameter vs. Outcome Noise

When you fit a regression model, you estimate parameters such as an intercept and coefficients. With finite data, those parameters are not known exactly. Parameter uncertainty is the uncertainty in those coefficients after seeing data. Separately, even if you knew the true coefficients, outcomes vary due to unobserved factors and randomness; this is outcome noise (also called residual variance). A predictive distribution for a new observation must include both: it averages over plausible parameter values and then adds the residual variability. In practice, this is the difference between a narrow “confidence band around the mean prediction” and a wider “prediction interval for actual outcomes.” For decisions, the prediction interval is usually what matters because costs and constraints depend on realized outcomes, not just the mean.

From Point Forecasts to Predictive Distributions

A point forecast answers “What is the most likely value?” A predictive distribution answers “What values could happen, and how likely are they?” This changes how you communicate and act. For example, a point forecast of 1,000 orders tomorrow is not actionable without context: if there is a 30% chance of 1,400 orders, staffing decisions differ from a scenario where orders are almost surely between 950 and 1,050. Predictive distributions also allow asymmetric decision-making: if under-staffing is much more costly than over-staffing, you may choose a staffing level based on a high quantile (say the 90th percentile) rather than the mean.

Core Object: The Posterior Predictive Distribution

The key output of Bayesian regression for forecasting is the posterior predictive distribution. Conceptually, you generate it by: (1) drawing a set of plausible regression parameters from the posterior, (2) using those parameters to compute the mean outcome for a new input, and (3) drawing a new outcome by adding residual noise according to the likelihood. Repeating this many times yields a distribution of possible outcomes. This distribution directly supports probability statements such as “There is a 12% chance revenue falls below $50k” or “There is a 80% chance churn stays under 3%.”

Choosing a Regression Likelihood That Matches the Data

Predictive distributions are only as good as the likelihood you choose for the outcome. For continuous outcomes like delivery time or revenue per user, a Normal likelihood is common, but it can be fragile if there are outliers or heavy tails. A Student-t likelihood often improves robustness by allowing occasional extreme values without distorting the fit. For counts (orders, signups), a Poisson likelihood is a starting point, but real data often show over-dispersion (variance larger than mean), where a Negative Binomial likelihood is more realistic. For binary outcomes (purchase/no purchase), logistic regression yields predictive probabilities rather than numeric outcomes. The practical rule: pick a likelihood that captures the scale, constraints (nonnegative, bounded), and tail behavior of your outcome, because that choice shapes the predictive distribution you will act on.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Step-by-Step: Building a Predictive Distribution for a Continuous Outcome

This step-by-step outlines how you go from data to actionable forecasts for a continuous target such as “daily demand” or “time to resolve a ticket.” The focus is on what you compute and how you use it, not on re-deriving Bayes’ rule.

Step 1: Define the prediction target and decision context

Write down what you need to predict and what decision it informs. Example: predict next-day demand to choose staffing. Identify the cost of under- and over-staffing, or at least the operational constraint (capacity). This determines whether you will use the mean, a quantile, or an expected-utility calculation from the predictive distribution.

Step 2: Choose features and a baseline model

Select inputs that are available at prediction time: day-of-week, marketing spend, price, seasonality indicators, backlog, etc. Start with a simple linear model for interpretability, then add structure if needed (interactions, nonlinear terms, hierarchical effects). Keep a clear separation between features known at forecast time and those only known afterward.

Step 3: Fit the model and obtain posterior draws

Fit the Bayesian regression model using an inference method that yields posterior samples (for example, MCMC) or an approximation that still allows sampling (for example, variational inference with sampling). The practical output you want is a set of draws of coefficients and noise parameters. These draws represent plausible worlds consistent with your data and modeling assumptions.

Step 4: Generate posterior predictive samples for new inputs

For each new input vector x_new and for each posterior draw of parameters, compute the mean prediction (for example, mu = intercept + x_new · beta). Then draw an outcome y_new from the likelihood using that mu and the drawn noise parameter. Collect many y_new draws to form the posterior predictive distribution.

Step 5: Summarize the predictive distribution for action

Compute summaries aligned with the decision: mean for average planning, median for robustness, and quantiles for risk control. For staffing, you might use the 90th percentile demand if under-staffing is costly. For inventory, you might choose a reorder point based on a service level (for example, 95th percentile of demand during lead time). Also compute probabilities of threshold events: P(demand > capacity), P(time_to_resolve > SLA), etc.

Step 6: Validate with predictive checks and out-of-sample evaluation

Evaluate whether the predictive distribution is calibrated: when you say “80% prediction interval,” does it contain the truth about 80% of the time? Use backtesting: generate predictive distributions for historical periods using only information available then, and check coverage, sharpness (interval width), and decision outcomes. If intervals are too narrow, you are underestimating uncertainty; if too wide, you may be leaving value on the table.

Worked Example: Demand Forecasting with a Risk-Aware Staffing Rule

Suppose you forecast daily customer support tickets. Let y be the number of tickets tomorrow, and x include day-of-week indicators and a marketing campaign flag. A Negative Binomial regression is often appropriate for counts with over-dispersion. After fitting the model, you generate posterior predictive samples for tomorrow. You obtain a distribution with mean 520 tickets, median 505, 80% interval [420, 650], and 95% interval [360, 780]. Capacity with current staffing is 600 tickets.

From the predictive samples you estimate P(y > 600) = 0.28. If exceeding capacity causes SLA breaches with a high penalty, you might staff to cover the 90th percentile of y. If the 90th percentile is 690 tickets, you plan staffing for 690. If extra staffing is expensive, you might instead compute expected cost: for each predictive sample y_s, compute cost(y_s, staff_level) and average across samples; then choose the staff level that minimizes expected cost. The key is that the predictive distribution lets you quantify the probability and magnitude of overload, not just the average.

Predicting the Mean vs. Predicting an Observation

Many regression outputs focus on uncertainty around the mean response E[y|x]. But decisions often depend on the realized y. For example, if you are planning inventory, you care about actual demand, not the mean demand. In Bayesian regression, you can produce both: (1) the posterior distribution of the mean outcome at x (parameter uncertainty only), and (2) the posterior predictive distribution of a new observation (parameter uncertainty plus residual noise). The second is wider and is the correct object for most operational decisions. A common mistake is to use uncertainty bands for the mean as if they were prediction intervals, leading to systematic under-preparation.

Actionable Forecasts: Turning Predictive Samples into Decisions

Once you have predictive samples, decision-making becomes a computation problem. You can map each simulated outcome to a business consequence and then aggregate. This is more flexible than relying on a single metric like RMSE because it aligns evaluation with what you actually care about.

Quantile rules for asymmetric costs

If the cost of under-forecasting differs from over-forecasting, choose a quantile. For example, if under-staffing is twice as costly as over-staffing, you might choose a higher quantile than the median. In general, the optimal quantile depends on the ratio of marginal costs. Predictive distributions make this easy: pick the quantile of the predictive samples corresponding to your risk tolerance.

Threshold probability rules

Sometimes the decision is triggered by a probability crossing a threshold: “Launch a mitigation plan if P(churn > 5%) > 0.2” or “Order emergency stock if P(demand > inventory) > 0.1.” These rules are straightforward with predictive samples: estimate the probability by the fraction of samples exceeding the threshold.

Expected utility / expected cost optimization

For richer decisions, define a utility (or cost) function U(action, outcome). For each candidate action, compute the average utility across predictive samples. Choose the action with the highest expected utility. This approach naturally handles nonlinearities, caps, penalties, and constraints. For example, profit might be revenue minus staffing cost minus SLA penalty, where SLA penalty applies only if y exceeds capacity. Predictive distributions allow you to compute expected profit under uncertainty rather than pretending the mean outcome will occur.

Practical Step-by-Step: Expected Profit Pricing with Bayesian Regression

Consider a pricing decision where demand depends on price. You want a price that maximizes expected profit, not just expected demand. Let y be units sold tomorrow, and let price be p. Suppose you fit a regression model for demand as a function of price and other covariates (seasonality, channel mix). The steps below show how to use the predictive distribution to choose p.

Step 1: Define candidate prices and a profit function

Create a grid of candidate prices p in a feasible range. Define profit for a realized demand y as profit(p, y) = (p - unit_cost) * min(y, capacity) - penalty * max(0, y - capacity). This includes capacity limits and penalties for stockouts or overtime.

Step 2: For each price, generate predictive samples of demand

For each candidate price p, form x_new(p) and generate posterior predictive samples y_s(p). This yields a distribution of possible demands at that price, incorporating both parameter and outcome uncertainty.

Step 3: Compute expected profit and risk metrics

Compute average profit across samples for each price: E[profit(p, y)]. Also compute risk metrics like the 10th percentile profit (a downside measure) or P(profit < 0). This helps if you need to avoid catastrophic outcomes even when expected profit is high.

Step 4: Choose a price based on your objective

If you maximize expected profit, pick the p with the highest average profit. If you are risk-averse, you might maximize a conservative metric like the 10th percentile profit or impose a constraint like P(profit < 0) < 0.05. The predictive distribution makes these criteria computable without pretending demand is deterministic.

Handling Nonlinearity and Interactions Without Losing Uncertainty

Real relationships are rarely perfectly linear. You can incorporate nonlinearity while still producing predictive distributions. Common approaches include polynomial terms, splines, or Gaussian processes for smooth nonlinear effects, and interaction terms when the effect of one variable depends on another (for example, marketing spend may have different returns by channel). The important point is not the specific technique but maintaining the ability to sample from the posterior and generate posterior predictive draws. When adding flexibility, watch for overfitting: a model that fits historical noise too closely can produce predictive distributions that are miscalibrated (too confident) out of sample.

Hierarchical Regression for Grouped Forecasts

Many forecasting problems involve groups: stores, regions, product categories, sales reps. A separate regression per group can be unstable for small groups; a pooled regression can miss meaningful differences. Hierarchical regression (partial pooling) lets group-level parameters vary while sharing information across groups. For predictive distributions, this matters because uncertainty is larger for sparse groups, and the model should reflect that. In practice, hierarchical regression yields predictive distributions that are appropriately wider for groups with little data and tighter for groups with abundant data, improving both calibration and decision quality in long-tail settings.

Forecasting with Future Covariates: Scenario-Based Predictions

Sometimes the inputs x for the future are themselves uncertain: next month’s marketing spend, economic indicators, or competitor actions. You can still produce actionable forecasts by using scenarios. Create a set of plausible future covariate scenarios (for example, low/medium/high spend), assign probabilities if you can, and generate predictive distributions conditional on each scenario. If you can model the covariates probabilistically, you can propagate that uncertainty by sampling x_new as well, then sampling y_new given x_new and parameters. This yields a forecast distribution that includes uncertainty from both the regression and the future inputs, which is often critical for planning.

Calibration and Sharpness: Making Predictive Distributions Trustworthy

A useful predictive distribution is calibrated (probabilities match frequencies) and sharp (as narrow as possible while remaining calibrated). Calibration can be checked with coverage tests: among predictions with a 90% interval, about 90% of realized outcomes should fall inside. Sharpness is about informativeness: overly wide intervals are safe but not helpful. Improving calibration may require changing the likelihood (for example, Normal to Student-t), adding missing predictors, modeling heteroskedasticity (noise that changes with x), or accounting for time dependence. For example, if variance increases with marketing spend, a constant-variance model will be miscalibrated in high-spend periods, producing intervals that are too narrow when you most need caution.

Implementation Pattern: Posterior Predictive Sampling in Code

The mechanics of posterior predictive sampling are similar across probabilistic programming tools. The pseudocode below shows the essential loop: draw parameters, compute mean, draw outcome, repeat. You can adapt it to linear regression, logistic regression (drawing probabilities and then Bernoulli outcomes), or count models.

# Inputs: posterior draws of parameters, new feature vector x_new, number of samples S  # Example for a Normal likelihood with residual sd sigma  y_pred_samples = []  for s in 1..S:      beta_s = draw_from_posterior_beta(s)      intercept_s = draw_from_posterior_intercept(s)      sigma_s = draw_from_posterior_sigma(s)      mu_s = intercept_s + dot(x_new, beta_s)      y_s = normal_rng(mu_s, sigma_s)      y_pred_samples.append(y_s)  # Summaries for decisions  mean_pred = mean(y_pred_samples)  p_over_threshold = mean([y > T for y in y_pred_samples])  q90 = quantile(y_pred_samples, 0.90)

Common Pitfalls That Break Actionability

Using uncertainty around the mean as if it were uncertainty about outcomes, leading to underestimation of risk.
Ignoring likelihood mismatch (for example, Normal for heavy-tailed outcomes), producing overconfident predictive distributions.
Forgetting that future inputs may be uncertain; treating planned spend or traffic as fixed can understate forecast variance.
Evaluating only point metrics (RMSE/MAE) and not checking calibration or decision outcomes.
Communicating a single interval without tying it to a decision rule; stakeholders need “what we do if the 90th percentile happens,” not just “here is a band.”

Now answer the exercise about the content:

In Bayesian regression, why is the posterior predictive distribution typically the right object for operational decisions like staffing or inventory?

You are right! Congratulations, now go to the next page

You missed! Try again.

The posterior predictive distribution combines parameter uncertainty and residual variability, producing a prediction interval for actual outcomes. Operational costs and constraints depend on realized outcomes, so using only uncertainty around the mean can underestimate risk.