Why proportions and rates show up everywhere
Many real decisions depend on quantities that are naturally bounded at zero and one (proportions) or nonnegative counts per unit exposure (rates). Examples include conversion rate (purchases per visit), defect proportion (defects per item), email open rate (opens per delivered email), churn proportion (customers who leave per month), incident rate (accidents per 10,000 hours), and support ticket rate (tickets per active user-day). Building practical Bayesian models for these outcomes means choosing a likelihood that matches the data-generating process, adding structure for real-world complications (unequal exposure, seasonality, heterogeneity, overdispersion), and producing outputs that map to decisions (risk of being below a target, probability of beating a baseline, expected loss under different actions).
This chapter focuses on two workhorse families: binomial-type data for proportions and Poisson-type data for rates. The emphasis is on practical model building: how to encode exposure, how to handle small samples, how to pool across segments, and how to check whether the model is adequate for decision-making.
Modeling proportions: binomial likelihood with real-world twists
When the binomial model fits
Use a binomial likelihood when you observe y “successes” out of n “trials,” where each trial is comparable and the success probability is approximately constant within the group and time window you are modeling. Typical examples: y purchases out of n sessions, y defects out of n units inspected, y renewals out of n customers up for renewal. The key practical requirement is that n is known and that each trial is a yes/no outcome.
In practice, the “constant probability” assumption is often only approximately true. Traffic mix changes, some users are more likely to convert than others, and measurement can be noisy. You can still start with a binomial model as a baseline, then add structure when diagnostics or business context suggest it.
Step-by-step: a baseline Bayesian model for a proportion
Step 1: Define the unit of analysis and the time window. Decide what constitutes a trial and a success. For example, a trial might be a unique session, and a success might be a purchase within that session. Choose a window where the probability is plausibly stable (e.g., daily or weekly).
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
Step 2: Collect y and n for each group you care about. For a single group, you have one pair (y, n). For multiple segments (countries, devices), you have (y_g, n_g) for each segment g.
Step 3: Choose a parameterization. Let p be the underlying conversion probability. The likelihood is y ~ Binomial(n, p).
Step 4: Choose a prior that reflects plausible ranges and stabilizes small samples. A convenient choice is a Beta distribution on p. In practice, you can set it using “prior mean and prior sample size” intuition: choose a prior mean m (your best guess for p) and a prior strength s (how many pseudo-trials the prior is worth). Then set alpha = m*s and beta = (1-m)*s. This makes it easy to encode domain knowledge without overconfidence.
Step 5: Compute posterior quantities that map to decisions. Common decision outputs include: the posterior mean of p, a credible interval for p, the probability that p exceeds a target (e.g., P(p > 0.03)), and the probability that p is better than a baseline or competitor (e.g., P(p_A > p_B)).
Step 6: Do posterior predictive checks. Simulate y_rep from the posterior predictive distribution and compare to observed y. For a single group, check whether the observed y is typical under the model. For multiple groups, check whether the model reproduces the spread of observed rates across groups.
Practical example: conversion rate with a target threshold
Suppose you ran a landing page for one week and observed y = 42 purchases out of n = 2000 sessions. You care about whether the conversion rate is at least 2.5% to justify scaling ad spend. A practical Bayesian workflow is to compute P(p > 0.025) and compare it to a decision threshold (for example, scale only if this probability exceeds 0.9).
Even without showing the algebra, the modeling steps are clear: define p, use a binomial likelihood, choose a Beta prior with a reasonable mean (say 2%) and modest strength (say s = 50), and then compute the posterior probability of exceeding 2.5%. This produces a decision-relevant probability rather than a binary “significant/not significant” statement.
Handling unequal exposure and rates: Poisson models for counts
When a Poisson rate model fits
Rates arise when you count events over exposure: incidents per hour, purchases per user-day, clicks per impression, claims per policy-year. Here you observe a count y and an exposure E (time at risk, number of users, number of impressions). A common baseline model is y ~ Poisson(E * lambda), where lambda is the event rate per unit exposure.
The Poisson assumption implies events occur independently and the variance equals the mean. In many operational settings, the variance is larger than the mean (overdispersion) due to unobserved heterogeneity, bursts, or clustering. You can still start with Poisson and then upgrade to a negative binomial or a hierarchical model when needed.
Step-by-step: a baseline Bayesian model for a rate with exposure
Step 1: Define the event and exposure. For example, y is the number of safety incidents and E is total hours worked. Ensure exposure is measured consistently and corresponds to the time/units at risk.
Step 2: Choose the likelihood. Use y ~ Poisson(E * lambda). This makes lambda interpretable as “events per unit exposure.”
Step 3: Choose a prior for lambda that is positive and reasonably flexible. A common choice is a Gamma distribution because it is defined on positive values and is convenient computationally. You can set it using a prior mean and prior strength as well: if you believe the rate is around r events per unit and you want the prior to be worth k units of exposure, you can choose parameters so that the prior corresponds to roughly k exposure units’ worth of information.
Step 4: Compute decision quantities. Examples: P(lambda < target_rate), expected number of events next month given planned exposure, or probability that one site’s rate is higher than another’s.
Step 5: Posterior predictive checks. Simulate counts for the observed exposures and compare the distribution of simulated counts to the observed counts. Pay special attention to whether the model underestimates the frequency of zeros or the frequency of very large counts.
Practical example: incident rate per 10,000 hours
Imagine a plant recorded y = 3 incidents over E = 25,000 hours. Management wants to know whether the incident rate is below 2 per 10,000 hours. The model uses y ~ Poisson(E * lambda). The decision quantity is P(lambda < 2/10000). This directly answers “How confident are we that we meet the safety target?” and can be paired with a policy such as “if probability of meeting target is below 0.8, invest in additional training.”
Overdispersion: when simple binomial/Poisson is too optimistic
Recognizing overdispersion in proportions
For proportions, overdispersion often shows up when observed variability across days or segments is larger than what a binomial model would predict. For example, daily conversion rates might swing more than expected given daily traffic. This can happen due to changing traffic quality, promotions, or unmodeled covariates.
A practical remedy is to introduce an extra layer of randomness in p. One approach is a beta-binomial model: instead of treating p as fixed, treat p as varying according to a Beta distribution across repeated periods or subgroups. This increases predictive uncertainty and prevents overconfident decisions.
Recognizing overdispersion in rates
For counts, overdispersion appears when the variance of counts is much larger than the mean after accounting for exposure. If you see many more zero days and occasional spikes than Poisson predicts, you likely need a model that allows extra variability.
A common practical upgrade is the negative binomial model, which can be interpreted as a Poisson rate that varies across observations. In Bayesian terms, you can model lambda as random with a distribution across days or sites, which naturally yields a negative binomial marginal distribution for counts.
Step-by-step: upgrading to a hierarchical model to handle heterogeneity
Step 1: Identify the level at which heterogeneity occurs. Is the conversion probability different by country? Is the incident rate different by site? Is there day-to-day variation?
Step 2: Build a group-level model. For proportions, you can model each group’s probability p_g and allow them to vary around a shared distribution. For rates, you can model each group’s rate lambda_g similarly.
Step 3: Use partial pooling. The shared distribution causes small-sample groups to borrow strength from the overall pattern, reducing extreme estimates driven by noise. This is especially valuable when you have many segments with sparse data.
Step 4: Validate with posterior predictive checks at multiple levels: within-group counts and across-group variability. The model should reproduce both the typical within-group noise and the observed spread across groups.
Modeling multiple groups: partial pooling for proportions and rates
Why partial pooling matters operationally
In real systems, you rarely have one proportion or one rate. You have dozens: conversion by channel, defect rate by supplier, incident rate by site, churn by cohort. If you estimate each segment independently, small segments produce unstable estimates and can trigger bad decisions (e.g., pausing a campaign because of a few unlucky days). If you fully pool everything, you ignore real differences and miss opportunities to act.
Partial pooling is the practical compromise: each segment gets its own parameter, but the parameters are tied together through a shared distribution. This shrinks noisy segment estimates toward the overall mean while still allowing strong signals to stand out.
Practical blueprint: hierarchical logistic model for proportions
For segment g, observe y_g successes out of n_g trials. Model y_g ~ Binomial(n_g, p_g). To keep p_g in (0,1) and allow additive effects, work on the log-odds scale: logit(p_g) = eta_g. Then model eta_g as coming from a shared distribution across segments, such as eta_g ~ Normal(mu, sigma). Here mu is the overall log-odds and sigma controls how different segments can be.
Operational interpretation: sigma near zero means segments are similar and heavy pooling is appropriate; larger sigma means segments truly differ and the model will allow more separation. This structure is robust in sparse segments because the posterior for eta_g is pulled toward mu when n_g is small.
Practical blueprint: hierarchical log-rate model for counts with exposure
For segment g, observe y_g events with exposure E_g. Model y_g ~ Poisson(E_g * lambda_g). Work on the log scale: log(lambda_g) = theta_g. Then model theta_g ~ Normal(mu, sigma). This is a standard Poisson log-normal hierarchical model. It naturally handles unequal exposures and yields stabilized estimates for small-exposure segments.
Decision outputs can be computed per segment: P(lambda_g > threshold), expected events next period given planned exposure, and rank-ordering segments by posterior probability of being in the top or bottom quantile.
Including predictors: turning proportions and rates into regression models
When you need predictors instead of segments
Segments are a coarse way to explain variation. Often you have continuous or multiple predictors: price, time on site, device type, marketing channel, seasonality, staffing level, weather. Bayesian regression models for proportions and rates let you estimate how these predictors change the probability or rate while quantifying uncertainty.
Logistic regression for proportions
If each observation i is a binary outcome (success/failure), use y_i ~ Bernoulli(p_i) with logit(p_i) = X_i * beta. If your data are aggregated (y_i successes out of n_i trials for row i), use y_i ~ Binomial(n_i, p_i) with the same logit link. The coefficients beta are interpretable as changes in log-odds per unit change in a predictor, and you can convert them into changes in probability at relevant baseline values.
Practical tip: for decision-making, report effects in probability space for realistic scenarios (e.g., predicted conversion at typical traffic mix) rather than only reporting log-odds. Compute posterior distributions of predicted conversion for scenarios you care about and compare them.
Poisson regression for rates with exposure offsets
For event counts with exposure, use y_i ~ Poisson(E_i * lambda_i) and log(lambda_i) = X_i * beta. Equivalently, log(E_i * lambda_i) = log(E_i) + X_i * beta, where log(E_i) is an offset. This ensures that doubling exposure doubles expected counts, holding predictors fixed.
Practical tip: always include exposure explicitly. If you model counts without exposure, you can mistakenly attribute differences in volume to differences in rate, leading to incorrect operational decisions.
Decision-focused outputs: what to compute after fitting
Probability of meeting a target
For a proportion p or rate lambda, a common decision question is whether you meet a service level or KPI threshold. Compute P(p > p_target) or P(lambda < lambda_target), depending on whether higher or lower is better. This supports policies like “ship if probability of meeting target exceeds 95%” or “invest if probability of missing target exceeds 30%.”
Expected impact under an action
Often you need to translate a change in proportion or rate into business units. For conversion, expected incremental purchases from an intervention is exposure * (p_new - p_old). For incident reduction, expected incidents avoided is exposure * (lambda_old - lambda_new). Use posterior draws to compute a distribution of incremental impact, then evaluate expected utility or expected cost.
Comparisons and ranking with uncertainty
For A/B tests, compute P(p_A > p_B) or P(lambda_A < lambda_B). For many segments, compute the probability each segment is worst or best, or the probability it exceeds a threshold. This avoids overreacting to noisy rank-ordering based on point estimates.
Model checking and common failure modes
Posterior predictive checks you can do quickly
For proportions: simulate replicated success counts y_rep for each row given posterior draws of p (or p_i). Compare the distribution of y_rep/n to observed rates. Look for systematic underestimation of variability, too many extreme days, or patterns by subgroup.
For rates: simulate replicated counts given posterior draws of lambda and the observed exposures. Compare the frequency of zeros, the tail behavior (spikes), and the relationship between mean and variance across groups.
Common data issues
Changing definitions: if “trial” or “success” changes over time (tracking changes, bot filtering), your model will interpret it as a real change in p. Guard against this by versioning metrics and adding indicators for measurement regime changes.
Non-independence: repeated trials from the same user violate the simple binomial assumption. A practical workaround is to define trials at the user level (e.g., user converts within a week) or to use hierarchical models with user-level random effects when feasible.
Zero inflation: if you have more zeros than Poisson predicts (e.g., many days with no incidents), consider a model that explicitly allows a structural zero process or add predictors that explain when events are possible.
Seasonality and time trends: if conversion or incident rates drift over time, a static p or lambda will be misleading. Add time effects (day-of-week, month) or a time-series component so the model can adapt.
Implementation sketch: fitting and using these models in practice
Workflow checklist
- Define outcome type: proportion (successes/trials) or rate (counts/exposure).
- Choose baseline likelihood: Binomial/Bernoulli for proportions; Poisson for rates with exposure.
- Add structure as needed: hierarchical effects for segments, predictors for drivers, overdispersion via beta-binomial or negative binomial/log-normal random effects.
- Fit the model with MCMC or variational inference in a probabilistic programming tool.
- Validate with posterior predictive checks focused on decision-relevant behavior (tails, zeros, across-segment spread).
- Compute decision metrics: probability of meeting targets, expected impact, probability of beating baseline, and risk metrics tied to cost.
Pseudocode templates
Use these templates as a mental model for implementation; the exact syntax depends on your tool.
# Proportion (aggregated) with hierarchical segments (logistic-normal) for g in 1..G: y[g] ~ Binomial(n[g], p[g]) logit(p[g]) = eta[g] eta[g] ~ Normal(mu, sigma) mu ~ Normal(0, 2) sigma ~ HalfNormal(1) # Decision: P(p[g] > target) from posterior draws# Rate (counts with exposure) with predictors and segment random effects for i in 1..N: y[i] ~ Poisson(E[i] * lambda[i]) log(lambda[i]) = X[i] * beta + u[segment[i]] u[g] ~ Normal(0, sigma_u) beta ~ Normal(0, 1) sigma_u ~ HalfNormal(1) # Decision: P(lambda_new < threshold) and expected counts for planned exposureChoosing between model variants
If you have one group and stable conditions, a simple binomial or Poisson model may be sufficient. If you have many small segments, use hierarchical models to stabilize estimates. If you see extra variability, upgrade to beta-binomial or negative binomial/log-normal random effects. If you need to explain variation and forecast under changing conditions, use regression with predictors and exposure offsets.