June 11, 2026 · 10 min read · methodology

The Poisson Distribution for Football Predictions — A Mathematical Guide
June 11, 2026 · 10 min read
Every football fan has stared at a fixture list and thought: "What are the chances?" Bookmakers answer that question thousands of times a day. Behind their odds sits a deceptively simple piece of 19th-century mathematics — the Poisson distribution. This guide breaks down exactly how it works, how to build your own prediction model from scratch, and where the approach falls apart.
What Is the Poisson Distribution?
Siméon Denis Poisson was a French mathematician who published the distribution bearing his name in 1837. Originally developed to model the probability of rare events in criminal court judgments, the Poisson distribution describes how often something happens within a fixed window of time or space — given you know the average rate.
The core formula is elegant:
P(x) = (e−λ × λx) / x!
Where P(x) is the probability of exactly x events occurring, λ (lambda) is the expected average, e is Euler's number (≈2.71828), and x! is the factorial of x. For football, x represents goals scored and λ represents the expected number of goals for a team in a match.
The beauty of applying Poisson to football is that goals are relatively rare events distributed across a 90-minute window. A team might score zero, one, two, or occasionally three or more goals — and the Poisson distribution captures this spread remarkably well.
Why Football Fits the Poisson Model
Not every sport maps cleanly onto Poisson assumptions. Football does for several reasons:
- Low scoring: The average Premier League match produces around 2.7 goals. Poisson works best when events are infrequent relative to the observation window.
- Discrete outcomes: You can't score 1.5 goals. Poisson handles count data — whole numbers only.
- Independence (roughly): Each goal-scoring opportunity is approximately independent of the previous one, though this assumption has limits we'll address later.
- Consistent rate: Team scoring rates stay relatively stable across a season, giving us enough data to estimate λ reliably.
The first academic application came from Michael Maher in 1982, who published "Modelling Association Football Scores" in the journal Statistica Neerlandica. His work demonstrated that independent Poisson distributions for home and away goals could predict match outcomes with surprising accuracy.
Building a Poisson Prediction Model: Step by Step
Here's the practical process for building your own model. You need nothing more than a spreadsheet and historical match data.
Step 1: Gather Your Data
Collect at least one full season of match results for the league you want to predict. You need: home team, away team, home goals, away goals. Free datasets are available at football-data.co.uk, covering dozens of leagues back decades.
Step 2: Calculate League Averages
Compute the overall averages for the league:
- Average home goals (H_avg): Total home goals ÷ total matches. In the 2024/25 Premier League, this figure sat around 1.55.
- Average away goals (A_avg): Total away goals ÷ total matches. Typically around 1.20 in the Premier League.
The gap between these two numbers reflects home advantage — a persistent effect across virtually every football league on earth. Home teams score roughly 30-35% more goals than away teams.
Step 3: Calculate Attack and Defence Strength
For each team, compute four ratings relative to the league average:
- Home Attack Strength = Team's home goals scored ÷ H_avg
- Home Defence Strength = Team's home goals conceded ÷ A_avg
- Away Attack Strength = Team's away goals scored ÷ A_avg
- Away Defence Strength = Team's away goals conceded ÷ H_avg
A team with Home Attack Strength of 1.50 scores 50% more goals at home than the league average. A team with Away Defence Strength of 0.70 concedes 30% fewer goals on the road than average.
Step 4: Predict Expected Goals (λ)
For a specific fixture, multiply the opposing strengths together with the league average:
Home team λ = Home Attack Strength × Away Defence Strength × H_avg
Away team λ = Away Attack Strength × Home Defence Strength × A_avg
This is the critical step. The expected goals for each team are not their season averages — they're adjusted for the specific opponent they're facing.
Step 5: Build the Probability Matrix
Plug each team's λ into the Poisson formula for every plausible scoreline from 0-0 to, say, 5-5. This creates a matrix where each cell represents the probability of that exact scoreline. For independent Poisson variables, the probability of any specific scoreline (home: h, away: a) is:
P(h, a) = P_home(h) × P_away(a)
For example, if the home team's λ = 1.8 and the away team's λ = 1.2:
- P(0-0) = e−1.8 × e−1.2 = 0.1653 × 0.3012 = 4.98%
- P(1-0) = (1.8 × e−1.8) × e−1.2 = 0.2975 × 0.3012 = 8.96%
- P(2-1) = (1.62 × e−1.8) × (1.2 × e−1.2) = 0.2677 × 0.3614 = 9.68%
Sum all cells where home goals > away goals for the home win probability. Do the same for draws and away wins. You now have match outcome probabilities — and correct score probabilities for any market you care about.
Worked Example: Chelsea vs Sunderland (EPL 2016/17)
David Sheehan's widely-cited tutorial provides a clean demonstration. Using 2016/17 Premier League data and Poisson regression, his model produced:
- Chelsea (home) expected goals: λ = 3.06
- Sunderland (away) expected goals: λ = 0.41
The resulting probabilities told a clear story:
- Chelsea win: 88.9%
- Draw: 8.4%
- Sunderland win: 2.7%
How did this compare to the betting markets? Betfair's exchange had implied probabilities of 88.5%, 8.7%, and 3.4% respectively. The Poisson model was within a percentage point of the market on every outcome — impressive for a model built on nothing but historical goal data.
The Dixon-Coles Upgrade: Fixing the Weak Spots
The basic Poisson model has a known flaw: it underestimates low-scoring draws. In real football, 0-0 and 1-1 results happen more often than independent Poisson variables predict. Mark Dixon and Stuart Coles addressed this in their landmark 1997 paper, "Modelling Association Football Scores and Inefficiencies in the Football Betting Market."
Their fix introduces a correction factor (τ) applied to the four lowest-scoring outcomes: 0-0, 0-1, 1-0, and 1-1. The correction accounts for the fact that goals in football are not truly independent — when one team scores, the other team's tactical approach changes, which affects the probability of subsequent goals.
The Dixon-Coles model also allows attack and defence parameters to vary over time, capturing form changes, managerial switches, and player availability. This upgraded model remains the gold standard baseline for academic football prediction research.
Real-World Accuracy: 2026 World Cup Models
How well do Poisson-based models actually perform? Recent open-source projects targeting the 2026 World Cup provide concrete numbers:
| Model | Method | Accuracy (1X2) | Baseline |
|---|---|---|---|
| Cup26 Dixon-Coles | Elo + DC Poisson + 10k Monte Carlo | 61% | Home pick: 49% |
| Cup26 (clear favourites) | Same model, p ≥ 50% | 67% | Higher Elo: 60% |
| Cup26 Brier Score | Probability calibration | 0.53-0.54 | Coin-flip: 0.67 |
A Brier score measures how well-calibrated probability estimates are — lower is better. The Poisson-based model at 0.53 substantially beats random guessing (0.67) and performs respectably against professional bookmakers.
One notable project by Hicruben on GitHub walk-forward backtested on 920 real international matches from October 2023 to May 2026. The model achieved 61% correct result prediction overall and 67% when a clear favourite existed. Simply picking the higher-rated team without any Poisson modelling only hit 60%. The mathematical framework adds measurable edge.
Where Poisson Breaks Down
No model is perfect. The Poisson approach has well-documented limitations:
- Independence assumption: Goals are not truly independent. A team leading 2-0 plays differently than one trailing 0-2. The Poisson model ignores score-state effects.
- Over-dispersion: Real goal distributions have higher variance than Poisson predicts. Teams occasionally produce freak results (7-0, 5-1) more often than the model expects.
- No context: Injuries, suspensions, fixture congestion, motivation, weather, and tactical matchups are invisible to a pure Poisson model. These factors matter enormously.
- Small samples: Each team plays only 19 home and 19 away league matches per season. Attack and defence strength estimates carry significant uncertainty.
- No time decay: A goal scored in August counts the same as one scored in May. Recent form is not automatically weighted more heavily.
David Sheehan, whose Python Poisson tutorial is one of the most-referenced on the internet, puts it bluntly: "A simple Poisson model is, well, overly simplistic. But it's a good starting point and a nice intuitive way to learn about statistical modelling."
Poisson vs xG: How They Relate
Expected Goals (xG) and Poisson models are often confused, but they operate at different levels:
- xG is bottom-up: It rates individual shots based on distance, angle, body part, and defensive pressure. A shot from 6 yards with an open goal might have 0.85 xG; a 30-yard speculative effort might have 0.03.
- Poisson is top-down: It takes aggregate goal data and models the distribution of match scores. It doesn't care about individual shots.
The modern approach combines both. Instead of using raw goals scored as the input for λ, advanced models use xG values. This better reflects the quality of chances created rather than the luck involved in finishing. A team that generates 2.5 xG per match but only scores 1.8 is probably due for regression — feeding their xG into a Poisson model captures this signal.
Several 2026 World Cup prediction projects on GitHub use exactly this pipeline: xG data feeds into a bivariate Poisson model (Dixon-Coles variant), which then runs Monte Carlo simulations to forecast tournament outcomes.
Modern Extensions: Beyond Basic Poisson
Researchers have extended the Poisson framework in several directions:
- Bivariate Poisson: Models home and away goals jointly rather than independently, capturing the negative correlation between them.
- Zero-inflated Poisson: Adds extra probability mass to the 0-0 scoreline, addressing the fact that goalless draws are more common than standard Poisson predicts.
- Cox processes: Doubly stochastic Poisson models where the rate parameter λ itself varies randomly — useful for modelling in-match goal intensity that changes with game state.
- Environmental modifiers: Some 2026 models integrate altitude, weather, pitch condition, and crowd size as adjustments to λ.
A 2023 arXiv paper by Maia et al. on "Stochastic Modelling of Football Matches" found that red cards decrease goal intensity by more than 30%, while trailing teams increase their goal rate by 10-20%. These are exactly the kind of dynamic adjustments that basic Poisson misses.
How to Use Poisson Predictions on FanPick
Understanding Poisson output gives you a structured edge in prediction games. Here's how to apply it:
- Calculate match outcome probabilities — home win, draw, away win — before placing your prediction.
- Identify value in correct score markets — the probability matrix shows which scorelines are most likely.
- Weight your confidence — when Poisson gives a team 80%+ win probability, that's a high-confidence pick. When it's 45-55%, treat it as a coin flip.
- Combine with context — use Poisson as your baseline, then adjust for injuries, form, and motivation.
Key Takeaways
- The Poisson distribution models the probability of each team scoring a specific number of goals, using an expected goals rate (λ) derived from historical data.
- Building a basic model requires only league averages and team-level attack/defence strength ratings — no advanced statistics background needed.
- The Dixon-Coles upgrade (1997) corrects the model's tendency to underestimate low-scoring draws and remains the academic gold standard.
- 2026 World Cup Poisson models achieve 61-67% accuracy on match outcomes, substantially beating naive baselines of 49-60%.
- Modern models combine Poisson with xG data and Monte Carlo simulation for tournament-level forecasting.
- The model's main weakness is context blindness — it cannot see injuries, motivation, or tactical matchups. Use it as a baseline, not a crystal ball.