2026年6月26日 · 11 blog.minRead · methodology

Feature Engineering for Football Prediction Models — The Inputs That Actually Matter
June 26, 2026 · 11 min read
You can build the most elegant Poisson model or train a gradient boosting algorithm on millions of shots — but if your input features are garbage, your predictions will be too. Feature engineering is the unglamorous work that separates models that beat the market from models that look good on paper.
Why Features Beat Algorithms
A 2024 study by Robberechts and Davis at the ECML-PKDD conference found that the choice of input features affects prediction accuracy more than the choice of algorithm. A logistic regression with well-engineered features outperformed a neural network trained on raw match statistics. The reason is straightforward: algorithms learn patterns, but features define what patterns are possible.
Professional football prediction models at clubs like Liverpool and Brentford use 50+ variables per match. But you do not need an Opta subscription or a data science PhD to build useful features. The public data available through FBref, Understat, and Football-Data.co.uk gives you enough raw material to construct features that capture team strength, recent form, tactical context, and match conditions.
The key insight is this: raw statistics like "shots on target" or "possession percentage" are inputs. Features are transformations of those inputs that carry predictive signal. The difference between 12 shots and 15 shots is noise. The difference between a team averaging 2.1 xG over their last 5 matches and 0.8 xG over the same span — that is signal.
The Three Tiers of Football Features
Every football prediction model draws from three tiers of data. Understanding each tier helps you decide where to invest your effort for the biggest accuracy gains.
Tier 1: Raw Match Statistics
These are the numbers you see in any post-match report: goals, shots, shots on target, possession, passes completed, corners, fouls, yellow cards. They are freely available from dozens of sources and easy to collect.
The problem with raw statistics is that they are noisy and context-dependent. A team that had 65% possession against a bottom-of-the-table side is not the same as a team that had 65% possession against Manchester City. Raw possession tells you almost nothing about future performance without adjusting for opponent strength.
Useful raw features to collect:
- Goals scored and conceded (home/away split): Home and away performance differs dramatically — the home team wins roughly 45% of matches across top European leagues.
- Shots and shots on target: Volume metrics that indicate attacking intent, though quality matters more.
- Corners and set pieces: Often overlooked, but set-piece goals account for roughly 25-30% of all goals in top leagues.
- Cards and fouls: Indicators of playing style — aggressive pressing teams concede more fouls but also win the ball higher up the pitch.
Tier 2: Advanced Metrics
This is where prediction accuracy starts to improve meaningfully. Advanced metrics adjust raw statistics for context, quality, and opponent strength. The most important ones for prediction models include:
- Expected Goals (xG): The probability that each shot results in a goal, based on shot location, angle, body part, and assist type. A team averaging 2.1 xG per match is creating high-quality chances, regardless of whether those chances are being converted.
- Expected Goals Against (xGA): The xG of shots conceded. This measures defensive quality more reliably than goals conceded, which is heavily influenced by goalkeeper performance and luck.
- Expected Assists (xA): The probability that a completed pass becomes an assist. Captures creative quality beyond just the final shot.
- PPDA (Passes Per Defensive Action): Measures pressing intensity. Teams with low PPDA press aggressively high up the pitch; teams with high PPDA sit deep and concede possession.
- Progressive passes and carries: Passes that move the ball at least 10 yards toward goal or into the penalty area. Measures how effectively a team advances the ball.
Tier 3: Engineered and Derived Features
This is where the real work happens. Tier 3 features are combinations, transformations, and aggregations of Tier 1 and Tier 2 data that capture patterns invisible in raw numbers. This is what separates a spreadsheet from a prediction model.
The Five Essential Engineered Features
1. Time-Decay Form
The simplest form feature is a rolling average of the last N matches. But this treats a match from last week identically to a match from two months ago. Time-decay weighting fixes this by giving more weight to recent matches.
A common approach is exponential decay: weight = λ^(days_ago), where λ is a decay factor between 0 and 1. A λ of 0.95 means a match 30 days ago gets about 21% of the weight of today's match. A λ of 0.99 is nearly flat; 0.90 is aggressive recency bias.
Research from the Journal of Sports Analytics suggests that a 5-match rolling window with exponential decay (λ ≈ 0.95) captures team form better than longer windows. Short windows (3 matches) are too volatile; long windows (15+ matches) converge to overall team strength and lose the form signal entirely.
League position converges to underlying team strength after roughly 15-20 matches. Before that threshold, form-based features carry significant predictive value. After it, you are mostly measuring noise around a team's true quality.
2. Strength-of-Schedule Adjustment
A team that has beaten three relegation candidates is not the same as a team that has beaten three top-six sides. Strength-of-schedule (SoS) adjustment normalizes performance metrics by opponent quality.
The simplest approach: calculate each opponent's Elo rating or league position, then adjust the team's xG, goals, and results by the average opponent strength. A team averaging 1.8 xG against opponents with an average Elo of 1500 is performing at a different level than a team averaging 1.8 xG against opponents with an average Elo of 1700.
For prediction purposes, this feature is especially valuable early in the season when sample sizes are small. A team's 3-0-0 record means little if all three wins came against bottom-half teams. The SoS-adjusted version tells you whether those wins were dominant (high xG against weak opponents) or fortunate (low xG, clinical finishing against weak opponents).
3. Home/Away Splits
Home advantage has declined over the past century — from roughly 60% home win rates in the early 1900s to about 45% today — but it remains one of the strongest single features in any football model. The key is to split all performance metrics by venue.
Instead of using a team's overall xG of 1.5 per match, use their home xG (say, 1.9) when predicting a home match and their away xG (say, 1.1) when predicting an away match. This simple split typically improves prediction accuracy by 2-4 percentage points in backtesting.
Post-COVID research showed that matches played in empty stadiums saw home advantage drop by roughly 40%. This suggests that crowd influence is a major component of home advantage — a factor worth tracking separately if you have attendance data.
4. Elo Ratings as a Strength Baseline
Elo ratings provide a single number that captures team strength, updated after every match. They are not features on their own — they are the foundation upon which other features are built. A team's Elo rating tells you their overall quality; their xG tells you how they create chances; their form tells you how they are playing right now.
The most useful Elo-derived features include:
- Elo difference: Home Elo minus Away Elo. The single strongest predictor of match outcome in most models.
- Elo-adjusted form: Time-decay form weighted by opponent Elo. A 5-match winning streak against top-50 Elo opponents means more than the same streak against bottom-50 opponents.
- Elo momentum: The rate of change in Elo over the last 10 matches. Captures whether a team is improving or declining beyond their current level.
5. Context and Situational Features
Context features capture the circumstances surrounding a match that pure statistics miss. They are often the difference between a model that looks good in backtesting and one that actually works in production.
- Schedule congestion: Matches played in the last 7, 14, and 30 days. Teams playing their third match in eight days perform measurably worse — roughly 0.15 fewer xG per match, according to data from the Premier League.
- Rest days: Days since the last match. The sweet spot is 4-6 days of rest. Fewer than 3 days increases injury risk and reduces pressing intensity; more than 10 days can indicate a break that disrupts rhythm.
- Travel distance: Especially relevant in continental competitions and the World Cup. Teams traveling across multiple time zones show measurable performance dips in the first match after arrival.
- Derby and rivalry indicator: A binary feature for matches between historical rivals. Derby matches show higher card counts, more fouls, and a compressed outcome distribution — the weaker team performs better than their Elo would predict.
- Referee tendencies: Historical card rates, foul rates, and penalty award rates for the assigned referee. Some referees award 40% more penalties than average — a meaningful edge in markets that do not price this in.
Feature Selection: What to Keep and What to Cut
More features is not always better. Correlated features add noise without adding signal. If your model includes both "goals scored" and "xG scored," you are feeding it two measurements of roughly the same thing. The model cannot distinguish between them, and both become less useful as a result.
Three practical rules for feature selection:
- Check correlations first. If two features have a Pearson correlation above 0.85, keep the one with better predictive power in isolation. For football, this typically means keeping xG over goals, Elo over league position, and PPDA over possession.
- Test each feature individually. Before adding a feature to your model, check its standalone predictive power. A feature that cannot improve a simple baseline model on its own will not help a complex one either.
- Use cross-validation, not intuition. Your gut feeling about which features matter is wrong more often than you think. Use time-series cross-validation (train on past, test on future) to evaluate each feature's contribution objectively.
Common Feature Engineering Mistakes
Every prediction model builder makes these mistakes at least once. Knowing them in advance saves weeks of debugging.
- Using post-match data to predict the match. This is the most common data leakage error. If you include "shots on target" as a feature for predicting a match outcome, you are using information that only exists after the match is played. Only use pre-match available data.
- Ignoring the target variable's distribution. Match outcomes are not normally distributed. Draws happen about 25% of the time. Goals follow a Poisson-like distribution. If your features assume a normal distribution of outcomes, your model will systematically overestimate draws and underestimate extreme results.
- Overfitting to historical patterns. Football changes. The introduction of VAR, the five-substitution rule, and the back-pass rule all fundamentally altered how the game is played. A model trained on 2010-2020 data may not generalize to 2026 because the underlying game has changed.
- Treating all matches equally. A Champions League group stage match and a dead-rubber league match on the final day are not the same. Motivation, squad rotation, and tactical approach all differ. Adding a match-importance feature helps the model distinguish between these contexts.
Building Your First Feature Pipeline
If you are starting from scratch, here is a minimal feature set that will get you to a working prediction model. These 10 features, properly engineered, can achieve 50-55% accuracy on match outcome prediction — enough to beat most casual predictors and find value in soft markets.
- Home team Elo rating
- Away team Elo rating
- Home team xG per match (last 5, time-decay weighted)
- Away team xG per match (last 5, time-decay weighted)
- Home team xGA per match (last 5, time-decay weighted)
- Away team xGA per match (last 5, time-decay weighted)
- Strength-of-schedule adjusted home form
- Strength-of-schedule adjusted away form
- Rest days for each team
- Head-to-head record (last 5 meetings)
From this foundation, you can add complexity incrementally: PPDA for pressing style, progressive passes for build-up quality, schedule congestion for fatigue effects, and referee tendencies for officiating bias. Each addition should be validated through backtesting before it enters your production model.
Key Takeaways
- Features matter more than algorithms. A logistic regression with well-engineered features beats a neural network with raw data. Invest your time in feature quality, not model complexity.
- Time-decay form and strength-of-schedule adjustment are the two highest-impact transformations. They convert noisy raw statistics into predictive signals that capture how a team is actually playing right now.
- Home/away splits and Elo ratings provide the foundation. Nearly every prediction model starts with these two features. They are simple, robust, and carry more signal than most people realize.
- Avoid data leakage at all costs. Only use information that would be available before the match kicks off. Post-match statistics are not features — they are the outcome you are trying to predict.
- Validate everything with time-series cross-validation. Football changes over time. A feature that worked in 2018 may be useless in 2026 because the rules, tactics, or data landscape have shifted.