Saturday, August 25, 2018

Predicting Baseball's Postseason

Having recently tried my hand at numerically predicting the notoriously unpredictable March Madness basketball tournament, with moderate success, it was only natural that I would next attempt to predict the outcome of the MLB postseason, with baseball being dearer to my heart and its historical data being so plentiful. In this post I'll describe the groundwork I've done toward this goal so that I'll be ready to generate an optimized bracket when the postseason rolls around. Specifically I'll be focusing on Elo ratings and how I've tailored them on a team-by-team basis from first principles specifically for predicting MLB postseasons.

Identifying free parameters in Elo

Typically, implementations of the Elo rating system tune the K-factor only, which controls the speed at which ratings respond to performance. This is well-documented and values have been published for numerous competitive sports. However, focusing only the K-factor only ignores a few important considerations that improve predictive performance:
  • The relationship between rating difference and prediction certainty
  • The effect of home field advantage
More sophisticated models also take into account weather conditions, individual players (particularly pitchers), and other progressively finer factors. However, for the sake of keeping scope manageable, I'll be looking only at game logs showing the teams, home field, and game score. Everything will be inferred from this basic data.

Relationship between rating differences and prediction certainty

At the heart of the Elo system is the logistic function or "S-curve":

Logistic function; L = 1, k = 1, x_o = 0; source: Wikipedia

Equally-rated teams should have equal win probabilities, so x_o must be set to 0. Similarly, a team's probability cannot exceed 100%, so L must be set to 1. This leaves only a single free parameter, k, which controls the slope of the curve. Essentially it represents the slope of the S-curve, or the rate at which prediction confidence changes as the rating gap between the two teams widens.

The effect of home field advantage

Historically, teams win about 54.1% of their home games, and 45.9% of their away games. In other words, for evenly-matched teams, being home confers about a 4.1% win probability advantage, and the same amount as a disadvantage to the away team when compared to a theoretical neutral field.

Despite seasonal noise, this value has been remarkably consistent over the live-ball era. The modern game, which I'm arbitrarily defining as the last 10 years, falls neatly in line with this apparent historical constant:

Supposing home field advantage was set at 4.1%, when calculating Elo, I subtract this from the away team's probability of winning, and add it to the home team's win probability. This has the intuitive effect of making away wins slightly more consequential than home wins, and so on with losses.

Optimizing Elo for predictive accuracy

With all this in mind, I define an "Elo configuration" as a set of three values:
  1. K: controls the speed at which ratings adjust to performance
  2. k: controls the relationship between prediction confidence and rating difference
  3. h_f_a: controls how significant the home field advantage is
To optimize these parameters for predictive accuracy, my approach is:
  1. Pick a set of parameters that define an Elo configuration
  2. With this configuration, calculate day-by-day Elo ratings for all teams from the start of the live-ball era, 1920, to present
  3. For each year requested, extract the final game of each postseason series, representing a team being eliminated
  4. For each of these games, look up the regular-season-end Elo ratings for both teams, and predict the winner to be the one with the higher Elo rating
  5. Compare the prediction to the actual result
  6. Loop back to step 1 and repeat with another Elo configuration
  7. ...
  8. After running a sufficiently large number of Elo configurations, select the one that maximizes prediction accuracy over the requested time domain
A few other notes on my Elo calculation:
  • Elo mean is arbitrarily set at 1,000
  • Ratings are initialized in 1920, the start of the live-ball era
  • Ratings are carried over directly between seasons, without regressions or resets
  • Postseason games count toward Elo ratings, but only in subsequent seasons, i.e. the predictor uses a snapshot of Elo right before the requested postseason to generate predictions, and does not "peek ahead"
Each Elo configuration takes about 1 second to evaluate on my Dell Precision 5520, which is fairly time-expensive. I approached the optimization by first sweeping the three parameters over logarithmically-spaced ranges to get in the right ballpark (pun intended), then sweeping them over linearly-spaced ranges around the best configuration found in the prior step.

With this approach, a time domain window is necessary to optimize the Elo parameters. I consider two cases, and show the results below:

"Live-ball" era: 1920-present
  • K = 0.075
  • k = 11.0714
  • h_f_a = 0.017857
This configuration has an overall prediction accuracy of 70%. The breakdown by decade is:
  • 1920s: 100%
  • 1930s: 80%
  • 1940s: 80%
  • 1950s: 70%
  • 1960s: 67%
  • 1970s: 83%
  • 1980s: 68%
  • 1990s: 64%
  • 2000s: 69%
  • 2010s: 65%
An interesting trend here is that baseball seems to be becoming progressively less predictable, especially starting in the 1980s. Causes for this would be an interesting topic for research in and of itself.

"Modern" era: 2008-2017
  • K = 0.08
  • k = 11
  • h_f_a = 0.03
This configuration has an overall prediction accuracy of 68%. The breakdown by year is:
  • 2008: 86%
  • 2009: 71%
  • 2010: 57%
  • 2011: 71%
  • 2012: 66%
  • 2013: 44%
  • 2014: 67%
  • 2015: 78%
  • 2016: 56%
  • 2017: 89%
It's worth noting that using the "live ball" configuration in the prior section is very nearly as accurate, and the configuration values are very close. The "modern" configuration correctly predicts 56 of 82 matches, compared to 55 of 82 for the "live ball" configuration. So they're nearly interchangeable.

Snapshot of today's teams

What do these Elo configurations say about predicting the postseason? To get some intuition, let's look at how today's teams have fared in the last month:

The ratings here can be thought of as a metric for how well the teams would fare in the postseason if it were to start immediately, i.e. their postseason strength. What's interesting to note is how quickly teams can rise above the pack with a strong win streak or regress to mediocrity with a slump. For example, look at the Pirates, who went from 2nd best to 2nd worst in the last month due to a slump. My takeaway here is that postseason strength is highly dependent on "hotness" - that is, how well a team has been performing recently, as opposed to how well they've done all season. Here's the full breakdown:

Some historical perspectives

The 2017 postseason was an especially predictable one, in the sense that the teams that finished the regular season on top also generally finished the postseason on top, i.e. there were few upsets.

By comparison, look at 2013's postseason:

The Red Sox finished the regular season with mediocre strength, but caught fire and upset several teams on their way to win the World Series.


Once the 2018 postseason bracket is finalized (including the Wild Card games), I'll generate two bracket predictions: one optimized for points (where the value per correct prediction doubles with each successive round), and another optimized for percentage. For the point-optimized bracket, I'll use Monte Carlo simulations in conjunction with the Elo method developed here to account for the probabilistic difficulty of each team's path to the World Series.

Source code

Available on my GitHub, minus the scraper. 

No comments:

Post a Comment