Pitt associate professor Konstantinos Pelechrinis predicted the Patriots would win the Super Bowl last week. Or, more accurately, his model did.
Pelechrinis has been teaching at the School of Computing and Information at Pitt since 2010. He developed a model two years ago that’s meant to predict the outcome of football games. The model has been rather successful so far, accurately predicting the winner 74 percent of the time in a study released last April.
When Pelechrinis tested to see if the model could predict this year’s Super Bowl, it predicted a 25-22 Patriots victory. The Eagles, of course, won the Super Bowl 41-33 this past Sunday.
After the Eagles victory, the Pitt News caught up with Pelechrinis to talk about his model and what this year’s Super Bowl means for his model going forward.
The Pitt News: What gave you the idea to build a model to predict football scores?
Konstantinos Pelechrinis: It was more about building a model for teaching purposes. I’m teaching a sports analytics class and that is a really nice way to engage students — building predictive models with some data that students can actually understand and relate to rather than just using abstract data and abstract equations and models. It was mainly because I wanted to give students something they could relate to, but obviously I’m a big sports fan.
TPN: What was the process of creating this model like?
KP: The process was a typical data analysis process, where I collected data from the past 10 seasons of the NFL. We extracted the model parameters from each of the games and identified how they correlate with winning, and then we used some statistical techniques … in order to project how teams are going to perform … and the model was finally giving out the win probability.
TPN: Why was the model wrong for this year’s Super Bowl?
KP: The model is a probabilistic model, in the sense that the first object is win probability. So the win probability was 59 percent for the Patriots and 41 percent for the Eagles. Then there is a simple interpolation that repeats based on how many points on average a team scores to get this 25-22. So I’m not that surprised that the model was wrong in regard to the actual score, because the main object of the model is the probability rather than the score. The score is more of a metaprocessing that happens and makes some very strong assumptions.
With probability, the Patriots had a higher probability from the model compared to the Eagles, but that’s a nice thing, and this is actually one of the reasons I like to use it as an example for the class. What the model gives is probability — so if the game was repeated 100 times you’d expect 59 of them to be Patriots, 41 to be Eagles, but you only get one shot.
TPN: Was there anything from the game that surprised you that was hard to account for?
KP: The biggest play call would be the fourth-and-1 obviously at the end of the second quarter where the Eagles scored a touchdown instead of settling for three points. We have other models — not the same that we’re discussing — but we have other models for within the game that supported this decision. Also what surprised me a lot from the Patriots point of view — in their first drive, they faced a fourth-and-4 on the 8-yard line. Instead of going for it, they settled for a field goal, but then in their next drive, they faced a fourth-and-4 and they went for it and they failed.
So two decisions from Bill Belichick were very big surprises for me because we are not used to seeing Belichick make bad decisions, and these were bad decisions from the perspective of win probability.
TPN: What about the botched snap on the extra point and the missed field goal?
KP: Yeah, so obviously you have to take chances, but that was messed up by the unit. That was the right decision, what they did there, the extra point, it’s just the unit messed up. Now, Philly made a decision that some people may have questioned — where they missed their first extra point and then later went for two in order to somehow leverage their first missed extra point. The data didn’t really support one or the other. Both gave about the same win probability.
TPN: Will you make adjustments or will you leave the model as-is?
KP: What I’m doing is, after every season, I’m recreating the model, and I’m keeping track of how the model parameters change. So from year to year there are not big changes. But, for example, if you use data from 20 years back, obviously the model is completely different. Now you have a pass-first game, so the passing statistics are way more inflated than in the past. So I do make a few tweaks from year to year. They are not big changes.
TPN: Are there any games in particular where your model was spot-on or really close to being spot-on?
KP: The way we evaluate the model is we take all the games and see what the win probability we give for the home team. Let’s say then we take all the games where the home team was given a 60 percent probability and we say, “Okay, from these games, how many times did the home team win?” If our probabilities are correct, what you expect is 60 percent of the instances to have the home team winning because of the probability. It means there’s a 40 percent probability they lose.
So at the end of the day, what you are doing is called a liability curve where you have your predicted probability on the X-axis and the actual outcome of these instances on the Y-axis. And if you’re close along the X and Y axis, then your model gives you very good probability.