tree-of-thought OP t1_j8171sh wrote on February 10, 2023 at 9:34 PM

Source: nflfastr. They have rich play-by-play data for every NFL game of the last ~25 years. I was able to get a time series of win probabiltiy for every Super Bowl since the 2000 season.

Tools: R. nflfastr to get the data, data.table to clean it and develop excitement scoring metrics, and ggplot2 to visualize.My collaborator built interactive visualizations for this project in Flourish. Those visualizations are linked lower in this comment.

Explanation: I've seen win probabilty graphs used as a shorthand for the excitement of a game. I wanted to develop a metric which takes in a win probability time series and outputs an "excitement score."Ultimately, I decided on three different factors that should contribute to the excitement score...

How close is the average win probability to zero? This is intended to capture how surprising was the eventual outcome.
What is the average absolute distance between the win probability and 50%? This is intended to capture how closely contested the game was.
What is the root mean square of all the changes in win probability from one play to the next? This is intended to capture how "back and forth" the game was.

I took each of these scores, scaled (but did not center) them, and then used their euclidean norm as the composite score.

Visit this webpage for more information on this topic!

The plots above are ordered by the composite score descending from left to right, top to bottom.It seems to work pretty well! Especially at the lower end of the scale--those are all pretty clearly games that were lopsided and foregone conclusions early on.

I've gotten the feedback that Super Bowl LIII between the Patriots and Rams is evidence of a "bug" in the metric. That game was very tight--which accounts for its high score (in the metric sense), but it was tight because it was low scoring (in the FOOTBALL sense!) with neither team performing very well.

Stuff I might tinker with to make it better:

Assign different weights to the three constituent metrics
Weight the constituent metrics differently at different points in the game.
Factor in how much scoring is happening