College football advanced stats: Explaining the deep S&P+ system

Time for a quick word association game: If I say, “college football analytics,” what do you think of? Something involving Moneyball or when to go for it on fourth down, right?

A standard definition for sports analytics is gathering information and applying it in a way that derives a competitive advantage. Translation: doing what football coaches do, only with more help from computers.

Analytics are a path toward a winning edge. They see every game. They separate emotion from reality. They separate what you can control from what you cannot.

Sure, they can also give you a better idea of when to go for it on fourth down. But they can do more than that.

In 2016, 35 percent of games involving FBS teams were decided by one possession each. In a 12-game schedule, the average team will play in four games that could be decided by a single play. The difference between going 1-3 or 3-1 in those games could be the difference between a bowl bid or holidays at home, a good season or a conference title, and losing your job or getting a contract extension.

A team that went 3-1 might have just been luckier than a team that went 1-3. Aside from maybe hockey, football involves more randomness, more luck-of-the-bounce, than any other major American sport. How do we measure quality while filtering out as much noise as possible?

The intent of S&P+ is to filter out the noise.

My goal throughout 10 years of working with college football advanced stats has been to dial in to what actually wins and loses games and evaluate teams appropriately. Since 2007, I have been collecting play-by-play and drive data in an attempt to create tempo- and opponent-adjusted measures that evaluate teams as honestly as possible.

I have created a system called the S&P+ ratings that can be used to make weekly predictions, analyze tendencies, break down matchups, and home in as closely as possible on a team’s true strengths, assets, and liabilities.

It is the basis for what is, for better or worse, the most comprehensive set of college football data and measurement available.

What exactly is S&P+, though?

Perhaps I should explain it as clearly as possible if I’m going to use it so frequently in my work.

Let’s start by addressing what S&P+ is not.

It is not a résumé tool. If you’re looking for something transitive — Team A beat Team B and has more wins than Team C, so Team A should be ranked ahead of Team B and Team C — you aren’t going to find it here.

S&P+ is designed to track overall team efficiency.

It can be used to make predictions, similar to the analytics systems Vegas uses.

It is, at its heart, a tempo- and opponent-adjusted measure of what college football teams can most consistently do to win football games.

S&P+ is presented in the form of an adjusted points per game figure.

For instance, if Team A’s S&P+ rating is plus-19.0, that means it is 19 points better than the average college football team. If Team B’s rating is minus-12.0, it is 12 points worse than average. And if Team A and B were playing on a neutral field, you could determine that S&P+ would favor Team A by 31 points (19 minus minus-12).

In terms of performance, S&P+ tends to hit between 51 and 54 percent against the Vegas spread, a solid range.

It thrives in the realm of win probability — teams given, say, 66 percent chances of winning will win 66 percent of those games over time.

The S&P+ ratings are an overall prediction tool, but honestly, the predictions are just intended to provide affirmation. If you know that the ratings can be used to at least slightly beat Vegas from year to year, then you can trust them in your analysis as well.

(At Football Outsiders, S&P+ is presented both in PPG form and as an overall percentile rating. Since the average and standard deviation for points scored can change from year to year — having a plus-19.0 rating means something different now than it would have meant in 2005 — you can use the percentile rating to compare the quality of teams from one year or era to another.)

What goes into S&P+?

My original S&P ratings, derived long ago, were based on two measures: Success rate and equivalent Points per play. It was an attempt at an OPS-style measure for football, a look at both efficiency and explosiveness. As so many things do, however, it has grown more complicated.

In its current state, S&P+ is based around the core concepts of the Five Factors of winning football: efficiency, explosiveness, field position, finishing drives, and turnovers.

Since efficiency is by far the most replicable and least random aspect of football — big plays and turnovers decide games, but are incredibly random by nature — my success rate measure is the single biggest contributor to the S&P+ ratings.

Explosiveness does play a role, and to emphasize the importance of finishing drives, a team’s success rate during scoring opportunities (first downs inside the opponent’s 40) is given slightly more weight. Special teams efficiency plays a role in both field position and finishing drives, and sack rates are one of the only reliable, non-random factors that contribute to a team’s turnover margin. They’re thrown into the blender as well.

OK, then what is success rate if it’s so important?

The goal of success rate is to create an on-base percentage-style efficiency measure. Depending on a given down and distance, each play is deemed successful or non-successful:

  • First downs: gaining at least 50 percent of necessary yardage (usually 5 yards) is successful.
  • Second downs: gaining at least 70 percent of necessary yardage is successful.
  • Third or fourth downs: gaining at least 100 percent of necessary yardage is successful.

This is intentionally simple, but it can do powerful things over time, especially when adjusted for opponent.

Opponent adjustments are easily the messiest piece of college football ratings.

Professional sports don’t have to deal with the dramatic difference in schedule strength that afflicts the 130-team Football Bowl Subdivision. Technically, Alabama and South Alabama are playing for the same national title, but their schedules differ just a little bit.

Tiny samples are an issue.

There’s another component that makes college football data particularly tricky: sample size. A major league baseball team will make about 4,500 outs in a given season. An NBA team will have about 8,000 to 8,500 possessions. A college football team, meanwhile, is guaranteed only about 12 games, 170 possessions, and 850 or so plays.

The small sample leads us to overreacting to singular results more than we do in any other sport. From my Study Hall: College Football, Its Stats and Its Stories, published in 2013:

Imagine if college basketball teams played just 12 games, as in college football. If you sampled 12 games from a 30-game, real-life college basketball schedule, you could define a team’s season in drastically different ways. Take 2013 NCAA basketball champion Louisville, for example. … They were great. The team had athleticism, length, and incredible toughness. For head coach Rick Pitino’s style, this team was the culmination of his search for, as Bob Dylan would call it, that “thin, wild mercury sound.”

But a 12-game sample could have told you two completely different stories.

* Sample A non-conference schedule: Kentucky (W, 80-77), Manhattan (W, 79-51), at Charleston (W, 80-38), vs. Missouri (W, 84-61)

* Sample A Big East schedule: at Syracuse (W, 58-53), at UConn (W, 73-58), at Rutgers (W, 68-48), at Seton Hall (W, 73-58), Marquette (W, 70-51), St. John’s (W, 72-58), Notre Dame (W, 73-57), USF (W, 64-38)

Sample A record: 12-0 (8-0 in conference)

* Sample B non-conference schedule: WKU (W, 78-55), Northern Iowa (W, 51-46), at Memphis (W, 87-78), at Duke (L, 71-76)

* Sample B Big East schedule: at Villanova (L, 64-73), at Georgetown (L, 51-53), at DePaul (W, 79-58), at USF (W, 59-41), Syracuse (L, 68-70), Pitt (W, 64-61), Providence (W, 80-62), Cincinnati (W, 67-51)

Sample B record: 8-4 (5-3 in conference) […]

The same team, with the same strengths and weaknesses and similar strengths of schedule, produced both of those ranges of results. Instinctively, small samples can make things weird. But when we see a computer rank a 9-3 team ahead of an 11-1 one, we still freak out a little bit.

S&P+ is not built to care about us freaking out.

When do preseason projections get completely phased out?

It takes a while for a college football team to create data of statistical significance or establish reliable priors. Therefore, for predictive purposes, you have to lean pretty heavily on preseason projections.

My S&P+ projections incorporate recent performance, recent recruiting, and returning production; early in the season, they carry significant weight, and they are phased out with increasing speed. After a team has played seven games, its preseason projections are completely phased out of the S&P+ equation.

There is a decent case for keeping the preseason numbers involved for much longer than that, but maintaining anything beyond seven weeks has not, in my experience, changed predictive accuracy to any major degree.

A couple of years ago, I attempted an even more rapid phaseout, and while that resulted in a decent performance against Vegas, it made the numbers incredibly volatile — the average error per game increased, and the ratings changed dramatically from week to week. Keeping the projections involved past the halfway point of the season smooths out the volatility.

Where else can I learn about S&P+ and college football advanced stats?

I like your attitude! Start with this 2015 glossary post at Football Study Hall, and then, when you think you’re ready, immerse yourself in the Football Study Hall stat profiles.

Source