In today’s post, I want to detail some of the mathematics behind the J-Press Rankings system. At some point in the future, I will also dump some of the Python code directly into a new post for others to play with.
At its core, the J-Press system is an Elo Rankings system, modified for college football and transformed into probabilities. I’d encourage you to Google more about Elo Rankings systems.
To begin with, all teams start with an Elo Ranking of 1000. As teams win or lose, they gain or lose Elo Ranking, respectively.
Now, suppose two teams, Home, and Away, are facing each other. Let’s call Home’s Elo Ranking H and Away’s Elo Ranking A. The first math that is really important to compute is the Expected percentage for each team; this gives a number between 0 and 1 that is the probability of that team winning that game. It is computed using the following formulas:
The actual outcome (O) of the game is assigned as a “1” (Win”) or a “0” (“Loss”).
Another aspect factored into the system is “game importance”. This uses their existing PRating (probability rating), which will be discussed later. To begin with, each team is given a PRating of 1/N, where N is the total number of teams in Division I NCAA football for that year. You can think of PRating as “the probability that this team wins the National Championship” and will be the main output of the ratings later. For game importance, Game Importance (identified as K), is calculated as follows:
where HP is the PRating of the Home team, AP is the PRating of the Away team, and AvgP is the average PRating of all N teams currently. For two teams with PRatings equal to the average PRating, K = 40, so this is the standard default Game Importance. It can grow, though, up to K = 320, for very high-impact highly-rated teams. This value essentially controls how fast or slow a team can gain Elo Ranking (and therefore, PRating). This means in small insignificant games against small foes, the numbers will change very little, but in large juggernauts of a game, numbers can change greatly.
The change in Elo Ranking, after Home and Away have played each other, is calculated by:
There is also a small bonus added in some occasions to the Elo Rankings. If a team won by 9 to 16 points, they are given a 10-point Elo Ranking boost. If a team won by 17 or more points, they are given a 15-point Elo Ranking boost instead. This is used to factor in margin of victory, which does have some predictive power in which teams are more successful. However, the boost is only modest, and teams cannot “pad” their victory by, say 40 points, and expect to benefit from that. That doesn’t work. Close games (where teams won by 8 or fewer points) do not benefit from this boost.
The Elo Rankings are not what is shown publicly, though. I strongly feel that the Elo Ranking (such 1453) tells you very little, and percentages are much more intuitive to people. So, a transformation using the Softmax function from multinomial logistic regression is utilized to transform all of the Elo Rankings into a PRating. (As mentioned, their initial PRating is just 1/N, but once games are played, this naive guess is updated.) If a team’s Elo Ranking is Ri for the ith team, the calculation to transform it to a PRating is as follows:
This function has unique properties. First, and most importantly, it nicely puts all N teams into a probability distribution, meaning their PRatings all add up to 1. Secondly, the exponent-tower structure used here heavily weights higher ranked teams; this is what we want – we want to be able to distinguish among the top 10 (and ideally, top 4) teams out of over 100 teams. We don’t particularly care if the ranking system distinguishes well between the 78th and 79th ranked teams, but we definitely want some predictive power to see differences between the 2nd and 3rd best teams, for instance.
The code then sets the average PRating (used in an above calculation) equal to the median of the PRatings of all teams. This keeps outliers from heavily skewing the results.
The process isn’t yet finished, though! The existing iterative process starts all teams with 1000 Elo Ranking and 1/N PRating, which is a naive assumption – we can do better. Using last year’s final Elo Rankings, we can form starting ranks for the upcoming season as follows:
Whereas teams may have Elo Rankings as low as 500 and as high as 2500 from last year, this transformation squeezes them monotonically into a range of 800 to 1200, but mostly between 950 and 1050. This gives teams that were poor last year a slight disadvantage starting off the current year, and teams that were great get a slight boost.
The final tweak to the numbers is with the PRatings. When a Top 25 for a given week is produced, the assumption (rightly or wrongly) is that a team not in the Top 25 has 0 chance of winning the national championship that year. Of course, as new data enters over new weeks, the teams change, but the assumption remains. Therefore, when it is time to produce the Top 25 list, all existing teams are ranked in order of the current PRating, the Top 25 teams are selected, and new PRatings are calculated just for these Top 25 teams. This means the Top 25 teams all add up to 100%, and all teams out of the Top 25 are effectively snuffed (until next week’s games, anyway).
One last caveat is concerning teams that are not in Division I. They ARE factored into games, but they are treated as “new” teams with 1000 Elo Ranking and 1/N PRating each time. Therefore, early on in the season, facing a small team that isn’t in Division I may seem smart to boost your J-Press PRating, but the K factor (Game Importance) will severely limit the usefulness of this strategy. And later in the season, facing a non-Division I team will help very little, since they have such a poor Elo Ranking to begin with. So facing many non-Division I teams to try and boost your J-Press ranking will have little to no useful effect.
There is the system! A future post will give some of the code details, but this is essentially the heart of the operation, as far as the math goes. Hope you enjoyed the read!