Mathematics Behind the J-Press Rankings

In today’s post, I want to detail some of the mathematics behind the J-Press Rankings system. At some point in the future, I will also dump some of the Python code directly into a new post for others to play with.

At its core, the J-Press system is an Elo Rankings system, modified for college football and transformed into probabilities. I’d encourage you to Google more about Elo Rankings systems.

To begin with, all teams start with an Elo Ranking of 1000. As teams win or lose, they gain or lose Elo Ranking, respectively.

Now, suppose two teams, Home, and Away, are facing each other. Let’s call Home’s Elo Ranking H and Away’s Elo Ranking A. The first math that is really important to compute is the Expected percentage for each team; this gives a number between 0 and 1 that is the probability of that team winning that game. It is computed using the following formulas:

Expected_{Home} = \frac{10^{H/800}}{10^{H/800}+10^{A/800}}

Expected_{Away} = \frac{10^{A/800}}{10^{H/800}+10^{A/800}}

The actual outcome (O) of the game is assigned as a “1” (Win”) or a “0” (“Loss”).

Another aspect factored into the system is “game importance”. This uses their existing PRating (probability rating), which will be discussed later. To begin with, each team is given a PRating of 1/N, where N is the total number of teams in Division I NCAA football for that year. You can think of PRating as “the probability that this team wins the National Championship” and will be the main output of the ratings later. For game importance, Game Importance (identified as K), is calculated as follows:

K = 20 \cdot \min (16, \frac{HP + AP}{AvgP}) ,

where HP is the PRating of the Home team, AP is the PRating of the Away team, and AvgP is the average PRating of all N teams currently. For two teams with PRatings equal to the average PRating, K = 40, so this is the standard default Game Importance. It can grow, though, up to K = 320, for very high-impact highly-rated teams. This value essentially controls how fast or slow a team can gain Elo Ranking (and therefore, PRating). This means in small insignificant games against small foes, the numbers will change very little, but in large juggernauts of a game, numbers can change greatly.

The change in Elo Ranking, after Home and Away have played each other, is calculated by:

NewHomeRanking = H + K \cdot ( O_{Home} - Expected_{Home})

NewAwayRanking = A + K \cdot ( O_{Away} - Expected_{Away})

There is also a small bonus added in some occasions to the Elo Rankings. If a team won by 9 to 16 points, they are given a 10-point Elo Ranking boost. If a team won by 17 or more points, they are given a 15-point Elo Ranking boost instead. This is used to factor in margin of victory, which does have some predictive power in which teams are more successful. However, the boost is only modest, and teams cannot “pad” their victory by, say 40 points, and expect to benefit from that. That doesn’t work. Close games (where teams won by 8 or fewer points) do not benefit from this boost.

The Elo Rankings are not what is shown publicly, though. I strongly feel that the Elo Ranking (such 1453) tells you very little, and percentages are much more intuitive to people. So, a transformation using the Softmax function from multinomial logistic regression is utilized to transform all of the Elo Rankings into a PRating. (As mentioned, their initial PRating is just 1/N, but once games are played, this naive guess is updated.) If a team’s Elo Ranking is Ri for the ith team, the calculation to transform it to a PRating is as follows:

PRating = \frac{ 2.5 \cdot (\ln R_i)^{2.5 \cdot (\ln R_i)}}{\Sigma_i^N  2.5 \cdot (\ln R_i)^{2.5 \cdot (\ln R_i)}}

This function has unique properties. First, and most importantly, it nicely puts all N teams into a probability distribution, meaning their PRatings all add up to 1. Secondly, the exponent-tower structure used here heavily weights higher ranked teams; this is what we want – we want to be able to distinguish among the top 10 (and ideally, top 4) teams out of over 100 teams. We don’t particularly care if the ranking system distinguishes well between the 78th and 79th ranked teams, but we definitely want some predictive power to see differences between the 2nd and 3rd best teams, for instance.

The code then sets the average PRating (used in an above calculation) equal to the median of the PRatings of all teams. This keeps outliers from heavily skewing the results.

The process isn’t yet finished, though! The existing iterative process starts all teams with 1000 Elo Ranking and 1/N PRating, which is a naive assumption – we can do better. Using last year’s final Elo Rankings, we can form starting ranks for the upcoming season as follows:

newRank = \frac{1000 \cdot \ln (CurrentRank)} {3 \cdot \ln (10)}

Whereas teams may have Elo Rankings as low as 500 and as high as 2500 from last year, this transformation squeezes them monotonically into a range of 800 to 1200, but mostly between 950 and 1050. This gives teams that were poor last year a slight disadvantage starting off the current year, and teams that were great get a slight boost.

The final tweak to the numbers is with the PRatings. When a Top 25 for a given week is produced, the assumption (rightly or wrongly) is that a team not in the Top 25 has 0 chance of winning the national championship that year. Of course, as new data enters over new weeks, the teams change, but the assumption remains. Therefore, when it is time to produce the Top 25 list, all existing teams are ranked in order of the current PRating, the Top 25 teams are selected, and new PRatings are calculated just for these Top 25 teams. This means the Top 25 teams all add up to 100%, and all teams out of the Top 25 are effectively snuffed (until next week’s games, anyway).

One last caveat is concerning teams that are not in Division I. They ARE factored into games, but they are treated as “new” teams with 1000 Elo Ranking and 1/N PRating each time. Therefore, early on in the season, facing a small team that isn’t in Division I may seem smart to boost your J-Press PRating, but the K factor (Game Importance) will severely limit the usefulness of this strategy. And later in the season, facing a non-Division I team will help very little, since they have such a poor Elo Ranking to begin with. So facing many non-Division I teams to try and boost your J-Press ranking will have little to no useful effect.

There is the system! A future post will give some of the code details, but this is essentially the heart of the operation, as far as the math goes. Hope you enjoyed the read!

How Grades Are Damaging Students

In 2021, the 100-point grading scale is still the primary grading system used by most schools, colleges, and universities. Yet, for over a century, it has been known that this 100-point grading scale suffers from extreme statistical problems that render the results essentially useless. But we continue to use it! In fact, it has become so ingrained that change seems unlikely in any realistic way.

The problem is that an assessment can be statistically analyzed (ie, graded) in many different ways, depending on the teacher. One teacher may give a student a 50; another may give a 90. Over time, this variance builds up so that the final grade a student has is simply a culmination of rampant subjectivity. This also helps explain racial biases present in many school systems across the country.

The problem, when it really comes down to it, is that the rating scale – 0 to 100 – is so large that it makes any specific assignment largely meaningless. What’s the difference between a 70 and a 71? Nothing? Well why HAVE the difference if it means nothing? Additionally, in most systems, failures occur over 60% or more of the entire rating scale: If a 20 and a 60 are both failing and both convey the same deep gravity of a student’s ineptitude at the topic, why is it necessary to separate the grades by 40 points?

The entire system is “simple” and “intuitive” because it bases everything as a percentage out of 100, but this presents more measurement and statistical problems than solutions! As “intuitive” as it may be, it is definitely NOT a valid instrument of measurement in most cases, I argue.

But alas, I doubt the situation will meaningfully change in the coming years. Without a real push to a rating scale that makes sense (such as a simple 1-4 rating) or an assessment system that bypasses grading altogether (for example, see mastery.org), I wonder how many more millions of students will have their academic trajectories damaged or destroyed beyond repair at the whims of the 100-point system. I personally believe most students are creative and educationally astute, until we stomp it out of them.

I am running a grading experiment, if you will. Go to https://forms.gle/TQhdfy851784Y1x69 to play along – it’ll take 2-3 minutes to grade a short mock test of a 3rd grader. My aim is to illustrate, empirically, how wildly grades can vary, given the whims and fancies of the grader. Maybe, one day, this issue will get the attention it deserves. Until then, here’s an A+ for sticking with me. Meaningless? Yes, why, yes, it is…

An April Look At Covid

Covid has continued to ravage the planet in recent weeks, but the collective appetite of Americans to indulge in public health measures is quickly approaching zero. On my personal Facebook page, over the course of the pandemic, I have periodically shared some stats and analysis about the current state of the pandemic. Today, I will do so here.

The model I have been using for some months now to predict deaths from Covid is much simpler than some of the high-end statisticians use, but it has still proven effective for the most part.

First, I used https://www.worldometers.info/coronavirus/country/us/ to track and record the daily cases and deaths in the United States. I started with June 1, 2020, and here’s my reasoning: Before June 1 (in the early days of the pandemic), accurate case and death counts were sketchy, and I wanted accurate data for my predictions. Granted, by May and June of 2020, the data were largely good, but I decided on June 1 to be totally sure the data were accurate.

Here’s the key assumption I make in the model: After N days of X people becoming infected with Covid-19, a certain percentage P of those X people will die. The main statistics I had to figure out was the N and the P.

One of the first thing I did, since this time-series data, is run the Cross-Correlation Function (CCF) to test for what lags are most prominent between cases and deaths. The best lag was -22, ie, case count preceded deaths by about 22 days. Over the past 6 months I have been running this model, the -22 day lag has remained remarkably consistent. The message is clear: Someone that will die from Covid typically, on average, dies about 22 days after diagnosis.

Figuring out the percentage P was a little trickier, but I programmed a loop that iterated the sum of squared differences between 22-day-old cases and current-day deaths to find the one that minimized the sum of the squared errors the most to within one-hundredth of one percent. This number has NOT be stable over the 6 months I have ran the model. Actually, it has been monotonic, but decreasing. (This makes sense – as a virus propagates, it is advantageous to the virus to get less lethal over time.)

For example, in late November (near Thanksgiving), the virus had about a 1.65% death rate, in that, Covid Case count on Day 1 turned into 1.65% of those cases dying by Day 23. Today, on April 18, that number is down to 1.41%. This also makes sense because vaccines are taking effect, and many of the most vulnerable have already gotten vaccinated. Still, it is a perilously low drop – only about .25% drop in the past 6 months. And the overall death rate still stands at 1.46% percent, according to my model, going back to June 1. There is some hope here that eventually the death rate will make the virus become a nonfactor, but its rate of decline has plateaued (it was 1.45% for the month of January).

Here is the plot of the two time series:

Death rates of Covid19 victims since June 1, ending April 17, 2021

There are so many interesting tidbits about this graph. First, notice the big dips – these mostly correspond to holidays, when the reporting went down due to facilities closed for the holiday. The slight dip in blue/red around day 90 is Labor Day, the larger dip around day 160 (blue) and day 182 (red) is Thanksgiving, and the huge massive dip around day 190 (blue) and day 212 (red) is the Christmas/New Years holiday. They are “fake” dips, in that it is unlikely true numbers actually dipped during these times – it is likely just lack of human recording during the holidays.

Also notice overall how well these two plots match overall: There is only a cumulative error of 5.8% in the red graph (cases) to predict the blue graph (deaths), and the correlation between them is 0.97.

What does this tell us about the coming weeks? Well, the upward trend in Cases (red line) definitely will likely produce an upward trend in deaths soon (the blue line). If the prediction (the dotted line) is correct, we could see over 1,000 deaths per day again over the next 3-4 weeks.

No doubt vaccines will make a difference, but we are reaching a limit very soon; the number of people willing to get vaccinated will soon be reached. We likely will not reach herd immunity. This gives the pandemic air and room to keep going and keep mutating. I fully suspect the pandemic, in some form or fashion, will be with us most or all of 2021. Why? The tail of the Cases graph above was already sliding down at an increasingly slow speed, and now it’s heading back up again. I think if this pandemic does end anytime soon, it will be a slow, agonizing end, not a definitive stopping point.

We, as Americans and as a planet, should take heed at the alarming signs. Will we? Doubt it. We are just too sick and tired of it all. I feel like Americans as a whole just want to let it ride and take our chances, and if that’s the case, chances we will most certainly take.

Until next time.

2020 Final Rankings

Here are the final rankings, after the bowls and playoffs, for the 2020 season.

RankTeamPrating
1Alabama0.484583
2Ohio State0.112337
3Clemson0.06403
4Oklahoma0.046337
5Georgia0.043602
6BYU0.042289
7Texas A&M0.034773
8Louisiana0.028634
9Notre Dame0.028483
10Cincinnati0.026721
11Coastal Carolina0.017277
12Ball State0.009516
13San José State0.009075
14Appalachian State0.0082
15Liberty0.004924
16Memphis0.004506
17Indiana0.004449
18Iowa State0.004042
19Tulsa0.003575
20Oregon0.002551
21Buffalo0.002124
22Florida0.002095
23Army0.001838
24Marshall0.001787
25NC State0.001787

Preseason Rankings CFB 2021

Here are the J-Press Rankings for the 2021 College Football Preseason. These will be the scores each team starts with heading into Week 1 of the season. They are based on ending scores from the 2020 season, including bowl games and playoffs.

RankTeamPrating
1Alabama0.066617
2BYU0.050249
3Ball State0.047514
4Oklahoma0.04743
5Ohio State0.04676
6Coastal Carolina0.046509
7Texas A&M0.046216
8Liberty0.044592
9Georgia0.044426
10Louisiana0.043514
11Clemson0.043307
12Cincinnati0.042976
13Notre Dame0.04059
14San José State0.037294
15Kentucky0.037092
16Texas0.033646
17NC State0.033607
18North Carolina0.032576
19West Virginia0.032379
20Army0.032221
21Indiana0.031041
22Tulsa0.031002
23Ole Miss0.029791
24Mississippi State0.029752
25Iowa State0.028898

The Data Overload

It has been said that oil was the commodity of the 20th century, but data will be the commodity of the 21st century. In the past 30 years, computer processing power has grown exponentially, and with it, the capacity and ability of our technology to gather data has skyrocketed. Statisticians and data scientists have been grappling with the very real problem in recent years of how to analyze and make meaning of the copious amounts of data that they now have access to, but I assert that we as a society are also struggling with the data overload in our day-to-day lives.

For example, it would appear that police interactions and mishaps with increasingly angry communities are on the rise. It would also appear that gun violence, divorce rates, and even global temperatures are increasing at alarming rates. In all of these cases, I believe that the onslaught of new data that we have, that weren’t available in accessible forms decades ago, are influencing or even misleading these trends.

Take the police interactions today. It is entirely possible that body cams, cell phone footage, and 24-hour news coverage have made these types of situations appear many times more common than in days past, even to crisis levels, when perhaps the prevalence is really not much different than it has ever been in the past few decades. That is not to say there is not a real problem, just that the problem has be extant for many decades and isn’t a new phenomenon.

Or gun violence and divorce rates – these also appear to be reaching “crisis” levels today, and arguably are actually on the rise. However, this could perhaps be a feedback loop created by the existence of myriads of data drowning people every day. “Back in the day”, wives were only told about marriage via the church, elders, and the carefully-curated network television shows on the few channels that existed. Today, the secular world is literally breathing down your neck every second of every day with portrayals of “picture perfect” couples, ideas of what happiness is, what you deserve, etc. Also, many relationships that are fractured with constant fights or abuse now have ample data to support getting a divorce, whereas in the “good ole days”, wives rarely had a way out. (Note: This is not commenting on whether divorce is “right” or “wrong”, but merely that wives (and husbands) have more data now to make decisions than previously.) Gun violence is a similar feedback loop, with more and more people seeing gun violence in their everyday data and increasingly seeing it as a “solution” to their problems.

Climate change is the other example I mentioned – now, I don’t doubt for one second that climate change is real or that mankind may be influencing it. I do claim that the mountains of data we have today versus the vacuum of data centuries ago creates a bias in our views. It seems very likely that the world is warming and that humans may be at least part of the cause; but making extrapolations of data into the past or future based on sparse or weak data and imputing that into trends is a dangerous idea.

I think humanity may not even realize the flood of data that sweeps over them everyday now and how that is influencing their decisions in even subtle ways. It just has never been the case that humans have had access to so much information at every moment of the day – smartphones, TVs, computers, vehicles, and even our homes are becoming “smart” and drowning us with data. Our brains were not meant to handle such overload. Memes, trends, and fads race across the neural world as mental viruses. Just as data scientists are struggling to grapple with a new age of Big Data, each and every one of us are incrementally and rapidly evolving our mind’s framework, one piece of datum at a time…