This is the first week of Official Rankings (Weeks 1-6 were Preliminary, as the system calibrated). Rankings will continue all the way into the post-season.

The last post was so well received, I have been thoroughly enjoying developing this final post. Over the past two weeks, I have been reading and combing through two fantastic older books about grades, the first Teaching Without Grades (1968) and the second Making Sense of College Grades (1986). Both are excellent, and even though the second tome deals with college, in particular, 85%+ of the book is still applicable to K12 realms.

I think it would be a shame to spend two posts lamenting the travesty of our current grading system without also proposing some solutions. Granted, the 5-letter, 100-point 4.00 GPA system is fully ingrained, but little by little, we can dream about chipping away at it.

At issue is that evaluation is essentially a qualitative subjective action, not an objective quantitative one, so any measurement assigning numbers to people will have limitations in usefulness at best and abhorrent misuses at worst. My solution, melded together from the two books and some self-study in the matter, attempts to establish grading (evaluation is really the better word) on a firm footing.

Proposal #1: Remove GPA entirely and convert to a new letter system. GPA is a derived statistic of derived statistics and has essentially lost any relevant information carried to it by the dead carcasses of other statistics, so it’s an easy call to toss – it is meaningless in every conceivable way.

The new letter system reduces the number of letters from 5 to 4 (wait, here me out! It truly is a reduction!), but in practice, the system only utilizes 2 letters heavily – N for No Credit and C for Credit. On any assignment, a student can be given no credit or given credit…and that’s essentially it, no 100-point scale, no convoluted curve, no magical mathematical genie lamp to make the teacher’s implicit wishes come true.

There are two other letters that are used on occasion, P and H. H is for Honors, which is given rarely (~5%) to assignments that exceed expectations and showcase high mastery. The other letter, P, is a temporary assignment of Pending and is for the gray middle areas where a clean cut from N or C cannot be had. A student who receives an assignment grade of Pending must produce evidence of further mastery (in whatever way is mutually agreed upon) to successfully receive credit (C). Failure to do so or inaction, and the P reverts to an N.

Proposal #2: Alongside the letter evaluation system, each assignment will also be accompanied by a Narrative as well. This seeks to give detailed information to the student, the teacher’s future memory, and any other stakeholders a vivid descriptive picture of the student’s mastery and how it has evolved over time. Suggested lengths of 1 medium-to-large paragraph seems doable. Granted, this will add time and effort on the teacher, but I feel the effort would be worthwhile.

There is a temptation among traditionalists that this system will blur the lines and delete valuable information about a student’s progress (such as the 78 the student made last week or the 3.7 GPA that the student has). Yet, this is a smoke-and-mirrors fantasy. The truth is that the math and stats are being bastardized by the education system in the name of “objectivity” and “fairness” when it really is anything but. Honestly, the Narrative along with the evaluation letter gives far MORE information to students and stakeholders about where a student stands and paints a far more complete picture than a unidimensional number can ever hope to do so. Realize “academic ability” is a very multidimensional trait, so expecting a unidimensional stat to tell you what you need is naive and perhaps far worse.

We do still live in a world of grades, though, and districts, software programs, and parents and students will still demand/require numeric grades, sadly. Until a no-numbers-grading can take foot, I suggest the following: For No Credit, this is an F/50%. For Credit, this is a B/85%. For Honors, this is an A/100%. Pending skips over the C/75% range and isn’t recorded. (In my view, C’s and 70’s are one of the greatest criminals in the grading system – in the murky gray middle, no one really knows if a student has mastered the standard or not and thus slips by to repeat the tragedy next course.)

When calculating end-of-semester/course grades, it is INCORRECT to use the mean (average), since grades are (by definition) ordinal data. (So if you take away nothing else from all this, stop using the mean/average to calculate grades!) The median is better if you have to use numeric grades. However, if using qualitative data such as letters, the mode is the most appropriate to use. Simply find the most commonly-occurring letter, and take that as the overall evaluation for the semester/grading period/course. Don’t weight everything different amounts; that only introduces huge subjectivity and biases the evaluation to your personal preference. The best practice is, if you don’t want to count a daily grade the same as a test, just don’t count the daily work as a grade – simple as that! (Those saying “but…then the students won’t work!” have to ask themselves what the real purpose of grades is anyway…the books above will help tremendously.)

Finally, for any wishing to take the dive into this radical realm of evaluation, I have enjoyingly spent some time creating a gradebook that accomplishes all of everything in this post. This gradebook allows you to fill out narratives and assign evaluations, and the excel VBA code takes care of most of the tiny details for you. It’ll even create Microsoft Word reports for you based on the students, or you can produce entire folders of reports all at once for an entire class.

I’ve included the zipped file at the end of this post. You’ll have to unzip it to use it. There are two files in the folder – both of these MUST remain in the folder for the coding to work. Simply make copies of the Gradebook file for each of your classes. Also, since it contains macros, you must enable and click through the Warning messages for the code to work.

Thanks everyone for such a warm response to this series of posts, and I truly hope we can work together to change math, stats, and education for the better!

As I touched on in How Grades Are Damaging Students, I have become a firm believer in educational reform and getting rid of our current notion of grading altogether. Using some of the same mathematics as in my past last week, Does God Play Dice?, I wanted to explore the role random error plays in grades students receive.

I wanted to simulate classes of students receiving grades by teachers and teachers then using that information, for instance, to rank the Top 10. This practice is still done in many high schools for graduation purposes. But this exploration goes far beyond a Top 10 and touches on every single decision made with grades, from class placement, to demotion/promotion in school, to college choices, career options, and beyond. Truly, grades can have lifelong consequences on young human beings, which is why teachers should be even more cognizant of the actions of which they take.

Using R, I created classes of various sizes (25, 50, 100, and 500) and in each, I picked a Top 10, with the following caveats: Student grades were based on two metrics; first is their true grade, which is modeled by a logit-normal distribution (as recommended here), and also by “random” error, which we should discuss. This error component, in theory, is composed of random error (it’s modeled by a continuous uniform distribution), but in practice, teachers’ implicit biases, feelings about the students, explicit and implicit desires about what work is “excellent”, “good”, “fair”, “poor”, <insert arbitrary quality indicator here>, etc. This “random” error also takes into account confounding factors contributing to the grade, such as distractions, test anxiety, confidence, conscientiousness, stresses, and so on.

The model took the overall grade as 85%, 90%, or 95% based on true ability grade and 5%, 10%, or 15% based on “random” error, respectively. This simulation was then repeated 10,000 times, and the averages taken.

The main two metrics I looked for are as follows: 1) If the error played NO role, how many of the Top 10 would have NOT made it into the Top 10 otherwise? (listed as #1 in the table below); and, 2) How many grade points (on a 100-point scale), on average, did the Top 10 benefit from, due to the error factor? (listed as #2 in the table below). The results are shown:

% Error Allowed

Class Size

#1 (approx)

#2 (approx)

5%

25

0.5

2.1

5%

50

0.75

2.4

5%

100

1

2.9

5%

500

1.5

3.2

10%

25

1

5.6

10%

50

1.5

6

10%

100

2

6.4

10%

500

2.5

7.1

15%

25

1

8

15%

50

2.5

9.4

15%

100

3.5

10.1

15%

500

5

12.6

Thus, even in a class of 25 and assuming a modest 5% error, those that score the top grades benefit from over 2 points on a 100-point scale due to error alone! Think about how many students have an 88 versus a 90 or how often the Valedictorian is chosen by just tenths of a point, and you see how even in this smallest scenario, the results are shocking!

That’s to say nothing about when the class size gets larger. Imagine a lecture hall at a college of 100 students, and let’s say there’s 10% of the grade based on “error” (or the arbitrariness and capriciousness of the professor). If this professor grades in such a way that only the Top 10 will receive an A, those Top 10 will have benefitted an average of 6.4 points (out of 100) just by whim and fancy alone, not by skills they showcased, and a full 2 of them shouldn’t even be in the Top 10 if error played no role!

This becomes devastating when you think of all the college and career choices people make over the 20+ years of schooling they endure based on the results of grades, GPA, and class rankings.

See, here’s the sad and dubious truth about grades: They use mathematics and statistics to produce the illusion of objectivity. In reality, grades are ALWAYS subjective; unless measuring rote memorization or simple fact-and-recall (which arguably shouldn’t be graded anyway), there is ALWAYS human judgement involved in the grading process. The veneer of grades simply allows administrators, policy-makers, and sometimes teachers the façade of evidence needed to keep others happy and to satisfy parents and students.

Note, I’m not advocating for a wholesale scrapping of the education system altogether. Rather, I think society needs to reimagine what mastery looks like. If a student has complete mastery of a concept, what does that look like? What evidence can you show that isn’t on a scale, rubric, or number line? Describe what you know and how you know it.

Until society can see grading for the ghost of a solution that it is, the injustice we parade as objective scoring will continue to deliver both lucky breaks and heartaches to students who are none the wiser.

(P.S. The R code I used is almost a facsimile of the code in my Does God Play Dice? post with a few label modifications, changing the distribution assumptions, and such. Feel free to play with it if you want.)

A small town athlete makes it to the pros. A business owner of a small company goes on to become the CEO of a Fortune 500 company decades later. A small-time Youtuber, over the course of several years, becomes a millionaire with billions of views on her videos. We like to think of success as the result of all of our hard work…but how much of it boils down to pure luck? How much of our life is just a roll of the dice?

After watching an awesome video on YouTube by Veritasium about life and luck (look it up!), I decided to fiddle with the numbers myself. Here’s what I did: I created samples of various sizes (100 to 10,000) and assigned each individual a random Skill Level, from 0% to 100%. I also assigned each individual a random Luck Score, also from 0% to 100%.

I then simulated an “event” (making the NFL, becoming an astronaut, making it as CEO of a Fortune 500 company, becoming President, put whatever you want in the blank) by making 95% of that event’s outcome be based on Skill and just a measly 5% based on that individual’s Luck.

Then, I took the Top 1, 3, or 10 from the sample and checked their Luck Scores. This experiment is then repeated 1,000 times in a simulation, and the results are averaged together.

You’d think since Luck (in this example) only accounts for 5% of an individuals outcome that it wouldn’t matter much. Let’s see.

When the sample size is 100, the Top 10 had average Luck Scores of 56%, the Top 3 had average Luck Scores of 65%, and the Top 1 had average Luck Scores of 74%. Already, it’s becoming pretty obvious that Luck does play a role.

When the sample size is 1,000, the Top 10 had average Luck Scores of 81%, the Top 3 had average Luck Scores of 88%, and the Top 1 had average Luck Scores of 91%.

Finally, when the sample size is 10,000, the Top 10 had average Luck Scores of 94%, the Top 3 had average Luck Scores of 96%, and the Top 1 had average Luck Scores of 97%. Stop for a moment and think about that.

Just how big of a role does luck play? I also checked to see, on average, how many of the Top 10 in each sample would’ve been chosen based on Skill alone. In other words, if luck played no role, how many would still have been selected?

In a sample of 100, over 9 of the Top 10 would have still be chosen had luck not played a role. That is encouraging. As sample size rises, though, luck takes over. In a sample of 1,000, only about 5 of the 10 would have still be selected had luck not played a role. And in a sample of 10,000, only about 2 of the 10 would have still be selected had luck not played a role.

The results are summarized in the table below:

Sample: 100

Sample: 1,000

Sample: 10,000

Top 10 Avg Luck Score:

56%

81%

94%

Top 3 Avg Luck Score:

65%

88%

96%

Top 1 Avg Luck Score:

74%

91%

97%

Number in Top 10 on Skill Alone:

9.2

5.4

1.9

A sample of 10,000 is a modest size, but many situations in life (such as the number of players competing to be in the NFL one day) have much higher numbers than 10,000. Even with sizes as small as 10,000, Luck has an overwhelming role.

And this only assumes 5% Luck. If an event requires more Luck, these numbers become even heavier weighted.

This is not aiming to be depressing, but instead be revelatory: Much of what happens in life is beyond your control, so don’t be envious or jealous or upset about the apparent success of those around you. And likewise, don’t be haughty and demeaning if you are successful and others around you seemingly aren’t. In our self-centered view of the world, we like to attribute success to our hard efforts, and sometimes it may be. Other times, we could just be in the right place at the right time, clicked on the right link, met with the right recruiter, went to the right college, etc.

Here is the R code I used. Feel free to modify it and play with it yourself. (Don’t mind my bad coding practices!)

It truly is profound to realize how much luck plays a role in our daily lives. Knowing this information, though, we can be the best version of ourselves possible. But luck is everywhere, all the time, randomness in disguise. Heck, if you even SAW this post with a search engine’s algorithm and reached this very point to read this sentence, that may have just been a roll of the dice…