Tuesday, June 11, 2019

Specification Grading - followup

You may remember my long and somewhat bleary-eyed article about the new method of grading, called “Specification grading” or “Mastery grading” (depending on whom you ask). I wrote it mid-semester, and at that point I was convinced that it is the best grading system achievable; a method that pretty much solves all problems that we ever faced; the final form; grading as it was originally conceived in the mind of God…

Now, as the semester is over, I still think that this method is good and promising, but the picture is a bit more nuanced, and there are fewer “free lunches” coming with it than I expected.

In this follow-up, I share my impressions of specification grading, split into three parts: “Things that are great” about this method; things that are sort of “Different about it” (not good or bad necessarily, but different; something to keep in mind); and finally, things that are problematic. I finish with a hint of how I hope to further adjust my grading system next semester.

And again, if you are new to this method, read this post first:

Why specification grading is great

Students are more motivated. This is not surprising, as it is the main reason people developed this grading approach to begin with! Grading is an artificial, extrinsic motivation, which also happens to put people on a scale, classifying them into “worthy” and “unworthy” (at least in their own minds), which is known to be extremely bad for the morale:

The trick in fighting this mentality is of course to separate the helpful formative feedback from summative assessment. In point-based grading systems, you typically do it with a rubric, by trying to objectify student’s work as something that exists outside of their ego, and does not reflect back on their self-worth. It helps to some extent, but it’s not enough. With specification grading, you do it by marking all assignments on a pass/fail scale, which is in most cases rather permissive, and providing constructive, formative feedback in some other ways. And it seems to be working: in my anonymous course assessments, about 70% of students who filled the survey claimed that they either worked harder in this course than in comparable point-based courses, or that they learned more, despite putting in about the same amount of effort, just because they knew better, what skills and topics to concentrate on. A few students also claimed that the low-stress environment allowed them to spend more time on studying selected topics that they were genuinely interested in, which is of course great, if true. (The problem with questions like “did you work hard in this course” is of course that it’s unpleasant to say “no” to a question like that, so I bet these responses were a bit self-congratulatory. But still, students think that they were motivated! It is certainly a “yay” in my book!)

Students are less stressed. This is perhaps the most obvious benefit of this approach, and it makes teaching so much more fun, as the spirit in the classroom gets so much more cheerful and enthusiastic! 90% of students who filled the survey said that they were less (or even “way less”) stressed in this course, compared to a point-based course. One student said that they could not keep track of what topics they passed and what topics they didn’t, which made then anxious. I wonder why they wouldn’t stop by and ask during office hours, but even so, if less than 10% of students are stressed about the course, these days it counts as a huge win! High anxiety is arguably the main obstacle that prevents this generation from learning well, which makes it my personal enemy in the classroom. And specification grading does a great job in curbing it.

Students learn way more. It’s about the 7th time that I taught some variant of this course, and in the past, in a typical class graded on a point-based system, about 60% of all written tests reached a passing score, and only about 5-10% of students reached proficiency in 80% of topics or more. In this latest rendition on this same class, but taught with “Specification grading”, 88% of all tests got a passing grade, and 90% of students (!!!) reached proficiency in 80% of topics or more. This is not at all surprising, considering that they could keep trying until they passed (some students only passed their tests from a 4th or even 5th attempt!), but it is still amazing! The level at which they knew stuff by the end of the course was just plain incredible! I had actual productive conversations about synaptic transmission and plasticity, with one student after another, which never happened in the past. For first-years, and at this scale, it was simply mind-blowing!

Office hours are fun and productive, as students are forced to talk about their misunderstandings while they are trying to pass the test. That was potentially the best part of my experience with a new system: I never ever had office hours that full, and that fun before! Ever! Note that it may not be automatically true for your course: it only happened in my course because I made all retakes verbal. If a student wrote the quiz from the first attempt, they didn’t have to talk to me. If they needed to re-take a quiz however, they had to come to the office hours, and describe the topic to me, as I was asking them questions (Socratic if they fumbled; deeper follow-up questions if they were doing well). As I wrote in my previous account, it created an ideal setup for learning: if a student did not pass the test, we just ended up talking about their misconceptions and lacunae in their understanding. I could afford doing it, with 20 students in class, but of course it is not easily scalable (and actually I may not be able to repeat this experience next semester, as amazing as it was).

You get a much better picture of what exactly they misunderstand, and why. Again, this may be less true if your quizzes are always written, but even then, you get so much more information from several attempts than from a single one. With one test, you never know whether it was you, who explained it badly, or whether they lack some background info, or whether they just partied the night before the test. But if you meet several times in a row, with a dozen of different students, having open-ended conversations about different concepts you taught, it gives you an opportunity to essentially look at the course with their eyes. If I had to revamp this course, or any other course for that matter, I would have switched to this “Mastery Grading + Verbal Exams” mode for one semester before the revamp, just to collect the data! Going through this experience made me really rethink the emphasis I put on different topics within the course, just because I discovered that some of them are hard for students in ways I was never able to identify before.

You get to know students better, as you can observe their thinking, and even talk about their interests a bit. This, again, is only true if you decide to utilize your office hours for topic retakes, but at least for a small class, it is very achievable.

Despite higher satisfaction and better learning, it does not seem to inflate letter-grades. My average grade in this course was typically either a “B” or a “B plus”, and if I reverse-calculate the average grade for my “Specification-graded class”, I end up with a similar average score. The distribution seems to be a bit more narrow this time (it is harder to get a C, but also harder to get an A), but the average grade is about the same.

Neutral points to keep in mind

Specification Grading puts a stronger emphasis on tests, and makes students study for the test. It’s not really news, if you think of it; students always study for a test. Yet, if you give them an opportunity to retake the test several times, it makes this effect so much stronger! If the tests are good, it’s not a problem, but if they were kind of tangential to your course in the past; if you used them just to shake things up a bit, and make sure everybody come to class prepared, then be aware. Now they will come to the forefront of student’s minds. For example, if you test the vocabulary, in the past you could use this to check whether they did the reading. Now, they will actually put an effort in memorizing the vocabulary, which may be weird if your goal was the reading itself. My class developed a bit of a prepping vibe to it, as if its main goal was not to introduce students to the discipline, but rather to prepare them for taking higher-level classes in neuroscience. It was not that much of a problem, as it was always my goal, just I never before thought of it as my top priority. It was number three or four on the list in the syllabus, maybe, but not number one. And yet, by using old quizzes with a new grading system, I kind of inadvertently made it the top priority, which was a curious thing to realize after the fact.

A different type of students fall through the cracks. The strongest students and the weakest students fair about similarly in both grading systems, but the middle gets somewhat different. With point-based system, you are rewarding “natural talent” (whatever it means), social background (better schooling prior to the course), the ability to think quickly on-the-fly, and the ability to handle test anxiety. With mastery grading, you are rewarding persistence, and willingness to engage in potentially taxing interactions with the professor (as showing up to office hours every week may not be easy for some students). It means that you may see weak but persistent students succeed, while some really smart ones may fail, by repeatedly missing retake opportunities you offer. This may feel unusual, and almost counter-intuitive, but after some meditation on the topic, I decided that it is probably more fair that way. Or, at least, not any less fair than a point-based system.

What makes Specification Grading problematic

It may require more of your time. Again, if you always had fully loaded office hours, it would not be much of a difference, but if you were like me, with students barely showing up, now suddenly you’ll have four more hours of intense one-on-one tutoring every week. Depending on your goals, it may be great, but it is also something to keep in mind. And especially if you tend to naturally enjoy human interactions, as it may get dangerous for your productivity. There is probably a reason why Mastery Grading seems to be more popular in math and physics, compared to life or social sciences: it is relatively easy to create 5 different versions of essentially same assignment if you teach calculus or thermodynamics. Moreover, you may even have a TA to grade the work. With life sciences, it is harder, as it is quite rare that one could easily make several equivalent quizzes for the same topic. And grading writing assignments (short or long answers), while possible, is incredibly taxing, even if you do it pass/fail, and typically it cannot be delegated to a TA. Verbal exams are at least fun, but even then, the sheer amount of feedback you have to give, if every student attempts at least some assignment more than once, may be overwhelming.

Students tend to created backlogs, which puts an extra strain on last 2-3 weeks of the course. Because everyone can retake at least something, the closer you get to the final deadline for the course, the more your office starts to look like an Apple Store on a Black Friday. I am not sure what to do about it, but it is clearly a problem.

Attendance gets slightly worse (unless you take attendance, making it part of the specification, which I typically don’t). With point-based grading, my daily quizzes served as an auto-attendance taker, as if a student missed a class, they also missed a quiz. With Mastery approach, they coudl always retake the quiz, so more arrogant students just started skipping classes. I’m guessing if I made my retake policy a bit less permissive, it could have helped (some educators use “retake tokens” that can be exchanged for a retake, but that are made intentionally scarce, with only 3 or 5 tokens issued to every person).

Like every system, this system can be gamed. You have to be vigilant, as some students will try to game your grading system by deliberately failing tests the first time around, cramming for the test, once they know it, and then trying to pass them without actually understanding much, through simple rote-memorization (which is a well known approach, apparently: https://www.edutopia.org/article/allowing-test-retakes-without-getting-gamed). You can also see students come to the very door of your office with a handwritten list, quickly cram it for several more seconds, drop it on the floor, and enter your office, trying to start answering right away, while the data is still in their working memory. The trick here is to set limits either on the total number of retakes, or on the maximal frequency of retakes (I only allowed retakes once a week, which helped a lot). And for last-second-crammers, you just have to make sure to disrupt the narrative, and start with asking a few follow-up questions. After one-two attempts like that, students realize that actually honestly learning things gives a better result, and is actually easier than just memorizing answers to possible questions.

As this system is unusual, it feels less transparent. This is an unpleasant topic, but I think it needs to be mentioned. Look, as I said before, students are quite happy about this approach, and report 90% satisfaction with it, saying that they’d prefer it to a point-based system any time. However, when I asked them whether this system was easy to understand, only 50% of students said that it was easy; the other 50% said that they were “confused”, and that the point-based system felt easier, even if only because it is more familiar. If your goal is to teach well, it is not necessarily a problem. However, if your employment is dependent on student evaluations, you will almost certainly get lower scores on the “Grading criteria were clear” point of the rubric, which for some people may be a problem. Generally, for most places, it may be a good idea to try new grading systems only after tenure. Bard College is a bit more open to experiences, in this regard, and seems to value experimentation more, but I’m not tenured myself, so it is yet to be tested =)

If you make getting a “B” easy, but getting an “A” hard, it may lead to hard conversations. This is another unpleasant topic, but also worth mentioning. The problem here is that the specification grading mostly helps students in the middle of the curve: those who don’t pass all assignments on the first attempt, and are grateful for extra chances. Those students who tend to be doing well, on the other hand, may feel weird about this new unfamiliar system. Which means that, for a change, most conflicts you may have about final grades won’t be about students asking to change a D to a C, but rather about students asking to change a “B plus” to an “A”. And these conversations are just generally less pleasant, as you may have to negotiate with students who are more entitled, more pushy, and better equipped to work their way through the academic hierarchy. Moreover, the problem is further complicated by the fact that from the pedagogical point of view, it makes lots of sense to make “top-level” assignment, that separate “decent B-level work” from “excellent A-level work”, a bit more creative and ambitious. It totally makes sense, if you want to teach better, to tailor your assignments to your student’s needs, and to encourage them to step outside of the basic framework of the course. But it also means that your stronger students will feel uniquely challenged, compared to other assignments in the class. In a “normal” course, it rarely presents a problem: students never actually know how hard the final will be, and what curve you will use, but somehow it is a part of an accepted status quo. However, in a course that feels unusual, and is intentionally designed to be extremely transparent, this final assignment may suddenly feel “unfair”, just because it is surrounded by sparser scaffolding, compared to “basic assignments” that guarantee a “C” or a “B”. There are of course ways to help with this problem, but even so, be aware that “I deserve an A” situations are inherently more taxing for educator’s psyche than “I deserve to pass” ones.

Summary, and further plans

Will I personally stick to the specification grading next year? Yes and no. Look, while in my original write-up I used “Specification grading” and “Mastery grading” interchangeably, actually “Mastery” is just a subset of “Specification”, and I expect that going a bit broader than “pure Mastery” can help with the workload a lot. For example, you might have read about that professor who refused to give fine-tuned letter grades, but instead gave everyone a default of a “B”, and required deeper extra work for an “A”: https://www.thecrimson.com/article/1975/5/14/a-quiet-act-of-impiety-prichard/

While the article presents it as a revolutionary rebellion, essentially they just used a 2-level specification system, in which some basic level of engagement resulted in one grade, and honest extra effort was needed for the other. And I suspect that in life sciences, it may be better to shift from math-style “mastery grading” towards this more relaxed approach. This would allow to retain the best part of the “specification” model: that instead of trying to give a holistic assessment of every student’s “worth”, we would put the responsibility on students themselves, allowing them to set their personal goals, and choose the amount of work they want to do for the course. But I am less sure that I need to keep the “train them until they reach a bar” mindset. It works well in some courses (maybe in core courses, like chemistry or genetics, it would actually make lots of sense), but in expository courses (both 100s and 300s) it may be too restrictive.

In fact, I have already sketched a new version of my syllabus, for next semester, but I hope to find time to write a separate post about it. So stay tuned!

Tuesday, April 9, 2019

Mastery (specification) grading, and why you should try it

Students come up during our Friday class, ask me for a completely optional voluntary quiz, I hand it to them, then they say “thank you!” An unbelievable occurrence that happens all the time with Mastery grading. -- (@katemath)

This semester I tried a new approach to grading, called “Specification grading” (also known as Mastery-Based grading), and honestly, I think it is the best thing I’ve tried in my teaching career, ever. It is so good, it pains me to think that I haven’t discovered it earlier, and honestly I don’t understand why all faculty in the world haven’t switched to it yet. It’s not even “revolutionary”, it is just how grading should be. How it always meant to be.

So now, as I hopefully grasped your attention with these empty superlatives, let me explain how specification grading works in practice. It may sound a bit confusing and underwhelming at first, so I’ll start with the definitions, then provide a rationale for them, and then will show how I adjusted my courses to this grading scheme, as an example.

There are three main principles:
  1. Instead of assigning grade-points, and then calculating letter-grades based on these points, you define a set of criteria they need to meet to get an A. You also describe what share of these criteria students would need to meet to get a B, or a C. Some people call these criteria “bundles”, some call them “standards”, or “benchmarks”. 
  2. You give students several opportunities to meet each of these criteria. It can mean that they are allowed to take a test several times, until they pass. Or maybe they can rework and improve a piece of writing several times, until it is good enough. But that’s the key point: you don’t grade HOW and WHEN they arrived at a given level of mastery; you only care WHETHER they demonstrated this level of mastery before the course ends. 
  3. As your time is limited, and students can now try every assignment several times, you cannot afford to grade individual assignments on a point-based system: you either accept an assignment as “good enough” (if it passes your benchmark), or you say “try again”, and provide targeted feedback. Crucially, this feedback is completely uncoupled from the final grade.

This list may, at first, seem rather flat for the grandiose claim with which I started this letter. So, why does it work exactly? Here are the benefits, compared to a point-based system:
  • - Students can try to pass a benchmark several times. It means that they keep practicing the task, which is of course the best way to learn! 
  • - It also means that they have less anxiety (and as you have surely heard by now, anxiety is the biggest issue modern students face, for systemic / societal reasons). Imagine how liberating it is for a student to know that, at least in principle, in your course there are no irreversible mistakes. Given time, they can fix everything. If they are sick on a day of an exam, if they have test anxiety, they know that there’s a safety net. They can try again. They still need to plan their work (as you won’t accept it after the course is over), and they still need to put the effort in, but this system is infinitely easier on the nerves. 
  • - There may be more than one way to pass a benchmark, which is great for inclusion. Say, you can decide to use a timed test for the first attempt, a short essay on a matching topic for the 2nd attempt, and an alternative between a longer take-home essay or a short verbal exam for all subsequent attempts. It’s up to you how you define the benchmarks! But this way if a student has a test anxiety, or an attention deficit, even if undocumented (which is common for students from traditionally underrepresented groups), they can always use this extra flexibility to meet your criteria. 
  • - This improvement in students’ well-being is good for you, as now you don’t have to meet with sad anxious people, or entitled grade-grubbers, justifying each point on each quiz. If they think that you misunderstood them, they are always welcome to try again, and make sure that this time they are easy to understand. (Of course, some extreme students may still try to extort grades from you, but even then, this type of grading is really not conducive to grade-grubbing) 
  • - It is also good for you, as it makes your grading easier, and almost painless. You no longer have to obsess about “marginal cases”. As any assignment can be retaken, it is much easier, psychologically, to say “Try again! This little thing is missing, but I’m sure you can get it right next time!” In fact, you don’t even have to say “try again”: you may choose to go with “fix this one thing, and it will be a “pass”. You no longer have to be obsessively specific, documenting and justifying every little point that you give or take away. The only thing that matters is that you give some good, pointed, clear feedback about those few things, or maybe even that one thing, that matters. 
  • - For writing assignments that go through several revisions, the process of meeting the criteria becomes more of an interaction; similar to how we work with reviewers, when submitting a research paper to a journal. You tell students what they need to improve, if they want you to accept their work. You can also require that they make it easy for you to see and assess these changes (you can ask them to bring the previous copy with them, or “track changes”, or submit a list of responses - whatever works for you). And then you only recheck this one part that they improved; you don’t have to re-read the entire thing. 
  • - As a side-effect of this “benchmark” approach, you may actually make some bundles optional, and required for an “A” only (but say, not for a “B”). Now it’s up to the student whether they want to take the challenge. This moves the initiative in learning to the student; makes them “buy” into the assignment, and commit to it. It improves their learning, as now they own it! 
  • - And incidentally, it makes life a bit easier for you (which is important, as some of the points I described above could make your life a bit harder). If a student doesn’t want to attempt an assignment, as they are fine with a “B”, they just don’t. It means that you have less work to grade. And it is critical, as you have more to grade for those students who want to give another assignment another try!

How to convert an existing course to a specification grading system? Here’s my process:
  1. Identify key criteria your students need to meet. Some of them may come from skills you teach (writing, presentation); some may represent different conceptual topics of your course (cell division, species diversity). Enumerate them. 
  2. For each criterion, define how you can tell whether a student “got it”. Try to find a test that may be repeated several times. Say, when teaching math, a professor may create 5 variants of an assignment, with 4 problems in each of them, and define “success” as solving 3 problems out of 4 (in which case “retake attempts” would not be not infinite, as one could only try each test 5 times, but it is still a lot!). Or you can make students work on a piece of writing, with a rubric, until they meet 9/10 of criteria of the rubric. It’s up to you how you define it; the only two things that matter is that students understand what they need to do, and that they can have several tries at it. 
  3. Once you defined an “A”, define how a “B” and a “C” would look like. You can define pluses and minuses as well, if you want to, or you can just state on your syllabus that pluses and minuses are reserved for intermediate cases. 4. Plan your assignments in a way that would give students time to attempt them. Say, you can no longer have a final exam on the completion week. But you can use the completion week to give students an extra opportunity to get passes on some of the prior topics.

Now, some examples. I am teaching my “Introduction to Neurobiology” on a specification-based grading system this semester, and it works marvelously. Here’s an excerpt from the syllabus:

Course material consists of four "Bundles":
  1. Core bundle (weekly reading reflections and practical homework). 
  2. Knowledge bundle (tested via quizzes). 
  3. Lab bundle (lab work and lab reports) 
  4. Depth bundle (paper reviews)
Depending on what end-course grade you want to get, you need to complete different bundles to different levels. Your grade is in your hands!
End-course grade
not more than 1 reflection missed
All quizzes completed
Good lab work, all lab reports
4 paper reviews
up to 2 reflections missed
90% of quizzes completed
Good lab work, 80% of lab reports
up to 4 reflections missed
70% of quizzes completed
50% of lab reports
up to 5 reflections missed
50% of quizzes completed

All individual assignments are graded on a pass/fail (or rather, “pass / try again”) basis. There are no points, and no partial credits; you just need to pass. Simple!

The core bundle is easy, but it is the only one that is unforgiving: you need to submit your short responses (typically, 2-3 sentences) on time, before 9 am the day the assignment is due (usually Monday). Make sure to set aside time for it, and put it on your schedule. That is the only one that cannot be redone.

For all other bundles, you can have several attempts to pass them. If you miss or fail a quiz, you can try again during allotted times in class (there will be one opportunity before the spring break, and one more during the completion week). Better yet, you can come to drop-in hours, or schedule a visit (send me an email), and pass this topic as a short verbal exam (conversation). You can have as many tries as you need; the only limitation is that you can only give it a try once a week, so if you tried but haven't passed, you need to have a week-long cool-down.

For lab work, you need to be present, and meaningfully engaged (e.g. not texting in a corner, but organizing your team, building things, taking notes, collaborating, logging and analyzing data etc.)

For lab reports and paper reviews, it is expected that you may have to go through several iterations. I will provide feedback, and request revisions, until your work passes.

The description of the paper review assignment will be shared separately.

No work will be accepted after the last day of Completion week.

Pluses and minuses will be used for intermediate cases, at the discretion of the instructor. The easiest way to get a plus is to go above the minimal set of requirements for a grade.

In practice, for the “Knowledge” bundle, which corresponds to different conceptual topics we study (such as action potential, synaptic transmission, retinal circuitry etc.), I give students several attempts to pass them, and all of these attempts are slightly different. First they have a 10-min “pop quiz” in class. If they aren’t successful with it right away, they can either come to my office hours and have a verbal micro-exam, or they can wait till the special class time (15 min set aside during the lab time) when I make them provide a long answer in class. The verbal exam idea worked particularly well for me: for the first time ever, I actually have students come to my office hours, and these office hours are uniquely productive! I ask them to explain me a topic. If they are faltering, I help them. If they manage to explain the topic with only 1-2 prods, they earn a pass. If not, (and that’s the surprisingly great part!) our discussion just naturally evolves into a productive tutoring session. We transition right from their answer into a discussion about what they miss, or misunderstand, or need to express better, and then I tell them to come again in a week, and give it another try.

Only imagine how pleasant it is to meet with a student who “failed” a topic back two months ago, but who can now explain this topic to you reasonably well, even though you haven’t recently referred to it in class! It means that they went back to the material, studied it, and figured it out. They are happy, you are happy, it’s a win-win!

Does this approach necessarily make the course too easy? No, because now you can reasonalbly expect a bit better level of understanding! My mid-term results were 30% “B”; 40% “B”, 10% “C”, and 20% “D”, which is pretty similar to my grade distributions in prior years, when I used point-based grading. But who got As and Bs this time was slightly different: people who would typically be B students, but who persevere, got an A, while some smart but scattered students got a B, or even a “C”. At this point, all students can still get an A if they want to, but they need to put some effort if they want to earn it. (And in fact I had a bout of activity immediately after the spring break, as students received mid-term grades, didn’t like it, and closed some of the gaps, so current running grades are actually a bit higher than the mid-term grades).

Does this approach increase the work load on the faculty? I want to tentatively say “no”; at least, not if you assume that office hours were already spent meeting with students, before you switched to this method, and also assuming that you don’t overcommit. It was an increase for me personally, as I almost never had students attend my office hours before, and now they are packed, but it is still contained within about 2-3 hours a week, so it seems doable. It also makes planning office hours more important, as this approach essentially makes office hours attendance semi-required for a student to succeed. Which means that if a student has a time conflict, you may have to meet with them outside of you normal office hours, which may be hard sometimes. But that said, grading got so much easier, and (crucially) it is no longer painful, or emotionally draining, that it is still a net win. You know that you don’t close the door of opportunity, which gives you a moral permission to be a bit more demanding when working with “gray zone” answers, which is a huge relief, and makes the process almost pleasant.

The only word or caution here: there may be students who, despite all your statements and encouragement, both in class and in the syllabus, would be hesitant to come to office hours, especially if your standard drop-in hours don’t work with their schedule. They would be ashamed to be a burden, so they’ll just sit there, waiting for in-class opportunities to retake the quiz, but they won’t be proactive about it. It may be a personality issue, or an issue of culture, or both (think socioeconomic status, gender, introversion), but either way, if one needs to come to your office to retake the quiz, make sure that you demand from each student that they show up to office hours at least once. And then talk to them. Make sure they believe you when you say that retaking a quiz is part of the deal, and not some extra favor they have to ask for.

Finally, as another example, here’s what I plan to do with my Statistics class next fall. For topical goals, I’ll still use problem quizzes, but will prepare several versions of them. While verbal exams are great, because stats are closer to math, I think problems could work better. For skills (data wrangling, R coding, visualization, figure captions), I’ll introduce several in-class data workshops that would each count towards “closing” a topic. This would be a change to my current process: in the past, I offered all workshops as take-home assignments; graded some of them, and provided formative feedback on the rest. All workshops were submitted openly but anonymously (every student could see every other student’s submission), and the final exam was optional (if you liked your running grade, you could skip it). Now I will have to split all workshops into two types: those that are take-home, and those submitted by the end of class. (I still plan for both types to be submitted openly and anonymously). Take-home assignments will count towards “participation” (every student needs to submit something vaguely reasonable), while n-class assignments will serve as mid-terms. Except that these mid-terms will be “anticumulative”: if you proved that you know topic A, you don’t have to polish this part of an assignment anymore, you can concentrate on the topic B part of it. (I still need to put some thought into making it transparent to students, but it seems doable).

To sum up, I think specification grading (aka Mastery grading) is obviously better than the point-based grading: both for the students, and for the faculty. It seems to improve learning; it is more fair and inclusive, and it is unexpectedly pleasant to use! So I definitely encourage you to give it a try!


Kate Owens. A Beginner’s Guide to Standards Based Grading (A practical blog post) https://blogs.ams.org/matheducation/2015/11/20/a-beginners-guide-to-standards-based-grading/

Sadler, D. R. (2005). Interpretations of criteria‐based assessment and grading in higher education. Assessment & evaluation in higher education, 30(2), 175-194. https://uncw.edu/cas/documents/sadler2005.pdf

Carberry, A., Siniawski, M., Atwood, S. A., & Diefes-Dux, H. A. (2016, June). Best practices for using standards-based grading in engineering courses. In Proceedings of the 123rd ASEE Annual Conference and Exposition, New Orleans, LA.

A giant open repository of materials on Mastery-based grading (curated by Dr. Rachel Weir, Allegheny College):

Nilson, L. (2015). Specifications grading: Restoring rigor, motivating students, and saving faculty time. Stylus Publishing, LLC. (Book)

Monday, March 4, 2019

Advice to former self #3: Grading is overrated

I don’t know about you, but grading used to absolutely terrify me. Maybe that’s because of my unproductive perfectionism, literal thinking, and a tendency to complicate things, but I just could not get grading right. I mean, one can argue whether “right” is ever achievable, but I like to think that for most topics out there, I know at least a general direction towards what constitutes “better”, and so I can try to set on a decent trajectory. For point-based grading, it was always different. There seems to be no “ideal grading”, no win-win solution. If you consider all possible ways to grade, you’ll get a weird optimization landscape, where all extreme cases are just plain horrible, and somewhere in the middle sits a mediocre maximum of “least painful grading”. It is a depressing, Leibniz-style philosophy: the best grading is the one that makes everyone about equally unhappy, and the reasons for that unhappiness are as diverse as possible (that is, the mean(gradient)==0, which in practice means that different students should complain about different things, without any single common theme).

Let me elaborate, and let me start with the most basic question: why do we grade? I can probably come up with three main reasons: 1) we want to give students some feedback; 2) we want to loan them a bit of our willpower, to help them do the work, by providing some external motivation, and finally (3) we want to make sure that good students are rewarded with signalling tokens, such as a good GPA (some people may call it “justice”, or “fairness”). Even if you don’t believe in objective justice, or your ability to discern it (and I certainly have strong doubts here), we still want to trust young doctors who will operate on us in 20-30 years from now, right? Which means that I want to reward good students, rather than bad ones. So it all boils down to feedback, coaching, and fairness.

Now, if we try to translate these “goals” into practical criteria of a “good grading system”, we can probably capture the essence of it with five guiding principles:

  1. Grades should be informative (to serve as productive feedback)
  2. They should be encouraging (to help coaching)
  3. They should assess something that matters; something that is relevant (to be fair)
  4. They should be quantifiable, measurable (again, to be fair)
  5. Grading process should be time-efficient (time is limited, and we want to maximize impact)

The problem with these statements is that they are all, to some degree, contradictory. To be truly informative, grades should be brutally honest, but this would make them extremely disheartening. For example, if you only grade on a pass/fail basis, and each assignment can only be attempted once, then to achieve highest information transfer, you would need to adjust your grading criteria for every student, to maintain an average failure rate of 50%. Can you imagine what would happen to a human if they keep failing at a 50% rate whenever they do? They will probably quit, won’t they? I am not sure what is the most “encouraging” rate of failures, but judging from computer games, it should be at about 10%, just to give it a bit of spice, while still keeping it safe. Which of course would make for a very inefficient training system.

Or consider another point: WHAT do we grade? Is it the final performance, the growth, or the effort? Final performance is easy to quantify (objective, measurable), but it is fundamentally unfair, as in every class, and especially in skill-based classes like math and CS, some students would start with a baggage of transferable skills, and some will have to catch up a lot. And it does not just “feel” unfair; grading of the final product is fundamentally misplaced, as it does not measure any relevant skills of each student: their ability to learn, to grow, to persevere, or their “true potential” (whatever it means, as “true potential” is of course unmeasurable by definition). So grading by performance is bad.

To battle this issue, you may be tempted to grade growth, but this approach contradicts the principle #5 of time efficiency. To reliably measure a slope you have to have 3-4 times more point-estimations than if you only measure the final product. Slope is just a very noisy thing to assess, and so, again, we are caught in a contradiction. Either our estimations of “growth” are so noisy that it makes them irrelevant, and thus unfair, or we spend all of our time on grading, which warps our curriculum, and hurts our teaching and research.

What options are left? We can try to grade on effort, as a time-efficient proxy for growth. But then again, effort can be gamed more easily than any other measure, and also effort is actually a rather bad proxy for growth, as efforts may be so easily misplaced (through inefficient work, procrastination, etc.).

In practice most people I know use a system that somehow combines these three aspects into one “index”. Say, 40% of the grade comes from a final exam (product), 30% from participation (essentially, effort), 30% comes from labs, with 2 worst labs dropped (essentially - growth). Each part of this equation on its own is unfair, but the reasons are different, and we hope that the final formula is more-or-less OK “on average”.

But then we run into a yet different issue: students are not good in understanding grading rubrics! Imagine somebody teaching calculus one, and having a grading rubric that essentially uses formulas like grade = a*x1 + b*max(x2,x3), where each of the x values is also somewhat curved. No wonder students never understand grading rubrics! As a result, half of them give up on easy but critical assignments (which earns them a C), and the other half come to your office hours arguing about a 1 point on a 20-point daily quiz, which translates to something like 0.00002 of their final GPA. They just don’t get it! I experienced it first-hand in my second year of teaching: for several months I was working on my rubric, hoping to make it fair, objective, and transparent. By the end of semester I learned that all of my students were convinced that my grades were completely subjective, and took into account my personal guesses about the intrinsic “worth” of each person. In other words, they assumed the exact opposite of what I was telling them (or at least what I thought I was telling them), and what was formally written on the syllabus. Because they didn’t understand it.

Ironically, it means that the more balanced your grading system is, the less transparent it seems to students, which makes them anxious, and violates requirement #2, as at this point grades no longer serve as a good encouragement.

And contradictions don’t even end here! We have yet another one, about curving grades, vs. using fixed thresholds. If you have adjustable thresholds for As and Bs (what point-score corresponds to each letter grade), you can change assignments from one semester to another (which is good), or even on the fly (even better!), and also you can adjust your criteria if a snow day steals a lecture, or you get sick and fail to explain something well enough. You can correct your grading! But as a payment for that, students get really upset, as they are unable to translate their tests results into letter-grades, which obviously increases anxiety, and decreases their performance. There is a solution for that: curving, where you give a fixed share of As, Bs, etc. But if you officially introduce curving on your syllabus, students start competing with each other, which ruins course dynamics, kills collaborative assignments, and makes everyone unhappy. So no luck here as well. As Ken Bain describes in his famous book, good grading is always somewhere in-between curving and fixed thresholds.

There’s also a question of how hard would you make your assignments, given that students come to class with different skills, and different abilities. Should you serve top 10%, and explicitly give up on the bottom 50%? (That’s how Soviet model of STEM education worked: recruit the best, let the rest die). Or should you aim at about the median student? (American system). Or would you go for the lower third? (Don’t they do it in Finland?). Whichever option you choose, some students will complain that the course is going too fast, and some - that it is going too slow. And all you can do is to make sure that the proportion of complaints is “on target”, whichever your target is (in the US usually 50/50).

To sum up: grading is horrible, unpleasant, and imperfect by design. That’s probably why everyone hates it. (see also: https://rtalbert.org/traditional-grading-the-great-demotivator/ )

So, how to fix grading? Actually, at this point it seems that I have a very good solution (I wish I would have found it earlier!), which was described in 2015 by Linda Nilson, and is in fact nothing short of revolutionary! But that will be a topic of a separate (next) post!!

Wednesday, December 19, 2018

Advice to former self #2: Don't be a drill seargeant

  • Remember: It is NOT your responsibility to bring students to some predefined point B. You give an opportunity, not a guarantee. Make sure this opportunity is good, fair, inclusive, but don’t be a drill sergeant (pointless, painful). Have fun, and limit, contain your time and efforts.

Well, this one is easy, and probably even less controversial than the first one, but it took me a while to believe in it, and the realization was rather painful. Wasted a few semesters in needless bitterness and anxiety!

As a zealous neophyte, I binged on books and articles about pedagogy and teaching techniques: active learning, spaced repetition, concepts transfer. I designed my syllabi, and then my classes, with the highest impact in mind, and the effect was rather peculiar: students learned A TON, and, based on my internal “before and after” tests, their progress was quite astounding. But they were also angry, bitter, and overall unhappy.

Now, there are several communities online where jaded sad professors rant, under the veil of anonymity, about how students are ungrateful, and how the over-reliance on course evaluations spawned an inflation of praise, good grades, participation prizes, “the coddling of the mind”, and what not. “In our times”, they say… And I don’t really buy that. For one, I think it is unfair for older people to berate modern students for their “weakness”, as the “real world” that meets college graduates these days is so different from what it was even just 20 years ago: more competitive, less predictable. And also, the memories we have of our own past are shaped by the survival bias, and creative reinterpretaton of facts. Just because we came to peace with memories of a tough course that we hated back at school, does not mean that this course was any good. It just means that we grew older, and forgot just how unnecessarily painful it was.

It is really easy to concoct an image of oneself as a suffering hero, a self-sacrificial teacher whose true effect on young lives will be evident only in 10 years from now, and only by the selected few. One day they will stop in their tracks to suddenly realize: yes, this class back in college was hard and painful, but now I see how my professor truly taught me some Calculus! And now I’m so grateful for that!

But this would all be complete and utter nonsense. It goes beyond saying that course evaluations are a horrid way of evaluating faculty, but it does not mean that, as a professor, you should not care about whether students like your courses; about the emotional effect these courses have. In a way, nothing is more important than this fleeting emotional effect. If your students don’t like math while in your class, why would they ever return to math on their own? They will never use it, they will run away from it, and all your supposedly “efficient” teaching will be wasted on them, wasted completely. And because of that, there is nothing wrong with being lax and forgiving, if it makes students more engaged.

It all sounds so obvious, and maybe it was always obvious to you, dear reader; maybe you see it is a straw-man argument, but for me it was a tough realization. I spent two years or so working as a drill sergeant, prepping students for battle, as if a race of evil aliens was just about to descend on Earth in a few weeks’ time. And it totally did not work. So these days I’m trying to be as lax as I can get, without having them students completely spoiled (I’ll later describe some practical solutions in a separate post). I am sure that with my Russian heritage and upbringing, even the most chill and kind version of me is still reasonably scary and unnecessarily intense, but hey at least I’m trying!

So here’s my current approach: I downplay extrinsic motivation to bare minimum, and make it very clear from the very beginning. Here’s the class, my goal is to be here for you, and to provide you with a nice set of opportunities. I will also regularly remind you about best practices, but I will not attempt to punish you for not following them. I don't think it is my job. My job is to open the doors for you, and to show why I think the topics we are studying are fun. But it is up to you to decide how much you want to get from this class. What are your goals? Of all the options on the table, which ones are you planning to use?

I think it is a win-win. Easier, more pleasant teaching, which is also much more effective in the long-term.