Tuesday, June 11, 2019

Specification Grading - followup

You may remember my long and somewhat bleary-eyed article about the new method of grading, called “Specification grading” or “Mastery grading” (depending on whom you ask). I wrote it mid-semester, and at that point I was convinced that it is the best grading system achievable; a method that pretty much solves all problems that we ever faced; the final form; grading as it was originally conceived in the mind of God…

Now, as the semester is over, I still think that this method is good and promising, but the picture is a bit more nuanced, and there are fewer “free lunches” coming with it than I expected.

In this follow-up, I share my impressions of specification grading, split into three parts: “Things that are great” about this method; things that are sort of “Different about it” (not good or bad necessarily, but different; something to keep in mind); and finally, things that are problematic. I finish with a hint of how I hope to further adjust my grading system next semester.

And again, if you are new to this method, read this post first:
https://khakhalin.blogspot.com/2019/04/mastery-specification-grading-and-why.html

Why specification grading is great


Students are more motivated. This is not surprising, as it is the main reason people developed this grading approach to begin with! Grading is an artificial, extrinsic motivation, which also happens to put people on a scale, classifying them into “worthy” and “unworthy” (at least in their own minds), which is known to be extremely bad for the morale:
https://medium.com/bits-and-behavior/grading-is-ineffective-harmful-and-unjust-lets-stop-doing-it-52d2ef8ffc47

The trick in fighting this mentality is of course to separate the helpful formative feedback from summative assessment. In point-based grading systems, you typically do it with a rubric, by trying to objectify student’s work as something that exists outside of their ego, and does not reflect back on their self-worth. It helps to some extent, but it’s not enough. With specification grading, you do it by marking all assignments on a pass/fail scale, which is in most cases rather permissive, and providing constructive, formative feedback in some other ways. And it seems to be working: in my anonymous course assessments, about 70% of students who filled the survey claimed that they either worked harder in this course than in comparable point-based courses, or that they learned more, despite putting in about the same amount of effort, just because they knew better, what skills and topics to concentrate on. A few students also claimed that the low-stress environment allowed them to spend more time on studying selected topics that they were genuinely interested in, which is of course great, if true. (The problem with questions like “did you work hard in this course” is of course that it’s unpleasant to say “no” to a question like that, so I bet these responses were a bit self-congratulatory. But still, students think that they were motivated! It is certainly a “yay” in my book!)

Students are less stressed. This is perhaps the most obvious benefit of this approach, and it makes teaching so much more fun, as the spirit in the classroom gets so much more cheerful and enthusiastic! 90% of students who filled the survey said that they were less (or even “way less”) stressed in this course, compared to a point-based course. One student said that they could not keep track of what topics they passed and what topics they didn’t, which made then anxious. I wonder why they wouldn’t stop by and ask during office hours, but even so, if less than 10% of students are stressed about the course, these days it counts as a huge win! High anxiety is arguably the main obstacle that prevents this generation from learning well, which makes it my personal enemy in the classroom. And specification grading does a great job in curbing it.

Students learn way more. It’s about the 7th time that I taught some variant of this course, and in the past, in a typical class graded on a point-based system, about 60% of all written tests reached a passing score, and only about 5-10% of students reached proficiency in 80% of topics or more. In this latest rendition on this same class, but taught with “Specification grading”, 88% of all tests got a passing grade, and 90% of students (!!!) reached proficiency in 80% of topics or more. This is not at all surprising, considering that they could keep trying until they passed (some students only passed their tests from a 4th or even 5th attempt!), but it is still amazing! The level at which they knew stuff by the end of the course was just plain incredible! I had actual productive conversations about synaptic transmission and plasticity, with one student after another, which never happened in the past. For first-years, and at this scale, it was simply mind-blowing!

Office hours are fun and productive, as students are forced to talk about their misunderstandings while they are trying to pass the test. That was potentially the best part of my experience with a new system: I never ever had office hours that full, and that fun before! Ever! Note that it may not be automatically true for your course: it only happened in my course because I made all retakes verbal. If a student wrote the quiz from the first attempt, they didn’t have to talk to me. If they needed to re-take a quiz however, they had to come to the office hours, and describe the topic to me, as I was asking them questions (Socratic if they fumbled; deeper follow-up questions if they were doing well). As I wrote in my previous account, it created an ideal setup for learning: if a student did not pass the test, we just ended up talking about their misconceptions and lacunae in their understanding. I could afford doing it, with 20 students in class, but of course it is not easily scalable (and actually I may not be able to repeat this experience next semester, as amazing as it was).

You get a much better picture of what exactly they misunderstand, and why. Again, this may be less true if your quizzes are always written, but even then, you get so much more information from several attempts than from a single one. With one test, you never know whether it was you, who explained it badly, or whether they lack some background info, or whether they just partied the night before the test. But if you meet several times in a row, with a dozen of different students, having open-ended conversations about different concepts you taught, it gives you an opportunity to essentially look at the course with their eyes. If I had to revamp this course, or any other course for that matter, I would have switched to this “Mastery Grading + Verbal Exams” mode for one semester before the revamp, just to collect the data! Going through this experience made me really rethink the emphasis I put on different topics within the course, just because I discovered that some of them are hard for students in ways I was never able to identify before.

You get to know students better, as you can observe their thinking, and even talk about their interests a bit. This, again, is only true if you decide to utilize your office hours for topic retakes, but at least for a small class, it is very achievable.

Despite higher satisfaction and better learning, it does not seem to inflate letter-grades. My average grade in this course was typically either a “B” or a “B plus”, and if I reverse-calculate the average grade for my “Specification-graded class”, I end up with a similar average score. The distribution seems to be a bit more narrow this time (it is harder to get a C, but also harder to get an A), but the average grade is about the same.

Neutral points to keep in mind


Specification Grading puts a stronger emphasis on tests, and makes students study for the test. It’s not really news, if you think of it; students always study for a test. Yet, if you give them an opportunity to retake the test several times, it makes this effect so much stronger! If the tests are good, it’s not a problem, but if they were kind of tangential to your course in the past; if you used them just to shake things up a bit, and make sure everybody come to class prepared, then be aware. Now they will come to the forefront of student’s minds. For example, if you test the vocabulary, in the past you could use this to check whether they did the reading. Now, they will actually put an effort in memorizing the vocabulary, which may be weird if your goal was the reading itself. My class developed a bit of a prepping vibe to it, as if its main goal was not to introduce students to the discipline, but rather to prepare them for taking higher-level classes in neuroscience. It was not that much of a problem, as it was always my goal, just I never before thought of it as my top priority. It was number three or four on the list in the syllabus, maybe, but not number one. And yet, by using old quizzes with a new grading system, I kind of inadvertently made it the top priority, which was a curious thing to realize after the fact.

A different type of students fall through the cracks. The strongest students and the weakest students fair about similarly in both grading systems, but the middle gets somewhat different. With point-based system, you are rewarding “natural talent” (whatever it means), social background (better schooling prior to the course), the ability to think quickly on-the-fly, and the ability to handle test anxiety. With mastery grading, you are rewarding persistence, and willingness to engage in potentially taxing interactions with the professor (as showing up to office hours every week may not be easy for some students). It means that you may see weak but persistent students succeed, while some really smart ones may fail, by repeatedly missing retake opportunities you offer. This may feel unusual, and almost counter-intuitive, but after some meditation on the topic, I decided that it is probably more fair that way. Or, at least, not any less fair than a point-based system.

What makes Specification Grading problematic


It may require more of your time. Again, if you always had fully loaded office hours, it would not be much of a difference, but if you were like me, with students barely showing up, now suddenly you’ll have four more hours of intense one-on-one tutoring every week. Depending on your goals, it may be great, but it is also something to keep in mind. And especially if you tend to naturally enjoy human interactions, as it may get dangerous for your productivity. There is probably a reason why Mastery Grading seems to be more popular in math and physics, compared to life or social sciences: it is relatively easy to create 5 different versions of essentially same assignment if you teach calculus or thermodynamics. Moreover, you may even have a TA to grade the work. With life sciences, it is harder, as it is quite rare that one could easily make several equivalent quizzes for the same topic. And grading writing assignments (short or long answers), while possible, is incredibly taxing, even if you do it pass/fail, and typically it cannot be delegated to a TA. Verbal exams are at least fun, but even then, the sheer amount of feedback you have to give, if every student attempts at least some assignment more than once, may be overwhelming.

Students tend to created backlogs, which puts an extra strain on last 2-3 weeks of the course. Because everyone can retake at least something, the closer you get to the final deadline for the course, the more your office starts to look like an Apple Store on a Black Friday. I am not sure what to do about it, but it is clearly a problem.

Attendance gets slightly worse (unless you take attendance, making it part of the specification, which I typically don’t). With point-based grading, my daily quizzes served as an auto-attendance taker, as if a student missed a class, they also missed a quiz. With Mastery approach, they coudl always retake the quiz, so more arrogant students just started skipping classes. I’m guessing if I made my retake policy a bit less permissive, it could have helped (some educators use “retake tokens” that can be exchanged for a retake, but that are made intentionally scarce, with only 3 or 5 tokens issued to every person).

Like every system, this system can be gamed. You have to be vigilant, as some students will try to game your grading system by deliberately failing tests the first time around, cramming for the test, once they know it, and then trying to pass them without actually understanding much, through simple rote-memorization (which is a well known approach, apparently: https://www.edutopia.org/article/allowing-test-retakes-without-getting-gamed). You can also see students come to the very door of your office with a handwritten list, quickly cram it for several more seconds, drop it on the floor, and enter your office, trying to start answering right away, while the data is still in their working memory. The trick here is to set limits either on the total number of retakes, or on the maximal frequency of retakes (I only allowed retakes once a week, which helped a lot). And for last-second-crammers, you just have to make sure to disrupt the narrative, and start with asking a few follow-up questions. After one-two attempts like that, students realize that actually honestly learning things gives a better result, and is actually easier than just memorizing answers to possible questions.

As this system is unusual, it feels less transparent. This is an unpleasant topic, but I think it needs to be mentioned. Look, as I said before, students are quite happy about this approach, and report 90% satisfaction with it, saying that they’d prefer it to a point-based system any time. However, when I asked them whether this system was easy to understand, only 50% of students said that it was easy; the other 50% said that they were “confused”, and that the point-based system felt easier, even if only because it is more familiar. If your goal is to teach well, it is not necessarily a problem. However, if your employment is dependent on student evaluations, you will almost certainly get lower scores on the “Grading criteria were clear” point of the rubric, which for some people may be a problem. Generally, for most places, it may be a good idea to try new grading systems only after tenure. Bard College is a bit more open to experiences, in this regard, and seems to value experimentation more, but I’m not tenured myself, so it is yet to be tested =)

If you make getting a “B” easy, but getting an “A” hard, it may lead to hard conversations. This is another unpleasant topic, but also worth mentioning. The problem here is that the specification grading mostly helps students in the middle of the curve: those who don’t pass all assignments on the first attempt, and are grateful for extra chances. Those students who tend to be doing well, on the other hand, may feel weird about this new unfamiliar system. Which means that, for a change, most conflicts you may have about final grades won’t be about students asking to change a D to a C, but rather about students asking to change a “B plus” to an “A”. And these conversations are just generally less pleasant, as you may have to negotiate with students who are more entitled, more pushy, and better equipped to work their way through the academic hierarchy. Moreover, the problem is further complicated by the fact that from the pedagogical point of view, it makes lots of sense to make “top-level” assignment, that separate “decent B-level work” from “excellent A-level work”, a bit more creative and ambitious. It totally makes sense, if you want to teach better, to tailor your assignments to your student’s needs, and to encourage them to step outside of the basic framework of the course. But it also means that your stronger students will feel uniquely challenged, compared to other assignments in the class. In a “normal” course, it rarely presents a problem: students never actually know how hard the final will be, and what curve you will use, but somehow it is a part of an accepted status quo. However, in a course that feels unusual, and is intentionally designed to be extremely transparent, this final assignment may suddenly feel “unfair”, just because it is surrounded by sparser scaffolding, compared to “basic assignments” that guarantee a “C” or a “B”. There are of course ways to help with this problem, but even so, be aware that “I deserve an A” situations are inherently more taxing for educator’s psyche than “I deserve to pass” ones.

Summary, and further plans


Will I personally stick to the specification grading next year? Yes and no. Look, while in my original write-up I used “Specification grading” and “Mastery grading” interchangeably, actually “Mastery” is just a subset of “Specification”, and I expect that going a bit broader than “pure Mastery” can help with the workload a lot. For example, you might have read about that professor who refused to give fine-tuned letter grades, but instead gave everyone a default of a “B”, and required deeper extra work for an “A”: https://www.thecrimson.com/article/1975/5/14/a-quiet-act-of-impiety-prichard/

While the article presents it as a revolutionary rebellion, essentially they just used a 2-level specification system, in which some basic level of engagement resulted in one grade, and honest extra effort was needed for the other. And I suspect that in life sciences, it may be better to shift from math-style “mastery grading” towards this more relaxed approach. This would allow to retain the best part of the “specification” model: that instead of trying to give a holistic assessment of every student’s “worth”, we would put the responsibility on students themselves, allowing them to set their personal goals, and choose the amount of work they want to do for the course. But I am less sure that I need to keep the “train them until they reach a bar” mindset. It works well in some courses (maybe in core courses, like chemistry or genetics, it would actually make lots of sense), but in expository courses (both 100s and 300s) it may be too restrictive.

In fact, I have already sketched a new version of my syllabus, for next semester, but I hope to find time to write a separate post about it. So stay tuned!