Tuesday, August 20, 2019

Specification ungrading

This is a third post about my adventures in alternative grading (here are the links to the first, and the second posts in the series).

Briefly: after reflecting on my “Mastery grading” saga last semester, I decided to once again change everything in my "Intro to Neurobiology” class. This time around, I’ll try a revolutionary yet controversial “Ungrading” approach, toned down using a hybrid “Specification” scheme. I am pretty excited for this experiment, as I feel that it may be uniquely suited for teaching intro classes in particular. If this sounds interesting, read along!

First of all, why changing the grading system again? See, I kind of enjoyed my experience with mastery grading last semester, but it made me realize several things. One: it was really hard on my time, and I don’t think it is sustainable in the long-term, just because weekly verbal exams with several students, even if very short, are a huge time commitment. Second: as we spent all office hours talking about science, we had no time left to talk about more human, and thus potentially more important things, such as life, college, careers, art, courses, learning strategies etc. The existential component was gone, and it is a shame, especially for an introductory course, that is mostly taken by first-years.

But the most important concern, is that, based on student evaluations, as well as some anecdotal evidence, students didn’t enjoy the class as much, compared to its previous renditions. In fact, it is the first time in history that my course appreciation rating tanked slightly below 4.0 on a 5-point scale (p=0.03, compared to my typical performance of about 4.5). This numerical analysis is, of course, all sorts of problematic (I hope to write a separate, referenced post on this topic), but it is still curious. Students learned a ton more than in any of my previous classes; and not by a bit, but something like 3-4 times more (see my previous analyses). Moreover, at least nominally, in an abstract, cerebral way, they knew that they learned a ton, as I kept reporting back to them in class. They also unambiguously reported lower levels of anxiety. And still, they liked the class significantly less (both in the statistical, and in the IRL sense of this word). Weird!

There are several ways to look at it, philosophically. One can of course say that it is more important to be a good professor than students’ favorite professor [1]. One could also argue that I should have done a better job pitching the class to the students. Or that we can just ignore these ratings, as first-years, taking their first course in the discipline, don’t have a good reference point for judging anything. Or we can go full groachy and complain that students’ goals are no longer about education, as they don’t want to walk uphill both ways. But regardless of whether anything of it is true, the fact is that I worked more, students learned better, and yet they were about 10% less happy (4.5 to 4.0), on average, than in all previous years. And I don’t like it. Because arguably, the main goal of an intro course is not to prep students for the second-year sequence, but to give them a taste of, and an appreciation for science. Here, the emotional goal, the enjoyment of the process, may be more important than objective learning. And if this is true, then having lower student appreciation is actually worrisome. Not to mention that that the second most important goal for a typical intro course is, probably, the metacognitive training: learning how to learn. Which, again, suffered with “mastery training”, because of its emphasis on skills. And so, even though it was fun to try, I think that “mastery grading” may just be poorly suited for intro courses in biology; at least of a type we teach here at Bard. 200-level biology, or some electives - maybe. 100-level math: maybe, for a different reasons. But not the bio intros.

What could be an alternative? Do I have to return to a vastly inferior point-based grading, just because it feels more familiar to students, and so less intrusive? Maybe not! Introducing: yet another revolutionary grading technique, called “The Ungrading”! While lots is written about it [2, 3], here are the key points:

Lots of feedback, but no letter grades or even points
2-3 written self-reflections (self-evaluations), in which students are encouraged to, essentially, grade themselves, and give themselves advice on how to do better [4]
Followed by a one-on-one discussion in office hours
The final grade in the course, for each student, is assigned by this student themself. If the grade does not feel right to the instructor, they negotiate. In extreme cases, instructor reserves the right to override the grade, but in practice it happens very rarely, and even then, usually upwards, rather than downwards.

What are the benefits of this approach? The biggest two are: (1) a replacement of extrinsic motivation (grades) with intrinsic motivation (learning), which is known to be better [5], and (2) explicit introduction of metacognitive exercises and discussions. Which was precisely what mastery grading lacked, as it shifted the focus on prepping a bit too much for my liking. And again, prepping is OK and fun in some classes, if students are into it, and know what they are doing. But in an exploratory intro class, a metacognition-oriented design may be much more impactful [6].

And yet, in its pure form, “Ungrading” feels very dangerous. It is known to be paradoxically anxiety-triggering: it is so unlike anything students have experienced before that they freak out, and cannot believe that it is not a trap [7]. And even at the existential level, pure self-evaluation can be incredibly stressful (think of the impostor syndrome!). This downside could potentially be corrected with thoughtful one-on-one discussions on inclusion, and personalized advice on learning strategies, but I think that for most faculty, me included, it would be a tough ask. It would come dangerously close to life coaching, or maybe even therapy, which we are never good at.

Moreover, pure “Ungrading” feels particularly dangerous for teaching a highly heterogenious group of students: if a third of your class suspects that they may not belong, while another third are competitive “gunners” that try to out-achieve anything that moves, it is easy to see how a system of complete self-grading can be perceived negatively by everyone, and become really detrimental for everyone’s learning. And then there are the mundane issues of class attendance and the use of mind-altering substances immediately before class. In essence, pure “Ungrading” feels like a case of a very “Not-inclusive pedagogy”, in the sense that it does not offer students enough structure and motivational support to help them succeed with their studies [8].

Is there a way to fix this system, and negate the negative sides of it, while keeping the good? I think the answer is “yes”: if we combine Ungrading with a simple version of Specification grading, to make sure that the course has some strong, fine-grained structure, but still retains the reflective aspect of “Ungrading”. The structure will help to reduce anxiety, both for those students who don’t yet know how to work and how to feel about their own success in class, and for those that are concerned with permissive grading systems being somehow unjust. Remember that prof who gave a “B” to everyone, but required extra work for an “A”, which really was the most run-down version of specification grading? [10] We could use a similar approach! The coolest thing about specifications is that they give you a powerful tool for combining very different types of assignments, by separating them into different strata (aka “Bundles”). With specifications, we can have the best of all worlds: a structured “minimum required performance”, a free-form reflective “Ungrading”, and a “challenging bonus assignment” on top, for those who want to walk an extra mile. We can reap all the benefits of not grading, without paying the price of it!

But enough rambling; at this point I’ll just post the draft of my new syllabus below, which I hope is rather self-explanatory. What do you think? Would it work? Any suggestions, while I can still make changes to it?

Intro to Neurobiology, Draft Syllabus

(Section on grading)

This course is graded using a “Specification Grading” system, which is known to work better than the standard “Point-based” system, both in terms of final results, and student experience. The gist here is that each letter grade comes with a certain list of criteria, and it is up to you to pick the level you want to achieve.

To get a C:

Participate in 80% of classes
Participate in 80% of labs
Submit 60% of short written assignments (weekly reading reflections, labs, short exercises etc.) They don’t have to be perfect, but they need to be reasonable. If any given submission is problematic, I’ll let you know within a week, so that you could improve next time.

To get a B:

Participate in 90% of classes
Participate in 90% of labs
Submit 90% of short written assignments
Submit 2 reflective letters (each about 3 pages long; see the descriptions below). One before the mid-semester break; the other one before the completion week. The letters should be meaningful; if they are not, I’ll let you know, and give you a chance to resubmit.
Come to office hours at least once in the first half, and once in the second half of the semester.

To get an A:

On top of everything specified in “B”, write a good final essay about a neuroscience paper; then meet with me during the completion week, to discuss both the paper and the essay. See the instructions below.

Reflective letters

To get a “B” or higher, you need to write two letters, reflecting on this class, and your work in it.

For the first reflective letter (to be written before the mid-semester break), please answer these questions. Each answer should be about half a page (200-300 words).

Please describe your personal goals for this course. What do you want to take from it? What do you want too learn and understand? Why?
Describe the coolest, most interesting thing you learned in this course so far. Try to write as if you were describing it to your friend who is not taking this course. Please tell me why you, personally, find it interesting.
Describe one topic you studied in this course that you still do not completely understand, but would like to understand. Try to explain how you know that you don’t understand it fully yet. This is a tricky question, as it is hard to notice that you don’t understand something, and it is even harder to write about it. Note that if what separates you from knowing this topic is a few minutes of googling, it clearly does not count as an answer. Try to think deeply about everything we studied in this course, and whether you really got it. It is an important skill: to know what you know, what you don’t, why you want to know it, and how to get there.
Describe how you study for this course (half a page). How much do you study each week? How do you use this study time? What sources do you use? Have you tried to study with someone? Have you looked for help? Note that I will not judge your studying schedule, even if you confess that you don’t study at all. I just want to know what you do, and I want you to think about it.
Tell me something about this class, your work in it, and your experience of it. What seems to work well? What does not? Why? How to make it work better? If there are any particular things you would like me to address, please let me know. I’d like to know what you think, and how you feel.
If you had to give yourself a mid-term letter grade for the first half of this course, what grade would it be? Why not a higher, or a lower one? Note that this self-grading won’t affect your actual grade at all, so let’s just honestly think, and later talk about it.

Second paper (to be written in the second half of the semester, before the completion week). Each response should take about half a page (200-300 words).

Describe the most important, most useful thing, either conceptually or practically, that you learned in this course. Not necessarily the most interesting one, but the one that is consequential, for you, personally. What makes it important?
Reflect on your work in the second half of this course, compared to the first half of it. Is there anything that you do differently now, compared to 2-3 months earlier? Have you tried to deliberately change something about your studies? Did it work? Have you stopped doing something because it did not feel productive?
If you had to give yourself a letter grade for this course, what grade would it be? Why not a higher grade? Why not a lower one? Again, this self-grading won’t affect your actual grade, but I’d like us to talk about it.
What are you plans for next semester, and next year? What do you want to do? What do you want to learn? Are they the same plans that you had as you arrived at Bard, or did anything change? If yes, why? Did the courses you took this semester affect your plans for the future, in any way?

Final essay, and a discussion of it

This assignment is only required if you want to go an extra mile (above a letter grad of “B”). If you put good work in it, you can get an “A”; if you don’t put enough work, you can get an “A minus”, or a “B plus”, depending on how far you get.

First, you need to find a neuroscience paper that you’d like to write about, from a journal named eLife ( https://elifesciences.org/ ). It is an open-access journal, so all papers are freely available online. It may make sense to find a paper that has something to do with something we learned in class, but if you want to go rogue, it is also possible (just make sure it’s neuroscience). Also, it has to be a primary paper: an experiment, a model, or a meta-analysis, but not a literature review, or an opinion piece.

Send me an e-mail with a link to the paper, to claim it. If I think that the paper is a poor choice, I’ll let you know (it’s unlikely, but just in case).

The earliest you can claim the paper is a week before the mid-semester break. The latest you can claim it, is immediately before the completion week. The more time you have, the better you can prepare, and the more feedback I can give you, when you ask for it.

Read the paper of your choice. Read some of the papers that it references, to learn the background (especially those referenced in the “introduction” section). Make sure you understand what question this paper asks, and why it is important. What are the consequences of knowing it? What are the hypotheses the authors had? You need to understand the methods. You need to know the figures, and understand what they are trying to say. You also need to read the exchanges that the authors and the reviewers had before the paper got accepted (eLife is one of the few journals that publish these exchanges openly). You need to get a good idea of what happened there, at the review stage; what the concerns were, and how the authors responded to these concerns.

Then write your essay. It should clearly address the following points:

The rationale for the paper (what makes its question important), and the background needed to understand the study. Explain what we need to know before we even start reading the paper of your choice.
The narrow question (or questions) the paper posed, and associated hypotheses.
The methods used. You need to have a good grasp of the general idea of each method, and some meaningful details about them. Outline weak and strong points of these methods, compared to others methods authors could have conceivably used within the same time and financial budgets.
The overall structure of the paper, including the key message of each panel in each figure. How do these messages contribute to the answer the paper eventually provides? How do different figures interact with each other?
The answer eventually provided by the paper. Make sure to refer back to the narrow question the authors posed, and the hypotheses they had. What does this answer mean, in a broader scale of things?
What are the limitations of this study? What follow-up studies can be inspired by this study? Some of these are probably outlined in the Discussion, some may be identified from the peer review materials, some you can discover yourself. Make sure to go beyond the obvious; if a limitation is true for every paper ever written, it is probably not the most useful thing to discuss.

You need to submit the final version of the essay by the first day of completion week, and we’ll schedule a meeting to talk about it. In the meeting, we will discuss the paper you wrote about, using your essay to answer the six questions outlined above.

There will also be a deadline to submit a draft version of your essay. If you submit something by this deadline, I will read your draft, and offer you feedback. We can also meet during drop-in hours, to talk about your project.

The measure of success here is the depth of your research towards understanding this one single paper. Some topics, methods, and analyses are inherently more complicated, while some are simpler, so we are not going after some predefined level that you need to reach. Rather, I want to see your independent work, and your ability to use what we learned in class to learn more science on your own. If the paper is “easy”, I will expect a deeper understanding of it, and I may ask more follow-up questions. If the paper is hard or long, a deep understanding of only one part of it, perhaps as little as one figure, may suffice. What matters is the amount of thought you put in your project. You definitely don’t need to become perfect (it is impossible to know everything!) but you need to kinda become a specialist in this one particular paper, compared to a person reading it for the first time.

Tuesday, June 11, 2019

Specification Grading - followup

You may remember my long and somewhat bleary-eyed article about the new method of grading, called “Specification grading” or “Mastery grading” (depending on whom you ask). I wrote it mid-semester, and at that point I was convinced that it is the best grading system achievable; a method that pretty much solves all problems that we ever faced; the final form; grading as it was originally conceived in the mind of God…

Now, as the semester is over, I still think that this method is good and promising, but the picture is a bit more nuanced, and there are fewer “free lunches” coming with it than I expected.

In this follow-up, I share my impressions of specification grading, split into three parts: “Things that are great” about this method; things that are sort of “Different about it” (not good or bad necessarily, but different; something to keep in mind); and finally, things that are problematic. I finish with a hint of how I hope to further adjust my grading system next semester.

And again, if you are new to this method, read this post first:
https://khakhalin.blogspot.com/2019/04/mastery-specification-grading-and-why.html

Why specification grading is great

Students are more motivated. This is not surprising, as it is the main reason people developed this grading approach to begin with! Grading is an artificial, extrinsic motivation, which also happens to put people on a scale, classifying them into “worthy” and “unworthy” (at least in their own minds), which is known to be extremely bad for the morale:
https://medium.com/bits-and-behavior/grading-is-ineffective-harmful-and-unjust-lets-stop-doing-it-52d2ef8ffc47

The trick in fighting this mentality is of course to separate the helpful formative feedback from summative assessment. In point-based grading systems, you typically do it with a rubric, by trying to objectify student’s work as something that exists outside of their ego, and does not reflect back on their self-worth. It helps to some extent, but it’s not enough. With specification grading, you do it by marking all assignments on a pass/fail scale, which is in most cases rather permissive, and providing constructive, formative feedback in some other ways. And it seems to be working: in my anonymous course assessments, about 70% of students who filled the survey claimed that they either worked harder in this course than in comparable point-based courses, or that they learned more, despite putting in about the same amount of effort, just because they knew better, what skills and topics to concentrate on. A few students also claimed that the low-stress environment allowed them to spend more time on studying selected topics that they were genuinely interested in, which is of course great, if true. (The problem with questions like “did you work hard in this course” is of course that it’s unpleasant to say “no” to a question like that, so I bet these responses were a bit self-congratulatory. But still, students think that they were motivated! It is certainly a “yay” in my book!)

Students are less stressed. This is perhaps the most obvious benefit of this approach, and it makes teaching so much more fun, as the spirit in the classroom gets so much more cheerful and enthusiastic! 90% of students who filled the survey said that they were less (or even “way less”) stressed in this course, compared to a point-based course. One student said that they could not keep track of what topics they passed and what topics they didn’t, which made then anxious. I wonder why they wouldn’t stop by and ask during office hours, but even so, if less than 10% of students are stressed about the course, these days it counts as a huge win! High anxiety is arguably the main obstacle that prevents this generation from learning well, which makes it my personal enemy in the classroom. And specification grading does a great job in curbing it.

Students learn way more. It’s about the 7th time that I taught some variant of this course, and in the past, in a typical class graded on a point-based system, about 60% of all written tests reached a passing score, and only about 5-10% of students reached proficiency in 80% of topics or more. In this latest rendition on this same class, but taught with “Specification grading”, 88% of all tests got a passing grade, and 90% of students (!!!) reached proficiency in 80% of topics or more. This is not at all surprising, considering that they could keep trying until they passed (some students only passed their tests from a 4th or even 5th attempt!), but it is still amazing! The level at which they knew stuff by the end of the course was just plain incredible! I had actual productive conversations about synaptic transmission and plasticity, with one student after another, which never happened in the past. For first-years, and at this scale, it was simply mind-blowing!

Office hours are fun and productive, as students are forced to talk about their misunderstandings while they are trying to pass the test. That was potentially the best part of my experience with a new system: I never ever had office hours that full, and that fun before! Ever! Note that it may not be automatically true for your course: it only happened in my course because I made all retakes verbal. If a student wrote the quiz from the first attempt, they didn’t have to talk to me. If they needed to re-take a quiz however, they had to come to the office hours, and describe the topic to me, as I was asking them questions (Socratic if they fumbled; deeper follow-up questions if they were doing well). As I wrote in my previous account, it created an ideal setup for learning: if a student did not pass the test, we just ended up talking about their misconceptions and lacunae in their understanding. I could afford doing it, with 20 students in class, but of course it is not easily scalable (and actually I may not be able to repeat this experience next semester, as amazing as it was).

You get a much better picture of what exactly they misunderstand, and why. Again, this may be less true if your quizzes are always written, but even then, you get so much more information from several attempts than from a single one. With one test, you never know whether it was you, who explained it badly, or whether they lack some background info, or whether they just partied the night before the test. But if you meet several times in a row, with a dozen of different students, having open-ended conversations about different concepts you taught, it gives you an opportunity to essentially look at the course with their eyes. If I had to revamp this course, or any other course for that matter, I would have switched to this “Mastery Grading + Verbal Exams” mode for one semester before the revamp, just to collect the data! Going through this experience made me really rethink the emphasis I put on different topics within the course, just because I discovered that some of them are hard for students in ways I was never able to identify before.

You get to know students better, as you can observe their thinking, and even talk about their interests a bit. This, again, is only true if you decide to utilize your office hours for topic retakes, but at least for a small class, it is very achievable.

Despite higher satisfaction and better learning, it does not seem to inflate letter-grades. My average grade in this course was typically either a “B” or a “B plus”, and if I reverse-calculate the average grade for my “Specification-graded class”, I end up with a similar average score. The distribution seems to be a bit more narrow this time (it is harder to get a C, but also harder to get an A), but the average grade is about the same.

Neutral points to keep in mind

Specification Grading puts a stronger emphasis on tests, and makes students study for the test. It’s not really news, if you think of it; students always study for a test. Yet, if you give them an opportunity to retake the test several times, it makes this effect so much stronger! If the tests are good, it’s not a problem, but if they were kind of tangential to your course in the past; if you used them just to shake things up a bit, and make sure everybody come to class prepared, then be aware. Now they will come to the forefront of student’s minds. For example, if you test the vocabulary, in the past you could use this to check whether they did the reading. Now, they will actually put an effort in memorizing the vocabulary, which may be weird if your goal was the reading itself. My class developed a bit of a prepping vibe to it, as if its main goal was not to introduce students to the discipline, but rather to prepare them for taking higher-level classes in neuroscience. It was not that much of a problem, as it was always my goal, just I never before thought of it as my top priority. It was number three or four on the list in the syllabus, maybe, but not number one. And yet, by using old quizzes with a new grading system, I kind of inadvertently made it the top priority, which was a curious thing to realize after the fact.

A different type of students fall through the cracks. The strongest students and the weakest students fair about similarly in both grading systems, but the middle gets somewhat different. With point-based system, you are rewarding “natural talent” (whatever it means), social background (better schooling prior to the course), the ability to think quickly on-the-fly, and the ability to handle test anxiety. With mastery grading, you are rewarding persistence, and willingness to engage in potentially taxing interactions with the professor (as showing up to office hours every week may not be easy for some students). It means that you may see weak but persistent students succeed, while some really smart ones may fail, by repeatedly missing retake opportunities you offer. This may feel unusual, and almost counter-intuitive, but after some meditation on the topic, I decided that it is probably more fair that way. Or, at least, not any less fair than a point-based system.

What makes Specification Grading problematic

It may require more of your time. Again, if you always had fully loaded office hours, it would not be much of a difference, but if you were like me, with students barely showing up, now suddenly you’ll have four more hours of intense one-on-one tutoring every week. Depending on your goals, it may be great, but it is also something to keep in mind. And especially if you tend to naturally enjoy human interactions, as it may get dangerous for your productivity. There is probably a reason why Mastery Grading seems to be more popular in math and physics, compared to life or social sciences: it is relatively easy to create 5 different versions of essentially same assignment if you teach calculus or thermodynamics. Moreover, you may even have a TA to grade the work. With life sciences, it is harder, as it is quite rare that one could easily make several equivalent quizzes for the same topic. And grading writing assignments (short or long answers), while possible, is incredibly taxing, even if you do it pass/fail, and typically it cannot be delegated to a TA. Verbal exams are at least fun, but even then, the sheer amount of feedback you have to give, if every student attempts at least some assignment more than once, may be overwhelming.

Students tend to created backlogs, which puts an extra strain on last 2-3 weeks of the course. Because everyone can retake at least something, the closer you get to the final deadline for the course, the more your office starts to look like an Apple Store on a Black Friday. I am not sure what to do about it, but it is clearly a problem.

Attendance gets slightly worse (unless you take attendance, making it part of the specification, which I typically don’t). With point-based grading, my daily quizzes served as an auto-attendance taker, as if a student missed a class, they also missed a quiz. With Mastery approach, they coudl always retake the quiz, so more arrogant students just started skipping classes. I’m guessing if I made my retake policy a bit less permissive, it could have helped (some educators use “retake tokens” that can be exchanged for a retake, but that are made intentionally scarce, with only 3 or 5 tokens issued to every person).

Like every system, this system can be gamed. You have to be vigilant, as some students will try to game your grading system by deliberately failing tests the first time around, cramming for the test, once they know it, and then trying to pass them without actually understanding much, through simple rote-memorization (which is a well known approach, apparently: https://www.edutopia.org/article/allowing-test-retakes-without-getting-gamed). You can also see students come to the very door of your office with a handwritten list, quickly cram it for several more seconds, drop it on the floor, and enter your office, trying to start answering right away, while the data is still in their working memory. The trick here is to set limits either on the total number of retakes, or on the maximal frequency of retakes (I only allowed retakes once a week, which helped a lot). And for last-second-crammers, you just have to make sure to disrupt the narrative, and start with asking a few follow-up questions. After one-two attempts like that, students realize that actually honestly learning things gives a better result, and is actually easier than just memorizing answers to possible questions.

As this system is unusual, it feels less transparent. This is an unpleasant topic, but I think it needs to be mentioned. Look, as I said before, students are quite happy about this approach, and report 90% satisfaction with it, saying that they’d prefer it to a point-based system any time. However, when I asked them whether this system was easy to understand, only 50% of students said that it was easy; the other 50% said that they were “confused”, and that the point-based system felt easier, even if only because it is more familiar. If your goal is to teach well, it is not necessarily a problem. However, if your employment is dependent on student evaluations, you will almost certainly get lower scores on the “Grading criteria were clear” point of the rubric, which for some people may be a problem. Generally, for most places, it may be a good idea to try new grading systems only after tenure. Bard College is a bit more open to experiences, in this regard, and seems to value experimentation more, but I’m not tenured myself, so it is yet to be tested =)

If you make getting a “B” easy, but getting an “A” hard, it may lead to hard conversations. This is another unpleasant topic, but also worth mentioning. The problem here is that the specification grading mostly helps students in the middle of the curve: those who don’t pass all assignments on the first attempt, and are grateful for extra chances. Those students who tend to be doing well, on the other hand, may feel weird about this new unfamiliar system. Which means that, for a change, most conflicts you may have about final grades won’t be about students asking to change a D to a C, but rather about students asking to change a “B plus” to an “A”. And these conversations are just generally less pleasant, as you may have to negotiate with students who are more entitled, more pushy, and better equipped to work their way through the academic hierarchy. Moreover, the problem is further complicated by the fact that from the pedagogical point of view, it makes lots of sense to make “top-level” assignment, that separate “decent B-level work” from “excellent A-level work”, a bit more creative and ambitious. It totally makes sense, if you want to teach better, to tailor your assignments to your student’s needs, and to encourage them to step outside of the basic framework of the course. But it also means that your stronger students will feel uniquely challenged, compared to other assignments in the class. In a “normal” course, it rarely presents a problem: students never actually know how hard the final will be, and what curve you will use, but somehow it is a part of an accepted status quo. However, in a course that feels unusual, and is intentionally designed to be extremely transparent, this final assignment may suddenly feel “unfair”, just because it is surrounded by sparser scaffolding, compared to “basic assignments” that guarantee a “C” or a “B”. There are of course ways to help with this problem, but even so, be aware that “I deserve an A” situations are inherently more taxing for educator’s psyche than “I deserve to pass” ones.

Summary, and further plans

Will I personally stick to the specification grading next year? Yes and no. Look, while in my original write-up I used “Specification grading” and “Mastery grading” interchangeably, actually “Mastery” is just a subset of “Specification”, and I expect that going a bit broader than “pure Mastery” can help with the workload a lot. For example, you might have read about that professor who refused to give fine-tuned letter grades, but instead gave everyone a default of a “B”, and required deeper extra work for an “A”: https://www.thecrimson.com/article/1975/5/14/a-quiet-act-of-impiety-prichard/

While the article presents it as a revolutionary rebellion, essentially they just used a 2-level specification system, in which some basic level of engagement resulted in one grade, and honest extra effort was needed for the other. And I suspect that in life sciences, it may be better to shift from math-style “mastery grading” towards this more relaxed approach. This would allow to retain the best part of the “specification” model: that instead of trying to give a holistic assessment of every student’s “worth”, we would put the responsibility on students themselves, allowing them to set their personal goals, and choose the amount of work they want to do for the course. But I am less sure that I need to keep the “train them until they reach a bar” mindset. It works well in some courses (maybe in core courses, like chemistry or genetics, it would actually make lots of sense), but in expository courses (both 100s and 300s) it may be too restrictive.

In fact, I have already sketched a new version of my syllabus, for next semester, but I hope to find time to write a separate post about it. So stay tuned!

Tuesday, April 9, 2019

Mastery (specification) grading, and why you should try it

Students come up during our Friday class, ask me for a completely optional voluntary quiz, I hand it to them, then they say “thank you!” An unbelievable occurrence that happens all the time with Mastery grading. -- (@katemath)

This semester I tried a new approach to grading, called “Specification grading” (also known as Mastery-Based grading), and honestly, I think it is the best thing I’ve tried in my teaching career, ever. It is so good, it pains me to think that I haven’t discovered it earlier, and honestly I don’t understand why all faculty in the world haven’t switched to it yet. It’s not even “revolutionary”, it is just how grading should be. How it always meant to be.

So now, as I hopefully grasped your attention with these empty superlatives, let me explain how specification grading works in practice. It may sound a bit confusing and underwhelming at first, so I’ll start with the definitions, then provide a rationale for them, and then will show how I adjusted my courses to this grading scheme, as an example.

There are three main principles:

Instead of assigning grade-points, and then calculating letter-grades based on these points, you define a set of criteria they need to meet to get an A. You also describe what share of these criteria students would need to meet to get a B, or a C. Some people call these criteria “bundles”, some call them “standards”, or “benchmarks”.
You give students several opportunities to meet each of these criteria. It can mean that they are allowed to take a test several times, until they pass. Or maybe they can rework and improve a piece of writing several times, until it is good enough. But that’s the key point: you don’t grade HOW and WHEN they arrived at a given level of mastery; you only care WHETHER they demonstrated this level of mastery before the course ends.
As your time is limited, and students can now try every assignment several times, you cannot afford to grade individual assignments on a point-based system: you either accept an assignment as “good enough” (if it passes your benchmark), or you say “try again”, and provide targeted feedback. Crucially, this feedback is completely uncoupled from the final grade.

This list may, at first, seem rather flat for the grandiose claim with which I started this letter. So, why does it work exactly? Here are the benefits, compared to a point-based system:

- Students can try to pass a benchmark several times. It means that they keep practicing the task, which is of course the best way to learn!
- It also means that they have less anxiety (and as you have surely heard by now, anxiety is the biggest issue modern students face, for systemic / societal reasons). Imagine how liberating it is for a student to know that, at least in principle, in your course there are no irreversible mistakes. Given time, they can fix everything. If they are sick on a day of an exam, if they have test anxiety, they know that there’s a safety net. They can try again. They still need to plan their work (as you won’t accept it after the course is over), and they still need to put the effort in, but this system is infinitely easier on the nerves.
- There may be more than one way to pass a benchmark, which is great for inclusion. Say, you can decide to use a timed test for the first attempt, a short essay on a matching topic for the 2nd attempt, and an alternative between a longer take-home essay or a short verbal exam for all subsequent attempts. It’s up to you how you define the benchmarks! But this way if a student has a test anxiety, or an attention deficit, even if undocumented (which is common for students from traditionally underrepresented groups), they can always use this extra flexibility to meet your criteria.
- This improvement in students’ well-being is good for you, as now you don’t have to meet with sad anxious people, or entitled grade-grubbers, justifying each point on each quiz. If they think that you misunderstood them, they are always welcome to try again, and make sure that this time they are easy to understand. (Of course, some extreme students may still try to extort grades from you, but even then, this type of grading is really not conducive to grade-grubbing)
- It is also good for you, as it makes your grading easier, and almost painless. You no longer have to obsess about “marginal cases”. As any assignment can be retaken, it is much easier, psychologically, to say “Try again! This little thing is missing, but I’m sure you can get it right next time!” In fact, you don’t even have to say “try again”: you may choose to go with “fix this one thing, and it will be a “pass”. You no longer have to be obsessively specific, documenting and justifying every little point that you give or take away. The only thing that matters is that you give some good, pointed, clear feedback about those few things, or maybe even that one thing, that matters.
- For writing assignments that go through several revisions, the process of meeting the criteria becomes more of an interaction; similar to how we work with reviewers, when submitting a research paper to a journal. You tell students what they need to improve, if they want you to accept their work. You can also require that they make it easy for you to see and assess these changes (you can ask them to bring the previous copy with them, or “track changes”, or submit a list of responses - whatever works for you). And then you only recheck this one part that they improved; you don’t have to re-read the entire thing.
- As a side-effect of this “benchmark” approach, you may actually make some bundles optional, and required for an “A” only (but say, not for a “B”). Now it’s up to the student whether they want to take the challenge. This moves the initiative in learning to the student; makes them “buy” into the assignment, and commit to it. It improves their learning, as now they own it!
- And incidentally, it makes life a bit easier for you (which is important, as some of the points I described above could make your life a bit harder). If a student doesn’t want to attempt an assignment, as they are fine with a “B”, they just don’t. It means that you have less work to grade. And it is critical, as you have more to grade for those students who want to give another assignment another try!

How to convert an existing course to a specification grading system? Here’s my process:

Identify key criteria your students need to meet. Some of them may come from skills you teach (writing, presentation); some may represent different conceptual topics of your course (cell division, species diversity). Enumerate them.
For each criterion, define how you can tell whether a student “got it”. Try to find a test that may be repeated several times. Say, when teaching math, a professor may create 5 variants of an assignment, with 4 problems in each of them, and define “success” as solving 3 problems out of 4 (in which case “retake attempts” would not be not infinite, as one could only try each test 5 times, but it is still a lot!). Or you can make students work on a piece of writing, with a rubric, until they meet 9/10 of criteria of the rubric. It’s up to you how you define it; the only two things that matter is that students understand what they need to do, and that they can have several tries at it.
Once you defined an “A”, define how a “B” and a “C” would look like. You can define pluses and minuses as well, if you want to, or you can just state on your syllabus that pluses and minuses are reserved for intermediate cases. 4. Plan your assignments in a way that would give students time to attempt them. Say, you can no longer have a final exam on the completion week. But you can use the completion week to give students an extra opportunity to get passes on some of the prior topics.

Now, some examples. I am teaching my “Introduction to Neurobiology” on a specification-based grading system this semester, and it works marvelously. Here’s an excerpt from the syllabus:

Course material consists of four "Bundles":

Core bundle (weekly reading reflections and practical homework).

Knowledge bundle (tested via quizzes).

Lab bundle (lab work and lab reports)

Depth bundle (paper reviews)

Depending on what end-course grade you want to get, you need to complete different bundles to different levels. Your grade is in your hands!

End-course grade	Core	Knowledge	Lab	Depth
A	not more than 1 reflection missed	All quizzes completed	Good lab work, all lab reports	4 paper reviews
B	up to 2 reflections missed	90% of quizzes completed	Good lab work, 80% of lab reports
C	up to 4 reflections missed	70% of quizzes completed	50% of lab reports
D	up to 5 reflections missed	50% of quizzes completed

All individual assignments are graded on a pass/fail (or rather, “pass / try again”) basis. There are no points, and no partial credits; you just need to pass. Simple!

The core bundle is easy, but it is the only one that is unforgiving: you need to submit your short responses (typically, 2-3 sentences) on time, before 9 am the day the assignment is due (usually Monday). Make sure to set aside time for it, and put it on your schedule. That is the only one that cannot be redone.

For all other bundles, you can have several attempts to pass them. If you miss or fail a quiz, you can try again during allotted times in class (there will be one opportunity before the spring break, and one more during the completion week). Better yet, you can come to drop-in hours, or schedule a visit (send me an email), and pass this topic as a short verbal exam (conversation). You can have as many tries as you need; the only limitation is that you can only give it a try once a week, so if you tried but haven't passed, you need to have a week-long cool-down.

For lab work, you need to be present, and meaningfully engaged (e.g. not texting in a corner, but organizing your team, building things, taking notes, collaborating, logging and analyzing data etc.)

For lab reports and paper reviews, it is expected that you may have to go through several iterations. I will provide feedback, and request revisions, until your work passes.

The description of the paper review assignment will be shared separately.

No work will be accepted after the last day of Completion week.

Pluses and minuses will be used for intermediate cases, at the discretion of the instructor. The easiest way to get a plus is to go above the minimal set of requirements for a grade.

In practice, for the “Knowledge” bundle, which corresponds to different conceptual topics we study (such as action potential, synaptic transmission, retinal circuitry etc.), I give students several attempts to pass them, and all of these attempts are slightly different. First they have a 10-min “pop quiz” in class. If they aren’t successful with it right away, they can either come to my office hours and have a verbal micro-exam, or they can wait till the special class time (15 min set aside during the lab time) when I make them provide a long answer in class. The verbal exam idea worked particularly well for me: for the first time ever, I actually have students come to my office hours, and these office hours are uniquely productive! I ask them to explain me a topic. If they are faltering, I help them. If they manage to explain the topic with only 1-2 prods, they earn a pass. If not, (and that’s the surprisingly great part!) our discussion just naturally evolves into a productive tutoring session. We transition right from their answer into a discussion about what they miss, or misunderstand, or need to express better, and then I tell them to come again in a week, and give it another try.

Only imagine how pleasant it is to meet with a student who “failed” a topic back two months ago, but who can now explain this topic to you reasonably well, even though you haven’t recently referred to it in class! It means that they went back to the material, studied it, and figured it out. They are happy, you are happy, it’s a win-win!

Does this approach necessarily make the course too easy? No, because now you can reasonalbly expect a bit better level of understanding! My mid-term results were 30% “B”; 40% “B”, 10% “C”, and 20% “D”, which is pretty similar to my grade distributions in prior years, when I used point-based grading. But who got As and Bs this time was slightly different: people who would typically be B students, but who persevere, got an A, while some smart but scattered students got a B, or even a “C”. At this point, all students can still get an A if they want to, but they need to put some effort if they want to earn it. (And in fact I had a bout of activity immediately after the spring break, as students received mid-term grades, didn’t like it, and closed some of the gaps, so current running grades are actually a bit higher than the mid-term grades).

Does this approach increase the work load on the faculty? I want to tentatively say “no”; at least, not if you assume that office hours were already spent meeting with students, before you switched to this method, and also assuming that you don’t overcommit. It was an increase for me personally, as I almost never had students attend my office hours before, and now they are packed, but it is still contained within about 2-3 hours a week, so it seems doable. It also makes planning office hours more important, as this approach essentially makes office hours attendance semi-required for a student to succeed. Which means that if a student has a time conflict, you may have to meet with them outside of you normal office hours, which may be hard sometimes. But that said, grading got so much easier, and (crucially) it is no longer painful, or emotionally draining, that it is still a net win. You know that you don’t close the door of opportunity, which gives you a moral permission to be a bit more demanding when working with “gray zone” answers, which is a huge relief, and makes the process almost pleasant.

The only word or caution here: there may be students who, despite all your statements and encouragement, both in class and in the syllabus, would be hesitant to come to office hours, especially if your standard drop-in hours don’t work with their schedule. They would be ashamed to be a burden, so they’ll just sit there, waiting for in-class opportunities to retake the quiz, but they won’t be proactive about it. It may be a personality issue, or an issue of culture, or both (think socioeconomic status, gender, introversion), but either way, if one needs to come to your office to retake the quiz, make sure that you demand from each student that they show up to office hours at least once. And then talk to them. Make sure they believe you when you say that retaking a quiz is part of the deal, and not some extra favor they have to ask for.

Finally, as another example, here’s what I plan to do with my Statistics class next fall. For topical goals, I’ll still use problem quizzes, but will prepare several versions of them. While verbal exams are great, because stats are closer to math, I think problems could work better. For skills (data wrangling, R coding, visualization, figure captions), I’ll introduce several in-class data workshops that would each count towards “closing” a topic. This would be a change to my current process: in the past, I offered all workshops as take-home assignments; graded some of them, and provided formative feedback on the rest. All workshops were submitted openly but anonymously (every student could see every other student’s submission), and the final exam was optional (if you liked your running grade, you could skip it). Now I will have to split all workshops into two types: those that are take-home, and those submitted by the end of class. (I still plan for both types to be submitted openly and anonymously). Take-home assignments will count towards “participation” (every student needs to submit something vaguely reasonable), while n-class assignments will serve as mid-terms. Except that these mid-terms will be “anticumulative”: if you proved that you know topic A, you don’t have to polish this part of an assignment anymore, you can concentrate on the topic B part of it. (I still need to put some thought into making it transparent to students, but it seems doable).

To sum up, I think specification grading (aka Mastery grading) is obviously better than the point-based grading: both for the students, and for the faculty. It seems to improve learning; it is more fair and inclusive, and it is unexpectedly pleasant to use! So I definitely encourage you to give it a try!

References

Kate Owens. A Beginner’s Guide to Standards Based Grading (A practical blog post) https://blogs.ams.org/matheducation/2015/11/20/a-beginners-guide-to-standards-based-grading/

Sadler, D. R. (2005). Interpretations of criteria‐based assessment and grading in higher education. Assessment & evaluation in higher education, 30(2), 175-194. https://uncw.edu/cas/documents/sadler2005.pdf

Carberry, A., Siniawski, M., Atwood, S. A., & Diefes-Dux, H. A. (2016, June). Best practices for using standards-based grading in engineering courses. In Proceedings of the 123rd ASEE Annual Conference and Exposition, New Orleans, LA.
https://www.asee.org/file_server/papers/attachment/file/0006/9169/SBG_ASEE_final_submitted.pdf

A giant open repository of materials on Mastery-based grading (curated by Dr. Rachel Weir, Allegheny College):
https://drive.google.com/drive/folders/1GNSqfOb0LZS6BeAuc1tqPDZWKkPk11KT

Nilson, L. (2015). Specifications grading: Restoring rigor, motivating students, and saving faculty time. Stylus Publishing, LLC. (Book)