Yes, the grade reliability problem can be solved

This guest blog from Dennis Sherwood marks the third in a series about grade reliability.

“Have you read that blog about one grade in every four being wrong?”

“Yes. I have. I had no idea that the results of school exams were so unreliable. And so variable by subject too – 4% wrong for Maths, 44% wrong for History. There’s another blog too, showing the evidence that scripts marked at grade boundaries have only about a 50% chance of being awarded the right grade, even for Maths.”

“Really? I haven’t seen that one. I’ll look it up. I’ve always known grades were a bit wobbly, but I’d thought because of bad marking. So I was surprised that the cause is not sloppy markers, but a consequence of the fact that you and I might not give exactly the same mark to the same essay.”

“Yes. If I give 59 and you give 60, just on the other side of a grade boundary, the candidate gets a different grade. That can’t be fair.”

“No. It isn’t. But it happens, and seems to happen a lot more often than I’d thought. That concept of ‘fuzzy marks’ explains a lot. So obvious!”

“Yes. But only with hindsight – I wish I’d have thought of it! But if the problem is fuzziness, then the solution is to eliminate it. That’s easy. Instead of writing essays, the students could do unambiguous – but still tough – multiple choice questions, like they do in America. There’s only one right answer, and a computer can do the marking, which makes everything faster, and cheaper too. Problem solved!”

“Maybe. But I don’t think that does much for teaching and learning.”

“Probably not – there are always trade-offs. But if it’s a choice between essays and multiple choice, and if the outcome of multiple choice is fair and reliable grades, then perhaps people might prefer multiple choice.”

“Oh no! Not another referendum!”

“I didn’t mean that! But the policy on how the results of exams are recorded on candidates’ certificates is important, and the current policy, with one grade in four being wrong, just can’t be right.”

“I agree. It can’t be. But something you just said has got me thinking. You said …if there’s a choice between essays and multiple choice… . That implies that there are only two choices. That can’t be true. There must be other possibilities.”

“Other possibilities?”

“Yes. What about this? We know marks are fuzzy – if I mark a script 59, you might give 60, someone else 58, whatever. Suppose we can discover that the lowest mark an examiner might give is 57, and the highest 61. That must be possible for a statistician to do. Today, the grade is determined by the mark I give, 59. Why not base the grade on the highest mark an examiner might give, 61?”

“Highest mark? That’s crazy!”

“Is it? Perhaps not so crazy if the candidate were to appeal, and be re-marked 61, resulting in an up-grade anyway. And not so crazy when you think of all those candidates who don’t appeal, and are stuck with a grade lower than they deserve. Why not give candidates the ‘benefit of the doubt’ in the first place?”

“Mmm … I see what you mean … But doesn’t that drive grade inflation?”

“That’s a good point, but I don’t think so. Grade inflation happens year-on-year; this is a once-only policy change, a re-calibration, a change in the base-line – just as happened when GCSE grades were changed from A*, A, B…to 9, 8, 7…”.

“Yes, I see what you mean. But you trigger another idea too. What about basing grades on the lowest possible mark, 57? The would ensure that no one would be awarded a grade as a result of being lucky – everyone would certainly make the grade.”

“That’s a really good idea, especially for exams in brain surgery – I’d like to be assured that my brain surgeon was fully qualified, and hadn’t been lucky enough to have been given the ‘benefit of the doubt’!”

“Yes – and the same applies to exams in gas fitting, and electrical maintenance too. So let’s hope they do this for the new T-Levels.”

“And the driving test. But for GCSE Geography, why not give candidates the ‘benefit of the doubt’?”

“Why not? And this idea also makes the appeals process fairer too. If a candidate is given a mark of, say, 59, but the grade is awarded on, say, the ‘high’ mark of 61, then all marks between 57 and 61 have been taken into account. So if the candidate appeals, and if the re-mark is, say, 60, this possibility has already been recognised, and so the original grade is confirmed. That makes sense – and that’s why this results in much more reliable grades.”

“Yes. It looks like solving the grade unreliability problem isn’t just an either/or decision between what we do now and multiple choice. Basing grades on the ‘high end’ or the ‘low end’ are two further possible solutions.”

“Yes, and there must be others, too. I wonder what they are?”

“Well, here’s one. Why have grades at all? How about having the certificate show the given mark, 59, and also a measure of the range – in this case, from 57 to 61?”

“What an intriguing idea!”

The current policy of basing grades on a single mark is badly flawed. There must be several other possibilities, possibilities that result in assessments that are reliable and fair, and that do not damage teaching and learning. There needs to be a thorough, and independent, project to identify what those possibilities are. And given that no solution can ever be perfect, once they have all been identified, each (including the option of maintaining the status quo) must then be evaluated wisely so that the very best solution can be chosen and then implemented. Our young people deserve nothing less.

Share this:

Leave a Reply Cancel reply