This blog was kindly contributed by Dennis Sherwood, one of the UK’s leading experts in organisational creativity, innovation, systems thinking and system dynamics.
“There’s something missing!”, he said, setting down his briar.
His companion peered over his newspaper. “How interesting! Another non-barking-dog case?”
“Alas, no. But something that should be there isn’t. I wonder why…”
“As ever, your thought processes elude me. Please be so kind as to explain.”
“I’ve been looking at Ofqual’s statistics on reviews of marking for this summer’s school exams in England. There are 16 spreadsheets and over 5,000 numbers, but one particular statistic isn’t there. That said, there are some items of interest. Did you know that 1.1% of this year’s GCSE grades were changed after being challenged?”
“As a matter of fact, I didn’t. That’s jolly good, though, isn’t it? The other 99% must therefore be right, so that shows just how excellent our exam system is!”
“A howler, my good man! A howler of the first magnitude!”
“Why so? I saw such a claim on an exam board’s website some time ago, and I remember hearing the Minister for Schools himself imply this in an interview on the Today programme the morning this year’s GCSE results were announced.”
“But did you also hear the rebuttal on the following day’s More or Less?”
“No, I don’t think I did. But I still do not follow your reasoning.”
“This one truly is elementary, my good fellow. The key point is that a grade can be changed only if it has been challenged. I’ve studied Ofqual’s statistics in detail. This year, 5,199,335 GCSE grades were awarded, and 56,680 were changed.”
“I know my arithmetic’s a bit rusty, but even I can spot that 50,000 is 1% of 5 million. So that tallies with 1.1% of grades being changed.”
“Yes. But only 279,925 grades were challenged.”
“Ah! Good point. Yes… that implies that about 1 challenge in every 5 resulted in a grade change. So those particular grades, as originally awarded, must have been wrong. That presents a very different picture, casting considerable doubt on the reliability of grades in general.”
“Indeed. But the most important point is that 4,919,410 grades – that’s about 95% of the total number of grades awarded – were not challenged. So no one knows whether those grades are right or wrong. No one has looked.”
“But surely any candidate who is concerned would raise a challenge, so the likelihood that there are as-yet-undetected grade errors within those 4,919,410 unchallenged grades must be very small.”
“Must it? I wonder. There are many barriers to raising a challenge, not least the fee. It could be that every one of those 4,919,410 unchallenged grades is wrong.”
“I agree. But so is the contrary hypothesis that all the 4,919,410 unchallenged grades are right.”
“I see… Yes… it’s most unlikely that all those 4,919,410 unchallenged grades are wrong, and also most unlikely they are all right. So how many would have been changed, had they been challenged? How reliable are the grades awarded each August?”
“Precisely! No one knows! That’s the missing statistic!”
Each year Ofqual publishes over 5,000 statistics on challenges and reviews of GCSE, AS and A level grades. But the most important – those that measure the reliability of the awarded grades, analysed by subject, by level and by exam board – are nowhere to be seen.
In the past, it might have been argued that these statistics were not available. But in November 2018, Ofqual published the reliabilities of the grades for 14 subjects. As discussed in a previous HEPI blog, for maths, the reliability (averaged across all levels and all marks) is about 96%; for biology, about 85%; geography, about 65%; and history, about 56%. Ofqual knows how to determine these measures. So why are they not routinely published?
Especially so in the light of Ofqual’s two-paragraph announcement posted to their website on 11th August 2019 in response to a front-page article on that day’s Sunday Times. Within the first paragraph we read “…more than one grade could well be a legitimate reflection of a student’s performance…”.
“More than one grade”. But only one grade appears on the certificate. What other grades might also be “a legitimate reflection of a student’s performance”? Why aren’t they declared? How many are there? Are some of them higher? In which case, how much higher? How reliable is that single grade that appears on my certificate? Why are grades unreliable? For how long has this been happening? And – most importantly – what is being done to ensure that the assessments awarded to students are both reliable and trustworthy?