The mystery of the missing statistic

This blog was kindly contributed by Dennis Sherwood, one of the UK’s leading experts in organisational creativity, innovation, systems thinking and system dynamics.

PART ONE

“There’s something missing!”, he said, setting down his briar.

His companion peered over his newspaper. “How interesting! Another non-barking-dog case?”

“Alas, no. But something that should be there isn’t. I wonder why…”

“As ever, your thought processes elude me. Please be so kind as to explain.”

“I’ve been looking at Ofqual’s statistics on reviews of marking for this summer’s school exams in England. There are 16 spreadsheets and over 5,000 numbers, but one particular statistic isn’t there. That said, there are some items of interest. Did you know that 1.1% of this year’s GCSE grades were changed after being challenged?”

“As a matter of fact, I didn’t. That’s jolly good, though, isn’t it? The other 99% must therefore be right, so that shows just how excellent our exam system is!”

“A howler, my good man! A howler of the first magnitude!”

“Why so? I saw such a claim on an exam board’s website some time ago, and I remember hearing the Minister for Schools himself imply this in an interview on the Today programme the morning this year’s GCSE results were announced.”

“But did you also hear the rebuttal on the following day’s More or Less?”

“No, I don’t think I did. But I still do not follow your reasoning.”

“This one truly is elementary, my good fellow. The key point is that a grade can be changed only if it has been challenged. I’ve studied Ofqual’s statistics in detail. This year, 5,199,335 GCSE grades were awarded, and 56,680 were changed.”

“I know my arithmetic’s a bit rusty, but even I can spot that 50,000 is 1% of 5 million. So that tallies with 1.1% of grades being changed.”

“Yes. But only 279,925 grades were challenged.”

“Ah! Good point. Yes… that implies that about 1 challenge in every 5 resulted in a grade change. So those particular grades, as originally awarded, must have been wrong. That presents a very different picture, casting considerable doubt on the reliability of grades in general.”

“Indeed. But the most important point is that 4,919,410 grades – that’s about 95% of the total number of grades awarded – were not challenged. So no one knows whether those grades are right or wrong. No one has looked.”

“But surely any candidate who is concerned would raise a challenge, so the likelihood that there are as-yet-undetected grade errors within those 4,919,410 unchallenged grades must be very small.”

“Must it? I wonder. There are many barriers to raising a challenge, not least the fee. It could be that every one of those 4,919,410 unchallenged grades is wrong.”

“That’s preposterous!”

“I agree. But so is the contrary hypothesis that all the 4,919,410 unchallenged grades are right.”

“I see… Yes… it’s most unlikely that all those 4,919,410 unchallenged grades are wrong, and also most unlikely they are all right. So how many would have been changed, had they been challenged? How reliable are the grades awarded each August?”

“Precisely! No one knows! That’s the missing statistic!”

PART TWO

Each year Ofqual publishes over 5,000 statistics on challenges and reviews of GCSE, AS and A level grades. But the most important – those that measure the reliability of the awarded grades, analysed by subject, by level and by exam board – are nowhere to be seen.

In the past, it might have been argued that these statistics were not available. But in November 2018, Ofqual published the reliabilities of the grades for 14 subjects. As discussed in a previous HEPI blog, for maths, the reliability (averaged across all levels and all marks) is about 96%; for biology, about 85%; geography, about 65%; and history, about 56%. Ofqual knows how to determine these measures. So why are they not routinely published?

Especially so in the light of Ofqual’s two-paragraph announcement posted to their website on 11th August 2019 in response to a front-page article on that day’s Sunday Times. Within the first paragraph we read “…more than one grade could well be a legitimate reflection of a student’s performance…”.

“More than one grade”. But only one grade appears on the certificate. What other grades might also be “a legitimate reflection of a student’s performance”? Why aren’t they declared? How many are there? Are some of them higher? In which case, how much higher? How reliable is that single grade that appears on my certificate? Why are grades unreliable? For how long has this been happening? And – most importantly – what is being done to ensure that the assessments awarded to students are both reliable and trustworthy?

3 comments

Helen Poisson says:
27th December 2019 at 13:05
Good article Dennis! Thank you for sending it to me. My eldest grandson, is 15 now and expected to do well next year. We shall not hesitate….
Glen Thomas says:
11th January 2020 at 15:08
The result of improving the precision of exams will naturally mean longer exams — it’s in the nature of sampling a student’s abilities that the more you measure the better the average measurement. Do we really want to move to twice as many papers per subject, just to satisfy some arbitrary target of precision? It is naïve to expect that a single grade can capture a child’s entire appreciation and skill in a subject, so how precise does the rough measure need to be?
(The research referred to also doesn’t mention marking ‘errors’, only differences from an assumed ‘true’ grade which may or may not exist, especially for the more subjectively markes subjects.)
Dennis Sherwood says:
16th January 2020 at 17:17
Hi Glen, thank you. You mention ‘precision’. I agree with you that to achieve this in exams is very difficult, for it requires convergence on the ‘right’ mark, a most slippery concept. A more achievable objective is ‘reliability’ – this being the likelihood that the originally-awarded grade is confirmed after a fair re-mark.
Currently, GCSE, AS and A level grades are certainly not precise, and have reliabilities (depending on the subject and the given mark) which can be as low as around 30%.
As discussed in another blog (https://www.hepi.ac.uk/2019/07/16/students-will-be-given-more-than-1-5-million-wrong-gcse-as-and-a-level-grades-this-summer-here-are-some-potential-solutions-which-do-you-prefer/), there are many ways in which the reliability can be increased to a number approaching 100%, for all subjects and all marks. And although the grade is still not ‘precise’, the fact that it is reliable is important: it means that the grade can be trusted, and that there is no nagging question “I wonder what might have happened if another examiner had marked my script?”.

3 comments

Leave a Reply Cancel reply