This blog on last year’s results fiasco has been kindly written for HEPI by Dennis Sherwood, whose past articles for HEPI can all be found here.
***DON’T FORGET TO BOOK YOUR PLACE AT THIS YEAR’S HEPI ANNUAL CONFERENCE, TAKING PLACE THIS THURSDAY, WHICH IS KINDLY SPONSORED BY LLOYDS BANK AND UPP AND WHICH WILL INCLUDE THE LAUNCH OF THE HEPI / ADVANCE HE 2021 STUDENT ACADEMIC EXPERIENCE SURVEY***
History, they say, is written by the winners. But occasionally, a loser gets a look in too, usually along the lines of ‘But it wasn’t my fault!’.
Cue the report Is the algorithm working for us? Algorithms, qualifications and fairness, published on 14 June 2021, and written by Roger Taylor, who was Chair of Ofqual, the regulator of school exams in England, before, during, and for a shortish while after, the summer 2020 exam grade fiasco.
This report, without doubt, deserves a ‘teacher assessed grade’ of A* for this summer’s A level in Reputation Management, as awarded in accordance with the grade descriptors defined by the Joint Council for Qualifications for A level History (Ancient) – in particular the requirements to ‘demonstrate relevant and accurate knowledge and understanding’ and to ‘reach reasoned … evidence-based conclusions about historical events’.
So let’s examine some evidence relevant to the report’s Appendix, entitled ‘Explaining the Ofqual decision-making process’. This process, without doubt, is important, and Mr Taylor writes that:
Ofqual put forward two possible ways forward that were consistent with its primary objective: hold exams in a socially-distanced environment or, alternatively, use ‘non-qualification’ leaving certificates to issue grades, while making clear they were not equivalent to A-level grades.
The view of the government was that neither approach recommended by Ofqual would command public confidence.
My understanding of these statements is that the Government rejected the two proposals recommended by Ofqual, and that the fatal decision to use the ‘mutant algorithm’ was the Government’s alone. This paints Ofqual, and by implication its Chair, in a benevolent light.
Is this an ‘accurate understanding’ of ‘historical events’?
To answer that, we need to find an original source. Let me suggest looking, in the first instance, at the transcript of the hearing of the Parliamentary Education Select Committee hearing of 2 September 2020, at which Roger Taylor was a key witness.
In response to Question 984, one of Mr Taylor’s colleagues stated that Ofqual had offered the Department for Education its advice on the options as to how summer 2020’s grades might be awarded. This led the Committee to request sight of what those options were, resulting in the disclosure on 9 September 2020 of an ‘official sensitive’ Ofqual document, dated 16 March (that’s two days before Boris Johnson’s announcement that the summer 2020 school exams would not take place), entitled ‘Contingency planning for Covid-19 – options and risks’.
Study of this primary source shows that Ofqual’s preferred option, Option A, was ‘to continue with business as usual, with the exam timetable operating as published but with additional papers prepared as a contingency in a small number of subjects’, so validating the first part of the statement just quoted.
The second part, however, is more troublesome, for the paper discusses a total of 11 options, of which three were short-listed, and the remaining eight ‘not presented in the main paper because they were less likely to meet our objectives’. Or, in simpler terms, dropped. And at the very end of the reject list is Option K, ‘Issue a standardised leaving certificate’, two of the ‘arguments against’ being:
Schools are also likely to expect a refund of exam fees.
This would call into question the future of GCSEs.
Most prescient. And yes, the option of a leaving certificate had been considered by Ofqual. But discarded. So it is ‘interesting’ that this thrown-in-the-bin possibility is highlighted in Mr Taylor’s report as being ‘recommended by Ofqual’, but rejected by Government.
There was another option that Mr Taylor’s report does not mention at all:
Option C: Issue grades based on teacher estimates which have been statistically moderated at a centre / cohort level to bring them into line with previous years’ results.
That seems to me to be quite a good, if brief, description of what actually happened, and this option was on Ofqual’s short-list of three. Yet Mr Taylor implies that the ‘mutant algorithm’ was an ex cathedra imposition from on high.
It is of course possible that Ofqual had been heavily influenced in the days prior to 16 March, and that Option C originated elsewhere. I don’t know. But even if that’s what did happen, Ofqual’s document of 16 March suggests that the use of an algorithm had Ofqual’s approval and endorsement, even if it was not their first choice of ‘exams as usual’ – which, after all, is not a surprising first choice for an organisation existing solely in the context of exams.
Before any algorithm can be used for any purpose, it is of course essential that there is proof that it delivers trustworthy outcomes. Mr Taylor discusses this in a section entitled ‘The problem with bias and accuracy’, in which he states that:
The problem of accuracy in this much larger number of results was known from the outset. Ofqual raised the problem publicly in its consultation documents in the spring and at its summer symposium in June. It explained why lowering grades through moderation would leave many candidates with lower grades than they would have got in an exam, while others would get higher grades. Unfortunately, there was no way of knowing who they were and so there was nothing that could be done about it.
To me, this reads as if Ofqual not only performed a public service in flagging an important problem, but did so well before the announcement of the A level results on 13 August.
Ofqual’s consultation was indeed published in the spring, on 15 April, but the issue of accuracy is discussed much more in relation to teacher marking and assessment rather than the algorithm. And yes, some limited information about the operation of the algorithm was presented at Ofqual’s summer symposium. But this took place not in June but on 21 July 2020, just three weeks before the A level results came out.
I mention that because one of the presentation slides was featured in a HEPI blog dated 26 July. This attracted 86 comments, including several from Huy Duong, whose subsequent analysis of that slide’s data led him to estimate that nearly 40% of A level grades would be down-graded. This prediction was reported in the Guardian on 7 August, and turned out to be correct. If Huy Duong was able to do that on his own initiative from the fragmentary information available to him on 21 July, then surely Ofqual could have done much more, much sooner.
Furthermore, it wasn’t until 13 August that Ofqual revealed that throughout the summer they had been testing the algorithm against historic results they knew to be only 75% reliable. No wonder there ‘was a problem with accuracy’. I am therefore singularly unconvinced by Mr Taylor’s statement that ‘there was nothing that could be done about it’.
I must also take (great) issue with Mr Taylor’s claim that there was:
… broad consensus in advance that it was the right thing to do … The consensus crossed party lines: Labour in Wales, SNP in Scotland, Conservatives in England and the Northern Irish administration all supported the approach. Teachers’ leaders, universities, schools and colleges also supported the approach. Even students, in advance of the results, could understand why it seemed the sensible thing to do. When a misjudgment happens on this scale it warrants reflection. How could quite so many people be so wide of the mark?
So many people were ‘so wide of the mark’ for a very simple reason. There was no ‘mark’.
That is because no one knew what the algorithm was going to do, or how it was going to do it. Myself included. In March, when I read that exams were cancelled and that teachers were being asked to submit grades based on their expert judgement, I assumed there would be a process to check a school’s submission against its history, for which an ‘algorithm’ would, of necessity, be used. This would identify outliers, prompting the exam board to engage in a dialogue with the school accordingly. But by May, I came to realise that my assumption was wrong, for there were hints that Ofqual was using not a simple sense-checker but a much more complex algorithm intended to predict each individual’s grades. It could be argued, validly, that I was part of the original ‘broad consensus’. But that was based on my making a totally false assumption, itself based on minimal information.
That might have been all my fault. But let me point out that I was not alone in the dark. On 11 July 2020, the Education Select Committee published a reportpresenting their findings to that time as regards ‘the fairness, transparency and accessibility of this year’s exam arrangements’ . This is paragraph 28:
Ofqual must be completely transparent about its standardisation model and publish the model immediately to allow time for scrutiny. In addition, Ofqual must publish an explanatory memorandum on decisions and assumptions made during the model’s development. This should include clearly setting out how it has ensured fairness for schools without 3 years of historic data, and for settings with small, variable cohorts.
To me, that’s a clear, and direct, instruction. With which Ofqual refused to comply. Point blank. So we all had to wait for A level results day, 13 August, to see the details of the algorithm. If that information had been made public in March, when the so-called ‘broad consensus’ was allegedly built, my belief is that the more likely outcome would have been an uproarious ‘NO!!!’.
As regards appeals, words fail me, for I just don’t know what to say in response to this statement in Appendix 1:
People are understandably mystified as to why Ofqual allowed some results to be awarded knowing that they would need to be changed on appeal. The reason for this was very strong legal advice that to make changes in advance of the award would quite likely result in the whole approach being rejected by the courts following one of the many judicial reviews that a number of law firms planned to request.
Well, ‘mystified’ might be one word; nor am I enlightened by this legalistic explanation. But the appeals process has been highly problematic ever since 2016, when Ofqual changed the rules to make it harder to appeal, with consequences that could well cause great trouble this August.
One final point.
Towards the end of the paper, Mr Taylor states ‘the key error was misjudging what people would accept’.
Indeed. And in my view the lion’s share of that misjudgement in on Ofqual’s shoulders. If that ‘misjudgement’ happened in the spring and summer of 2020, what confidence can anyone have that a similarly tragic-only-with-hindsight ‘misjudgement’ has not taken place in the spring of 2021, and is unfolding now? And who is being held to account? Mr Taylor, in winning his A* in Reputation Management, is clearly pointing his finger at the Government. My finger is pointing somewhere else.