This blog was kindly contributed by Dennis Sherwood who has been blogging for HEPI about A levels, exams and Ofqual for many years. You can find Dennis on Twitter @noookophile.
Friday, 18 June 2021 is the deadline for teachers to submit this year’s school exam grades, so bringing the current act (The Deed) of this year’s school grading drama to a close. Three acts are to follow: The Outcome, the announcement of the results on 10 August (for A level and AS) and 12 August (GCSE); and then the most distressing – and possibly most tragic – act of all, The Reckoning, an act which has not yet finished for those who are still battling the outcomes of last year’s disaster. After that, the final act, The Aftermath, will play out over the years to come. So let me take this opportunity to reflect on acts past, and to anticipate acts future.
Act 1 – The Stitch-up
An office. A man is pacing the room, studying a letter.
The drama opens with the Government’s announcement on 4 January that schools will be closed, and that ‘it is not possible or fair for all exams to go ahead this summer as normal’. In the central scene, Simon Lebus, the interim Chief Regulator and Chief Executive Officer of Ofqual, the regulator of school exams in England, is reading a letter, dated 13 January, from Gavin Williamson, the Secretary of State for Education, informing him of the Government’s decision that ‘This year we are asking teachers to assess their students’.
He understands the significance of those words. Reeling from the summer 2020 grading fiasco in which the Department for Education and Ofqual took the brunt of the blame for relying on a ‘mutant algorithm’, the Department has lurched to the opposite extreme, passing the buck to teachers, requiring them to submit ‘teacher assessed grades’ (TAGs) for each of their students.
To do this, teachers have to identify, for each subject, which of the ten GCSE grades (9 to 1, plus U), six AS grades (A to E, plus U) or seven A-level grades (A* to E, plus U) should be awarded to each student.
Except for the most (or least) able students who are indisputably within the top (or bottom) grades, I think this is impossible. How can a teacher reliably decide whether to award a student grade B or grade A in A level Geography, or – harder – grade 3 or grade 4 in GCSE English? Imagine that universities were required to award ten classes of degree. Would that be sensible or meaningful? Would the outcomes be reliable and trustworthy? I might, of course, be wrong, and doing teachers a grave injustice, for which I apologise. So if anyone reading this thinks – or indeed knows – otherwise, please post a comment, and let’s talk about it. I’m happy to change my mind.
But let me cite three reasons in support of my case.
First, even the so-called ‘gold standard’ of ‘real’ exams cannot reliably distinguish between adjacent grades. At a hearing of the Education Select Committee held on 2 September 2020, Ofqual’s then Acting Chief Regulator, Dame Glenys Stacey, acknowledged that exam grades are ‘reliable to one grade either way’. This statement was unqualified, and so – presumably – applies to all grades in all subjects at all levels. And it is equivalent to, but less dramatic than, ‘an A level certificate showing ABB really means any set of grades from A*AA to BCC, but no one knows which’, and ‘on average, 1 grade in every 4 is wrong’. If the exam system is unable to distinguish reliably between grades B and A in A level Geography, how can a teacher?
This dovetails into the second reason. How does a teacher learn what, say, a grade A in A level Geography looks like, and how it is different from a grade B, so honing the skill to assess students correctly? This skill can be learnt only from experience, and by comparison of prior judgements to actual outcomes. If there is a discrepancy, the teacher recalibrates. But this goes awry when the actual grade is wrong, and the discrepancy arises not from an error in the teacher’s original judgement, but from an error in the ‘official’ outcome. Since, on average, 1 awarded grade in every 4 is wrong and has been wrong for years, teachers have been denied any opportunity to learn.
And thirdly, to help teachers identify individual grades, on 26 March, the Joint Council for Qualifications published ‘grade descriptors to assist with determining grades’ for both GCSE, and AS and A level. To take one at random, this is the descriptor for grade B in A level Geography:
Characteristics that differentiate a grade B from a grade A:
– Candidates will typically draw from a narrower range of examples and ideas. Coverage will be less comprehensive.
– Connections and relationship may not be as fully explained, and deconstruction of the question will be less effective.
– Conclusions will be characterised by less complexity and provide partial coverage of the question/issues.
What, precisely, do those comparative phrases ‘narrower range’, ‘less comprehensive’, ‘not as fully explained’ and the rest mean? Comparisons are about rankings, which a teacher can probably do anyway. Grading is about absolutes, and the practical problem the teacher faces is whether any two closely-ranked students are both ‘strong Bs’, both ‘just-made-it As’, or one of each. How do those descriptors help? How does the teacher decide?
Yes, it’s all about judgement. Of which more in Act 4.
The seeds of the what might turn out to be a tragedy were sown here. In my opinion, teachers have been required to do the impossible. They have been stitched up. To my mind, assigning students to fewer bands – say four – would have been much more sensible, albeit with the inevitable hair-splitting at each of the three boundaries.
Act 2 – The Deed
A small room, late at night. A teacher is agonising over where to draw a line on a list.
And so teachers have been obliged to determine their students’ TAGs. But how? On what basis? Their judgement, of course. But suppose that judgement is challenged? Judgement alone cannot be sufficient; there must be documentary evidence too. Evidence that will bear scrutiny from the exam boards; evidence that will be strong enough to defend that judgement against an appeal. So the school needs to amass specific items of evidence for each student – mock exams, class tests… – each of which needs to be formally assessed, with the aggregate assessment determining the student’s TAG.
As teachers around the land will attest, assembling all this evidence has caused a huge amount of unpaid additional work, ending with those agonising decisions as to where to draw the grade boundaries on those lists of marks. Students too, have been under pressure, some feeling that the experience of being in the spotlight continuously for weeks has been more stressful than the formal exams.
All of this is well-intentioned, but with any number of so-called ‘unintended consequences’, from the individual student whose (totally legitimate) special personal circumstances can’t be taken into account because that would imply that not all students had been treated in the same way, to the admission of only rock-solid ‘evidence’ that can be ‘proven’ by a ‘mark’, to the exclusion of everything else.
But there are three fatal flaws. First, the inevitability that different schools will use different assessment protocols, each minutely detailed and followed to obsession. Yet different. Secondly, the possibility of bias, which is very hard to detect. And thirdly, within all those protocols are marks – such as the mark given to Ali’s Geography mock exam script by Sam, a conscientious teacher. And everyone – even Ofqual – knows that if that same script were marked by a different, equally conscientious, teacher, that mark might, just might, be different…
… all of which cause Ali, and Ali’s parents or guardians, to wonder ‘what might have happened had Ali attended the school up the road, and had been assessed using a different protocol?’ and ‘suppose that script had been marked by a different teacher?’
These create doubt as regards the reliability of the final outcome. And that doubt erodes trust…
Act 3 – The Outcome
Locations around the country.
We shall see. My hunch is that A level results will be close to UCAS predictions, for they are in my view ‘safe’: these grades are likely to match most students’ hopes, so reducing the likelihood of appeals; furthermore, according to UCAS guidelines, ‘predicted grades … should be aspirational but achievable’, and the inclusion here of ‘achievable’ should satisfy Ofqual’s requirement for the ‘reasonable exercise of academic judgement’. Yes, that does have an impact, which may, or may not, have been anticipated, on higher education admissions. And it also implies some grade inflation, even beyond last year – for unlike last year’s grades, this year’s grades are not explicitly constrained by the results of prior years.
For GCSE, my expectation is some grade inflation here too, if only because I would expect fewer grade 3s in general, and in particular for key large cohort subjects such as English and Maths: few teachers will wish to ‘award’ grade 3 (or lower), consigning their students to the social dustbin of ‘The Forgotten Third’. So I think many 3/4 border-line students will be given the benefit of the doubt, with knock-on effects up the grades. And, for the most part, rightly so in my opinion. How can a national education system build in an outcome of ‘failure’, after 11 years of schooling, for one third of the students?
Act 4 – The Reckoning
A kitchen. Three people are having breakfast. One is distraught; the two others are engaged in earnest conversation.
Ali has been awarded grade B for A level Geography and is upset. So are Ali’s parents. They fear something has gone wrong; or, more subtly, they recognise that everything underpinning that grade B is fundamentally a matter of someone’s judgement – judgement as regards the protocol that the school followed, judgement as regards the weighting assigned to each component, judgement as regards where to draw those oh-so-important grade boundaries on those lists of marks, judgement as regards those marks themselves as given in that mock exam…
Yes, there is evidence that the mock exam mark was 62, and there is an audit trail of how that 62 is the sum of the marks given for each question. There are no mistakes. But might a different teacher have given that same script a different, and perhaps higher, mark – a teacher in a different school, a teacher who does not know Ali and so is much less likely to be biased? A higher mark that would have made all the difference between grade B and grade A? Of course, everyone – including Ofqual – knows this is a possibility.
This is a familiar situation in all matters of judgement: for example, if there is concern about a medical diagnosis, or legal advice. To resolve these doubts, the recourse is to an expert second opinion. And if the second opinion concurs with the original, the joint opinion will often be accepted; if there is a difference of opinion, the enquirer will follow whichever is the more palatable.
In the case of exam grades, the recourse sought by any appellant is, in general, the same – an expert second opinion. When there were ‘real’ exams, that was a fair re-mark; this year is much more complex since there will be doubts not just about the marks actually given, but about the process too – which items of work were taken into account, which not; the weighting given to [this] rather than [that]; why the critical grade boundary was drawn [here] rather than [there]….
It is an expert, trusted, independent, unbiased second opinion that an appellant seeks. And it was expert second opinions that Ofqual used to determine the reliabilities of GCSE, AS and A level grades, as presented in their two landmark reports, Marking Consistency Metrics, of November 2016, and November 2018’s Marking Consistency Metrics – An update, Figure 12 of which shows measurements of the reliabilities of the grades for each of 14 subjects – the key evidence that, on average across all subjects, about 1 exam grade in every 4, as actually awarded, is wrong.
But in the summer of 2016, just a few months before the first Marking Consistency Metrics report was published, Ofqual announced a change in the rules for appeals, denying access to an expert second opinion except in very limited circumstances. In my view, the consequences of that change have been pernicious when there were exams, more pernicious last summer when there weren’t, and I expect even more so this coming summer since so much more is based on local judgement.
Let me mention two consequences in particular. First, the significant narrowing of the grounds for appeal has, as deliberately intended, suppressed the number of appeals. But a reduction in the number of appeals, and hence a reduction in the number of grading errors discovered and corrected, does not imply a reduction in the number of errors actually made. They’re still there; just concealed.
Secondly, it has caused those appellants with sufficient tenacity and energy to fight their way through the appeals obstacle course to attempt to find ever more ingenious ways by which the allowed grounds for appeal – such as ‘marking errors’ and ‘procedural errors’ – can be ‘bent’ to result in that much-sought-after expert second opinion. And if they fail, let’s try ‘maladministration’ or ‘malpractice’. All of which makes the defenders ever more stubborn.
The barriers to overcome are high, and many appeals fail, to the great distress of the frustrated appellants, who feel that the denial of being able to obtain an expert second opinion is a denial of justice. Which leaves a bitter taste. A taste which will be even more bitter this year.
In the past, appeals required the school to pay an initial fee (refunded for appeals ruled in the student’s favour, but still acting as a disincentive, especially for state-funded schools), and the battle was against ‘the system’, as personified, usually, by the impersonal exam board. This year, there is no fee (so that disincentive has been removed) but the battle will be against ‘my school’ and ‘my teacher’, which can be bitter indeed – especially for those parents who have other children still at the school, and still beholden to the same teachers. Nasty. And let’s not forget that much of this will happen just because teachers were forced to undertake – and will be forced to defend – the impossible task of distinguishing between ten GCSE grades, six AS grades and seven A level grades. Maybe that B really was an A… after all, ‘real’ exam grades are only ‘reliable to one grade either way’…
No one knows how many appeals there will be, but in anticipation of a surge, and to allow more time for appeals to be processed, Ofqual have brought forward the date on which A level results will be announced to 10 August. It might be wise, however, to anticipate that not all appeals will have been resolved by the deadline for HE admissions.
For some this will be a tragedy. A tragedy caused by Ofqual’s requirement this year for teachers to make impossible judgements-of-Solomon; by Ofqual’s long-established policy of denying access to an expert second opinion and so not allowing fair re-marks; by Ofqual’s continued blindness to the consequences of the reality that different teachers can, and do, give different marks to the same script; by Ofqual’s fundamental failure to devise a way to award assessments that do not penalise students for this ambiguity.
Act 5 – The Aftermath
Locations around the country.
This year’s process has set student and parent against school and teacher, with exam boards, Ofqual and the Department for Education as bystanders. And with Ofsted, the inspector of schools, nowhere in sight. Schools and teachers have been placed in an invidious position. And the students are the victims. Yet I believe that it is the ‘bureaucrats’ who are responsible. But to whom are the bureaucrats accountable? Ultimately, to those students, parents and teachers – and the rest of us as well. So the stronger, and the more united, the voice of protest from those students, parents, and teachers, and from all of us, the greater the likelihood that someone with the power to take the required action will take notice – especially if the media are alongside.
I would expect the bureaucrats to be concerned about grade inflation too, seeking to ensure that, when exams return, as they surely will, the percentage of top grades is brought back down to the pre-Covid levels of 2018 and 2019. When this happens, however, the students – understandably – will cry ‘unfair!’ since far fewer students will be awarded top grades simply as the result of a stroke of the bureaucrats’ pen. A knotty problem. To which there is an ‘interesting’ solution. Rather than reverting to the existing grading structure, why not adopt a totally new assessment structure that is an explicit and deliberate break with the past? That way, it will be obvious that direct comparisons of, say, 2023 results with those of 2018 are not valid, and that the post-Covid exam assessment era is different. One such break would be to throw grades away, and award assessments on the basis of, for example, ‘mark ± fuzziness’, which offers the added benefit of delivering grade reliability too. Another would be introducing more far-reaching reforms, such as those advocated by so many pressure groups.
The moral – It’s all about trust
Fundamentally, though, everything comes back to trust.
If exam results are reliable and trustworthy, they are respected and there is no reason to appeal. If teachers are trusted, their assessments are respected, and there is no reason to appeal.
But the events of last year damaged trust. And I fear that, in August, when the results are announced, teachers will (for the most part, unfairly) be under attack, weakening that trust even more. Oh dear. It’s not the teachers’ fault. It’s a consequence of botched policy.