This blog was kindly contributed by Dennis Sherwood who has been blogging for HEPI about A levels, exams and Ofqual for many years. You can find Dennis on Twitter @noookophile.
Friday, 18 June 2021 is the deadline for teachers to submit this year’s school exam grades, so bringing the current act (The Deed) of this year’s school grading drama to a close. Three acts are to follow: The Outcome, the announcement of the results on 10 August (for A level and AS) and 12 August (GCSE); and then the most distressing – and possibly most tragic – act of all, The Reckoning, an act which has not yet finished for those who are still battling the outcomes of last year’s disaster. After that, the final act, The Aftermath, will play out over the years to come. So let me take this opportunity to reflect on acts past, and to anticipate acts future.
Act 1 – The Stitch-up
An office. A man is pacing the room, studying a letter.
The drama opens with the Government’s announcement on 4 January that schools will be closed, and that ‘it is not possible or fair for all exams to go ahead this summer as normal’. In the central scene, Simon Lebus, the interim Chief Regulator and Chief Executive Officer of Ofqual, the regulator of school exams in England, is reading a letter, dated 13 January, from Gavin Williamson, the Secretary of State for Education, informing him of the Government’s decision that ‘This year we are asking teachers to assess their students’.
He understands the significance of those words. Reeling from the summer 2020 grading fiasco in which the Department for Education and Ofqual took the brunt of the blame for relying on a ‘mutant algorithm’, the Department has lurched to the opposite extreme, passing the buck to teachers, requiring them to submit ‘teacher assessed grades’ (TAGs) for each of their students.
To do this, teachers have to identify, for each subject, which of the ten GCSE grades (9 to 1, plus U), six AS grades (A to E, plus U) or seven A-level grades (A* to E, plus U) should be awarded to each student.
Except for the most (or least) able students who are indisputably within the top (or bottom) grades, I think this is impossible. How can a teacher reliably decide whether to award a student grade B or grade A in A level Geography, or – harder – grade 3 or grade 4 in GCSE English? Imagine that universities were required to award ten classes of degree. Would that be sensible or meaningful? Would the outcomes be reliable and trustworthy? I might, of course, be wrong, and doing teachers a grave injustice, for which I apologise. So if anyone reading this thinks – or indeed knows – otherwise, please post a comment, and let’s talk about it. I’m happy to change my mind.
But let me cite three reasons in support of my case.
First, even the so-called ‘gold standard’ of ‘real’ exams cannot reliably distinguish between adjacent grades. At a hearing of the Education Select Committee held on 2 September 2020, Ofqual’s then Acting Chief Regulator, Dame Glenys Stacey, acknowledged that exam grades are ‘reliable to one grade either way’. This statement was unqualified, and so – presumably – applies to all grades in all subjects at all levels. And it is equivalent to, but less dramatic than, ‘an A level certificate showing ABB really means any set of grades from A*AA to BCC, but no one knows which’, and ‘on average, 1 grade in every 4 is wrong’. If the exam system is unable to distinguish reliably between grades B and A in A level Geography, how can a teacher?
This dovetails into the second reason. How does a teacher learn what, say, a grade A in A level Geography looks like, and how it is different from a grade B, so honing the skill to assess students correctly? This skill can be learnt only from experience, and by comparison of prior judgements to actual outcomes. If there is a discrepancy, the teacher recalibrates. But this goes awry when the actual grade is wrong, and the discrepancy arises not from an error in the teacher’s original judgement, but from an error in the ‘official’ outcome. Since, on average, 1 awarded grade in every 4 is wrong and has been wrong for years, teachers have been denied any opportunity to learn.
And thirdly, to help teachers identify individual grades, on 26 March, the Joint Council for Qualifications published ‘grade descriptors to assist with determining grades’ for both GCSE, and AS and A level. To take one at random, this is the descriptor for grade B in A level Geography:
Characteristics that differentiate a grade B from a grade A:
– Candidates will typically draw from a narrower range of examples and ideas. Coverage will be less comprehensive.
– Connections and relationship may not be as fully explained, and deconstruction of the question will be less effective.
– Conclusions will be characterised by less complexity and provide partial coverage of the question/issues.
What, precisely, do those comparative phrases ‘narrower range’, ‘less comprehensive’, ‘not as fully explained’ and the rest mean? Comparisons are about rankings, which a teacher can probably do anyway. Grading is about absolutes, and the practical problem the teacher faces is whether any two closely-ranked students are both ‘strong Bs’, both ‘just-made-it As’, or one of each. How do those descriptors help? How does the teacher decide?
Yes, it’s all about judgement. Of which more in Act 4.
The seeds of the what might turn out to be a tragedy were sown here. In my opinion, teachers have been required to do the impossible. They have been stitched up. To my mind, assigning students to fewer bands – say four – would have been much more sensible, albeit with the inevitable hair-splitting at each of the three boundaries.
Act 2 – The Deed
A small room, late at night. A teacher is agonising over where to draw a line on a list.
And so teachers have been obliged to determine their students’ TAGs. But how? On what basis? Their judgement, of course. But suppose that judgement is challenged? Judgement alone cannot be sufficient; there must be documentary evidence too. Evidence that will bear scrutiny from the exam boards; evidence that will be strong enough to defend that judgement against an appeal. So the school needs to amass specific items of evidence for each student – mock exams, class tests… – each of which needs to be formally assessed, with the aggregate assessment determining the student’s TAG.
As teachers around the land will attest, assembling all this evidence has caused a huge amount of unpaid additional work, ending with those agonising decisions as to where to draw the grade boundaries on those lists of marks. Students too, have been under pressure, some feeling that the experience of being in the spotlight continuously for weeks has been more stressful than the formal exams.
All of this is well-intentioned, but with any number of so-called ‘unintended consequences’, from the individual student whose (totally legitimate) special personal circumstances can’t be taken into account because that would imply that not all students had been treated in the same way, to the admission of only rock-solid ‘evidence’ that can be ‘proven’ by a ‘mark’, to the exclusion of everything else.
But there are three fatal flaws. First, the inevitability that different schools will use different assessment protocols, each minutely detailed and followed to obsession. Yet different. Secondly, the possibility of bias, which is very hard to detect. And thirdly, within all those protocols are marks – such as the mark given to Ali’s Geography mock exam script by Sam, a conscientious teacher. And everyone – even Ofqual – knows that if that same script were marked by a different, equally conscientious, teacher, that mark might, just might, be different…
… all of which cause Ali, and Ali’s parents or guardians, to wonder ‘what might have happened had Ali attended the school up the road, and had been assessed using a different protocol?’ and ‘suppose that script had been marked by a different teacher?’
These create doubt as regards the reliability of the final outcome. And that doubt erodes trust…
Act 3 – The Outcome
Locations around the country.
We shall see. My hunch is that A level results will be close to UCAS predictions, for they are in my view ‘safe’: these grades are likely to match most students’ hopes, so reducing the likelihood of appeals; furthermore, according to UCAS guidelines, ‘predicted grades … should be aspirational but achievable’, and the inclusion here of ‘achievable’ should satisfy Ofqual’s requirement for the ‘reasonable exercise of academic judgement’. Yes, that does have an impact, which may, or may not, have been anticipated, on higher education admissions. And it also implies some grade inflation, even beyond last year – for unlike last year’s grades, this year’s grades are not explicitly constrained by the results of prior years.
For GCSE, my expectation is some grade inflation here too, if only because I would expect fewer grade 3s in general, and in particular for key large cohort subjects such as English and Maths: few teachers will wish to ‘award’ grade 3 (or lower), consigning their students to the social dustbin of ‘The Forgotten Third’. So I think many 3/4 border-line students will be given the benefit of the doubt, with knock-on effects up the grades. And, for the most part, rightly so in my opinion. How can a national education system build in an outcome of ‘failure’, after 11 years of schooling, for one third of the students?
Act 4 – The Reckoning
A kitchen. Three people are having breakfast. One is distraught; the two others are engaged in earnest conversation.
Ali has been awarded grade B for A level Geography and is upset. So are Ali’s parents. They fear something has gone wrong; or, more subtly, they recognise that everything underpinning that grade B is fundamentally a matter of someone’s judgement – judgement as regards the protocol that the school followed, judgement as regards the weighting assigned to each component, judgement as regards where to draw those oh-so-important grade boundaries on those lists of marks, judgement as regards those marks themselves as given in that mock exam…
Yes, there is evidence that the mock exam mark was 62, and there is an audit trail of how that 62 is the sum of the marks given for each question. There are no mistakes. But might a different teacher have given that same script a different, and perhaps higher, mark – a teacher in a different school, a teacher who does not know Ali and so is much less likely to be biased? A higher mark that would have made all the difference between grade B and grade A? Of course, everyone – including Ofqual – knows this is a possibility.
This is a familiar situation in all matters of judgement: for example, if there is concern about a medical diagnosis, or legal advice. To resolve these doubts, the recourse is to an expert second opinion. And if the second opinion concurs with the original, the joint opinion will often be accepted; if there is a difference of opinion, the enquirer will follow whichever is the more palatable.
In the case of exam grades, the recourse sought by any appellant is, in general, the same – an expert second opinion. When there were ‘real’ exams, that was a fair re-mark; this year is much more complex since there will be doubts not just about the marks actually given, but about the process too – which items of work were taken into account, which not; the weighting given to [this] rather than [that]; why the critical grade boundary was drawn [here] rather than [there]….
It is an expert, trusted, independent, unbiased second opinion that an appellant seeks. And it was expert second opinions that Ofqual used to determine the reliabilities of GCSE, AS and A level grades, as presented in their two landmark reports, Marking Consistency Metrics, of November 2016, and November 2018’s Marking Consistency Metrics – An update, Figure 12 of which shows measurements of the reliabilities of the grades for each of 14 subjects – the key evidence that, on average across all subjects, about 1 exam grade in every 4, as actually awarded, is wrong.
But in the summer of 2016, just a few months before the first Marking Consistency Metrics report was published, Ofqual announced a change in the rules for appeals, denying access to an expert second opinion except in very limited circumstances. In my view, the consequences of that change have been pernicious when there were exams, more pernicious last summer when there weren’t, and I expect even more so this coming summer since so much more is based on local judgement.
Let me mention two consequences in particular. First, the significant narrowing of the grounds for appeal has, as deliberately intended, suppressed the number of appeals. But a reduction in the number of appeals, and hence a reduction in the number of grading errors discovered and corrected, does not imply a reduction in the number of errors actually made. They’re still there; just concealed.
Secondly, it has caused those appellants with sufficient tenacity and energy to fight their way through the appeals obstacle course to attempt to find ever more ingenious ways by which the allowed grounds for appeal – such as ‘marking errors’ and ‘procedural errors’ – can be ‘bent’ to result in that much-sought-after expert second opinion. And if they fail, let’s try ‘maladministration’ or ‘malpractice’. All of which makes the defenders ever more stubborn.
The barriers to overcome are high, and many appeals fail, to the great distress of the frustrated appellants, who feel that the denial of being able to obtain an expert second opinion is a denial of justice. Which leaves a bitter taste. A taste which will be even more bitter this year.
In the past, appeals required the school to pay an initial fee (refunded for appeals ruled in the student’s favour, but still acting as a disincentive, especially for state-funded schools), and the battle was against ‘the system’, as personified, usually, by the impersonal exam board. This year, there is no fee (so that disincentive has been removed) but the battle will be against ‘my school’ and ‘my teacher’, which can be bitter indeed – especially for those parents who have other children still at the school, and still beholden to the same teachers. Nasty. And let’s not forget that much of this will happen just because teachers were forced to undertake – and will be forced to defend – the impossible task of distinguishing between ten GCSE grades, six AS grades and seven A level grades. Maybe that B really was an A… after all, ‘real’ exam grades are only ‘reliable to one grade either way’…
No one knows how many appeals there will be, but in anticipation of a surge, and to allow more time for appeals to be processed, Ofqual have brought forward the date on which A level results will be announced to 10 August. It might be wise, however, to anticipate that not all appeals will have been resolved by the deadline for HE admissions.
For some this will be a tragedy. A tragedy caused by Ofqual’s requirement this year for teachers to make impossible judgements-of-Solomon; by Ofqual’s long-established policy of denying access to an expert second opinion and so not allowing fair re-marks; by Ofqual’s continued blindness to the consequences of the reality that different teachers can, and do, give different marks to the same script; by Ofqual’s fundamental failure to devise a way to award assessments that do not penalise students for this ambiguity.
Act 5 – The Aftermath
Locations around the country.
This year’s process has set student and parent against school and teacher, with exam boards, Ofqual and the Department for Education as bystanders. And with Ofsted, the inspector of schools, nowhere in sight. Schools and teachers have been placed in an invidious position. And the students are the victims. Yet I believe that it is the ‘bureaucrats’ who are responsible. But to whom are the bureaucrats accountable? Ultimately, to those students, parents and teachers – and the rest of us as well. So the stronger, and the more united, the voice of protest from those students, parents, and teachers, and from all of us, the greater the likelihood that someone with the power to take the required action will take notice – especially if the media are alongside.
I would expect the bureaucrats to be concerned about grade inflation too, seeking to ensure that, when exams return, as they surely will, the percentage of top grades is brought back down to the pre-Covid levels of 2018 and 2019. When this happens, however, the students – understandably – will cry ‘unfair!’ since far fewer students will be awarded top grades simply as the result of a stroke of the bureaucrats’ pen. A knotty problem. To which there is an ‘interesting’ solution. Rather than reverting to the existing grading structure, why not adopt a totally new assessment structure that is an explicit and deliberate break with the past? That way, it will be obvious that direct comparisons of, say, 2023 results with those of 2018 are not valid, and that the post-Covid exam assessment era is different. One such break would be to throw grades away, and award assessments on the basis of, for example, ‘mark ± fuzziness’, which offers the added benefit of delivering grade reliability too. Another would be introducing more far-reaching reforms, such as those advocated by so many pressure groups.
The moral – It’s all about trust
Fundamentally, though, everything comes back to trust.
If exam results are reliable and trustworthy, they are respected and there is no reason to appeal. If teachers are trusted, their assessments are respected, and there is no reason to appeal.
But the events of last year damaged trust. And I fear that, in August, when the results are announced, teachers will (for the most part, unfairly) be under attack, weakening that trust even more. Oh dear. It’s not the teachers’ fault. It’s a consequence of botched policy.
In recent years there has been research on this subject that has resulted in more people better understanding what exams and other forms of assessment can and cannot do.
Accuracy and precision is very difficult, in fact, impossible to achieve in assessment processes involving individual judgements by humans or AI.
AS the recently published book Noise, points out, this is not limited to education, criminal justice or commercial insurance risk.
In addition, there is the question of bias to take into account.
Teachers have been given an impossible task. It seems to me there should be fewer grades and when it comes to University admission, more weight given to interviews (which face a different set of problems).
The debate continues.
Thank you, Dennis, for a very sharp article.
May I suggest one thing? There was something between the DfE and Ofqual lurching from “our algorithm knows best” to passing the bucks to the teachers. That was the months from August to January when they heard no evil and saw no evil about a certain pandemic, and they charged headlong towards their guarantee that exams would take place.
Thank you, Albert and Huy – yes, indeed.
And since this blog was published this morning, Ofqual has now confirmed the process for this year’s appeals, the two key grounds being:
“The Centre did not follow its procedure properly”
and
“The result reflects an unreasonable exercise of academic judgement on the part of the Centre”.
The first is a process error, and the second denies access to an expert second opinion unless the original opinion can be proven to be “unreasonable”. This explicitly excludes the possibility of another opinion being “different but legitimate”, which is to my mind the heart of the matter.
The full details are on
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/991828/6791-1_GQAA_guidance_consultation_decisions_08JUN21.pdf
and, in more detail (‘Condition GQAA4’) on
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/991757/6795_General_qualificatons_alternative_awarding_framework.pdf
Last year Oqual lost the plot by having the aim of keeping grade inflation unsupportably low as their primary priority, and virtually ignored fairness for individuals.
I wonder if this year they have lost the plot again. The plot behind cancelling exams, as stated by the PM in January, was to even out the effects on grades of the different levels of learning loss that different students suffer. Looking at the rules and guidelines that Ofqual have developed, it hard to see how they would even that out better than not cancelling exams in the first place. Most of these rules and guidelines seem to be in hope of solving the problems that the solution have created rather than solving the original problem.
If a student who is truly grade 8 is given a grade 7, how can they ever prove that grade 7 is unreasonable? This is hard enough for exact subjects and must be virtually impossible for non-exact ones. It difficult to imagine that the rule “The result reflects an unreasonable exercise of academic judgement on the part of the Centre” will provide sufficient redress for students. It looks like a rule that’s designed to make Ofqual look good rather than add anything for students.
Normally in life when we take two readings and both of which are reasonable, we would take the average. That is also what we teach students in schools. However, in both normal years and Covid years, Ofqual’s rule is that the second reading must be ignored. It is a rule designed to protect the system rather than the people that system is supposed to served, which, in turn, is the kind of rules that we see in countries where the people in charge have absolute power and there are aligned interest groups.
Dear Huy Duong my Son was a 2020 student, he was awarded a CAG of a 4 in Biology but took the exams just 78 days later and received an 8! Would anyone believe a 16 year old who said he was capable of an 8 when he received a 4? How could he have appealed? What evidence did he have that his Center had got his grades incorrect? Department of education are reluctant to look into the case, I wonder why?
Dear Mrs G, Congratulations to your son for getting out of the bad situation that the adults threw him in. If I have to hazard a guess, I would put my money on his teachers knowing that grade 4 is wrong, but the school management wouldn’t allow them to do anything about it. The school management had its own skin to protect, even if in their conscience they knew they were like to be wrong about your son. For example, the Head had signed a declaration that grade 4 was a “true reflection” of your son’s performance, and he/she will be reluctant to admit that he/she signed a wrong declaration. The exam boards and Ofqual, they don’t care and they want to crush students like your son and move on.
Dennis, If Ofqual has finally come clean about grades being “reliable one grade either way” (I think they should be more honest and say grades are ONLY reliable one grade either way), perhaps integrity requires the exam boards to put on certificates a disclaimer to say that their grades are only reliable one grade either way. Perhaps that should more well known to students, employers and education providers so that they don’t reject candidates who “miss” a grade? The lack of honesty and openness in admitting that grades are unreliable, students might be perceived to have miss a grade and will suffer.
However, I have the impression that for years Ofqual has not tried to make that known. For example, when the Times or the Sunday Times published an article saying that 1/4 of grades are wrong (or something like that, I don’t remember), they successfully forced that newspaper to apologise. Could it be that the newspaper was right?
Huy, Mrs G…
May I echo Huy’s words about your son’s achievement!
And yes, the newspaper was right, yet it was forced to make this retraction:
https://www.thetimes.co.uk/article/corrections-amp-clarifications-pwjqk965s.
The full ruling is here:
https://www.ipso.co.uk/rulings-and-resolution-statements/ruling/?id=06272-19.
The fact that grades are unreliable has been known since at least 2005. Here is an extract from page 70 of a report published by AQA (https://filestore.aqa.org.uk/content/research/CERP_RP_MM_01052005.pdf):
“However, to not routinely report the levels of unreliability associated with examinations leaves awarding bodies open to suspicion and criticism. For example, Satterly (1994) suggests that the dependability of scores and grades in many external forms of assessment will continue to be unknown to users and candidates because reporting low reliabilities and large margins of error attached to marks or grades would be a source of embarrassment to awarding bodies. Indeed it is unlikely that an awarding body would unilaterally begin reporting reliability estimates or that any individual awarding body would be willing to accept the burden of educating test users in the meanings of those reliability estimates.”
The lead author of that report is Dr Michelle Meadows, then at AQA and now Ofqual’s Executive Director for Strategy, Risk and Research.
You are correct Huy, teachers absolutely knew that their system was flawed and in my Sons case they admitted that my Son was disadvantaged by the system. I fully believe that if a student is able to evidence by taking the autumn series resits ( he also improved his chemistry grade from a 6 to a 7) that his teachers were 4 grade boundaries out then it should be investigated. How is a student meant to have confidence in the very people they trusted for 5 years when this happens? Not only did my Son loose the grades he deserved but his school told him he couldn’t continue to study Biology at A level, a compulsory subject for veterinary medicine, this was the student who then achieved an 8 after being out of education for 6 months!
The system failed the students who have to really work hard for the grades.
As a parent I hope one day the pain will disappear.
Mrs G,
So the school management’s mutant algorithm derailed your son’s grade and, instead of apologising, the school management used the derailed grade to deny your son’s right to study biology? It makes me sad and angry that in a democratic country with the rule of law and human rights, school managements can treat students like that.
Did he get to study it at different school?
Dennis,
Thank you. That 2005 article by Michelle Meadows and Lucy Billington is really good. May I quote from the conclusion,
“The need to routinely report reliability statistics alongside grades
Please (1971) and Newton (2003) pointed out that even with high values of the marker reliability
coefficient, the proportion of candidates likely to be wrongly graded is likely to be large. Indeed
Baird and Mac (1999) reported a meta-analysis of reliability studies conducted by the AEB in
the early 1980s to show the relationship between inter-marker reliability measures and the
proportion of candidates getting the same grade. They demonstrate that even near perfect
reliability estimates of 0.98 are associated with up to 15 per cent of the candidates not achieving
the same grade. A reduction in reliability to 0.90, which is still a reasonable figure, saw between
40 per cent and 50 per cent of candidates not receiving the same grade.
As discussed earlier, given the variability in the marker reliability estimates that has been
documented, teachers, examiners and the consumers of examination results need to be better
informed about the importance and limitations of reliability in the evaluation of attainment. This
has been argued for a long time and by a number of authors. As early as 1968, Skurnik and
Nuttall voiced concern that awarding bodies issue certificates which conceal margins of error of
unstated magnitude. Skurnik and Nuttall cited the good practice of a number of public
examination bodies in the USA that attempt to communicate the margin of error inherent in the
assessment. They issue the results of tests in the form of a band of scores for each candidate,
based upon the standard error, as well as a single score for each person. They also publish the
reliability coefficient associated with the examination.
68
This was also the view held by the Joint Matriculation Board (JMB) in 1969 when it proposed a
revision to the A level grade scale that recognised the uncertainty in the measurement. The
proposal was taken up by the government but abandoned by the Secretary for State for
Education and Science after extensive consultations. The JMB continued to draw attention to its
suggestion that “results should be accompanied by a statement of the possible margin of error”
(JMB, 1983, p.65-66). ”
It’s clear that there is a need to routinely report reliability statistics alongside grades. Why don’t Ofqual and the exam boards do it?
Yes Huy my Son had attended the same school for 5 years receiving 301 positive points to 7 negatives. He sat his mocks November 2019 just six weeks into the academic year 11. In Biology mock he was 2 marks short of a 6 ( we still have the paper). He was also 1 mark short of a 7 in Geography in his November mocks. He applied to the sixth form at the school he had been at to study Chemistry, Biology and Geography. The requirements were that you had a 6 in each of the subjects and also that you had a 5 in maths to study Biology, My Son had a 6 in maths, 6 in Geography and a 6 in Chemistry, so he met all the requirements except for Biology where the teachers had predicted a 4 but the algorithm had took his grade up to a 5, so my Son missed out on a place to study Biology even though we know that all the places were not filled. My Son explained that he wanted to be a vet but they ignored him and offered him a different subject . He offered to re sit year 11 which they ignored, he asked if they could offer him a place to study Biology, Chemistry and Geography and then take the re sit so he could prove he was capable but they said no, every single barrier was put in front of my Son on August 20th 2020. He then had to join a different sixth form to study Biology, Chemistry and Geography. He went back to the school and they concluded that they should have lowered his target grades to manage our expectations, his target grades in chemistry and Biology were both a 7, he received an 8 in Biology and a 7 in Chemistry when he did the resits in November, after starting a new sixth form doing 3 a levels in the middle of a global pandemic, they also said they rushed the process.
My Son has a hearing disability so he has to sit exams in a room on his own, however the data used was not when he was in these conditions.
Sadly my Son struggled to eat in September because of the trauma and he also developed severe eczema all over both his eyelids which he had never had before, he has now been prescribed steroid cream for it every day and is now at risk of steroid induced glaucoma.
On the positive he applied for a place on the rvc veterinary summer school, there were 70 places worldwide and my Son received a place.
As for human rights, my Son has no rights with this situation.