This guest post has been kindly contributed by Huy Duong, who has been described by the Guardian as ‘the father who foresaw A-level algorithm flaws’. Huy thanks Professor George Constantinides, Professor Rob Cuthbert, Professor Mike Larkin and Mr Dennis Sherwood for important discussion and help in writing this article.
Ofqual’s calculated grades, which have been scrapped, were flawed for at least four reasons.
- First, the input for the grade calculation consisted of the Centre-Assessment Grades and rankings, which will have some errors.
- The second input was historical performance at the school-subject cohort level, which only has weak association with the ability of the 2020 cohort and furthermore can be volatile.
- The third input was the 2020 cohort’s prior attainment, but Professor George Constantinides found that corrections for prior attainment are made in a dubious way, resulting in anomalous grades.
- Finally, the appeal procedure to protect the students against these three fundamental flaws was itself flawed.
Notwithstanding those flaws, the bottom-line question was, ‘How confident is Ofqual that its algorithm does not downgrade the wrong students?’, or alternatively, ‘What is the probability that an awarded grade is correct?’ The answers to these questions are vital for public debate and policy making. Yet Ofqual only disclosed them to the public on A-level results day, in Awarding GCSE, AS, A level, advanced extension awards and extended project qualifications in summer 2020: interim report. Did it disclose such information to the Department for Education for its decision making? Strangely, the Education Secretary pleaded igorance. That seems wrong on many levels.
In testing, compared with grades awarded by exams, their best model got A-level Biology grades wrong around 35% of the time and French grades wrong around 50% of the time, while for GCSEs, it awarded around 25% wrong Maths grades and around 45% wrong History grades.
Ofqual’s interim report (Figure 7.25 on p.81) shows the probabilities that 2018 exam grades in different subjects are correct. For example, a 2018 Maths grade had a 96% chance of being correct, 74% for Economics and 61% for English Language – see Dennis Sherwood’s latest HEPI blog for more information. Superimposed, for comparison, is Ofqual’s estimation of the probability that a 2020 grade awarded by its chosen model is correct. Alarmingly, this probability ranges from 50% to 75%, implying that Ofqual’s 2020 grades had a 25% to 50% chance of being incorrect.
The accuracy of Ofqual’s 2020 grades depended on the subject and the cohort size. For example, 2020 A-level Biology grades awarded to a cohort of 49 students had almost a 35% chance of being incorrect compared to grades awarded by exams, but if the cohort size is just 24, this rose to 45%.
Therefore even Ofqual’s best model significantly worsened grade accuracy for most A-level subjects when the cohort size is below 50, which is common (almost 62% of the total in 2019). For GCSEs, even with larger cohorts, the best model would have worsened the grade accuracy for Maths and Sciences. A very conservative figure of 25% of wrong grades would have amounted to 180,000 wrong A-level grades and 1.25 million wrong GCSE grades.
With so many wrong grades awarded in 2020, Ofqual was never going to maintain the currency and integrity of grades anyway. Reducing A-level grade inflation (for example, from 12% to 2% for A-levels) would have meant very little against the backdrop of grade inaccuracies. Ofqual’s claim of making grades consistent between schools and between years is also questionable when so many grades would have been wrong. Very little would have been achieved at the cost of injustice, disruption, distress for hundreds of thousands of students and loss of faith in Ofqual and the Department for Education from teachers.
In the end, the ill-advised ‘standardisation’ left two legacies.
- The first, ironically, is an increase in grade inflation due to the upgrading that took place during its operation.
- The second is some Centre-Assessment Grades that are wrong – for example, because some teachers consciously or subconsciously tried to make the 2020 grade distribution similar to historical data – which the students might not be able to appeal against.
There have always been two extremes for this year’s grading: either aggressively keeping grade inflation down to a few percent at high risk of injustice, or using CAGs, which entails higher grade inflation but lower risk of injustice. There should have been a rational debate to find a compromise point and to devise a safety net for those who are failed by that compromise. Instead, the Department for Education and Ofqual’s dogmatic insistence on the first extreme, their blusters and lack of transparency made that debate impossible and led the country into the crisis.
After some inept handling of that crisis, they collapsed and lurched to the second extreme, but still leaving students without a well-designed appeal process as a safety net against that extreme’s limitations.