This blog has been kindly written for HEPI by Dennis Sherwood of Silver Bullet Machine, who has been tracking the state of this year’s public exams for HEPI.
On Friday, Ofqual announced the key principles that will be used to ensure this year’s GCSE, AS and A level grades will be ‘as fair as they can be’. Since there will be no exams, for each subject:
- Each school has to submit, to the appropriate exam board, their suggested ‘centre assessment grade’ for each candidate, and also the rank order of candidates.
- To ensure that grading is fair across the country, the board will then apply a ‘standardisation model’ to compare the submitted centre assessment grades to historical data relating to the actual grades awarded to that school’s candidates in 2017, 2018 and 2019 for A level, and, for GCSE, those years in which the exams have been graded 9, 8, 7….
- If a school’s centre assessment grades are different from those resulting from the ‘standardisation model’, some or all of the grades will be adjusted before being issued, but the submitted rank order will not be changed.
- Overall, the board will make sure that, at a national level, grade distributions are broadly in line with previous years.
To me, this all makes good sense. The rules are simple. There are no ‘behind-the-scenes’ statistics and the process can be replicated at every school. So teachers can have confidence that their centre assessment grades, submitted in compliance with their historical averages, will have a high likelihood of being confirmed rather than over-ruled. A measure of success of the process is therefore the ratio of confirmed centre assessment grades to the total number submitted, as determined for each school, each subject, each board and overall. The closer this number is to unity, the better.
Ofqual’s key objective is to prevent grade inflation – which is what the fourth bullet point is all about. To achieve that, the distribution of grades for each subject within each school must be the same as the average over recent years, hence the ‘standardisation model’. If this is the case for each school, then the aggregate will work too.
Subject to one nasty problem, as illustrated in this table, which shows the numbers of candidates awarded grade A* in six different schools over each of the last three years:
The total number of A*s awarded each year was 58, 62 and 60, and so from the board’s point of view, ‘no grade inflation’ means about 60 awards this year, and certainly not more than 62.
Each school has an average of 10, and a range of ± 2. If each school submits 10 this year, the sum is 60, which is fine.
But suppose each school says, ‘This has been a good year, and we think 11 students merit an A*. We’d really like to go for 12, but we appreciate that’s pushing the boat out; 11 should be safe.’
So the six schools each submit 11 candidates, giving a total of 66. Even though each school has behaved reasonably, grade inflation occurs. The board must intervene.
But how? Does the board go to each school and say ‘please explain?’, in response to which the school will indeed explain.
School C, for example, might claim that their data shows they have been on an improvement path, and so their 11 grade A*s are justified. But by the same token, if school C is improving, school D is declining. Should they be awarded only six?
Every school will have a reason why ‘we are a special case’, and these will be impossible to judge fairly. So to me, the most sensible option is for the board to apply exactly the same rule in exactly the same way to everyone, and reduce each school’s number to 10. That’s why the boards need the rank orders, so they can down-grade the lowest-ranked students.
To prevent grade inflation, for every school that submits 11, another has to submit 9. Which just won’t happen. So it’s in everyone’s interests to submit the average, 10.
But what about poor Isaac at school G? He is particularly gifted at Physics, and his school recommends him for an A*, even though the school has never achieved above grade B for years. The submission on behalf of Isaac will easily be identified as an outlier and so is quite likely to be disallowed. Isaac, however, will not be consulted; nor will his teacher. So Isaac will be awarded grade B, consistent with his place at the top of the rank order. He will be a victim, and his school too, for this year’s process traps all schools as prisoners of their pasts.
But before we weep too much on Isaac’s behalf, let us remember that Isaac is just one of the huge number of people disadvantaged (to say the very least) by this most pernicious virus, and although this is a pity, many people have suffered far more gravely, and without recourse to the autumn exam at which Isaac can prove his A*++.
And although Isaac is indeed a victim this year, let us not forget the 750,000 annual victims of the exam system in England – those who, in each of the last several years, were awarded a grade lower than they merited. Neither they, nor anyone else, knows this has happened, so they truly are victims of an unseen, unreported and unpunished injustice.
Despite these problems, I still believe that this year’s grades, resulting from the rank orders submitted on the basis of teacher judgement, will be fairer than those based on the rank orders determined by exam marks.
To illustrate this, here is a chart from my simulation of the marking and grading of 2019 A level Economics:
Suppose ten students from the same school take the exam, and each receives marks from 55 to 64 inclusive, as shown on the left. Candidate A is given the highest mark; candidate B the lowest. On the right are the results of my simulation of what happens when each script is marked by a different examiner, with all examiners, in both cases, drawn from the same pool. As can be seen, most of the marks are different: not because marking is sloppy, but because marking is ‘fuzzy’ – or, to use Ofqual’s words, because ‘it is possible for two examiners to give different but appropriate marks to the same answer’.
The consequences of fuzziness are dramatic. Candidate B is now ranked fourth, higher than candidate A, who is no longer ranked first, but ninth. Much else has changed too.
If all marks from 53 to 66 fall within the same grade width, then these changes in rank order do not matter. But if there are any grade boundaries within this range, the grades corresponding to the marks on the left are highly unreliable. Which is one of the explanations behind Ofqual’s infamous statement that ‘more than one grade could well be a legitimate reflection of a student’s performance’.
The rank orders used to determine GCSE, AS and A level grades have been unreliable for about a decade. Surely the rank orders based on teacher assessments will be more reliable, and the resulting grades therefore fairer.
And even if poor Isaac is a victim of this year’s process, there will be far fewer victims than in previous years. That’s why this year’s process is very likely to give fairer results than hitherto – and more trusted too, especially if all schools submit centre assessment grades that comply with the rules, so that the ratio of confirmed grades to the number of grades submitted is nearly one.