Halfon right: Ofqual has more to do

14 July 2020
By Dennis Sherwood & Rob Cuthbert

This blog was kindly contributed by Dennis Sherwood and Rob Cuthbert. Dennis runs the Silver Bullet Consultancy and is a frequent contributor to the HEPI blog. Rob Cuthbert is an independent academic consultant and Emeritus Professor of higher education management. @RobCuthbert

On the 11th July, The Education Select Committee, chaired by Robert Halfon, published a report entitled, “Getting the grades they’ve earned – Covid-19: the cancellation of exams and ‘calculated grades’ ”. It is a compelling read.

Recognising the urgency and significance of a number of issues associated with this year’s grades, the Committee has published its first report even while it is still taking evidence on other matters. If effectively implemented, the Committee’s recommendations will ensure that more students are awarded a fair grade than would have happened otherwise. That said, there are still some further problems to address.

For this year’s GCSE, AS and A-levels, schools were asked in June to submit, for each subject, proposed ‘centre assessment grades’ for each student, as well as a top-to-bottom rank order. Since then, the exam boards have been carrying out ‘statistical standardisation’ to ‘make sure that grades are fair between schools and colleges’, and ‘that at a national level, grade distributions are broadly in line with other years’ .

If the results of the standardisation algorithm do not match a schools’ submitted grades, the exam board, unilaterally and without consultation, will over-rule the school’s recommendations, and ‘slice’ the school’s rank order as they wish to determine the grades to be awarded in August.

For this process to be trusted, three conditions must be fulfilled:

Teachers must submit unbiased information.
Statistical standardisation must be seen to be fair, reliable and trustworthy, so that every individual student is confident of being awarded the grade he or she truly merits.
There must be an easily accessible appeals process, providing a safety net for students with robust evidence supporting their belief that their awarded grade is lower than they merit.

Should any of these conditions not be fulfilled, there will be trouble, as the International Baccalaureate has already shown – to quote a recent headline “IB results ‘scandal’: Why 2020 grades have sparked fury”.

The most significant recommendations relate to statistical standardisation, for it is this process that determines candidates’ grades. But no one knows how; and there have been fears that the algorithm might unfairly disadvantage, for example, schools with small cohorts, schools on a steady improvement path, and bright students in schools with relative weak past performance:

Ofqual’s decision not to include trajectory in their standardisation process was criticised in the Sutton Trust’s written evidence, which suggested that the ‘turnaround’ schools disadvantaged by this decision “are likely to disproportionately serve poorer communities”.
A submission from University College London’s Centre for Education Policy and Equalising Opportunities warned that the use of historic performance data for standardisation could penalise “atypical” students such as high achievers in historically low-performing schools.

Ofqual’s failure to publish the details has been strongly criticised, for example:

However, little detail has yet been published on Ofqual’s model, and we agree with the Royal Statistical Society’s conclusion that “more transparency” is needed urgently.

Having considered these, and many other, written submissions, as well as taking oral evidence, one of the Committee’s key recommendations is that:

Ofqual must be completely transparent about its standardisation model and publish the model immediately to allow time for scrutiny. In addition, Ofqual must publish an explanatory memorandum on decisions and assumptions made during the model’s development. This should include clearly setting out how it has ensured fairness for schools without 3 years of historic data, and for settings with small, variable cohorts.

This specifies two requirements: Ofqual must publish the details of statistical standardisation ‘immediately’, and also ensure fairness, a concept of many dimensions.

Fairness between one year and the next is supposedly addressed by the standardisation model, which fixes the national distribution of grades. But that kind of collective fairness not only denies the possibility of generational improvement, it may also still allow collective unfairness to particular groups, especially the disadvantaged and some minority ethnic groups. And it may involve massive individual unfairnesses as the model downgrades many students without consultation. In France, for example, the government policy in the exceptional circumstances of this year has been to avoid penalising students taking the French Baccalaureate, allowing some grade inflation rather than risking many students getting grades lower than they merit.

Fairness may also involve issues of bias or discrimination, and the report pays particular attention to the possibility that BAME students, those with special educational needs, and those on free school meals might be treated unfairly.

Since the algorithm has so far remained hidden, we cannot know how or whether any bias is embedded either in the algorithm or the way it uses historic data. It is also possible that there is bias within this year’s centre assessment grades, especially given the known bias in predicted grades for university entrance. This, however, is a distraction. The context for this year’s centre assessment grades is very different from university entrance, as has been made clear in Ofqual’s Guidance Notes; more importantly, centre assessment grades are surely irrelevant in that some can, and will, be over-ruled by statistical standardisation if an exam board so chooses. Indeed, we wonder why they were asked for in the first place – all the exam boards need is the student rank order.

The rank orders – which the exam boards have committed not to change – are fundamental to this year’s process. If a teacher, deliberately or unconsciously, has been biased, the place to look is in the rank order, which is nothing to do with the boards or with Ofqual, but the school’s responsibility, alone. The quest for bias might therefore end up quite close to home, which might be uncomfortable, but is probably not surprising.

So what can be done by students who believe they have been treated unfairly?

One recourse is to the autumn exams, but this option is fraught with problems.

In the understandable rush to introduce a completely new system, after the Secretary of State’s announcement on 20 March, it probably seemed reasonable at first to invent a system in which dissatisfaction could be tackled by an opportunity to take an autumn examination. Over time this choice has unravelled. If initial results match the allowed national distribution and autumn exam candidates succeed in achieving higher grades, then grade inflation is bound to follow – unless other candidates are downgraded, which is unthinkable. Are autumn exam candidates being set up to fail? Or will the August results be scaled down to allow some headroom in the national distribution?

Furthermore, students sitting autumn exams face a compulsory gap year, because the exams will be too late for a 2020-2021 start. This in itself may be discriminatory, especially for disadvantaged students. The impact of autumn-awarded grades on admission prospects for 2021 is uncertain. Some universities are refusing deferred entry for 2021, others will honour offers but with added conditions. The competition for 2021 entry is likely to be much more intense as 2020 students reapply, a larger 2021 cohort apply for the first time, and international students from 2020 and 2021 return in much larger numbers.

A different, and more immediate, recourse is to a fit-for-purpose appeals process, which the currently-defined process surely is not:

Centres will be able to appeal against the operation of the standardisation model where the wrong data was used to calculate results for learners, or where there was an administrative error in the issuing of results by an exam board.

This is narrow, technical, and – importantly – an appeal against the process, not against the outcome. If a student believes he or she has been awarded the wrong grade, the process by which that grade was determined is neither here nor there: it’s the outcome that matters. And to deny an appeal against an unfair outcome using the defence that a perverse process was conducted fully in accordance with its own flawed rules flies in the face of natural justice.

The Select Committee has made laudable recommendations to make it easier for the less “well-heeled and sharp-elbowed” to appeal, but the grounds for appeal remain technical and in practice impossibly narrow. Far better would be a broader basis for appeals on the grounds that the awarded grade is believed to be unfair, regardless of the wealth, ethnicity or any other characteristic of the candidate. There needs to be convincing evidence, but the principle of such an appeal is fundamental. This would not be an appeal against academic judgment; on the contrary, it would be an appeal to restore the academic judgment of teachers overturned by a statistical algorithm.

There are two nearby precedents. In Scotland, the SQA recently confirmed that the appeals process will be free, and allow for ‘further, evidence-based consideration of grades if schools and colleges do not think awarded grades fairly reflect learner performance’. And the Republic of Ireland has a three-stage process, of which the third is ‘If the student remains unhappy with the outcome after stages 1 and 2 he/she can seek a review by Independent Appeal Scrutineers’.

Looking ahead, there is a danger is that, in trying to meet the Committee’s recommendations, and to preserve the precious national grade distribution, Ofqual will merely tinker with their model to prevent the worst excesses of unfairness to high-profile groups such as socioeconomically disadvantaged and BAME students, with the price being paid by others – who will indeed be discriminated against but are considered more ‘expendable’. We need instead some modest relaxation of the policy of “no grade inflation” to compensate for all the individual unfairnesses.

The consequences would be minimal. For university entry, most universities will be sympathetic because they will be short of students – the trough of the 18-year-old demographic and major loss of international students. For broader employment purposes in the longer term, young people this year and for the next few years are very likely to be disadvantaged by the recession; the exam system need not make it even worse.

The Select Committee’s report is direct, relevant and hard-hitting. But it does not bring this important episode to a close.

When we can see inside the statistical standardisation black box we will know more about its inherent unfairnesses, especially those attributable to an overarching requirement to enforce “no grade inflation” even at the expense of fairness to individual candidates. Those unfairnesses will require attention.

And we need an appeals process which is easily accessible, broader, and – most importantly – fairer.

The authors gratefully acknowledge the valuable and insightful contributions made by Huy Duong, Mike Larkin and George Constantinides to the lively discussions that preceded the writing of this blog.

9 comments

Huy Duong says:
14th July 2020 at 14:20
Thank you for an excellent analysis. May I have a few quibbles?
Quote 1: “Fairness between one year and the next is supposedly addressed by the standardisation model, which fixes the national distribution of grades”. Even with the caveat “supposedly” this is still too generous towards the model (which I appreciate is not the intention of the authors) and could be misleading. Making the 2020 grade distribution the same as previous years’ at a centre-subject level, as the model attempts to do (albeit with some tolerances that are not disclosed to the public), does not mean fairness between 2020 and those years, and does not mean fairness between centres either. Specifically, the problems of statistics with small numbers mean that the model does very little to ensure that an A awarded in A-level maths at a school in 2020 is at a similar level to and A awarded at the same school in 2017-2019, and that it does does very little to ensure that an A awarded in A-level maths at a school in 2020 is at the same level as an A awarded at a different school in the same year. BTW, as such, “standardisation” is a term that the model hardly deserves.
Quote 2: “But that kind of collective fairness not only denies the possibility of generational improvement…”. Generational improvement is only slow, and the model has already allowed for that by taking the cohort’s previous attainment into account (KS2 for GCSE and GCSE for A-level). However, the more serious problem is that the random variation from year to year for A-levels at typical comprehensive schools is likely to be far greater than the generational improvement trend, and that no statistical modelling can adequately deal with this variation for A-levels at typical comprehensive schools. As such, the model is likely to fail for A-level students at typical comprehensive schools, especially for students at the two ends of the grade distribution bell curve.
Quote 3: “it may also still allow collective unfairness to particular groups, especially the disadvantaged and some minority ethnic groups”. I think the model is blind to ethnicity, and if so then rightly so. Any unfairness to any ethnic group will have happened at the ranking stage, and there is not much the model can do about it. It wouldn’t be right to attempt to use the model to correct for potential ethnic bias, which is unquantifiable.
Reply
William says:
14th July 2020 at 15:21
Thank you so much for writing this article, I am a year 13 student and this article perfectly highlights every single one of my concerns with the “calculated grading” process, specifically the “statistical standardisation”, and with the Autumn exams.
Ofqual’s standardisation model manifestly fails to account for the high academic ability and previous achievements of students who are outliers with respect to the current and previous cohorts at their school. This fundamental flaw in Ofqual’s process is still not getting enough recognition in mainstream media as I strongly feel it should. I agree with the previous comment in that the random variation between cohorts is a very important factor, adjusting for cohort prior attainment (at GCSE level) does not suffice as for this to help high achieving students in year 13, the whole (or a significant proportion of the) cohort would have had to achieve better GCSE results than previous cohorts, the effect of a few outlier students with high GCSE results would be negligible. The only plausible way I can think of to account for this would be to look at the expected progression given a student’s prior attainment at an INDIVIDUAL – not centre – level.
As I understand it, the Autumn exams are potentially problematic as those who will take the Autumn exams will not be a “normal cohort” and this has the likely potential to skew the exam results, with a greater proportion of higher achieving students choosing to sit the Autumn exams, which will result in these exams being relatively more difficult than those in previous years. Given the previous feats of Ofqual I have absolutely no faith, at this point, that these exams are going to be fair.
Ofqual has become so set on achieving national consistency and a uniform grade distribution compared with previous years that it is at the cost of natural justice – the fact that I fear that whatever teacher assessed grades or ranking I am given it will be theoretically impossible for me to meet my offers (due the prior attainment at my school) – is simply not good enough on behalf of Ofqual.
Even if Ofqual aren’t going to change their process, they should feel an obligation to officially recognise this issue and liaise directly with universities to inform them of the unfairness of their grading process and recommend that they are flexible and/or hold offers open for students to take the exams in the Autumn. Also, Ofqual should introduce an appeals process where teachers can submit evidence (e.g. mock results) on behalf of students to evidence that the teacher-assessed grades they gave were a result of that student being an outlier not “over-generous” or “over-inflated”. If Ofqual decide against introducing a process like this, I strongly feel, if the “statistical standardisation” has bumped down a student’s grades by e.g. 1 grade or more, the student’s universities should be informed (by teachers or Ofqual) and told of the extreme flaws in the standardisation model.
Reply
Renata Hamvas says:
14th July 2020 at 15:38
The ranking system is very flawed, there is much room for biased and favoritism within this. Within a large comprehensive you can have many pupils who are working at the same level, it is then at a personal level that a teacher ranks them. There are also concerns with SEND pupils. Teachers are supposed to grade such pupils as to how they would perform with additional measures in place, in many instances few assessments will have been done with the special measures in place. I don’t think SEND pupils should be subjected to moderation as the way their grading would have been worked out would be different to the cohort as a whole. For example if a pupil would have had 50% extra time and rest breaks in their exams, they wouldn’t have had ample opportunity to show their performance under those conditions except perhaps in mocks. Teachers will have had to work out their likely grade under the optimised exam conditions. It would be impossible to put them into the ranking in a fair manner for themselves or other pupils. Without the firm track record, their performance is more likely to be underestimated. I wonder if there will be means for SEND pupils to appeal if they feel that the awarded grades don’t reflect how they would have performed with extra measures in place.
Reply
Dennis Sherwood says:
14th July 2020 at 20:02
Hi Huy, William and Renata – Thank you for these most articulate and powerful comments.
A ray of hope, perhaps: I’ve just read about a change to the appeals process for IB in which they will work with schools to review that they are calling “extraordinary cases” – https://www.tes.com/news/coronavirus-ib-speak-each-school-about-grading-concerns?fbclid=IwAR1bKd1AU8hxRB80TOQGKdXUp_a51PMxlzhVC3r3OBRu5BlwF5ggMyF6F0Y
The IB people have been forced into doing that as a result of the huge row following the announcement of the IB results. (https://www.tes.com/news/coronavirus-ib-results-scandal-what-you-need-know)
Maybe Ofqual will be smart enough to change their rules before the GCSE and A level results are out in August.
Reply
Huy Duong says:
15th July 2020 at 09:47
Hi William and Renata Hamvas,
Yes, Ofqual’s approach has, among other problems, the two serious ones that you mentioned. Theoretical grade distributions are bell shaped. At the top and bottom, the numbers are small, especially for A-levels, so although the teachers can rank the students well there, the numbers are too small for sensible statistical modelling, so the so-called standardisation is likely to fail. In the middle of the distribution the number are larger, especially for the main GCSE subjects, so statistical modelling has a better chance to work there, but as the Royal Statistical Society has warned, the teachers’ ranking might be unreliable in the same area. That warning makes sense: if you are a teacher of a class of 30, you probably can rank a handful of student at the top and a handful of student at the bottom well, but for those in the middle, you probably think in terms of he’s a high B, she’s a low C, he’s a high D, rather than a high-grained ranking of every student. Additionally, a large cohort might be taught in multiple classes by multiple teachers, so producing a single ranking for a centre-subject means making compromises that involve students you don’t teach, so it’s bound to be unreliable.
So both the notions of centre-subject ranking and standardisation by using centre-subject level historical data are too simplistic and too optimistic.
We have a warning from the IB results, the question is what will be done about it for A-levels and GCSEs. Earlier this year we also had a warning from what’s happening in other countries, but we still stumbled our way to worse infection and death rates than theirs.
Reply
albert wright says:
15th July 2020 at 21:55
It seems Ofqual’s attempts to be “fair” have had unintended side effects.
The arguments above clearly describe how the chosen process will not allow many individuals, who are not taking the usual exams this year, will be disadvantaged by being given a grade that does not reflect their individual ability.
The only fair way to deal with this is to allow flexible, evidence based appeals to prevent injustice
Reply
Dennis Sherwood says:
15th July 2020 at 23:44
A report on TES, posted late on the afternoon of 15 July, states that Ofqual have refused to comply with the requirement of the Select Committee to publish the details of their statistical standardisation process (https://www.tes.com/news/coronavirus-ofqual-will-not-confirm-when-grading-details-published).
That is startling.
And makes it even more important for the appeals process to be changed to allow appeals on grounds of unfairness.
Reply
Huy Duong says:
16th July 2020 at 11:54
France rightly recognised that it is a choice between either more grade inflation overall or injustice for a larger proportion of students. Specifically, the more weight you put on the teachers’ predicted grades the more likely you are to have grade inflation, and the more weight you put on your invalid use of statistics (eg when the numbers are too small), the more likely the model will give wrong grades. So the more weight you put on invalid use of statistics, the better your appeal process needs to be – but even so you will still have some grade inflation because the students who are wrongly given grades that are too high are not going to appeal for correction, while those who are wrongly given grades that are too low will appeal, so that selection bias will lead to grade inflation.
The problem is Ofqual keeps up the pretension that it can solve the problem of both delivering fair grades and avoid grade inflation, while it can’t. This is probably why it refuses to subject their model to public scrutiny: the public will see that it can’t deliver. It wants to keep up the pretension and go for a fait accompli. Whether that fait accompli will be some grade inflation or a lot of injustice, it doesn’t want the public that it’s supposed to serve to know.
Reply
Jason Chuah says:
16th August 2020 at 12:33
Very interesting reading, thank you all – my question, I suppose, is a quasi-philosophical one. It seems that many are deeming the choice to be a binary one – control grade inflation or injustice for many. It is perhaps helpful also to see that grade inflation is not necessarily fair or fairer?
Could a system be “fairer”? Of course but can it be perfect? No (I hope the contributors, whatever our politics, would agree). Whatever system to predict grades without actually assessing performance will be unfair. At universities I have witnessed many a student cry foul over grades given by their own lecturers. Some of such complaints have indeed reached the law courts. Are teacher actual and/or predicted grades fair/fairer? From the perspective of students who feel marked down, clearly not but notably to those who feel their classmates have been awarded better marks than actually deserved, also not. Worse (?) when we drill into these student complaints, we see too (often) allegations of lack of teaching support, poor facilities, not adhering to the curriculum and ineffectual teaching. How should these environmental factors be taken into account, if at all? How does one produce such evidence in the light of the increasing calls for “evidence” to help decide a student’s grades? An impossible endeavour, perhaps?
Indeed, there are those amongst us who would probably argue assessments are unfair, period.
Perhaps an objective solution is for schools providing a record of what the student has learnt and been able to demonstrate, by means of learning outcomes? But then on the other hand the university admission system (and employment opportunities) are almost entirely oriented towards grades. Globally, there is also no appetite to abolish grades (at least that is my perception). And I believe there will be many who would argue that the abolition of grades is unfair too.
Can anyone (the government, ofqual, ofqual critics, government critics, schools, universities, employers, parents and children) actually win trying to construct a model of fairness? Perhaps not. As with the trend of the day, this seems increasingly to be a political and politicised tussle.
Reply

9 comments

Leave a Reply Cancel reply