This new blog has been written by Dennis Sherwood, who has been tracking the story of this year’s school exams for HEPI.
Friday, 12 June was the deadline for schools to submit their ‘centre assessment grades’ and rank orders for this year’s GCSE, AS and A-level students, and over the next several weeks those grades will be checked against a statistical algorithm to determine whether or not each submission meets certain criteria. If the result is ‘yes’, the school’s centre assessment grades will be confirmed. But if not, the exam board will over-rule the school’s submission, and determine whatever grades they think fit.
On Monday, 15 June, FFT Education Datalab posted a most informative blog presenting some findings from their recent service in which they checked schools’ draft GCSE grades against corresponding actual grades as awarded in 2019. As FFT point out, their study was based on draft grades, not the grades actually sent to the boards, and compared only against a single year, 2019. But even with those caveats in mind, the results are dramatic. Drawing on data from over 1,900 schools (that’s over half of the secondary schools in England), every GCSE subject they studied was overbid.
If this study is indicative of what is happening at the boards, then Ofqual has a choice. Either to ‘let it be’, on the grounds that ‘this year is special’. Or to enforce the ‘no grade inflation’ policy, so causing the boards to intervene, unilaterally and without consultation, to throw the schools’ grades away and place grade boundaries wherever they wish.
I don’t know what will happen. My hunch is ‘no grade inflation’. Looking back over the decades, the successive year-on-year increase in top grades was trumpeted as ‘proof’ of better teaching, better students, better education policies. Until about 2010, when ‘no grade inflation’ became the mantra, as first implemented in 2012 and maintained ever since. At that time, the Secretary of State for Education was Michael Gove. And his special adviser was Dominic Cummings. Enough said.
To my mind, the over-ruling of many centre assessment grades (that’s more than 25%) would be a great disappointment, and a measure of the failure of this year’s process; a process that, initially, held so much promise. I fear that the outcome will be a pretext for the powerful to say, ‘You had your chance, teachers, to show you could be trusted. And you blew it.’ The authoritarian pendulum will swing to the far right.
And, I fear, the blame will fall on the teachers.
One reason for this is that teachers have been placed right in the centre of the firing line by statements of the form ‘this year, students’ grades will be based on teacher assessments’, which have appeared widely in the press, and on the media too: over the last few weeks, I have heard those words on news bulletins on both the BBC and Channel 4. A reader, listener or viewer might therefore reasonably infer that an individual student’s teacher, perhaps in consultation with colleagues at the same school, has the last word on each student’s grade. This in turn has stimulated an important discussion on teacher bias.
The truth, however, is that the grades awarded to students in August will not be those submitted by teachers. Rather, they will be the grades resulting from ‘statistical standardisation’ by which each exam board will ‘make sure that grades are fair between schools and colleges’. The grades to be awarded are those of the exam boards, not the teachers. This truth, however, is not widely known.
And a second reason is that teachers have been working, if not the dark, then at best in very poor light.
In the press release for Gavin Williamson’s announcement on 20 March confirming this year’s exams would be cancelled, we read:
Ofqual will develop and set out a process that will provide a calculated grade to each student which reflects their performance as fairly as possible, and will work with the exam boards to ensure this is consistently applied for all students. The exam boards will be asking teachers, who know their students well, to submit their judgement about the grade that they believe the student would have received if exams had gone ahead.
This clearly states that teachers will be asked to submit their assessments, and that there will be a centrally-administered process, to be ‘set out’, that will ensure a uniform standard across the country.
The press release also contains a direct quotation from the Minister:
I have asked exam boards to work closely with the teachers who know their pupils best to ensure their hard work and dedication is rewarded and fairly recognised.
I read ‘work closely with’ as implying consultation, co-operation, and listening, so when I wrote my blog the following day, I was optimistic. Yes, asking teachers for their assessments of students, and working closely with them, is showing trust in teachers. Great.
But as the weeks have passed, and as the process has become somewhat clearer – but only somewhat – my initial optimism has been tempered.
My original hope, stimulated by the quotations just cited, was that there would be a dialogue between the schools and the boards, after the schools have submitted their assessments. Schools would therefore have the opportunity to explain why ‘Isaac’, an especially talented student, really does merit an A* in Physics, even though the school has never achieved a grade higher than B in the past.
That hope was dashed when Ofqual published their consultation document on 15 April. I was disappointed; but I can understand that an explicit statement that such a dialogue will take place opens the door to ‘optimists’, and would require much time, effort and wisdom to distinguish between the legitimate and the fraudulent. The absence of this dialogue is unfair to Isaac, and an indictment of trust and integrity; wounds that will need to be healed. But Ofqual’s decision does make some form of pragmatic sense.
What makes no sense to me at all has been the failure of Ofqual and the exam boards (and the SQA too) to ‘set out’ (to quote the press release of 20 March) the full details of how, precisely, ‘statistical standardisation’ will work.
Yes, Ofqual have made statements such as:
The standardisation model will draw on … historical outcomes for each centre … and will consider, for A level, historical data from 2017, 2018 and 2019 … and for GCSE, data from 2018 and 2019 except where there is only a single year of data from the reformed specifications.
That’s helpful, for it rules out results before 2017 for A level, and rules in only GCSEs graded 9, 8, 7….
Those words ‘draw on’ and ‘consider’, though, are vague. Yes, they do imply that a school’s results, as actually achieved in any subject in the past, will be used to determine a benchmark against which this year’s submission will be compared. But how, precisely?
If I were a Head, I would wish to ensure that the grades my school submits are as close to the exam board’s benchmark as possible, so (presumably) increasing the likelihood that they will be confirmed, rather than over-ruled. To comply with the benchmark, I need to know how the benchmark is computed; I need to know if some clever statistics will be in place so that outliers such as Isaac will be acknowledged. But if I’m not told the rules, I can only guess.
For example, for A-level, I can ‘draw on’ and ‘consider’ the results of the last three years by calculating an average, weighting the results of each of the last three years equally. An alternative, still ‘drawing on’ and ‘considering’ the results of the last three years, is to weight the best year more heavily, and the worst year more lightly, resulting in a more favourable benchmark. Is one acceptable, and the other not? How do I know?
A further ambiguity concerns the year-on-year variability. Suppose that in each of the previous years, my cohort has been 100 students, of whom 22, 26 and 18 were awarded grade A. The (equally weighted) average for this year is 22. But since the number of awards has varied from 18 to 26, any submission in that range is feasible, if not reasonable, and could well comply with Gavin Williamson’s statement that ‘hard work and dedication’ will be ‘rewarded and fairly recognised’. If 24 students are submitted for grade A, and if the exam board’s statistical standardisation determines a benchmark of 22, then the last two students in the rank order will be down-graded unilaterally and without consultation. Even worse: if, in good faith, every school ‘bids up a bit’, then grade inflation is blown sky high, forcing the boards to intervene. That’s exactly what FFT Education Datalab’s findings suggest has actually happened – as is inevitable if schools do not know what the benchmark is, how closely they must comply with it, and that even ‘modest optimism’ is likely to be penalised.
I’m not arguing against setting the benchmark at the simple average 22. What I am arguing is that I think it would have been very helpful to everyone if the rules had been ‘set out’ in full. That way, every school would have been able to replicate the process in advance. They would therefore know, before the grades were submitted, whether those grades are compliant or not, and so have some awareness of the likelihood of their grades being confirmed or over-ruled. A school could still submit non-compliant grades if it wished, but it would be doing so knowingly.
Likewise, students and parents would know exactly how the process is being conducted, and be absolutely clear that the grades, as awarded, will be determined not by the teachers, but by the exam boards.
What has actually happened, however, is to me unsatisfactory, and could lead to trouble. Given that teachers are being required, in essence, to second-guess the answer-the-exam-boards-have-thought-of-first, have they been set up to fail? Are teachers at risk of being piggies-in-the-middle, unfairly blamed for grades that they did not recommend? When teacher grades will be seen to have been over-ruled downwards – as the FFT Education Datalab results suggest will happen – will students be disappointed that their teachers’ assessments have been ignored? Will parents become angry at the limited, and highly technical, grounds for appeal?
If the boards already ‘know the answer’, as I suspect they do, then surely it would have been both more honest, and far simpler, for each board to have written to each school saying ‘our statistical algorithm has determined that, for your cohort of 53 students for 2020 A level Geography, you are allocated 4 A*s, 10 As, 15 Bs… Please enter in each grade box the names of the students to be awarded each grade, ensuring that no grade exceeds its allocation’ (that idea, by the way, is not my own, but emerged during a conversation with Rob Cuthbert, editor of SRHE News and Blog).
That would have been much easier for the centres to do: no worrying about the grades, no agonising over the rank order for all the students. And focusing attention on the right place – on deciding which students are to be on which side of each grade boundary, as fairly as humanly possible.
I still believe that this year’s results will be more fair than hitherto, for, fundamentally, the rank orders determined by teachers are, in my opinion, more reliable than the lottery of exam marks. But I also believe that the whole process would have been far better had the rules been published, and even better still if there had been some opportunity for schools to justify their Isaacs.
But if FFT Education Datalab’s findings are indeed a sign of things-to-come, those things could be quite nasty, with teachers, totally unfairly, taking the fall.
Oh dear. What a missed opportunity.
I have some sympathy with Ofqual and the awarding bodies here. With no exams, if people want differentiated awards they only have prior attainment or teacher estimates. The former will be absent for some and penalise late developers. The latter is better, but needs some adjustment to calibrate into what people understand awarded grades to mean and reduce (if not remove) incentives to exploit the system. There isn’t an ideal way of doing this but I agree that what it is looking like – allocating from a previous pool of awarded grades controlled against group prior attainment shifts – is not the best one. The problem is that the random variability of exam grades in a normal year at school x subject will substantially exceed the size of the effects they are trying to control out. Unfairness will necessarily follow. A perfect solution to this doesn’t exist. But the limitations could be better expressed and there is a strong case for letting grades rise and giving people the benefit of the doubt (if universities were released from treating exam-awarded conditions as binding).
There are two areas where Government has gone wrong in my view. The first is in not using the circumstances as an opportunity to convey that exam awarded grades are themselves random measures of what you are really interested in rather than an absolute truth. If this is understood a more sensible discussion about what to do can be had. The second is dogmatically sticking to the calculated grades for university admissions. Deciding who should be confirmed is an entirely different task where universities have data for considerably superior estimates already to hand and can prove they work fairly and accurately (https://wonkhe.com/blogs/we-can-make-admissions-work-without-a-levels/#comments). They should have been encouraged to use this to confirm students now and give everyone more preparation time. Not recognising this has heaped unnecessary and unwarranted pressure on students and universities for August, will lead to worse outcomes and will damage equality. This was entirely avoidable and a significant policy failure.
Just to clarify (and I know that’s what Dennis meant) that exam boards have always set the grade boundaries anyway, so the fact that they set it this year is not the problem per se.
The difference with previous years is in the past for each subject a board would set the same boundaries for all schools, and to do that they used national data for the students being assessed. This year for each subject they will be setting one set of boundary per school, using only that school’s data, and that data come from the students being assessed. First of all, for a given subject, data from the past 2 or 3 years for a single school is not necessarily fit for the purpose of setting grade boundaries: it’s not necessarily representative of the students being assessed and furthermore it might fluctuate so much that it doesn’t represent anything. Secondly, the grade boundaries for different schools might not be the same, which would be fundamentally unfair.
Given these problems, I don’t think we can measure teachers’ success and failure in this process by how well they comply to the grade quotas set by the exam boards. It could be that the board has failed to set the right grade quotas for a subject at a school.
Apologies, one of my sentences above needs to be corrected as follows:
This year for each subject they will be setting one set of boundaries per school, using only that school’s data, and that data is not data on the students being assessed.
It seems to me that the algorithms will take no account of short term school improvement, so the schools who are currently under Ofsted pressure with grades of 3 or 4, who arguably have the most to lose in the accountability system, will have no opportunity this year to demonstrate progress over time in their outcomes. I hope that future inspections will take this into account when considering the published results for 2020.
Yes, all good points, thank you everyone.
Will Hazell, formerly a journalist at TES and now with the i, has written a piece that sensationalises matters somewhat, but the overall message is much the same…
One way to address the problems is to allow the school to appeal for individual A level students using those student’s GCSE data, possibly also the GCSE data of the other students in the same grade band that the school has submitted.
That will help to save Isaac from an unfair grade. It will also save any “Isabella” who is truly capable of a grade but is pushed down because the school happens to have a stronger cohort this year.
Clearly the issue here is how to differentiate the truly capable candidates and those whose luck is being pushed by the schools. Arguably doing that for Isaac and Isabella by looking at their GCSE record and their cohort’s GCSE record is going to be more accurate than by looking at Tom, Dick and Harry’s A-levels in the previous 3 years. If Ofqual doesn’t allow that, it looks like it’s failing in its duty to Isaac and Isabella.
The excitement will hit fever pitch when the first school goes against Cummings/OFQUAL’s diktat and releases its predicted grades as a defence against parent anger on 22 or 23 August. Teachers and parents against the government would be a disastrous PR outcome for Boris. Remember most parents are already seething at what they see as a piecemeal response to education by the government since March.
Returning to this thread in August 2020, schools will now indeed be obliged to release predicted grades and rank order to students on results day via FOI Subject Access Requests. Undoubtedly these will reveal that students predicted to pass by those who know them best have been over-ruled (and subsequently failed) by an unproven algorithm. If this somewhat arbitrary application of historical data to “moderate” teacher predicted grades results in students missing out on Uni and college places it could surely become the basis of a challenge at the ECHR under Article 2 of Protocol 1, Right to Education? Although the UK left the EU on 31st December 2019 my understanding is that the UK remains subject to EU law and the rulings of the European Court of Justice throughout the transition period.
Thank you – how can you make your observation known more widely?
A thought, if I may… revealing rank orders is somewhat problematic…
According to the Information Commissioner’s Office
“An organisation needs to consider if the information they are providing would reveal something about another individual.”
“If a student requests information about their rank order and it could reveal information about other students, you’ll have to decide whether it’s reasonable to disclose this information rather than withhold it.”
What does this mean in practice?
If a cohort is 2, then revealing the rank order to either one “reveals information about other students”, and so maybe is not allowed.
But if the cohort is 50, and 49 students have independently been told their rank orders, and if they talk to one another, then the situation is exactly the same.
So that suggests that a teacher can’t tell 49 students out of 50… which raises the question “how many students can be told?”.
For a cohort of [this number], what is the maximum number of candidates that can, independently, be told their rank orders so that, if they talk to one another, the probability of being able to guess any other student’s rank order is no more than 50%?
Or is the safest thing to say to all students, no matter how small or large the cohort, “on request, you can be told your CAG, but nothing else”?
…oops… that link is broken – sorry about that – try this
Thanks for your reply and input. I think that a FOI request will suffice for more general centre grades distribution data and a second more personal Subject Access Request will work for individual student CAG and Rank position. Incidently, my understanding is that for any cohort over 15, CAG have been totally disregarded and standardisation is happening based on automated implementation of prior performance and historical data. This has an additional important and unforseen consequence – it undermines the right not to be subject to automated decision making. By taking the “human” element from the decision (CAG) and leaving it up to the algorithm to determine the grades, the exam boards are failing to safeguard students against the risk that a “potentially damaging decision is taken by solely automated means, ie without human intervention.” This is from the ICO website: “When does the right apply?
Individuals have the right not to be subject to a decision when:
it is based on automated processing; and
it produces an adverse legal effect or a significantly affects the individual.
You must ensure that individuals are able to:
obtain human intervention;
express their point of view; and
obtain an explanation of the decision and challenge it.” It could be argued that by removing the “human” element and stopping individual student challenges OFQUAL and the DoE are on dodgy grounds. An automated decision that leads to borderline students ultimately failing appears to be something that “similarly effects the data subject “. Potentially denying the right to education and just the kind of arbitrary decision the EHCR could judge on. The relevant legislation is in section 14 of the DPA 2018 (https://www.legislation.gov.uk/ukpga/2018/12/section/14/enacted)
Mark! Wow! I had no idea about that hugely important “automaton” point – and I wouldn’t be surprised if a lot of others don’t know about it either.
How can more people learn of this?
Many thanks for raising it here!
Not sure Dennis. I am sure it will become relevant from tomorrow as – unlike Scottish students – UK students will have no right of appeal (the “obtain an explanation of the decision and challenge it.” mentioned above). The question is around the arbitrariness of the now-automated decision for 15+ cohorts – based as it is, on comparative outcomes (with a third-year sawtooth effect bolt-on (the predicted +2% “improvement” on last year’s GCSE cohort)). Is OFQUAL ‘s historical (and statistical) preoccupation with preventing grade inflation worth sacrificing this year’s high flyers and borderliners? And legally is it even permissible?
Hi Mark – and every one else too. I’m writing this on GCSE results day, 20 August. The world seems to be a very different place…
So thank you all for your thoughts and post: many of our concerns have been addressed, and perhaps, collectively, we had some influence too…
You were one of the first commentators not only to point to the oncoming standardisation train-crash but explain both the antecedents and potential consequences. I have enjoyed playing a small part in the discussions here.
hi Mark – thank you; that’s very gracious – and thank you for your contributions too.
Fortunately, your ‘automaton’ point disappeared of its own accord, in this context anyway… but not in general, I fear.. if you haven’t come across this recent book by Durham Professor Louise Amoore…