This blog was kindly contributed by Dennis Sherwood, who has been tracking the goings on at Ofqual in relation to this year’s public exam results for HEPI. This is the ninth in the series as we rapidly approach the release of the GCSE, AS and A level grades in the next three or four weeks. You can find Dennis on Twitter @noookophile
The slide pack presenting the key messages from Ofqual’s 21st July Summer Symposium is an important document. Its purpose – presumably – is to reassure students, teachers, parents, carers, colleges, universities and employers about the robustness and fairness of this year’s exam-free GCSE, AS and A level grades, to be announced on 13th and 20th August.
Slide 12, reproduced here, is of particular importance: it is the evidence that the grades teachers have been asked to submit – Centre Assessment Grades (CAGs) – have resulted in a distribution significantly more ‘optimistic’ than recent history, so driving grade inflation.
That is important, for “no grade inflation” has been a cardinal feature of Ofqual’s policy since 2012. As a consequence, the exam boards, unilaterally and without consultation, will over-rule the schools’ submitted grades: in August, the grades that candidates receive will be those determined by the results of Ofqual’s statistical, and largely historically-based, model, the details of which remain obscure.
To limit grade inflation, it is therefore highly probable that many – perhaps as many as 30% – of the submitted grades (which number rather less than six million) will be down-graded. 30% of six million is about 1.8 million. That’s a far larger number than the number of candidates (remembering, of course, that the great majority of candidates sit more than one subject), so it’s very likely that every young person in the country will be awarded at least one grade lower than the grade their teachers submitted on their behalf. That could lead to trouble – big trouble – especially since the current rules for appeal are narrow and technical, and (unlike in Scotland) do not allow for appeals on the grounds of unfairness.
Given the significance of this slide, it should be clear, and contain complete and correct information.
With those thoughts in mind, this is my ‘examiner’s report’.
Point 1 – What do the columns mean?
It is good practice, as recommended in the Government’s own guidelines, to ensure all data items – such as the columns and rows in a table, and the axes of a graph – are clearly defined. What do the columns mean?
Anyone reading the chart might answer ‘the percentage of students awarded the corresponding grade’.
If that is the case, then 91.0% of 2019 A level candidates were awarded grade D. That is unlikely. So the columns must represent something else, for example – and I’m guessing, for the chart doesn’t tell me – the cumulative percentage of students awarded the grade identified at the top of the column, and all higher grades. That’s plausible, for 91.0% of A level candidates might well have been awarded one of grades A*, A, B, C or D.
But that’s not obvious, and it’s easy to make the wrong inference, as, for example, did the journalists in The Telegraph and Schools Week (in the original piece, subsequently corrected).
It’s also highly misleading, for a quick glance might result in the reader thinking that, for A level, 13.4% of grade Bs were overbid. This is not the case. The percentage of candidates actually awarded grade B in 2019 was 26.1% of the total 2019 cohort; the corresponding number for this year’s CAGs is 27.2%. The difference is only 1.1%, not 13.4%. But I had to work that out for myself.
Point 2 – What do the numbers mean?
In presenting data, there is no place for the ‘obvious’. That the entries in the tables are percentages, rather than just numbers, is ‘obvious’ only to ‘the initiated’. The table should not be presented on the assumption that only ‘the initiated’ will look at it: the nature of the numbers should be explicitly stated so that everyone can immediately understand.
Point 3 – What is the third row?
The third row in each of the three tables is not defined. I need to work out for myself that it is the result of subtracting the number in the first row from the corresponding number in the second row in the A level and GCSE tables; but, confusingly, the second from the first in the AS table (of which more shortly).
Point 4 – What does the third row mean?
Looking across the third row in each of the three tables, there is a steady rise, followed by a steady fall. What is the significance of that pattern? What does the third row actually mean?
Point 5 – The AS table
The rows in the AS table are the other way around. That’s not only sloppy, but also, as noted in point 3, highly confusing. And why are the row heights different in each of the three tables? That’s pretty sloppy too.
Point 6 – The GCSE table
Why are there some blanks? Are the corresponding entries zero? This is very confusing too, especially since the CAGs row is complete.
Point 7 – The A level comparison is misleading
As far as I understand this year’s process, the statistical standardisation algorithm is likely to compare A level CAGs to the corresponding average over 2017, 2018 and 2019. So why is only 2019 shown as the comparison on Slide 12? Why not 2017 or 2018? Or – much better – the average?
Reference to authoritative sources of historic grade patterns, such as JCQ and Stubbs, will show that the distribution of grades in 2017 favours rather higher grades as compared to 2018, which in turn favours higher grades than 2019. Comparing this year’s CAGs against only 2019 portrays them in the worst possible light. Yet it is this comparison that was – presumably deliberately – chosen.
According to the (cohort weighted) average over the last three years, the percentage of candidates awarded A* to C is 76.7%, as compared to the figure of 75.8% shown on the slide – 0.9% closer to the CAG figure of 87.0%.
The CAGs are still over-bid, but that 0.9% is ‘interesting’ in that it is nearly one-half of the ‘about 2%’ hinted at for this year’s allowed grade inflation – and the other half is largely accounted for by the year-on-year variation of the A* to C % (75.8% in 2019, 77.0% in 2018 and 77.4% in 2017).
Point 8 – The AS comparison is misleading
By the same token, the AS CAGs should be compared not to 2019 alone, but to the average over 2017, 2018 and 2019.
Point 9 – The GCSE comparison is misleading
Likewise, the GCSE CAGs should be compared to the average over 2018 and 2019.
Point 10 – Where do the numbers come from?
When numbers are shown in a table or on a graph, it is good practice to identify the sources. But the sources are nowhere to be seen.
In fact, JCQ and Stubbs verify the 2019 grade distributions for A level and AS.
But those three, sparse, numbers for GCSE are a puzzle. Reference to JCQ or Stubbs will show that, for 2019 GCSE in England, 20.6% of candidates were awarded grades 9, 8 or 7; 67.0%, grades 9 to 4; 98.3%, grades 9 to 1.
But the corresponding numbers shown in the table are 24.7, 72.7 and 98.5 (I deliberately omit the % symbol to make a point).
I’ve searched high and low for that set of numbers – in individual 2019 subjects, and in 2018, but I just can’t find them.
Where do they come from? What is the significance of these numbers, and – more importantly – the difference between these and the corresponding CAGs? Assuming, of course, that those numbers are indeed the CAGs…
Slide 12 is a model of bad practice rather than good. And as for quality control…
Indeed, if this chart were presented in an examination, it would be given a very poor mark – an irony indeed given Ofqual’s role.
Some might consider my points – such as the absence of the % symbol, or the ordering of the rows in the AS table – as trivial, picking nits. Not so. There is an obligation on anyone presenting data to be complete and explicit, in all details. In an exam, the imperative of including the % symbol will be in the mark scheme. Not everyone reading the slides is a confident statistician. The information must be clear and explicit, and easily and correctly understood by anyone looking at the slide, not just the expert who knows how to read between the lines – or indeed numbers.
But two points in particular are fundamentally important.
Point 7, the failure to compare A level CAGs fairly against the three-year average, is, to me, a deliberate distortion, with intent to mislead. Ofqual knows that their statistical model uses an average, yet they present only a single comparative year – which just happens to be the one year that portrays CAGs in the worst possible light.
Point 10, the GCSE numbers, is even worse. Are those just arbitrary numbers, pulled out-of-the-blue? Everyone reading that chart (once they’ve worked out point 1, that the numbers are cumulative) will infer – and believe – that 24.7% of students for 2019 GCSE were awarded grades 9, 8 and 7, as indeed reported in The Times; the erroneous figure of 72.7% for the award of grades 9 to 4 is reported in The Times, TES and inews. Yet authoritative sources show that these numbers are 20.6% and 67.0% respectively. No one – other than an obsessive nerd like me – will ever dream of checking the numbers to original sources; everyone will take them on trust. But if just one number is wrong, that trust is broken. How many other numbers in the slide pack might be wrong?
It may be that all the points I have made were addressed in the Symposium. But I wasn’t there. I have seen only the slide pack I downloaded. Good practice is for the slide pack to be intelligible and complete in its own right.
And while talking about the slide pack, another puzzle.
I accessed the Ofqual website soon after the slides were posted, and downloaded a file with the filename 2020_Summer-symposium_slides_amended_210720. I went to the website just a minute ago, and downloaded a file with the filename 2020_Summer-symposium_slides_210720. The filenames are different, as are the files: the earlier (‘amended’) version has some slide notes that do not appear in the version now available. Why have the notes been deleted?
Another brilliant analysis Dennis. A systematic dismantling of incomplete and misleading data which, as you say, was published with the intention of assuaging the concerns of teachers, students, parents and other stakeholders in the process of awarding final grades. You have shown why it has had precisely the opposite effect. Indeed, the selective use of historical data – one year’s results rather than three – shows CAGs in a worse light than the average outcomes produce. If they had compared the CAGs with 2017 results the gap would have narrowed even more than the 0.9% that you have identified. A cynic might conclude that the real motive behind the publication of the tables is to justify Ofqual’s claim that results this year will be 2% higher than last year. When one adds into the mix the fact that top A level grades in 2019 were the lowest since 2007, the 2% grade inflation claim becomes even more apparent rather than real. The reality is more like an iceberg: the size and danger which Ofqual’s standardisation process represents lies beneath the surface. It’s what we can’t see, but which you have shown is there. I hope for the sake of millions of hardworking students – including my two daughters – that you are wrong in your assessment of the sheer number who will suffer CAG reduction with all the potential consequences that that will entail. I fear, however, that your forecast will prove to be substantially correct. If so, the sound of violins playing on the deck as the ship goes down will be deafening.
Interesting, I wonder how many points they will score when they show their workings!
A couple of observations.
I was also confused by the 2019 GCSE figures when I looked for a break-down of the 7-9 grades. The footnote says “Based on target age for qualifications”, so I took that to mean that excluding all the early takers and re-sits makes the 2019 figures higher and therefore makes the moderation gap look less scary, although the difference from published figures does look high. Maybe they intend to moderate the Y11s separately from the others. I agree it would be helpful if they provided more underlying data. I might try to dig out the post appeals report OFQUAL produces each year to see if it is in there.
On AS levels, for England, a total comparison with 3 year averages would probably have been more misleading than just comparing with 2019 as the number of students taking AS levels has fallen sharply with the introduction of linear A levels. 2017 was the first year that most AS grades did not contribute to A level grades, but a lot of schools still entered all the students for them as external exams are a good motivator. In 2018, for students I know, it seemed that lower performing schools still used them for all students, but selective schools and higher performing comprehensives just entered students who had studied 4 subjects and dropped 1 at the end of Y12 (most comps do not like students doing 4 linear A levels) but wanted something for the year of work they had done. They were a self-selecting strong cohort. The numbers fell again in 2019. Another statistical complication!
And looking more closely at the formatting of the tables … Why are the rows in the tables of differing, inconsistent heights. Why are the column headers left justified but column contents right justified? These tables manage to break all the rules about how to present data.
Hi Tania! Two great points, thank you.
The GCSE numbers are, to me, a real puzzle. Even when I’ve tried adding up the ‘right’ numbers, but with transcription errors – the sort of mistake I keep on making – I can’t get them to work.
It may be that the numbers Ofqual are using are only 16-year olds; but that would be odd too, for those who have been obliged to do re-sits in the past have sat the same papers, and so will – I think – be in the all-England stats on JCQ and Stubbs.
It’s possible that those numbers are just mistakes – and it’s a bit of a co-incidence that “72.7” is in the AS table too. We all make mistakes and I certainly do. Quality control is supposed to pick that up – so that’s one black mark to Ofqual: before anything like this is ever used in public, it should be checked, double-checked and triple-checked.
But the super-big black mark is the failure to be explicit about sources. That really is a no-no.
Your point about AS is really good too – and I certainly hadn’t thought of those issues. As you suggest, it is indeed quite possible that 2019 is the ‘least worst’ comparator for 2020, given the trends in AS.
But if that is the case, why didn’t Ofqual, on day 1, say “for AS, the comparator will be 2019”? In fact, they seem to have implied the opposite, for page 10 of Ofqual’s “Guidance” document says “For AS/A levels, the standardisation will consider historical data from 2017, 2018 and 2019”.
But what does “consider” mean? It could mean that each of those three years is cohort-weighted to compute the average; it could mean that 2019 is weighted much more heavily for AS. Who knows?
Dennis, I agree with many of the points you make, particularly insofar as presentation of the data is concerned. However, I think this section is misleading:
“… a quick glance might result in the reader thinking that, for A level, 13.4% of grade Bs were overbid. This is not the case. The percentage of candidates actually awarded grade B in 2019 was 26.1% of the total 2019 cohort; the corresponding number for this year’s CAGs is 27.2%. The difference is only 1.1%, not 13.4%.”
You are right, of course, in saying that the actual percentage awarded grade B in 2019 and the percentage predicted grade B for 2020 are almost equal (1.1% difference), but that figure doesn’t address the nature of the discrepancy.
Consider a simple and artificial example of an exam in which there are only 4 grades, A, B, C and F. Suppose we have
2019 actuals: 25%, 25%, 25%, 25%
2020 CAGs: 50%, 25%, 25%, 0%
Then the percentage predicted a grade B is spot on, as is the percentage predicted a grade C. But all of the individuals predicted B or C have been ‘promoted’ by one grade from where they ‘should’ be. So the percentage match for a grade doesn’t tell us what we want to know. It counts heads, but it doesn’t count the right heads. In fact 75% of candidates have been over-predicted.
The cumulatives, however, do tell us what we want to know. In my example, the cumulatives are
2019: 25%. 50%, 75%, 100%
2020: 50%, 75%, 100%, 100%
The over-predictions are 25%, 25%, 25%, 0%. The sum of these figures is 75% and that is the proportion of candidates that have been over-predicted.
A similar arithmetical exercise on the Ofqual data suggests that 50.5% of grades have been over-predicted by centres.
GCSEs are seriously broken, this whole system needs to change. Look everyone understands why the standardisation process used by Ofqual is important to ensure grades are fair and accurate. But for Gods sake why should students who don’t deserve to be downgraded be downgraded? What this government should be doing is looking at the process and seeing what can be done to ensure students progress onto the next stage of their lives so they can study what they want, not look at whether too many students are getting 9s.
Hi Craig – thank you too.
One thought, if I may…
It could be that a quite large number of CAGs being downgraded is in fact a ‘good thing’, if not the ‘right thing’, weird though that might appear. As a thought-experiment, suppose there’s only one school, with two candidates – good but not that good. Suppose that they truly merit (however that might be determined) an A and a B respectively. The school, though, wishes to give them ‘the benefit of the doubt’, and submits CAGs of A* and A.
The machine then does its thing, and down-grades both of them, awarding A and B.
All the CAGs – 100% of them – have been down-graded, but the result is right.
The problem here is that it’s the school, not the machine, that got it wrong. And I fear that this might have happened a lot this year.
So, for example, a teacher who fears that a parent will make a fuss might submit an A*, knowing that the machine will down-grade. The teacher can then, in all honesty, say to the parent “I recommended A*, but nasty Ofqual…”
Unfortunately, it’s quite possible that some ‘games’ of this nature have been played, for there was no ‘authority’ to stop that, and I think it’s a pity that ASCL, NAHT, HMC, the unions and the rest didn’t step in and stop them. And although Heads had to sign a ‘declaration’, well…
Talking of Heads, here’s another game (and I thank Mike Larkin for telling me about this one!). A subject is taught in two sets. One has a very aggressive teacher, the other a new, uncertain one. They meet to agree the overall subject rank order. I wonder which teacher’s students have the lion’s share of the higher ranks? And if the Head is relatively weak, and sees to avoid confrontation…
Which is another – and I think powerful – reason why we need a fair appeals process. The problem here is not with the algorithm or with the superficial, technical, features of the process. It’s deep within the process and – like bias – is very well hidden. But known only to the student. And can be uncovered, and resolved, only by a fair appeals process.
Oh… at the start I said “only one thought”… it turned out to be quite a long one…
The other point is that for many GCSE subjects, the new specs were only examined for the first time in 2018, so there are only 2 years’ worth of results to compare with.
I read with interest another thought-provoking piece from you. If one stands back from the CAGs and attempts to be as objective as possible, the sheer level of anticipated increase in attainment is unsustainable. That is so whatever criticisms can legitimately be laid at Ofqual’s door – and they undoubtedly can.
The underlying theme of the scenario that you postulate is typical of any negotiation: you initially go in with your highest offer, even if it is more than you think what you are selling is worth. If the other side accept it, all well and good. If not, you leave yourself with room to negotiate a reasonably acceptable settlement. Ofqual must have known that this was likely to happen if for no other reason that schools and teachers will be in the front line when results come out and pupils/parents in their droves start banging on doors and asking “what grade did you give Jonny and where did you rank him?” This is an inevitable consequence of a results-based, performance-related system with league tables placed on the high altar to be worshipped. It chimes with the experience of a friend who is a diligent, highly experienced senior maths teacher at an outstanding all-girls school. When the maths department met in May, their internal process of awarding grades and ranking was influenced by the position of each class, with teachers understandably fighting the corner of their own students in the ranking process and teachers ‘going high’ in the first round of bidding before reducing their sights in the second round of negotiations when it was clear from the first that the predicted outcomes were unsustainable based on attainment in previous years. My friend was relieved that she ‘went high’ initially as the pupils in her class would have lost out when reductions in opening positions were universally agreed and applied. Another consequence of this haggling within schools is that pupils in lower sets have little, if any, chance for doing better this year than their peers in higher sets. Such an arbitrary approach to awarding grades ignores the inevitable fact of exam life that some students in lower classes would have out-performed some in higher classes. This example of how the process was undertaken in one school is, I suspect, broadly representative of most schools whose teachers received no instruction or training, particularly in the ranking process which is likely to be crucial to students at the bottom of a grade when standardisation is applied by the Ofqual algorithm which your meticulous analysis in an earlier blog has shown to be mired in confusion and inaccuracy.
The grade inflation which applying CAGs without modification would lead to cannot be explained solely on poker-playing between teachers within schools. The failure of the Ofqual modelling to take into account a school’s trajectory of improvement, outstanding students in historically low-achieving schools, and seeking to ensure a level playing field for BAME students are all likely to have contributed to the optimism in the CAGs. The concern with your scenario of the A and B students must be that the pupils whose schools have submitted intellectually honest assessments will suffer if the response is an across the board reduction in grades. The solid A and B students who have been awarded those grades by their teachers may find themselves reduced to B and C respectively. It could also lead to a domino effect in order to fit the requisite number of grades into pre-determined distribution curves. In an ideal world, the answer to this would be for Ofqual and the exam boards to identify those centres whose grades may have been deliberately inflated in the expectation of them being knocked down. However, if, as seems likely, computer modelling will determine the grades which are ultimately given I seriously question whether any bespoke centre-by-centre analysis will be undertaken. The only element of teacher assessments which now seems likely to carry substantial weight is ranking order, with historical data being the first and foremost reference point in determining the distribution of grades in each subject at each centre.
Ultimately, this year’s exam cohort finds itself in an unprecedented situation. It will be impossible to find a solution acceptable to everyone. However, in seeking to reach a fair and equitable outcome, it seems that CAGs may have unwittingly driven Ofqual towards results which place even more emphasis on the achievements of previous pupils than on the abilities of the class of 2020.
Oh! I’ve just noticed something else too – something that’s missing… and it’s always harder to spot something that isn’t there than to identify a problem with something that is! Dogs not barking and all that…
There’s a column missing.
A level, AS and GCSE all have a grade U, “unclassified” or “ungraded”, for which no certificate is issued.
So each table should have an additional column U to the right, with entries of 100.0 in each of the upper two rows, and 0.0 in the third – all candidates must end up with a ‘real’ grade or a U. Ending each row like that is also far better than that string of odd-looking numbers like 97.6 and 99.7.
That would also make it much more obvious that the numbers are both cumulative, and percentages. Even so, the table should make “cumulative” explicit, and include those % symbols.
We are sorry there was initially a mistake in the reporting of AS results in this chart – and that our speaking notes were inadvertently published with the slides. We removed these because, in the absence of a presenter, they did not aid the reader’s understanding.
Mr. Sherwood and readers of this blog can be assured of the accuracy of the data we have published. Where figures vary from other sources, for example JCQ data, this is because the centre assessment grade data available at this stage is incomplete. Where subject specific CAG data for 2020 is unavailable, for example combined science GCSE in the table we published, the equivalent subject data has been removed from the JCQ figures quoted to ensure – as confirmed on the slide – comparisons are ‘like for like.’
“And looking more closely at the formatting of the tables … Why are the rows in the tables of differing, inconsistent heights. Why are the column headers left justified but column contents right justified?”
Because those are the default settings in Excel. Most tables presented in my university have the same faults. A tiny minority of people responsible for presenting data (in the [semi-]public sector, at least) actually know how to use spreadsheets. Universities don’t bother to train people, and apparently neither does Ofqual.
The OCR archive results confirm your conclusion that the percentages are cumulative. You have to a bit of maths to work out the grade specific real numbers but it’s an easy calculation to make. The benefit of the cumulative numbers/percentages is that you can immediately see how many students and what proportion were awarded A*/A grades, A*/A/B and so on.
Regarding some of your questions on what slide 12 says, please have a look at the first chart and the first 6 columns of first table at my web page here:
I think they present what slide 12 says for A-level in a reasonably clear way.
I think it’s incredibly sad that Ofqual are not allowing an appeals process for students. In my mind this years grades are worthless if students cannot genuinely submit an appeal with information supporting why they feel they have been awarded an incorrect grade.
Its also ludicrous to think that students can resit exams in October, they have been out of school for 7 months, many would have not finished the syallbus and schools are actively encouraging students not to consider resits. There is also the matter of who pays the exam fees… Resits become a bias system that favour the middle class.
Dr Michelle Meadows… Its not to late to allow an appeals process.. I doubt if you want the same situation as the IBO with 80 highrate headteachers demanding appeals.. Much better to do the right thing and allow an appeals process beforehand.
Dear Dr Meadows
Thank you for your comment.
I must confess, however, I am somewhat puzzled.
1. Thank you in particular for the information that “there was initially a mistake in the reporting of AS results in this chart”, and for your gracious apology. I don’t think I referred specifically to the AS numbers, so that is very helpful. May I enquire as to the nature of the mistake? Was it one that has subsequently been corrected? Or might there be a mistake in the version of the chart that I copied for this blog? I have just downloaded the slide pack from the Ofqual website, and it seems to be the same as the (note-free) version I downloaded a few days ago.
2. As regards the notes, they certainly aided my understanding, and I thank you for them – my regret is that the notes I have are only for slides 17, 18 and 19. These were most informative about statistical standardisation and small cohorts, and I would have found it most helpful for the other slides to be annotated too, not least slide 12.
3. And I’m afraid I am still confused by, precisely, what ‘like for like’ actually means. I just don’t understand where those three figures for 2019 GCSE came from.
My objective is clarity, for only through clarity can confidence and trust be built.
Dear Dr Meadows,
I was also lucky enough to have the version of the slides with some notes. “We removed these because, in the absence of a presenter, they did not aid the reader’s understanding”. Dennis has been very diplomatic in his reply, I’ll be more blunt. The statement is plainly not true. In fact all three slides with notes benefit from them, and the third slide is basically content free without the notes. So the opposite of what you said is true – with a presenter they might be less needed as presumably the presenter would have covered them. Without a presenter they are essential, which begs the question why were they really removed? Given how misleading this statement is, and the various other mistakes and poor practice already pointed out, the rest of your post doesn’t reassure me as a reader of the blog one bit.
As always, data tends to be used to prove what you want it to say in education. My only worry is that this will be a blanket downgrading when my school has been honest and fair and has resulted in CAGs which are not over inflated in comparison with last year (although I’m based in Wales where WJEC have a monopoly).
Hope you’re well!
How does Ofqual compare the GCSE performance of the 2020 A-level cohort with that of the 2017-2019 cohorts? The 2020 cohort is the first one that did the new GCSEs in most subjects, which are supposed to be harder than the old GCSEs. Additionally, the new GCSEs were new to teachers in most subjects, so the teachers might have been less experienced in teaching them compared with previous years.
The A level class of 2020 have had a pretty raw deal throughout their education. They were the first cohort to do the more rigorous content of KS2 exams. They were then the first year group to do the full suite of GCSEs using the 1-9 grading system in 2018. They have now had their A level exams cancelled and their futures are at the mercy of a computer programme. There is an unanswerable case for Ofqual to err in their favour when a reasonable doubt arises in the process of awarding their grades.
Personally I think the least that Ofqual can do is ensure that CAGs along with the made up grades are submitted to Universities via UCAS and both grades are entered on the Certificates.
Computer generated results were never going to be fair to everyone and these young people deserve to be fairly treated which is what Gavin Williamson stated would happen when exams were cancelled.
Chris – thank you!
And sorry it’s taken me so long to acknowledge your analysis, which is absolutely right, so many thanks.
I was in fact answering the question “what is the difference between the % of grade B’s bid as compared to as actually awarded last year?”, which is “interesting”, but nowhere near as important as the question I thought I was answering, but wasn’t: “What is the % of grade B’s overbid?”.
So I much appreciate your pointing this out.
So after all the spin Ofqual has in effect admitted that its calculated GCSE grades are not accurate enough for purpose.
A-level grades will be even less accurate than GCSE ones.
This implicit administration is the best honesty we have seen from Ofqual so far. This year’s grading is an impossible problem for many cases and what the students and society needs is for Ofqual to be honest and not mislead the public that this year’s grades are equivalent to previous years’.
Hi Huy – thank you, yes, I agree. Subject to one thing…
Sally Collier’s statement that “this year’s grades are equivalent to previous years” is more true that many might think – but in a perhaps rather unexpected way.
You are right in saying that this year’s grades are unreliable. Sally Collier’s statement is true because previous years’ grades have been unreliable too! (see, for example, https://www.hepi.ac.uk/2019/02/25/1-school-exam-grade-in-4-is-wrong-thats-the-good-news/)
But Ofqual have tended to be rather coy about that…
…and talking of the (un)reliability of exam grades, I’m reminded of this…
“However, to not routinely report the levels of unreliability associated with examinations leaves awarding bodies open to suspicion and criticism. For example, Satterly (1994) suggests that the dependability of scores and grades in many external forms of assessment will continue to be unknown to users and candidates because reporting low reliabilities and large margins of error attached to marks or grades would be a source of embarrassment to awarding bodies. Indeed it is unlikely that an awarding body would unilaterally begin reporting reliability estimates or that any individual awarding body would be willing to accept the burden of educating test users in the meanings of those reliability estimates.”
That’s a paragraph from page 70 of a report published by the exam board AQA in 2005…
…the lead author of which is Dr Michelle Meadows, then at AQA and now at Ofqual…
…and also the author of a previous comment on this page.
But I guess previous years’ grade inaccuracy is very different from this year. Previous years’ inaccuracy comes from subjectivity in marking, and is worse for non-exact subjects than for exact ones. This year’s inaccuracy comes from a kind of postcode lottery and affects both exact and non-exact subjects. In previous years, a student can minimise risk from marking inaccuracy by building up a safety margin. This year for some students it’s like being born into a lower caste where you’re not allowed to get the A* you deserve regardless of how good you are.
Hi Huy – you’re absolutely right here: the reasons behind the unreliability are quite different. But as the rank order ‘cross over’ diagram in the “Isaac” blog suggests (https://www.hepi.ac.uk/2020/05/18/two-and-a-half-cheers-for-ofquals-standardisation-model-just-so-long-as-schools-comply/), it is still possible that, despite everything, the year’s grades could still be more reliable than the grades determined by marking exams.
It is a great pity that schools were asked to submit centre assessment grades this year. They have caused a huge amount of trouble – trouble that I fear is not over yet.
But the overall message must be that the way assessment has been done for years – and this year too – is severely broken. It must be changed.
But if Ofqual didn’t ask for CAGs the the small and medium cohorts (eg for A-levels) would be in big trouble because the number getting each grade in each subject fluctuate so much from year to year, using a statistical model would be no better than using a random number generator.
To take an extreme example, French A-level in this data set, https://sites.google.com/view/2020-ofqual-grade-calculation/data-from-a-typical-comprehensive-school, has A* rate fluctuating from 0% to 100%. Without CAGs, what could be done? It would no better than random for Ofqual to try to calculate how many A* should be awarded.
Even with cohort sizes on 17 to 20, the English literature A* rate fluctuated by a factor of 6, which is probably much more than the fluctuation due to unreliability in marking.
Hi again Huy – yes!!! You are of course absolutely right – CAGs are needed for all cases in which history fails, such as no history (a new school) or small cohorts (whatever small is). I was wrong not to have identified that, so thank you.
“No CAGs” doesn’t work universally. You’re right.
So they could have said, “For cohorts > [whatever], and all schools with ‘history’, send us rank orders only; everyone else sends CAGs”. But that is full of problems too.
They might have thought of that (which is good), and then decided “so everyone sends CAGs”, which is more fair, but has led us to where we are.
If they’d have said “everyone sends CAGs, keep an eye on grade inflation, and be prepared for very rigorous scrutiny of outliers” – rather like the “trust the teachers” option in the ‘hindsight’ blog (https://www.hepi.ac.uk/2020/07/23/hindsight-is-a-wonderful-thing/) – would that have worked better, even though there would be some grade inflation?
I am with Huy on the need for CAGs. My son’s school had around 70 people doing Maths A level, but last year 14% A*s, previous year 3%. Also, what about the outstanding students in a poorly achieving school? Schools needed to be able to communicate the situation for individual students so that OFQUAL at least had the option to consider them.
I think there may still be a way that they could do an algorithm which calculated a centre average score for the last 3 years and a centre average score for the CAGs, for the whole centre rather than at a subject level, and then applied a moderation factor based on that centre submission as adjusted for prior attainment. I really hope they do not ring-fence subjects in the moderation.
Craig – every year feels like it has disadvantages for your own children! The way universities have treated 8/9 as A* has been a big advantage for the applicants with lower 8s who would have got As in the old system, for example Maths GCSE at AQA – in 2016 4% were awarded A*, but in 2019 8.2% were awarded 8/9, so actually they were probably winners in the GCSE lottery. For A levels the grades overall will be higher than last year so as a cohort they are at an advantage, but within that there will be winners and losers in the statistical lottery.
So Ofqual is subtly and patronisingly putting the blame on the teachers, “You got it wrong but it’s not your fault because we haven’t shown you how”, while portraying itself as the hero stepping in to save the day.
It tries to hide the fact that its standardisation is not much more reliable that the the teachers’ predictions – the standardisation simply works in the opposite direction, but it is not much more reliable. Some teachers have wrongly put some students up a grade, Ofqual will wrongly put some students down a grade, so it all evens out, sorted, all well and good, job done by Ofqual.
Then it is shifting what effectively amounts to handling appeals on to the education providers who will have to deal with the mess of student being awarded unreliable grades.
I think your proposal would have worked better. Also I think in these circumstances it is better to accept some grade inflation. The principle I would use here is it is better to let a few criminals get away with it than to imprison a few wrong people. That is also the principle that France has followed, but Ofqual is doing the opposite.
“If they’d have said “everyone sends CAGs, keep an eye on grade inflation, and be prepared for very rigorous scrutiny of outliers” – rather like the “trust the teachers” option in the ‘hindsight’ blog (https://www.hepi.ac.uk/2020/07/23/hindsight-is-a-wonderful-thing/) – would that have worked better, even though there would be some grade inflation?”
But the problem is Ofqual doesn’t want the hassle of rigorous scrutiny of outliers. I wonder how much work the exam boards have saved in all of this. Surely giving some students a proper appeal process will be less work than doing exams for everyone?
The information you gave is really interesting, “My son’s school had around 70 people doing Maths A level, but last year 14% A*s, previous year 3%.”
I shows that even with cohort sizes of around 70 and for a subject for which marking is consistent, the A* rate fluctuates by a factor of nearly 5. Statistical modelling using historical data is not going to work in this case. Eg, allowing 14% A* is likely to contribute to grade inflation, allowing 8% A* is likely to be unjust.
In my son’s school, the largest cohort sizes are around 30, and the fluctuations are similar or worse. It’s a typical comprehensive with 1100 students (Y7 to 13), so I think the problem is common across the country.
Huy, I have posted about this on another of Dennis’s excellent blogs. I think it was partly because the first proper year of the linear maths A level favoured ability more than exam training so good comprehensives did better than before and some of the schools which trained to the exam had disappointing results. But the variability every year is significant.
The Ofqual plot thickens.
I’ve been talking to quite a few people, and some light has been shed on these words to be found in the comment by Ofqual’s Dr Michelle Meadows, somewhere higher up:
“Where subject specific CAG data for 2020 is unavailable, for example combined science GCSE in the table we published, the equivalent subject data has been removed from the JCQ figures quoted to ensure – as confirmed on the slide – comparisons are ‘like for like.’ ”
What this means is that the CAG figures as shown are not the complete set of CAGs submitted – or in Ofqual-speak “unavailable”!!! What can’t be “available”? The deadline for submission was 12 June!!!
So these numbers represent an aggregate-so-far of CAGs in certain subjects from certain schools. To be “fair” the same subjects have been aggregated in the 2019 GCSE row, which explains why the numbers for 2019 GCSE do not match the 2019 whole subject aggregates, as shown in sources such as JCQ and Stubbs, which I quoted.
That is indeed an explanation. But it is outrageous!
How many CAGs have been aggregated? And in which, and how many, subjects? 3 subjects? 10 subjects? And over how many schools?
What on earth do those numbers signify? And what inferences, if any, can be drawn by the comparison of this unknown population of CAGs?
I just don’t believe it, and I surely couldn’t make it up.
Are the A level results complete? Or are some small cohort subjects missing so that the numbers just happen to match JCQ and Stubbs? Who knows?
I have received this from Ofqual in response to a my query about how the grading process will be carried out in relation to small cohorts: It provides a partial answer to my question and the principle of applying a sliding scale seems to me to be preferable to arbitrary cohort size starting and cut-off points. Needless to say, the devil will be in the detail because what the reply does not disclose is how much deviation will be applied at any point on the scale.
“The standardisation process will be sensitive to the fact that centres with smaller entries (because of their centre size or subject cohort) usually see more year-on year variation in results than in centres with larger entries.
Centres with smaller entries will have greater weight placed upon the centre assessment grades when calculating results.
We do not have one cut-off point to define a ‘small entry’ in a subject – instead, the process will use a sliding scale to adjust the weighting given to the centre assessment grade or statistical evidence depending on the number of
students taking a subject at a centre.
The process will only place more weight on the statistical evidence than the centre assessment grades where we believe it will increase the likelihood of students getting the grades they would have likely achieved had they taken exams in 2020.”
In principle, the application of a sliding scale would seem to be preferable to arbitrary starting/staging/cut-off points based on cohort sizes. However, the response only provides a partial answer to my question because it does not disclose the level of deviation to be applied at any point on the scale from a standardised norm based on historical data. That detail intersects with my other major concern, which I have still been unable to bottom, namely the question of rounding up/down. Those important caveats obviously temper the reassurance which the above statement may initially appear to give.
Yes, it should be a sliding scale, and we need to know the profile of that scale. Also, a good scale will need to depend on more than just the cohort size. It also needs to depend on the number and variability of a the number of students in each grade. For example:
School X has 50 students taking Physics each year. In 2017, 2018 and 2019, it has 15, 17, 16 students getting A*. It is somewhat reasonable to think that this year’s A* rate is somewhere near the average of 32%.
School Y also has 50 students taking Physics each year. In 2017, 2018 and 2019, it has 2, 17, 11 students getting A*. Due to the large variability, it is less reasonable to think that this year’s A* rate is somewhere near the average of 10%. After all, it was 2% for 1/3 of the time and 17% for another 1/3 of the time.
So the same cohort size might be large enough making somewhat reasonable prediction for one school, but not large enough for another school.
But we don’t know if Ofqual’s model takes that into account. It likes to keep its model safe from public scrutiny.
Supposing school Y predicts that 20% will get A* but only 17% truly deserve it, and Ofqual reduces that to 10%, it might well wrongly downgrade 41% of the students who deserve A*s. In that case, its so-called standardisation is actually worse than the school’s overprediction. Unfortunately, Ofqual has fooled the public into think thinking that the teachers are wrong and its statistical model is right.
Sorry, one of the sentences above should have been:
Due to the large variability, it is less reasonable to think that this year’s A* rate is somewhere near the average of 20%. After all, it was 4% for 1/3 of the time and 34% for another 1/3 of the time.
This table would make an excellent case study in sixth form Core Maths on how not to present data. I would award Ofqual a ‘C’ for statistical presentation – which presumably would then get downgraded to a ‘D’.
> it is still possible that, despite everything, the year’s grades could still be more reliable than the grades determined by marking exams
I disagree with this. It’s not wrong, exactly; it’s just that there’s actually no meaningful comparison to be drawn between exam reliability and CAG reliability, since there’s no useful concept of reliability for CAGs.
Exam grades are about asssessing an artifact that a student has produced. You can measure how reliable they are by comparing with the result of using a more carefully-calibrated measuring instrument (a senior examiner).
Predicted grades are about forecasting what a student will do in the future. You can measure how reliable they are by waiting to see what happens.
Centre-assessed grades are about predicting what would happen in an alternate universe. Since it’s difficult to create alternate universes, there’s no way to measure reliability — in fact, there’s no meaningful idea of reliability to measure.
Thank you… some thoughts if I may.
I fear that this year’s grades will not be based on teacher’s predictions or teacher centre assessment grades. The results of the statistical algorithm will be used to superpose grade boundaries on the schools’ rank orders, and teacher predictions will be ignored (although there will be many instances in which the algorithm gives the same result as the teachers – but this is a coincidence). The algorithm ‘wins’, as is made very clear from slide 18 of the pack presented at Ofqual’s Summer Symposium (https://www.gov.uk/government/publications/awarding-qualifications-in-summer-2020#summer-symposium).
That throws the spotlight onto the rank order. And it’s the rank order that’s important, for the grade boundaries are determined to comply with the “no grade inflation” policy – only [so many] students are allowed within [this grade].
So my assertion that this year’s exam-free grades might be more reliable than previous years’ exam-based grades is primarily about the reliability of a rank order determined by teachers as compared to the reliability of a rank order determined by single-marked exams.
One way of thinking about this is to imagine all students ‘duplicated’ and lined up in two adjacent long rows, one in which the rank order is determined by teachers; the other by exam marks. Both rows are ‘sliced’ with the same number of students in each ‘slice’ labelled as a grade.
Which row has the greater number of students actually awarded the grade they truly merit (recognising, very much, your artefact point)?
The reliability of exams – the ‘exam row’ – has been measured: the overall sound-bite message is that, on average, 1 grade in 4 is wrong (https://www.hepi.ac.uk/2019/02/25/1-school-exam-grade-in-4-is-wrong-thats-the-good-news/).
You’re right that there is no measure of the reliability of the teacher-determined rank order, so my next bit is no more, no less, than my belief: I believe that a conscientious teacher, a teacher with integrity, and a teacher who cares, is likely to do a better job of ranking the students than a 1-grade-in-4 is wrong exam. So I believe that the ‘teacher row’ is likely to give a better, more reliable, set of grades than the ‘exam-mark row’.
A great pity of this year’s process has been the obligation on teachers to have only one student on each of ‘rung’ of the rank order ‘ladder’, for I consider that is truly impossible to do. But in general, I’d trust the teachers.
That’s the basis of my assertion – there’s rather more about this on https://www.hepi.ac.uk/2020/05/18/two-and-a-half-cheers-for-ofquals-standardisation-model-just-so-long-as-schools-comply/ and https://www.hepi.ac.uk/2020/06/01/no-test-is-better-than-a-bad-test/.
Thank you for reading this far! As I said, this is only my belief…
“Centre-assessed grades are about predicting what would happen in an alternate universe. Since it’s difficult to create alternate universes, there’s no way to measure reliability — in fact, there’s no meaningful idea of reliability to measure.”
Unfortunately, Ofqual’s definition of reliability here is “consistency with some imaginary 2017-2019 grade distribution at the centre-subject level”. It works on the basis that just because it pushes the grades at the centre-subject level closer to that imaginary distribution, it is being more reliable than the teachers, and it can claim the prize of “standardisation”.
This year’s distributions may well match previous years’ perfectly — that’s not hard to do. But that has almost nothing to do with whether individual grades are “reliable” (which is what the calibration of exam marking is concerned with). The distribution could be made to match just as well if the ranking was generated by rolling dice rather than by teachers.
Ofqual have been confused about this throughout, which is why they keep talking about grade inflation, “currency”, etc. In fact, the value of exams from year to year depends on two factors:
1. there’s a reasonably reliable and objective nationwide ranking system (i.e. exams)
2. the grade boundaries fall in roughly the same places each year
Ofqual has been entirely focused on keeping the second (grade boundaries) the same as previous years. But without the first (objective nationwide ranking), that’s nowhere near sufficient to make the resulting grades trustworthy, useful, or even meaningful.
I agree with you 90%. The 10% is to do with the possibility that Ofqual might not be confused but is deliberately misleading the public.
In a letter to my MP, I wrote the below:
… due to the weak links between the grade distribution in the past three years and the current cohort’s ability, the supposedly standardised 2020 grades for that school is not necessarily consistent with those of previous years. Furthermore, the supposedly standardised 2020 grades for that school is not necessarily consistent with those of other schools for this year either. Therefore Ofqual’s standardisation for this year is not true standardisation, and Ofqual’s claim that it will bring consistency to grades is unlikely to be fulfilled for A-levels. Attempting to enforce a standardisation that is not fit for purpose will result in injustice…
Thank you for your thoughts, and for this interesting series of blog posts.
The idea of “the grade they truly merit” is an appealing one, but I think that if you try to specify exactly what you mean by it you’ll run into difficulties. (Your phrase “however that might be determined” in an earlier comment hints at these difficulties.)
Regarding your larger point: let’s assume that teachers do an excellent job of ranking — e.g. let’s assume that the teachers in each school can create rank orders that perfectly line up with the marks that children go on to achieve in exams. Even in this ideal scenario Ofqual’s system produces harmful outcomes: most notably, the grades of high-achieving children in low-achieving schools will be dragged down to match their peers, and there’s nothing that the children or their teachers can do to avoid it.
If there were a high-quality national ranking (as in normal years), it would make some sense to impose grade boundaries. Unfortunately, instead, there’s a large collection of rankings, and a correspondingly large collection of grade boundaries that will guarantee that children at the wrong sort of school will be given low grades, even if they were on track to achieve highly in exams.
I tweeted the CAG/GCSE comparison table to see what the wider community thinks about it. Last time I looked, more than 8,000 people had viewed the tweet and there are plenty of comments, most of them pretty damning. Here are a few:
“What are the numbers? Percentages? What’s it supposed to mean?” – reporter, Liverpool Echo
“The way the rows have been flipped around [for different exams] is making my eyes hurt” – Sky News journalist
“I sat here and ‘de-cumulativised’ the first table over lunch as it was annoying me and I was genuinely interested [in knowing what the actual grade stats are]” – Head of Maths, Secondary School.
“So the 3rd row in each table is the cumulative difference between last year and CAG? Why? How does that give anything useful?” – maths teacher.
“The numbers don’t add up to 100%! If adding in those who got ‘no grade’ makes this work, do it! And why have they put the worst grade first at GCSE?” – software consultant [who assumed that Grade 9 is the lowest GCSE grade, as many people do].
The point is that thoughtful, intelligent people are struggling to make sense of the table, and in some cases are drawing wrong conclusions from it. And that’s before the revelation that the data in the table is incomplete.
Thanks again, and I totally agree with you.
(1) ‘Truly merit’ is indeed an “appealing concept” – and needs a bit of thinking through!
(2) Enforcing “no grade inflation”, and constraining this year’s grades to previous years’ history will indeed result in unfairnesses and injustices. Doubly so since the current rules deny appeals except on the most limited and technical grounds, which is truly iniquitous.
But despite all that, I still think it quite likely that this year’s outcomes will be more fair than previous years – that’s not ‘fair’ in an absolute sense, but ‘more fair’ than the past.
I believe this to be likely not so much because this year is so good – it isn’t – but because previous years have been so poor.
Over each of about the last ten years, on average, 1 grade in 4 has been wrong (where ‘right’ means, as you know, the grade that would have been given had a senior examiner marked the script, this being the only benchmark, however flawed). So in the past 25% of the grades have been wrong.
That’s a very low standard for comparison.
So even with all the problems of this year, I still believe that the outcomes just can’t be 25% wrong. They have to be better than that!
Unfortunately, the waters are muddied by the likelihood that more than 25% – perhaps 30% or more – of the teachers’ centre assessment grades are likely to be over-ruled by statistical standardisation. But that’s probably because an unknown number – and possibly quite a large unknown number – of those were over-the-top. Which in itself a great pity, if not tragedy (https://www.tes.com/news/exams-gcse-alevel-grading-issue-risk-concern). We are indeed where we are.
But we could have been in such a better place… (https://www.hepi.ac.uk/2020/07/23/hindsight-is-a-wonderful-thing/).
“Unfortunately, the waters are muddied by the likelihood that more than 25% – perhaps 30% or more – of the teachers’ centre assessment grades are likely to be over-ruled by statistical standardisation.”
By my calculation, which I briefly presented here, https://sites.google.com/view/2020-ofqual-grade-calculation/ofqual-grade-calculation-problems, a net 39% of predicted grades will be downgraded by the “standardisation”. If that is correct then the percentage of grades modified by the “standardisation” will be higher than 39% because some grades will be upgraded and some will be downgraded.
According to Ofqual’s figures on slide 12, if you have 100 students who are worthy of A*, the teachers have predicted 177. This is not very accurate. Then out of 177 students predicted A*, the standardisation will knock about 64 down to A. The problem is the “standardisation” can’t identify the correct 64 students to knock down. It will be a lottery with different dices for different types of schools (which has an association with different social classes).
Suppose you are a talented student:
If you’re in a poor performing comprehensive, the dice are loaded against you. If you’re in a medium comprehensive, the dice are all over the place. If you’re in a high performing grammar or private school, then the dice predict your grades more accurately.
That in itself is an unfairness that goes beyond whether more or less than 25% of this year’s grades will be accurate.
The other thing is the 25% inaccuracy in other years probably relates more to non-exact subjects. Maths and science grades are probably more accurate. But this year the grade lottery affects all subjects, so it’s more unfair for those subjects.
Thank you; my “30%” is a number I have been using for some time, based on the estimate of 37% suggested by FFT Education Datalab in their answer to a question I asked relating to their report of 15 June (https://ffteducationdatalab.org.uk/2020/06/gcse-results-2020-a-look-at-the-grades-proposed-by-schools/). I rounded down by quite a bit to ‘play safe’ – your estimate could well be spot-on, which is fine – and the number I’ve been using is still safe!
You are also right in saying that exam grades in different subjects are more reliable than others – according to Ofqual’s data, Maths (all flavours) is about 96%; History about 56%; also, within a subject, the reliability differs (dramatically) by mark – even in Maths, scripts marked very close to grade boundaries are 50% reliable at best: you might as well toss a coin (https://www.hepi.ac.uk/2019/02/25/1-school-exam-grade-in-4-is-wrong-thats-the-good-news/)
So my “1 grade in 4 is wrong” statement is very much a “headline”, and I use it as a guide to the overall reliability of all grades over all subjects at all levels. For specific subjects, or for the clusters of subjects that a student typically takes at A level, the number is different, as you correctly identify.
Last year, about 6 million grades were awarded, and I think it is a fair estimate to say that about 4.5 million were ‘fair’ – the grade the student validly merited – whilst about 1.5 million weren’t. A few of the 1.5 million were the result of some form of marking or admin error, but at least 90%, if not 95%, were the result of ‘fuzziness’, and the problem with rank orders determined by a single exam mark.
This year about 6 million grades will be awarded too, and I still stand by my belief that more than 4.5 million will be fair, and fewer than 1.5 million unfair. So 1 million unfair grades this year still ‘beats’ my very low target, and that number is a desperate tragedy – but still a lesser tragedy than the last decade or so.
My conclusion is two-fold, and you know both elements well!
(1) A fair appeals process this year so that as many of the unfairnesses as possible can be rectified.
(2) The whole system needs a shake-up so that every student receives an assessment that is both fair and reliable – which, even for an exam, is not difficult to do (https://www.hepi.ac.uk/2019/07/16/students-will-be-given-more-than-1-5-million-wrong-gcse-as-and-a-level-grades-this-summer-here-are-some-potential-solutions-which-do-you-prefer/).
Also, there’s some more information published today by FFT Education Datalab, analysing their sample of 1,900 schools (about one-half of the state secondaries in England) by type of school – https://ffteducationdatalab.org.uk/2020/07/gcse-results-2020-optimistic-grades-and-school-type/. Here is one extract from their report:
“We would expect lower attaining schools to improve the most as a group, but what Ofqual’s statistical moderation process is unlikely to have been able to detect is which of those would have improved, and which would not, had exams been in place.”
That to me is further evidence of a need for a fair appeals process.
So, thanks again! Always good to talk!!!
Documents published by Ofqual have generally indicated that moderation will be carried out by their algorithm at a subject level and have not revealed what analysis will be carried out at Exam Centre level. In schools with performance which varies significantly from year to year this could result in some strange results.
I have an example of a different way of moderating a school’s results: not by each subject individually, but for the school as a whole. Unfortunately, I don’t think I can embed a picture here, but HEPI have kindly created this link – https://ibb.co/9WQFr6z – which depicts a table describing what’s going on.
In the imaginary scenario shown in the table, a school has results for the last 3 years for English and Maths. The Maths department did well in 2017, but their best maths teacher left at the end of 2017, and results have declined. The English department have changed their approach to teaching, brought in new resources and set up extra revision sessions and are on an upward trajectory.
The school submits total CAGs for 2020 which are 9% higher than their 3 year averages – Maths proposes the same grades as last year, but the English department is forecasting increased grades based on the very high results in January mocks and the generally high standard of work the students have been producing.
We know that Ofqual will not take account of trajectories, but the prior attainment data shows that the current cohort is 5% stronger than last year’s cohort, so the grades at that centre can go up.
There are many ways in which moderation could be tackled, this just looks at the difference in applying an algorithm to the exam centre as a whole and applying it at a subject level.
For Maths the CAGs are A,C,C,D,D. If moderated at a subject level, the good historic results for Maths mean that their grades are uplifted to A*,B,C,C,D. If moderated at a centre level they are reduced slightly as the centre as a whole has “over-bid” by 9%, so the results are A,C,D,D,E.
For English, CAGs of A*,A,C,C,D, are hit very hard in subject level moderation due to poor performance in 2017 and 2018, so the CAGs are slashed to B,B,D,D,E, but if a centre level approach is taken they are cut modestly to A*,A,C,D,D.
It seems obvious that the centre level approach has applied the same overall level of moderation in a much fairer way.
I agree that your suggestion would give fairer grades. Eg,
For Westminster School from 2017-2019 the overall A* and A*/A rates varied by factors of 1.02 and 1.09 respectively. (BTW I find that level of consistency amazing). So your suggestion will work very well.
For my son’s school, over the same period the over overall A* and A*/A rates varied by factors of 3 and 2 respectively, so your suggestion would still not work well enough. But it will still be better than “standardising” at the centre-subject level.
However, it would introduce a new problem in that the different subjects within the same school will have to fight for a limited number of high grades.
“However, it would introduce a new problem in that the different subjects within the same school will have to fight for a limited number of high grades.”
But I would trust humans, rather than an inadequate statistical model, to sort that out, so your suggestion is definitely better, it should have been used, and the schools could have and should have stepped up to that challenge, as they are in the best position to do that.
You suggestion is another argument to show that Ofqual’s claim that what it’s doing is the fairest possible is a false claim.
What did the (now removed) notes on the slides explain? Particularly in relation to small cohorts – I find that slide entirely inscrutable.
My child is in a cohort of 9 for a physics A level, with huge variability in grades over prior few years, but with new teacher and lots of able students in class this year. She thinks she’ll be high in rankings and should be okay but is worried about friends who also need high grades for engineering and maths courses.
(It’s something I loathe about this system – the way they are compared to their classmate and know their own success could directly cause their friend’s desolation. Especially an issue in small cohorts – but DD doesn’t believe my suspicion that 9 is a small cohort and more weight will hopefully go on CAG.)
Hi Clare – thank you, and, yes, your loathing is indeed valid. Ugh.
The file I downloaded has notes to only slides 17, 18 and 19, the last being the ‘small cohort’ picture. Here they are – but I must repeat the health warning provided by Ofqual’s Dr Michelle Meadows, to the effect that they do “not aid the reader’s understanding”. Perhaps you would let us all know your view on that…
Slide 17, “Testing to find the best standardisation model”
“There are two main steps to the standardisation process. The first step is to produce a predicted grade distribution for every centre in each subject. There are a range of approaches that might have been taken to coming up with these predictions so over the last couple of months we’ve been testing a wide range of approaches to identify the model that comes up with the most accurate predictions in the fairest way. The second step is to take those predicted grade distributions and combine them with the centre assessment grades and rank orders provided by centres. It is this second step which candidates get which grades.
The approach that we have identified as being the most accurate and the fairest we are calling the Direct Centre-level Performance approach that I’ll describe now.”
Slide 18, “The direct centre-level performance approach”
Here are the five sources of evidence that exam boards have available to them this summer for the awarding of grades.
The first group of information is the historical performance data in the subject. This is a record of the grade that every candidate taking the qualification has achieved over the past three years. The number of years of historical data that is used depends on the type of qualification being considered, by the example show here is based on three years of data.
The second source of data is the prior attainment for those candidates whose results data we have. Prior-attainment information is used routinely in any year for the purposes of setting and maintaining standards and can be a valuable source of evidence to identify changes in the ability of a cohort of students. When considering a GCSE, the prior attainment information is their KS2 attainment. When considering AS and A level, the prior attainment data that is used is candidates mean GCSE grade. This source of data is all of the prior attainment information for those candidates that are in the first box.
The third data source is the prior information for candidates sitting the subject this year. This information is the same as that in the last box, but for candidates who would have been sitting their exams this summer. Of course, we don’t have the results data for these candidates as this is what this process is needing to determine.
The final two sources of evidence are the rank orders of candidates and the centre assessment grades. This is the information about provided by centres on which we will rely to allocate grades to individual candidates.
The first step in the process is to generate a historical grade distribution for each centre entering candidates in the subject this summer. CLICK So that means drawing on the historical results data, identifying candidates entering the subject previously with that centres and combining all of that data together and determining the proportion of candidates achieving each grade to produce this distribution.
Once we have that distribution, we need to consider whether it needs to be adjusted to better reflect the ability profile of students entering the subject this year. To do that we draw on two if the sources of data that I mentioned earlier. The prior attainment information for those candidates we just identified in the historical data and the prior attainment of those who would have been sitting their exams with the centre this summer. This allows us to determine whether this historical grade distribution should be adjusted upwards or downwards compared to the historical data. In this example, the prior attainment for candidates entering the subject through this centre suggests that they are a slightly more able group and therefore, the grade distribution is adjusted to have a higher proportion of grades at the top and end a lower proportion at the bottom. This produced the predicted grade distribution for the centre in the subject this year.
Once we have that grade distribution we need to determine the grades for each individual candidate at the centre. We know that the statistical model – and indeed any statistical model – will not be capable of predicting the individual grades for individual candidates. To determine which candidates get which grades we, therefore, rely on the rank orders which teachers have produced based on the relative ability of their students. To do this we take the proportion of candidates predicted to achieve each grade, as shown here with the bars from the previous plot stacked on to of one another and, next to that, we lay the rank order of candidates generated by teachers. This enables us to draw a line across, in the same way that candidates are divided up into grades in a normal year, to replicate the predicted proportions of candidates achieving each grade.
Once exam boards have allocated those grades, there is then a check at the national level to compare whether the results of this process appear to have been too generous or too severe on candidates this summer. At which point slight adjustments will be made before allocating the grades to individual candidates based on the rank orders teachers have provided.
There will be many instances where the prior attainment record for candidates is not available usually because the student didn’t sit those earlier assessments. Where only a proportion of student in the centre entering for the subject have a valid prior-attainment record, this is taken into account when performing the adjustment. Where the whole centre has prior attainment information, the full adjustment described above is applied. Where no candidates have prior attainment information there is no adjustment possible on this basis and, therefore, the historical grade distribution for the centre is simply carried forward as the prediction for the centre this year. Where it is somewhere in between, which is the case for most centres, the weight of the adjustment is applied proportionally based on the proportion of candidates with prior attainment.”
Slide 19, “Centres with small numbers of candidates entered in each subject”
“For centres with a small entry in the subject it is also necessary to take a slightly different approach which draws on the centre assessment grades more directly. Where a centre has either a small number of candidates entering for a subject this year or has had a small number of candidates entering in the past, it would be technically unsound to use the statistical evidence in the same way. The intention is therefore for exam boards to balance the sources of evidence differently depending on the number of candidates. Where the number of candidates is very small, those candidates will be awarded their centre assessment grades. Where there are slightly more candidates and we have slightly more confidence in the statistics, exam boards will be giving some weight to the centre assessment grades and some weight to the statistical evidence in a tapered way as the number of candidates increases.
When we have a relatively large number of candidates, our confidence in the stats is high
As the number of candidates reduces, so too does our confidence in the statistical evidence, down to the point that our confidence in the statistics is very low when there are very few candidates
As the confidence in the statistics gets low it’s important that we reflect that in the balance of evidence we use
The two sources of evidence we have are the statistics and the CAGs provided by teachers
Where the number of candidates is high, our confidence is high, meaning we put the balance of evidence on the statistics.
When there are very few candidates and our confidence is low, that balance shift with the CAGs becoming the primary source of information.”
I read that as the use of a formula with two thresholds:
hi = the cohort size at and above which ‘history’ is weighted 100%, and CAGs 0%
lo = the cohort size at and below which CAGs are weighted 100% and ‘history’ 0%
such that cohorts intermediate between [low] and [high] are proportionately weighted according to a formula something like
CAGs x (hi – cohort)/(hi – lo) + History x (cohort – lo)/(hi – lo)
It would be interesting to know what hi and lo actually are…
Thanks so much for all this, Dennis. Even as a layperson, in terms of both education and statistics, those notes did indeed aid my understanding of the slides greatly and as such I am confused by Dr Meadow’s statement above.
None of this messy and limited information is helpful, as other parents have mentioned, to hard working teenagers who are angst ridden about the fact that it is possibly statistically impossible for them and their friends from the same class to all get what they need, based on school’s prior results.
The other thing I have observed impacting the mental and physical health of this cohort is the short time available to study for exams in October, after results come in. As a result, my daughter and her friends have continued studying extremely hard in case they have to take them, adding considerable stress to their summers. It seemed like this had been recognised, with the initial announcement that results would be released earlier, only for that to be dashed away.
Anyway, thank you again for all your analysis. And yes, how to know what hi/lo numbers actually are!
…thank you; and I do hope everything works out as you and your daughter hope…
> I must repeat the health warning provided by Ofqual’s Dr Michelle Meadows, to the effect that they do “not aid the reader’s understanding”. Perhaps you would let us all know your view on that…
It appears that Dr Meadows is not telling the truth
Here’s a more likely explanation: the notes were removed precisely because they did aid the reader’s understanding. Ofqual has justified keeping its model secret by saying that if the details were released then some teachers would be able to work out what their students’ grades would be in advance of results day.
The speaker notes that Dennis has kindly shared allow some teachers to do precisely that. They say “Where the number of candidates is very small, those candidates will be awarded their centre assessment grades.” This information does not appear in the slides (which say only that small cohorts will be subject to less statistical adjustment), and it allows teachers with “classes” of a single student to work out now what the final results will be.
With 9 students spread over 7 grades A*-U, you will have very small numbers in each grade band. A*, A, E and U will have even smaller numbers. So a cohort of 9 must count as very small.
It’s sad that Dr Michelle Meadows spinned the removal of the slide notes into “We removed these because, in the absence of a presenter, they did not aid the reader’s understanding.”
Clearly they aid the reader’s understanding than their absence. They don’t fully clarify the issue (but then neither do Ofqual’s answers to our enquiries on the details of its model), but they do help the reader to understand that slide better.
It looks like the real reason for the removal of the notes is that they aid the reader’s understanding more like Ofqual would like to do.
Sorry, some typos, should have been
“Clearly they aid the reader’s understanding than more their absence. ”
“It looks like the real reason for the removal of the notes is that they aid the reader’s understanding more than Ofqual would like to do.”
The results from Scotland are out, and there’s a lot of information here (https://www.sqa.org.uk/sqa/64717.html), including a description of their methodology here (https://www.sqa.org.uk/sqa/files_ccc/SQAAwardingMethodology2020Report.pdf), which I haven’t been through yet. There’s also a summary on https://www.sqa.org.uk/sqa/files_ccc/SQAChiefExaminingOfficer2020NQReport.pdf?fbclid=IwAR3_AfffsXUXw8cul2wDfaBrPO7NXAIgbx__U4ey9tmJiznLj2KEinvk-Ww.
Some of the headline figures are, for the % A – C grades as actually awarded:
Advanced Higher (= A level), 84.9% (2019, 79.4%), difference = 5.5%
National 5 (=GCSE), 81.1% (2019, 78.2%), difference = 2.9%
5.5% grade inflation for A level is quite large; I wonder why the SQA chose that number, for they could easily have drawn their line lower.
They also given information on the CAGs, for example, the % A – C ‘bid for’ was:
Advanced Higher: 92.8%, this being 13.4% above 2019, and 7.9% above the 84.9% they allowed.
National 5: 88.6%, this being 10.4% above 2019, and 7.5% above the 81.1% they allowed.
Overall, the total number of grades awarded was 511,070 (rather less than 10% of England).
377,308 (73.8%) of the corresponding CAGs were confirmed, and 133,762 (26.2%) were changed, 124,564 down and 9,198 up.
Scottish exam results:
“A total of 133,000 individual results were adjusted by the Scottish Qualifications Authority (SQA) from the initial estimates of grades that were submitted by teachers – a quarter of the total.
Of these, 6.9% of the estimates were adjusted up, while 93.1% were adjusted down. Almost all (96%) were adjusted by a single grade.
SQA figures also showed that the Higher pass rate for pupils from the most deprived backgrounds was reduced by 15.2 percentage points, compared to only 6.9 percentage points for the wealthiest pupils.”
This is what I fear for A-level results in England and Wales: talented students who go to comprehensive schools will be more likely to have their predicted grades knocked down than those who go to private and grammar schools.
I’m getting a bit confused about the Scottish Highers results. Some people are reporting a 5.5% increase in pass rate, others a lower increase. Then it’s being said that the rate has fallen more for disadvantaged students than for wealthier ones, but has fallen for both.
Can someone clarify?
Gordon: the Higher pass (grade A-C) rate is up from 74.8% (2019) to 78.9% (2020).
Two possible explanations for your 5.5% confusion:
(1) the Higher rate has been reported as a 5.5% increase, since 78.9 is 105.5% of 74.8
(2) you’re thinking of the Advanced Highers pass rate (which has increased from 79.4% to 84.5%)
A couple of thoughts if I may…
Forgetting CAGs for the moment, and just focusing on the actual grades as awarded, the definitive data source, I think, is the set of the Excel spreadsheets that can be downloaded from
Page ‘scqf 7’ gives the results by subject for Advanced Highers, Scotland’s A levels: This year, 84.9% of candidates were awarded grades A – C; in 2019, 79.4%. This year’s number is 5.5 percentage points greater.
Likewise, page ‘scqf 5’ is for National 5 = GCSE: this year, 81.1% of candidates were awarded A – C; in 2019, 78.2%. This year’s number is 2.9 percentage points higher.
I find the BBC report “SQA figures also showed that the Higher pass rate for pupils from the most deprived backgrounds was reduced by 15.2 percentage points, compared to only 6.9 percentage points for the wealthiest pupils” a bit of a muddle.
It would be quite reasonable for ‘Alex Reader’ to interpret that as “this year, students from deprived schools have had their grades pushed down 15.2%, but those rich kids, only 6.9%” – applying those numbers to actual grades, perhaps as compared to last year. Or even, and very differently, “this year, kids from deprived schools could pass their exams with a mark 15.2% lower than an ‘ordinary student’, while rich kids could pass with a mark 6.9% lower”. In fact, the more I think about it, the more likely that second explanation appears to be consistent with the words – however bizarre that actually is!
I think these are most unlikely. Far more likely – but requiring more knowledge – is “This year, deprived schools have submitted very ‘optimistic’ CAGs for Scottish Highers (= AS), bidding for 94.1% of candidates to be awarded grades A – C. In fact, only 78.9% of candidates have actually been awarded A – C (that’s page ‘scqf 6’), and so these schools overbid by 15.2 percentage points. Private schools also overbid, seeking to award 85.8% of their candidates A – C, 6.9 percentage points greater than the actual award.”
If these figures are true, then a higher proportion of ‘deprived’ CAGs were downgraded than for ‘wealthy’ CAGs. There are, I think, two possible reasons:
(1) The statistical model is intrinsically unfair, or biased against deprived schools.
(2) For whatever reasons, teachers in deprived schools were more ‘optimistic’ than those in ‘wealthy’ schools, so their CAGs were systematically more over-the-top.
It would be “interesting”, and I think important, to untangle those too, and to ask the question “why?”. And also to ask “Would this not have happened had the SQA (and maybe Ofqual too) been much clearer on the rules?”.
As I noted earlier, 5.5% grade inflation for Advanced Highers is to me quite large, and triggers the story of the three teachers:
Teacher 1 anticipated “no grade inflation”, and submitted a grade profile right on that button.
Teacher 2 was a bit optimistic, and submitted grades representing about 5% grade inflation.
Teacher 3 was over-the top, and submitted grades at 15% grade inflation.
Is the outcome that Teacher 3’s cohort are all downgraded; Teacher 2’s grades are accepted, as are Teacher 1’s?
Is Teacher 1 thinking “I must be a right charlie, playing what I thought was the game honestly. I have two students who might have been awarded As if only I’d pushed the boat out a bit more…”?
Have honest Scottish teachers been penalised? And could that have been avoided if the SQA – and Ofqual too – had been honest with everyone at the start, and said “this year, because of the special circumstances, we will allow [5%] Advanced Higher grade inflation. If you go 0.01% over that, we’ll knock you back. But keep within that and you’ll be OK”?
It’s all about the clarity of the rules…
Well, my explanation of the BBC quote is wrong!
I’ve just seen an item on TES (https://www.tes.com/news/SQA-results-day-poorest-far-more-likely-have-higher-pass-downgraded) that provides some more, and more important, information:
“Based on teacher judgements this year, the Higher pass rate for Scotland’s most deprived pupils would have been 85.1 per cent – but after moderation it was lowered to 69.9 per cent, figures published today reveal.”
I’m not sure if that means “the aggregate of all pupils known to be deprived, regardless of their schools” or an aggregate over “deprived schools” or “schools with more than [x]% of deprived pupils”, but whatever that means, it seems that 85.1% of that population were given CAGs of A – C.
After the standardisation model was run, and the final grades determined, then it seems as if 69.9% of that same population ended up with A – C, 15.2 percentage points different – which is the BBC’s number.
That explanation is different, and seems to make sense: that key number 85.1 doesn’t appear in the BBC report; the only reference to “85” in that report is a quote by Nicola Sturgeon, which, by itself, is a bit out of context.
Ho hum… interpreting these numbers can be quite a challenge!!!!
> If these figures are true, then a higher proportion of ‘deprived’ CAGs were downgraded than for ‘wealthy’ CAGs.
I think this is the right interpretation, and I’d like to suggest why this has happened.
In private schools and schools in wealthy areas, achievement is usually very high. In some cases, there is not really any scope for optimism. For example at Jordanhill, a school in a wealthy area of Glasgow, 92% of children passed 5 or more National 5s in 2019. So even if teachers were as optimistic as possible, and estimated that every pupil would pass, if the grades were adjusted down to last year’s level, that’d only be an 8% drop. Prediction in these circumstances is really easy: teachers can just pick random numbers somewhere between last year’s results and 100%, and they can be sure that the adjustments will be pretty small.
In contrast, in schools in deprived areas, exam performance is a lot more variable. There’s more scope for optimism, and prediction is harder, so we should expect adjustments to be larger.
One more point. You suggest as a possibility:
> The statistical model is intrinsically unfair, or biased against deprived schools.
If it’s the same approach as Ofqual’s then it’s not biased against deprived schools, exactly. Schools will receive the same average marks (with a little inflation thrown in) as they usually do. But it’s severely biased against high-performing pupils in deprived schools, who have been robbed of the opportunity to outperform their environment.
Splendid! Great points! Thank you!
A further thought on the ‘Three Teacher’ story (four comments above)…
…or did the statistical standardisation algorithm shift the honest teacher’s grades up? Might that explain the 9,198 grades that did in fact go up?
Just had a quick look at the Excel spreadsheet and there looks to be some massive grade inflation! In the sheet “scqf 7” which looks like the Advanced Highers sheet, it says that in 2020, 38.4% of candidates were awarded A, compared to 31.8% in 2019. That is an increase of almost 21% in the top grades.
In UCAS points, A at Advanced Higher is equivalent to an A* at A level – 38.4% of candidates got the equivalent of an A*?!!! It appears that 40.9% of all candidates for Maths got an A, imagine 40.9% of A level candidates getting A*!
Have I read the figures correctly? Do SQA really give almost 40% of Highers candidates the equivalent of an A*?
Good morning Tania
Your reading of the spreadsheet looks pretty good to me!
And as you surely will have seen too, grade B is also up (27.6% this year, 24.9% last), with grade C down (18.9% this year, 22.7% last) – so that’s a ‘drift’ to the higher grades.
Grade D is up a little (9.6%, 8.4%), but grade U has more-than-halved (5.5%, 12.2%).
The inflation of grade A (6.6 percentage points) is greater than the A – C ‘official’ measure (5.9), so it look as if the SQA were more concerned about the overall A – C measure than the distribution over grades.
Is this a tacit legitimisation of the game-player? I have a cohort of 10, and the A – C measure is, say, 60%. So I submitted 6 grade As and 4 grade Ds… Did I get away with it?
‘scqf 5’ = GCSE shows a nearly similar pattern, A and B up, C, D and U down, but in this case the overall A – C measure (2.9 percentage points) is higher than the increase in grade A (2.4), so the up-drift is less pronounced.
It would be fascinating to analyse individual school’s submissions to identify the extent to which submitted grades were in line with “no grade inflation”, or systematically rounded up, or were within the upper limit of statistical variation over the last three years, or were maybe just playing games… As I said in an earlier comment, if only the rules had been made clear from the outset…
Hmm, looks to me like the Scottish students should be on the whole very happy with that drift to higher grades and fall in fails. Of course the statistical method will have produced unfairly low results for some candidates, but they are able to appeal if the school can show that their grades were unfairly reduced. The only losers are the people who were hoping to get university places in Clearing as presumably a higher number of people will have achieved their offers.
Just reflect – a D in Advanced Highers is worth 32 UCAS points, which is a C at A level. In UCAS terms, 94.5% of AH candidates got equivalent to A level C or above. And then they pay no university fees – border envy!
I do not think English students will have such an easy ride.
It will be worse for English and Welsh students. 25% of Scottish predicted grades have been downgraded and the students have access to free appeal. 40% of English and Welsh predicted grades will be downgraded and the students don’t have access to proper appeal.
If your prediction is accurate, it follows that the chance of a student getting all three teacher assessed grades is less than 22% (60% x 60% x 60%). Each subject will have the same probability and with most students taking three subjects at A level there are eight grade permutations – assuming that the range of outcomes is limited to two grades (e.g. A/B, B/C and so on). In Scotland, it is alarming to see a number of students who have dropped two or more grades from their predicted ones. Even if A level students in England are only at risk of dropping one grade per subject, it seems likely that thousands will not meet the conditions of their first choice university. It is little wonder then that Ofqual are asking universities to be flexible if students. Add to this the negligible grounds for appealing grades in and we are on the cusp of a perfect storm.
It’s worse if you need high grades for your offers. Oxbridge STEM entrance requirements are often A*A*A and offers can be higher. Engineering entrance requirements for Russell group are often A*AA. Medicine entrance requirements are probably about the same.
By my calculation, 36% pro predicted A*s and 43% of predicted As will be downgraded.
It’s disappointing that the educational leaders and MPs are mostly silent.
Dear Dr Meadows, as you have shown an interest in this blog previously perhaps you’d want to consider this. An article in TES today with regards to the situation in Scotland quotes the following –
On Scottish Qualifications Authority results day yesterday, the SQA also published a methodology report on how it awarded students their grades.
Statistician Professor Guy Nason, of Imperial College London’s department of mathematics, has analysed the SQA document in a paper that includes the following key observations:
1. The SQA process of awarding grades should be like a driving test – but it’s not
“In a national exam, a student might have their grade changed, but this would only be loosely influenced by other students as the whole national cohort is used to form grade boundaries,” says Professor Nason. “However, in the SQA’s new standardisation process, a student might have their grade changed as a result of what students and teachers in their local centre had been doing. By contrast, if you take a driving test in the UK, it is a national test set to national standards. Your result should not depend on what has been happening in your local town.”
2. Even for a leading statistician, it is difficult to understand some of what the SQA means
On one passage of the SQA document, Professor Nason asks: “Is this statement saying, ‘We tried different fudge factors to get the outcome we want’? It is essential that the SQA publishes full algorithmic details about what is going on here, preferably with anonymised sample data for us to even begin to understand what is being attempted, and then we can come to some assessment of whether it is suitable and fair.”
3. It was an advantage to take a course in a school or college with no history of offering that course in the past
“If your teachers are typical, then there will be a tendency to over-predict and, since you are in a new centre, your grades will not be modified [as they are elsewhere],” says Professor Nason. He adds that “the differing treatments of new, recent and established centres again means that we have to question whether these are ‘national exams’ with all students being treated in the same way”.
4. SQA needs to publish more details about its methodology
Professor Nason says: “In due course, we would expect to see the full publication of algorithms and sample data sets so that the community can come to a mature understanding of what has happened and, hopefully, feed positive alterations into future models. Reading about the SQA’s process is enlightening, but the wordy explanation is sometimes confusing and ambiguous.”
5. The priorities of the SQA’s system of standardising results are seriously questionable
“The problem at the heart of the statistical standardisation is that it can be simultaneously unfair to individuals, but also maintain the integrity of the system,” says Professor Nason. “However, if system integrity damages the life chances of individuals, then it is not much of a system.”
Dr Meadows, given than Ofqual’s model would seem to operate in the same way (although we can’t tell yet as you’ve declined to release the details), how would you argue that the same points couldn’t be made about your model? Is Ofqual happy to damage the life chances of individuals in the pursuit of the maintenance of the “integrity of the system”? Because unlike Scotland, students in England won’t have the right to appeal on the outcome of the model’s workings.
“The SQA process of awarding grades should be like a driving test – but it’s not”
It shouldn’t be like a driving test. There was no test: that’s the entire reason for the process.
“if you take a driving test in the UK, it is a national test set to national standards. Your result should not depend on what has been happening in your local town.”
But the Department for Education gave Ofqual the brief of developing a system for awarding grades without assessments and without permitting a significant difference in the distribution of grades between 2019 and 2020. While exam boards’ marker standardization is intended to establish something like a national standard, there is always a degree of norm-referencing as the grade distributions are required to be stable year-on-year. That’s why we get nonsense like GCSE maths exams with a pass mark (grade 3/4 boundary) of 17% in the first year of the new-specification GCSEs.
“Exam results appeal process ‘more important’ than ever, says Nicola Sturgeon, amid criticism over ‘discriminatory’ grading”
Why is it that the Scotland can have that view but not England and Wales?
“Scotland’s exam result fiasco is coming to England”
“GCSE and A-level pupils will receive life sentences based on dubious models with no real right of appeal”
It looks like the appeal rules are being interpreted more generously now – https://www.gov.uk/government/news/appeal-arrangements-for-as-a-levels-and-gcses
Used to imply a major cohort change required for an appeal, now says:
“or where – because of the ability profile of the students – a centre was expecting results this year to show a very different pattern of grades to results in previous years. That could include where the grades of unusually high or low ability students been affected by the model because they fall outside the pattern of results in that centre in recent years. In most cases, this will only be apparent by reviewing centre wide data. Therefore centres, rather than individual students, will be best placed to consider whether this has occurred”
Promising! I would be happy if the school has to support any appeal, as they had to in the past although it was a loose requirement.
This contains a claim of the “small cohort” numbers. 5 and under in subject cohort gets CAG. 6-15 is the sliding scale. 16+ it’s algorithm.
I’m not a statistician but these numbers seem incredible small in terms of reliable data? Especially as subject cohort size changes every year.
6 in a subject cohort will often mean only 1-5 in one or more of prior 3 years; will the grades of this 6 really have some weight on the model with such minimal historical data? Likewise a cohort of 16 with prior cohorts of 15 and under getting no weight at all on the CAG’s?
I thought I posted this last night but it does not seem to have worked.
It looks like OFQUAL are relaxing the interpretation of appealing because the cohort is different from previous years, without admitting they are changing anything:
Previously I replied to you, “With 9 students spread over 7 grades A*-U, you will have very small numbers in each grade band. A*, A, E and U will have even smaller numbers. So a cohort of 9 must count as very small.”
Unfortunately I didn’t expect how far Ofqual would go in abusing statistics. A cohort of 15 means you have 15 students spread over 7 grades for A levels (A*-U) and over 10 grades for GCSEs (9-U), so the number of students in each grade band is really, really small, far too small for sensible statistical modelling.
Also, the reliability of Ofqual’s prediction (and it is a prediction, not standardisation) doesn’t just depend on cohort sizes, it also depends on the cohorts’ variability.
It is a good example of how not to use statistics.
After months and months of telling us that they are refining the details of their model, all we get is something that is so crude.
I don’t know what the statisticians ant Ofqual have been thinking.
I think any real statisticians at Ofqual were overruled long since. I wouldn’t be surprised if it were a disgruntled statistician that leaked the 5,15 band. Still don’t know what the numbers really mean though, there are both prior cohorts and the current one to consider. If the 15 number were reasonable (doesn’t look it) then it should mean that there need to be at least 45 spread over the prior three years. Or if the syllabus is new the previous year 45 in that previous year. But I am getting used to the way Ofqual behave so I’ll bet heavily against that. Once they have decided to ignore proper confidence limits at one point in the calculation they might as well ignore them at each stage of the calculation.
Information on appeals is out: https://www.jcq.org.uk/wp-content/uploads/2020/08/JCQ-June-2020-appeals-guidance.pdf
It’s a two-(or more)-stage process. Each stage takes up to 42 days. Centres which appeal on 13 August might hear back by 5 November.
So the appeal results will be far too late for university admission. And it’s after the A-level Autumn exams have started, which puts the student in a strange situation.
Ofqual has in effect shifted the responsibility of handling appeals on to universities and post-16 education providers, which will have to decide, when a student misses the offer grades, was it due to the unreliability of the standardisation or due the student’s ability.
Hi everyone – thank you for your lively comments!
I write this on 20 August, GCSE results day.
Ofqual didn’t do too well on the ‘Slide 12’ paper… and seems to have come something of a cropper as regards the whole exam…