Skip to content
The UK's only independent think tank devoted to higher education.

Two and a half cheers for Ofqual’s ‘standardisation model’ for GCSE, AS and A level grades – so long as schools comply

  • 18 May 2020
  • By Dennis Sherwood

This blog has been kindly written for HEPI by Dennis Sherwood of Silver Bullet Machine, who has been tracking the state of this year’s public exams for HEPI.

On Friday, Ofqual announced the key principles that will be used to ensure this year’s GCSE, AS and A level grades will be ‘as fair as they can be’. Since there will be no exams, for each subject:

  • Each school has to submit, to the appropriate exam board, their suggested ‘centre assessment grade’ for each candidate, and also the rank order of candidates.
  • To ensure that grading is fair across the country, the board will then apply a ‘standardisation model’ to compare the submitted centre assessment grades to historical data relating to the actual grades awarded to that school’s candidates in 2017, 2018 and 2019 for A level, and, for GCSE, those years in which the exams have been graded 9, 8, 7….
  • If a school’s centre assessment grades are different from those resulting from the ‘standardisation model’, some or all of the grades will be adjusted before being issued, but the submitted rank order will not be changed.
  • Overall, the board will make sure that, at a national level, grade distributions are broadly in line with previous years.

To me, this all makes good sense. The rules are simple. There are no ‘behind-the-scenes’ statistics and the process can be replicated at every school. So teachers can have confidence that their centre assessment grades, submitted in compliance with their historical averages, will have a high likelihood of being confirmed rather than over-ruled. A measure of success of the process is therefore the ratio of confirmed centre assessment grades to the total number submitted, as determined for each school, each subject, each board and overall. The closer this number is to unity, the better.

Ofqual’s key objective is to prevent grade inflation – which is what the fourth bullet point is all about. To achieve that, the distribution of grades for each subject within each school must be the same as the average over recent years, hence the ‘standardisation model’. If this is the case for each school, then the aggregate will work too.

Subject to one nasty problem, as illustrated in this table, which shows the numbers of candidates awarded grade A* in six different schools over each of the last three years:

The total number of A*s awarded each year was 58, 62 and 60, and so from the board’s point of view, ‘no grade inflation’ means about 60 awards this year, and certainly not more than 62.

Each school has an average of 10, and a range of ± 2. If each school submits 10 this year, the sum is 60, which is fine.

But suppose each school says, ‘This has been a good year, and we think 11 students merit an A*. We’d really like to go for 12, but we appreciate that’s pushing the boat out; 11 should be safe.’

So the six schools each submit 11 candidates, giving a total of 66. Even though each school has behaved reasonably, grade inflation occurs. The board must intervene.

But how? Does the board go to each school and say ‘please explain?’, in response to which the school will indeed explain.

School C, for example, might claim that their data shows they have been on an improvement path, and so their 11 grade A*s are justified. But by the same token, if school C is improving, school D is declining. Should they be awarded only six?

Every school will have a reason why ‘we are a special case’, and these will be impossible to judge fairly. So to me, the most sensible option is for the board to apply exactly the same rule in exactly the same way to everyone, and reduce each school’s number to 10. That’s why the boards need the rank orders, so they can down-grade the lowest-ranked students.

To prevent grade inflation, for every school that submits 11, another has to submit 9. Which just won’t happen. So it’s in everyone’s interests to submit the average, 10.

But what about poor Isaac at school G? He is particularly gifted at Physics, and his school recommends him for an A*, even though the school has never achieved above grade B for years. The submission on behalf of Isaac will easily be identified as an outlier and so is quite likely to be disallowed. Isaac, however, will not be consulted; nor will his teacher. So Isaac will be awarded grade B, consistent with his place at the top of the rank order. He will be a victim, and his school too, for this year’s process traps all schools as prisoners of their pasts.

But before we weep too much on Isaac’s behalf, let us remember that Isaac is just one of the huge number of people disadvantaged (to say the very least) by this most pernicious virus, and although this is a pity, many people have suffered far more gravely, and without recourse to the autumn exam at which Isaac can prove his A*++. 

And although Isaac is indeed a victim this year, let us not forget the 750,000 annual victims of the exam system in England – those who, in each of the last several years, were awarded a grade lower than they merited. Neither they, nor anyone else, knows this has happened, so they truly are victims of an unseen, unreported and unpunished injustice.

Despite these problems, I still believe that this year’s grades, resulting from the rank orders submitted on the basis of teacher judgement, will be fairer than those based on the rank orders determined by exam marks.

To illustrate this, here is a chart from my simulation of the marking and grading of 2019 A level Economics:

Suppose ten students from the same school take the exam, and each receives marks from 55 to 64 inclusive, as shown on the left. Candidate A is given the highest mark; candidate B the lowest. On the right are the results of my simulation of what happens when each script is marked by a different examiner, with all examiners, in both cases, drawn from the same pool. As can be seen, most of the marks are different: not because marking is sloppy, but because marking is ‘fuzzy’ – or, to use Ofqual’s words, because ‘it is possible for two examiners to give different but appropriate marks to the same answer’.

‘it is possible for two examiners to give different but appropriate marks to the same answer’

The consequences of fuzziness are dramatic. Candidate B is now ranked fourth, higher than candidate A, who is no longer ranked first, but ninth. Much else has changed too.

If all marks from 53 to 66 fall within the same grade width, then these changes in rank order do not matter. But if there are any grade boundaries within this range, the grades corresponding to the marks on the left are highly unreliable. Which is one of the explanations behind Ofqual’s infamous statement that ‘more than one grade could well be a legitimate reflection of a student’s performance’.

‘more than one grade could well be a legitimate reflection of a student’s performance’

The rank orders used to determine GCSE, AS and A level grades have been unreliable for about a decade. Surely the rank orders based on teacher assessments will be more reliable, and the resulting grades therefore fairer

And even if poor Isaac is a victim of this year’s process, there will be far fewer victims than in previous years. That’s why this year’s process is very likely to give fairer results than hitherto – and more trusted too, especially if all schools submit centre assessment grades that comply with the rules, so that the ratio of confirmed grades to the number of grades submitted is nearly one.

14 comments

  1. Kieran McLaughlin says:

    Fascinating as ever

  2. Charles Ben-Nathan says:

    I think there is a flaw in the conclusion that the teacher assessment will only be more trusted if grades fall in line with previous years.

    As teachers we are best placed to know how students will perform and the levels of the understanding of their subject. That said, all teachers will know that on results day they are often surprised by those students who over perform and those who under-perform. This is impossible to account for with CAGs and rankings. By the very definition of the words “over perform” and “under-perform” these are deviations from expectation. We can’t expect the unexpected, at least not so precisely. Furthermore, it is much easier to under-perform than over perform.

    For a student to make an unexpected leap up a grade may be evidence of some last minute hard work that teachers were unaware of, of elements of the subject suddenly clicking, and of course some marginal pupils will tip themselves over a boundary. There is also the randomness of marking that you do allude to. In terms of the boundary pupils, some marginal pupils tip themselves the other way, and also the vagaries of marking will push some pupils below their ‘true scores’. However, I suspect that in terms of exam day performance there is really only net downward pressure. Pupils are much more likely to go into an exam and forget crucial information they know than have information arrive in their minds that they never learnt. Similarly, they are more likely to make mistakes with technique than have a lucky guess, or the perfect paragraph appear.

    This means that exam grades and results generally are always lower than they ‘should’ be and what students deserve based on their proven ability in the subject. I would disagree with your conclusion and say rather than a modest increase in grades discrediting teacher assessment, a maintenance of the grades would discredit teacher assessment which would be then seen as falling into line with and condoning a very flawed exam system.

  3. Rob Cuthbert says:

    Who will be the one to tell Isaac: “you’re suffering unfairly but most people aren’t, so that’s OK”! The difference here is that unfairness is a direct consequence of the system adopted, not an unintended byproduct of fuzzy marking. As I understand it there is no appeal against anything but arithmetical or similar errors in the process. The standardisation method itself cannot be criticised, and it is designed to deliver a similar profile of results overall, not to be fair to individuals. Penalisation of schools/centres with poor track records is therefore baked into the method, and there is no recourse other than an Autumn ‘resit’ – which probably imposes a gap year for all who wish to resit, with inevitable financial and academic penalties. Is planned unfairness better than accidental unfairness in a flawed system? It might take a judicial review to find the answer.

  4. Hi Rob – you raise very valid, and important points, in a most articulate way! And I agree with you about the unfairness to Isaac.

    But I don’t think I agree that the unfairness of exam marking is an “unintended by product”. It may not have been deliberately intended at the outset, like some form of cruel punishment. Indeed, that is not the case, for in the ‘old days’, great care was taken over candidates close to boundaries. That has long since gone for school exams, and the continued existence of the problem is to me not an “unintended byproduct” but cast-iron evidence of a deliberate policy not to solve it.

    Ofqual’s announcement of 11 August 2019 states “…more than one grade could well be a legitimate reflection of a student’s performance…”, preceded by “This is not new, the issue has existed as long as qualifications have been marked and graded.”

    So the unfairness attributable to fuzzy marking is known to have deleterious consequences, and has been known for a long time. But nothing has been done to fix it. Which is not difficult to do – see, for example, https://www.hepi.ac.uk/2019/07/16/students-will-be-given-more-than-1-5-million-wrong-gcse-as-and-a-level-grades-this-summer-here-are-some-potential-solutions-which-do-you-prefer/.

    I certainly don’t think that this year’s process will be ‘perfect’: I just hope that it will result in fairer outcomes than the past, which it might and can. In England at any rate – what’s happening in Scotland seems to me to be nuts (https://www.tes.com/news/its-impossible-meet-sqa-grading-demands).

    I also hope that fairer outcomes this year will enable an exploration of an overall better way of doing things. A better way that makes sure all Isaacs are never victims again.

  5. Janet Hunter says:

    Hello Dennis,
    Thank you for an interesting read. I came across your article, after reading the government document outlining how GCSE and A Level grades will be calculated, as a parent of a Year 13 outlier, looking for an explanation of what the standardisation process might mean for such pupils. Your article was clear and helpful but did not do much to dispel my fear that 13 August might not be the day of celebration we were looking forward to 3 months ago.
    I have a couple of observations I would like to address here and would welcome your thoughts.
    Firstly, looking at the example you have given here, there are 6 schools who regularly have around 10 students achieving A* in Physics and one school that does well to achieve a top grade of B. Isaac, being the student who doesn’t fit the statistical model of his school is bumped down to a B to avoid overall grade inflation. My first question then is what happens to the grades of the other students taking Physics in his school? If the next pupil in the rankings was teacher assessed as a grade B, surely knocking two grade levels off Isaac’s assessment will also have to impact on the grade for the second pupil, since as the teacher assessment had them two grades apart, it would make no sense for them to end up with the same grade. Any grade reduction for outliers would therefore also end up disadvantaging other students in that school.
    Secondly, your explanation of fuzzy marking suggests that one in every four grades in a normal year is wrong. This might be advantageous or disadvantageous to an individual student. If A Level students on average take 3 subjects, it might not affect every student and for students who are affected, is likely to only affect one subject.
    However in your example of Isaac above, whose assessed A star grade is bumped down because it does not fit the historical school statistics, the underlying assumption is that his school does not have pupils of Isaac’s ability (so the assessment must be wrong) and/or the teachers are not able to teach to an A star level. But let’s assume that Isaac is an A star Physics pupil. The chances are that he works to a high level and is an outlier in his other subjects too. Therefore, unless the school just has a particularly poor Physics department, there is a high probability that he will be marked down in his other subjects as well. Let’s assume that he has been predicted A* in Physics and As in his other subjects and has firm and conditional offers at Universities based on these predicted grades. In a normal year he might be unlucky and receive a lower grade in one subject, giving him either AAA or A*AB. This will be disappointing but unlikely to be a disaster. This year however, due to his school statistics, and other, more successful, schools bumping up their numbers slightly, he loses a grade in every one of his subjects. He now has ABB and his university choices look a lot more shaky, as does his self-confidence, all because he went to a school with a comparatively lower historical level of achievement. That’s a bit more problematic than inconsistent marking, as it embeds inequality between different institutions. I truly hope that does not happen this summer.

  6. Hi Janet

    Thank you for this very thoughtful contribution; you raise important points, which will be on many peoples’ minds.

    I’ll try to answer as clearly and helpfully as I can, but let me firstly note that I am not an ‘insider’, I don’t work for Ofqual, the boards, or government. I am an independent, and the documents published by Ofqual are my only source of information. But I have thought about things a bit too.

    Your first point about the grades of Isaac’s friends.

    What I believe will happen is that school G will recognise Isaac’s gifts, and suggest he is awarded an A* in Physics; the other Physics students are OK but not exceptional, so Laura and Mary are assessed as B, and Peter as C. They then submit those grades (the distribution being 25% A*, 50% B, 25% C), and the rank order Isaac, Laura, Mary, Peter.

    The board then carries out their process of ‘statistical standardisation’. On Friday, Ofqual published the results of their recent consultation. They haven’t spelt out all the details, but they did say “…The statistical standardisation model should place more weight on historical evidence of centre performance (given the prior attainment of students) than the submitted centre assessment grades…” (page 10).

    Thing were more explicit in a previous statement published on 15th May: “For AS/A levels, the standardisation will consider historical data from 2017, 2018 and 2019. For GCSEs, it will consider data from 2018 and 2019, except where there is only a single year of data from the reformed specifications.”
    Furthermore, they’ve also said “the trajectory of centres’ results should not be included in the statistical standardisation process” (page 11).

    My hunch is therefore that the boards will compare the submitted distribution to the corresponding three-year average for students from that school in that subject, and not more than that. But that’s a hunch, not definitive!

    So let’s suppose that for school G, that average, for Physics, is 50% B, 50% C.

    That’s it. Isaac and Laura will be awarded a B; Mary and Peter a C.

    The rank order as submitted by the schools will be maintained (pages 11 and 12 here) – which is important, for it indicates that there’s something the boards will not do. They won’t say “Ah! We know that Laura only got a 2 in Physics GCSE. No way could she merit a B at A level. So we’ll (unilaterally) downgrade her to a D, which, according to all our statistics, is the highest grade ever achieved at A level Physics by a student who got a 2 at GCSE.” They can’t do that because that would require a change in the rank order from Isaac, Laura, Mary, Peter to Isaac, Mary, Peter, Laura. Which they have undertaken not to do.

    One more point here if I may. In practice, that average, 50% B, 50% C, will be associated with some variability, which might cause those submitting centre assessment grades to seek to use some ‘wriggle room’, and to believe that it’s safe to submit grades that are above the average, but not over-the-top. That’s dangerous, as my blog describes. The only ‘wriggle room’ is what Ofqual might tolerate as regards overall subject-level grade inflation. But I’m not holding my breath on that one. The last thing Ofqual want is to be seen to have lost control.

    Your second point about Isaac’s performance in other subjects. I’m sure that your hypothesis that he might be pretty bright across-the-piece makes sense to me, and your point is powerful.

    Ofqual have confirmed that all the ‘statistical standardisation’ will “operate at subject level, not centre level”, so, within each school, each subject is taken on its own. So if Isaac’s school has a good historic track record of A*s in, say, Further Maths, Isaac’s A* here is safe. Where Isaac really loses out is when he is a star in an otherwise uniformly dull sky, and I really can’t see a way out of that one. What might happen, though, is that university admissions officers decide not to take a ‘legalistic’ approach this year, and – being aware of this year’s context – look beyond the published grades.

    So your point about a gifted individual being unfairly dragged down by the school’s weak past record is certainly valid. It can happen and it probably will. But I sincerely hope in very few cases, and I also hope that university admissions officers are alert to this possibility, and that the school will make a big fuss on that student’s behalf.

    But even with all this, I still believe that there will be less unfairness this year than in previous years, primarily because of the (in my view) better reliability of a teacher’s rank order as compared to the exam-rank-order-lottery, as discussed in the blog.

    So let me finish with this thought. Consider candidates who sat A level in 2019.

    Chris did Maths, Further Maths and Physics. Ali did English Language, English Literature and History.

    What is the probability that both Chris and Ali were awarded the grades (of all flavours) that they truly merited?

    Using the grade reliability data published in November 2018 in Ofqual’s Marking consistency metrics – An update, my calculation gives these results:

    Chris: 0.91 x 0.91 x 0.88 = 81%

    Ali: 0.61 x 0.58 x 0.56 = 20%.

    And the probabilities that all three awarded grades are wrong are:

    Chris: 0.09 x 0.09 x 0.12 = about 0.02 % (say, 2 in 10,000)
    Ali: 0.39 x 0.42 x 0.44 = about 7% (that’s 700 in 10,000)

    As a by-the-by, the cohort for 2019 A level English Language (in England) was 14,144; English Literature, 40,824; History, 51,438. It is by no means improbable that 10,000 of those did all three. Of whom about 700 were awarded three wrong grades. And there could be a gender bias as regards subject combinations too.

    This unfairness has been hidden for years. And aggravated by the fact that since 2016, the ‘victims’ have had no grounds for appeal.

    So, I hope that’s helpful. Thanks again, and if you have any further points, please do post another ‘comment’, or do get in touch with me directly at dennis@silverbulletmachine.com.

  7. Victoria says:

    A very interesting article and comments which raise many valid points. I am interested in your views on the consultation outcome which stated that Ofqual were still looking at how the statistical analysis could be sensitive to small centres and cohorts. How do you think this might apply to Isaac and his friends? Is there more hope for him to receive the results he deserves in this scenario?

  8. Michael Bell says:

    Hi Janet

    I am in the same position as you are i.e. the parent of a Year 13 outlier, with my daughter’s predicted A level grades of A*/A/A in a school that hasn’t managed above a C in the subjects she’s taking (she achieved 9s at GSCE so we have objective evidence she is a high achiever).

    Like you I have huge fears for 13 August, and to me, regardless of the points Dennis makes, it still seems inherently wrong that my child will suffer simply because she happens to attend a school that has poor historic performance. Someone else has posted here “Penalisation of schools/centres with poor track records is therefore baked into the method” – this is undoubtedly true and must surely be challenged? A judicial review has been mentioned as well and I wonder whether this is the only way we can ensure our children receive the results they deserve.

    There must surely be other parents/students out there in the same position and we should be looking to join together to take action on this.

  9. Hi Victoria

    Yes, small cohorts present particular, and important, problems. As I mentioned in my response to Janet, I don’t “know” the answer, but I think I can offer some thoughts.

    The extreme case is the historic cohort of zero. At my school, for the first time, I have a student in, say, A level Arabic; the student has been taught by a new teacher to my school, a Syrian refugee whose day-job is to teach maths, but who has been kind enough to teach this student too. Though not a native speaker, the student is certainly diligent, but the teacher, being new, has no basis to judge the standard. So we submit an A*.

    The board has no relevant history against which to judge this submission, although they do have information on previous whole cohorts. But that information has no relevance in this specific case.

    To me, the only sensible outcome here is for the board to accept the submission and award the A*. Arabic is a small-cohort subject overall, and that A* will have little effect on the whole cohort distribution; even if it does, no politician is going to kick up a fuss about grade inflation in Arabic – and it’s the political fuss that Ofqual wishes to avoid.

    That, I think, is an easy case; much tougher is a school which, for several years, has had a small cohort – say 10 or fewer – in a mainstream subject such as, say, Physics.

    I’ve done lots of simulations of small cohorts, and the ‘take home’ message is that the statistics are, in general, all-over-the-place, in that (with reference to the table in the blog) the range for any grade is often quite wide. So if Isaac’s school has had 1 grade A* in the last three years, there is at least a precedent, even though 1 A* per year is well below the average of 0.33. The worst situation for Isaac is the one I portrayed: small historic annual cohorts and no grade higher than a B.

    But even so, the school could still submit an A* for Isaac, and have that grade confirmed, for the boards know that constraining submissions to the grade average (in the table, 10) has increasingly less statistical validity the smaller the cohort. And I am confident that they wish to be as fair as possible to everyone, whilst still under orders from Ofqual to stop grade inflation.

    So, if I were the board, I would adopt a rule along the lines of “If the aggregate grade distribution for this subject across all schools has jeopardised grade inflation causing schools’ submitted grades to be over-ruled, start with the schools with the largest cohorts, keep going with progressively smaller cohorts until everything works, and then stop”.

    That ensures “no grade inflation”, but puts small cohorts at the end of the process, with the possibility that the process never reaches them. To me that makes sense – it recognises that the statistics work better for large cohorts than small ones, and is pragmatic: the process has to start somewhere, so that gives a rule for where to start. In the example in my blog, this did not apply: there were only six schools, with identical cohorts. So imagine other columns for schools with cohorts of 100 each; schools A to F could be left alone if schools J and K had gone over the top. So I think my “play safe, not games” message is still valid, and, I trust, sensible.

    But let me stress once again that I don’t know, and what actually happens might be different…

  10. Michael, Janet – yes, it is wrong.

    Ofqual’s original announcement of 20th March (https://www.gov.uk/government/news/further-details-on-exams-and-grades-announced) stated that teachers will be invited to submit grades, but did not mention rank orders; it also left the door open for some form of dialogue between boards and schools to enable schools to justify outliers.

    That door closed with Ofqual’s consultation document, published on 15th May (https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/879627/Exceptional_arrangements_for_exam_grading_and_assessment_in_2020.pdf), which introduced rank orders; the door was firmly bolted when the results of the consultation were announced on 22nd May (https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/887048/Summer_2020_Awarding_GCSEs_A_levels_-_Consultation_decisions_22MAY2020.pdf).

    Second-guessing all this, I can only conclude that Ofqual feared that too many schools would seek to ‘game’ the process, submitting over-the-top grades, blowing grade inflation sky high. I also conclude that they decided it would be at best time consuming and at worst impossible to distinguish between the chancers and the valid outliers. The upshot of all that, most unfortunately (and I really do mean that), is that the “Isaacs” will be damaged. And my understanding of the process for appeals leads me to believe that the only grounds for appeal are very narrow and technical, and would not help “Isaac”.

    But I still believe that teacher rank orders are much more reliable than the exam lottery, even if the grades determined by those rank orders are constrained by history. And it is my fervent hope that this year’s process will be seen to be a success, and does indeed open the door to the design and implementation of a future assessment process in which all the candidates, including “Isaac”, are given awards that are both fair and reliable.

    In case it might be of interest, may I draw your attention to a consultation on “The Impact of Covid-19 on education and children’s services”, currently being undertaken by the Parliamentary Committee on Education. This specifically includes “The effect of cancelling formal exams, including the fairness of qualifications awarded and pupils’ progression to the next stage of education or employment”.

    This consultation is currently accepting evidence – https://committees.parliament.uk/work/202/the-impact-of-covid19-on-education-and-childrens-services/.

    If as many people as possible in your position might be willing to make a submission…

  11. Michael Bell says:

    Dennis,

    Appreciate you taking the time to provide the link to the current consultation, thanks.

    Michael

  12. Janet Hunter says:

    Hello Dennis,
    I just wanted to say thanks for your reply. I did have another question about small cohorts, which is covered by your answer to Victoria’s post. In my son’s case, we perhaps don’t need to be too concerned as it is a new school with only one year’s worth of A Level results. For two of his subjects, there are no historical results and for the other two, the previous year’s cohorts were 3-5 students. There are GCSE results of course, but these should show an outlier in 2018!
    Michael, I agree, it is a huge concern. I agree with Dennis’s point that universities will no doubt give a bit more flexibility this year, so in the longer term grades below what you know a pupil would have achieved (and with plenty of evidence to support this which won’t have been considered) perhaps won’t matter too much, but the sense of injustice experienced on the day of results itself might be quite damaging. I would certainly want to take some sort of action in that event.

  13. hi everyone – super, my pleasure, thank you!

  14. E. Richardson says:

    My daughter is a Cambridge offer holder due to study medicine after shadowing GPs ,hospital doctors and surgeons, personnal statement, very high GCSE results, taking interviews and scoring highly in the BMAT admissions test. She requires a minimum A*A*A to complete her offer and is expected to achieve A*A*A*A* , after achieving A*A*A*A* in her mocks and all other assessments. Unfortunately the school is not high achieving. Cambridge has already stated that if the offer is not achieved she may take the Autumn exams but not be able to start the course until 2021. Could she be forced to take a year out delaying her 6 year medical course through no fault of her own due to standardisation and students results from years ago? There are going to be many thousands of high achieving students requiring the grades they deserve to progress but not in high achieving schools.

Leave a Reply

Your email address will not be published. Required fields are marked *