Skip to content
The UK's only independent think tank devoted to higher education.

The price of grade unreliability, and an idea for the new Chief Regulator’s in-tray

  • 13 July 2021
  • By Dennis Sherwood

This blog was kindly contributed by Dennis Sherwood who has written a wealth of articles for HEPI on A levels and Ofqual. You can find, follow and debate Dennis on Twitter @noookophile.

On Friday 9 July, the Times ran an article under the headline ‘One better grade at GCSE could scoop £200,000’; similar stories appeared in the Telegraph, TES and the Daily Mail too. The source of these stories is impeccable: a press release from Schools’ Minister Nick Gibb entitled ‘Higher GCSE grades linked to lifetime earnings boost’, with these opening paragraphs: 

Those who perform just one GCSE grade better than their counterparts across nine subjects have been shown to earn on average over £200,000 more throughout their lives.

For the first time ever, statisticians and economists at the Department for Education have established a direct link between GCSE attainment and an increase in lifetime earnings after tracking the earnings of more than two million people in England.

And later:

The research looked at people in England who sat their GCSE exams between 2002 and 2005 alongside earnings records and found that those who achieved just one grade higher than their counterparts in one subject saw an increase in their lifetime earnings by an average of £23,000.

Those who secured one grade higher than their counterparts across nine subjects are likely to earn on average £207,000 more in their lifetime. The research, which took 18 months to develop, will be pivotal in creating new policies going forward. For example, when a new policy is developed to help pupils achieve better GCSE grades, this data will be used to create a quantitative, monetary value to evidence how a policy can affect earnings outcomes.

Most of the press, understandably, focused on the £200,000 figure, which is a big number. My eye, however, was drawn to the words ‘those who achieved just one grade higher than their counterparts in one subject’. For those words caused me to wonder if Nick Gibb has digested the importance of a statement made to the hearing of the Education Select Committee on 2 September 2020 by Ofqual’s then Acting Chief Regulator, Dame Glenys Stacey. In reply to the hearing’s very last question, from Ian Mearns, Labour MP for Gateshead, Dame Glenys acknowledged that exam grades are ‘reliable to one grade either way’.

Juxtapose the claim from Nick Gibb that ‘just one grade’ can make a substantial difference to an individual’s lifetime earnings with the statement from Dame Glenys that exam grades are ‘reliable to one grade either way’. What quantum of lifetime earnings has been lost by those students who were awarded one grade too low? Earnings lost not because the individuals lacked the intrinsic ability to gain those earnings, but because Ofqual (or rather, given the timing of 2002 to 2005, Ofqual’s predecessor, the now-defunct Qualifications and Curriculum Authority) had ‘awarded’ them grades that were wrong. Perhaps the statisticians and economists at the Department for Education might take a moment or two to estimate that.

This is all the more ironic in the context of an online speech given on that same day, 9 July, by Nick Gibb’s boss, the Secretary of State, Gavin Williamson. I quote from towards the end:

Education’s purpose is to unlock an individual’s potential so that they can get the job and career that they crave. If education fails to do that, then education itself has let them down.

While exam grades continue to be ‘reliable to one grade either way’, then – for many – that lock remains firmly, and unfairly, bolted. And, for them, it isn’t ‘education’ that has let them down; it is policy. Ministers and regulators could, at a stroke, change the grading policy to one that makes grades fully reliable and trustworthy.

One policymaker with the power to do this is Dr Jo Saxton, who was appointed as a Policy Adviser to Gavin Williamson in March 2020, just after the decision to cancel the summer 2020 school exams.

A few weeks ago, Gavin Williamson named Dr Saxton as his preferred candidate to be Ofqual’s next Chief Regulator, the most senior position in England’s school exam system. The post is to be taken up on 17 September 2021, for a duration of the next five years; five years that will be critical as regards the development and implementation of key policies concerning exams, assessment, and the curriculum. And given the proximity of the candidate to those who hold power, and the necessity for Ofqual to be independent, the choice of Dr Saxton for this role was somewhat controversial.

Perhaps with such controversies in mind, the Cabinet Office has published guidance on how candidates for especially significant roles should be selected:

Pre-appointment scrutiny by select committees is an important part of the process for some of the most significant public appointments made by Ministers. It is designed to provide an added level of scrutiny to verify that the recruitment meets the principles set out in the Governance Code on Public Appointments.

Ofqual’s Chief Regulator is one such appointment; accordingly, on 6 July, Dr Saxton appeared before the Education Select Committee. Although the powers of the Committee are limited – there is no right of veto, for example – a pre-appointment hearing does allow the qualities of the candidate to be examined in public, and for some important stakes to be planted firmly in the ground for future reference.

This indeed happened at the hearing. The questions asked were relevant, incisive and probing, and several enquired into the nature of the policy advice that Dr Saxton has provided over the last tumultuous 15 months, and her ideas and objectives for the future. I was therefore looking for Dr Saxton to express some decisive opinions on matters such as grade inflation, next year’s exams, Post-Qualification Admissions, lost learning, teacher training, exam reform, exam appeals, and the fall-out from summer 2020.

So, for example, Christian Wakeford, Conservative MP for Bury South, asked about the damage to Ofqual’s reputation associated with the ‘scandal and fiasco of last year’, and how public confidence in Ofqual might best be restored. To which Dr Saxton replied:

In terms of how I can help going forward, I think it’s bringing my 20 years on the front line of education. My method of working is to be collaborative, to try really hard to listen to people and understand their needs, and I would continue to work in that way as Chief Regulator to build on the expertise that Ofqual has.

Given last year’s ‘scandal and fiasco’ (to quote Mr Wakeford), and the possible consequences of the events about to take place this year, that question is of considerable significance. But how relevant was the answer?

The next question was from Ian Mearns, who referred back to the question he had asked last year:

One of your predecessors, Dame Glenys Stacey, at our hearing on 2 September 2020, acknowledged that exam grades are reliable to one grade either way. Also, according to the Education Act of 2011, Ofqual has a duty to secure a reliable indication of knowledge, skills and understanding. In your opinion, is a reliability of ‘one grade either way’ reliable enough? And if not, what do you propose to do about it?

Dr Saxton’s initial response was that she had indeed spoken on many occasions to Dame Glenys, but not about that particular issue. She then said:

As I understand it, what the research shows is that marking quality in England is pretty much the international gold standard.

After talking about the marking of Maths, History and English, Dr Saxton continued, somewhat hesitantly, saying, more as a question than a statement:

Isn’t it actually that the grades are within one percentage point of a boundary? So I think that’s where the interpretation that it is either way comes from. But, overall, this is such a critical thing and it’s one of Ofqual’s key duties…

And then the cruncher:

… I think it’s really really important that we do everything we can, we turn over every stone, to make sure that marking is as good as it possibly could be, and that I would be really willing as Chief Regulator to re-open that research and look at it again.

After a mention of her experience of marking scripts at university, Dr Saxton concluded by saying:

I agree with you that it is a critical thing, and that we should leave no stone unturned in ensuring that the public have confidence in the grades that young people are awarded.

One stone that perhaps might beneficially be turned over is the stone under which Ofqual’s two key research reports have, as far as it appears Dr Saxton is concerned, been hidden. Given that the delivery of reliable, trustworthy, meaningful grades is arguably Ofqual’s most important duty – as that figure of £20,000 per additional grade highlights in just one respect – might I draw Dr Saxton’s attention to Marking Consistency Metrics, published by Ofqual in 2016 (especially Figure 14 on page 25), and November 2018’s Marking Consistency Metrics – An update (Figure 12 on page 21)?

What I find more worrying, however, was Dr Saxton’s repeated reference to the quality of marking when the question was about the reliability of grading. Marking and grading are different, and the difference between them is important. At present, and until AI comes along, marking is carried out by humans who assign marks to candidates’ answers in accordance with their professional judgement, within the guidelines of the exam’s mark scheme. In contrast, grading, which happens after all the marking has been completed, is the formulaic determination of the grade according to a rule such as ‘all scripts marked from 66 to 70 inclusive are designated grade A’, with the grade boundaries set by Ofqual’s policy on grade inflation, specifying the percentage of students allowed each grade.

Marking can be of the highest quality (as it is), yet the grades resulting from those high-quality marks can be hugely unreliable (as they are) simply because ‘it is possible for two examiners to give different but appropriate marks to the same answer’. The solution to the problem of grade unreliability is therefore not to be found in marking, but in changing the policy defining how the grade is determined from the original mark.

Now that the Select Committee have confirmed Dr Saxton’s appointment, delivering exam grades that are fully reliable and trustworthy should, in my opinion, be at the top of her in-tray. For Dr Saxton is right. Ofqual should indeed ‘leave no stone unturned in ensuring that the public have confidence in the grades that young people are awarded’.

Get our updates via email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.


  1. A quick update if I may, please.

    I wrote this blog over the weekend of 10 and 11 July, and my source for the events at the Select Committee meeting was the video recording available on

    The question from Christian Wakeford is timed at just after 10:50:00; the question from Ian Mearns at about 10:51:30.

    The transcript of the meeting was published on 12 July, Christian Wakeford’s question is Q27; Ian Mearns’s, Q28.

  2. Huy Duong says:

    Instead of giving a grade, which is unreliable and opaque, why can’t Ofqual give the score, an indication of the uncertainty in the score, and the percentile in the exam board’s subject cohort for that year. Eg, (70, +- 10, 82), which means “The student got a score of 70% plus or minus 10% and they are in the top 18% of students”. That’s more reliable and a informative than, “grade A”. It’s also something that’s meaningful internationally. And Ofqual’s bogey man of grade inflation will go away by himself. If someone can’t understand what that means, they not qualified to assess the CV.

    Why is there such attachment to the inferior system of “grade X”? Is it so that Ofqual can have a job?

Leave a Reply

Your email address will not be published. Required fields are marked *