Secondary school grading inaccuracy: what are the implications for the humanities?

Author:: Gabriel Roberts
Published:: 6 January 2021

As the Government confirms ‘we will not be asking students to sit GCSE and A Levels’ in 2021 and while we all await further details, the HEPI blog considers the (in)accuracy of the regular grading process.

This blog has been kindly contributed by Gabriel Roberts, an English teacher at a secondary school in London. He has previously written about the humanities in higher education.

In several contributions to this blog, Dennis Sherwood has highlighted the inaccuracy of exam grading in the UK. According to his analysis, about 1-in-4 GCSE, AS, and A-Level grades in recent years has been wrong, meainng that a different grade would have been awarded if all the components of the qualification had been marked by a senior examiner.

Sherwood’s conclusions are particularly troubling for the humanities, where grading accuracy is lowest. In English and History, for instance, grading is only about 60% accurate. This means that out of 10 randomly selected pupils receiving a grade in these subjects, four will – on average – receive the wrong grade. For pupils whose mark is close to a grade boundary or whose grade is not at either extreme (a B, say, rather than an A* or an E), the odds are worse.

This causes a range of problems. Most obviously, pupils, their parents, universities, employers, and pretty much everyone else expects exam grades to be reliable indicators of learning, academic ability, resolve (and expectations will be even higher if the UK switches to post-qualification university applications, as has been proposed). Additionally, pupils may apply or not apply for humanities courses because of mistaken judgements about their level of ability, based on over-marked or under-marked papers. Moreover, whatever negative effects there are may also affect girls more than boys because of who takes humanities subjects.

Why is grading accuracy lower in the humanities?

One answer is that the humanities are more subjective. It may be that the things that the humanities deal with—interpretation, persuasion, evocation and so on—are simply harder to judge with precision than things in the social and hard sciences.

This is probably right up to a point. But thinking that grading inaccuracy reflects what subjects are really like underplays the role of the exams themselves. The humanities may be subjective, and this may mean that a certain amount of grading inaccuracy is inevitable, but some of the inaccuracy may also result from how the exams are designed and marked.

Here there is definite room for improvement. English Language mark schemes, for example, routinely require examiners to assess the grammatical accuracy of students’ writing but without defining what accuracy means. As a result, one examiner may judge that a certain construction is a mistake while another may not, leading to different marks.

Similarly, in English Literature, mark schemes often require examiners to assess candidates’ ability to analyse structure, form, and language, but without either defining where structure ends and form begins or explaining whether candidates need to analyse all of these things or just some of them in order to attain the highest marks. In other cases, assessment objectives relate imprecisely to more detailed descriptions of what candidates need to do. This means that examiners have to choose whether to mark work against the objectives or the descriptions, even though the latter are supposed to be elaborations of the former. Different examiners may handle this differently, again leading to different marks.

What could be done?

One option is to write more intelligible mark schemes. This would be very welcome, but it might lead to diminishing returns. Defining in greater detail what counts as grammatically correct English, for instance, might lead to greater grading accuracy but would make marking slower and bury examiners under even more paperwork. Other options include training examiners more thoroughly and introducing more double marking, although these are likely to be expensive.

Another option may arise from the digitisation of exams. Already, a large number of pupils use computers for typing in exams as part of a special access arrangement (although it is not clear how many because this use of computers is not covered in the statistics published by Ofqual on special exam arrangements). Additionally, some schools are considering equipping pupils with laptops as standard (something so far more common in the US), some have already gone paperless, some pupils bring laptops to school even where there are no special arrangements, and large numbers of laptops (though not enough) have been supplied to schools because of the Coronavirus pandemic. All this makes writing by hand look increasingly old-fashioned. The time may not be far off, then, when exams are completed on computers or, perhaps further down the line, when scripts are digitised using optical mark recognition software, which is already used for multiple choice exams.

If either of those developments takes place, computer-assisted marking may follow. English Language would be the obvious place to start. As things are, examiners are often asked to assess the accuracy of candidates’ spelling, punctuation, and grammar, the variety of sentence constructions that they use, the breadth of their vocabulary, and so on—all things which can be assessed automatically by software like Grammerly and AccioIbis. Soon, the marking of English Language exams may involve a computer assessing readily measurable characteristics like breadth of vocabulary and a human assessing more elusive ones like tone.

If computer-assisted marking becomes available, exam boards will have a strong incentive to use it, even if it is not very sophisticated, because it will be faster and cheaper than human examiners. It may also be more consistent. Designing an algorithm to scan a text for different kinds of grammatical mistake (the makers of Grammerly reckon that it can detect 250 of these reliably) may be easier than training human examiners to apply the same standard of grammatical correctness consistently across hundreds of scripts.

This might be great news: pupils might enjoy more accurate grading; and examiners might be spared much drudgery. But it might also make English banal. If computer-assisted marking is unsophisticated, pupils may find themselves being taught simple procedures to game it: use four kinds of punctuation mark per hundred words; use at least one polysyllabic word per sentence; and so on. Indeed, the same thing could happen without computers if ambiguous mark schemes were replaced with less ambiguous ones which were unimaginative and formulaic.

This kind of teaching should be avoided. One problem is that it inculcates passivity in response to mere stipulation: the exam board stipulates that something constitutes good practice and the pupils learn how to do it, without thinking about whether it is good practice and, if it is, what makes it so. Another is that it makes it difficult for pupils to imbibe general principles and apply them to new cases, since much of what they learn is designed for a particular assessment.

At bottom, there may be a trade-off between two purposes which teaching should fulfil: the creation of a reliable ranking of pupils so that they can make well-informed decisions about their futures and so that universities and employers can make well-informed decisions about who to admit and recruit; and the imparting of knowledge and skills and the development of interests which will serve pupils in later life. These are not necessarily convergent. Quick-fix solutions to the problem of grading inaccuracy in the humanities may perhaps lead to more accurate grading but only at the cost of attenuating what the subjects are really about.

Comments

Gordon Dent says:
“This might be great news: pupils might enjoy more accurate grading; and examiners might be spared much drudgery. But it might also make English banal. If computer-assisted marking is unsophisticated, pupils may find themselves being taught simple procedures to game it: use four kinds of punctuation mark per hundred words; use at least one polysyllabic word per sentence; and so on…
“This kind of teaching should be avoided. One problem is that it inculcates passivity in response to mere stipulation: the exam board stipulates that something constitutes good practice and the pupils learn how to do it, without thinking about whether it is good practice and, if it is, what makes it so. Another is that it makes it difficult for pupils to imbibe general principles and apply them to new cases, since much of what they learn is designed for a particular assessment.”
It’s fairly apparent that this already happens with human marking. Certainly, the number of undergraduate students with good grades in GCSE English language and maths who can neither write coherently nor perform simple real-world calculations suggests they have been trained in how to answer exam questions rather than how to apply principles of language and mathematics.
There is a trade-off between transparency and validity of assessments. The more explicit you are about what will be assessed and what the criteria will be for marking, the more students (and teachers) will focus on ticking these boxes rather than learning.

Email
Name
Comment

Your comment may be revised by the site if needed.

Share this:

Comments

Gordon Dent says:

Subscribe today!