Blogs

Can GCSE and A level exam grades be trusted?

Roughly speaking, if there were 54,000 marking errors in the 300,000 GCSE and A level grades that were challenged last year, how many might there be in the 5,700,000 grades that were not challenged? Dennis Sherwood looks at the problems of exam grade reliability
Image: Adobe Stock

“Grades can be relied on.”

The delivery of reliable GCSE, AS and A level grades is indeed what we all want; what we all expect.

The words above are from Ofqual’s chief regulator, Dr Jo Saxton, spoken on June 29, just after this year’s exams had been completed, at a hearing of the House of Lords Education for 11-16 Year Olds Committee.

The full context can be seen, and heard, from 12:34:13 on the Parliament TV recording here; this, to me, is the key statement: “I can assure this committee and young people who will receive their grades this summer that they can be relied on, that they will be fair, and that the quality assurance around them is as good as it is possible to be.”

 

Three reasons why grades can be wrong

A student can be awarded the wrong grade as the result of three, very different, circumstances.

 

1, Administrative or clerical error

The first, and most obvious, is a consequence of an administrative or clerical error such as the failure to mark a question, the omission of a question’s mark from the total, or a mistake in adding each question’s mark to derive the total.

These are all matters of “good housekeeping”, and we would all expect the exam boards to have robust internal systems. Such errors should therefore be extremely rare.

 

2, Mistakes in the awarding of marks

The second concerns mistakes in the awarding of marks that contribute to the final grade. For non-examined assessment, in which internally awarded marks are subsequently moderated, it is possible that the moderator might not follow the guidelines, resulting in a “moderation error”; for written examinations, the examiner might fail to comply with the appropriate mark scheme, resulting in a “marking error”.

Central to the identification of these errors is the concept of “tolerance”, which recognises that different, equally qualified, examiners might give the same answer (slightly) different marks (see Swan, 2016).

Accordingly, if a senior examiner were to give a particular question 13 marks out of 20, another examiner might give 12, another might give 15, and another still might judge it to be worth just 9.

If the tolerance for this particular question is 2 marks, then – given the senior examiner’s mark of 13 – the marks of 12 and 15 are accepted. The mark of 9, however, is deemed a marking error.

A vital aspect of the exam board’s quality assurance – which Dr Saxton informed us at the House of Lords’ hearing is “as good as it is possible to be” – is therefore to monitor examination marking, as it takes place, to ensure that any marking errors are spotted as they happen, and that appropriate corrective action is taken to ensure that the mark on which that candidate’s grade is based is fully legitimate.

 

3, Tolerance

The third reason why a grade might be wrong is directly attributable to tolerance.

Returning to the example above, marks of 11, 12, 13, 14 and 15 out of 20 for that particular question are all within tolerance, as discussed – all are legitimate. When considering this question's impact on the exam as a whole, the candidate’s total marks might therefore be, say, 63, 64, 65, 66 or 67.

If grade A is defined as “all marks from 61 to 70”, the candidate is awarded grade A regardless. But if the A/A* grade boundary is 65/66, then the candidate’s certificate shows grade A or A* as the result not of the candidate’s performance but of the lottery of which of the examiners above did the marking.

 

Which grade is ‘right’?

Ofqual resolve this issue by defining the grade attributable to a subject senior examiner’s mark as “definitive” or “true”. But since no one knows whether the grade actually awarded corresponds to the “definitive/true” grade of a senior examiner or not this is not helpful and leads us to question the reliability of the grade on the candidate’s certificate.

Nor does the appeals process allow the “definitive/true” grade to be discovered.

 

How the appeals process does and doesn’t work

We all know that mistakes can happen in any system. Yes, quality assurance processes are intended to minimise their incidence, but however good those are, some errors will get through.

That’s why the appeals process is so important – to catch those, hopefully very few, errors that do happen.

So how good is the exam appeals process in doing that? Here are some facts.

Most importantly, every grade that is unreliable as a result of the third reason discussed above – grades that might have been higher, or lower, had different examiners marked the scripts – can never be discovered by the current appeals process.

That is because this third reason was deliberately excluded from the appeals process by Ofqual’s 2016 reforms (Ofqual, 2016), which allow a script to be remarked only if a marking error is discovered.

As discussed in my article for SecEd earlier this year – entitled Exams 2023: The ‘mistaken’ grades that will not be found (Sherwood, 2023) – this category of unreliable grades is not attributable to marking errors, implying that a “review of marking” will confirm the originally awarded even if a senior examiner would have awarded a higher grade.

This situation arises much more frequently than one might think: this August, for example, of the approximately six million grades awarded, we can estimate that about one quarter – that’s about 1.5 million – would have been different had senior examiners marked the scripts, with about half of those (750,000) resulting in higher grades, and about half (the other 750,000) resulting in lower grades.

I explain how and why we can know this in my previous SecEd article (Sherwood, 2023).

Given the importance of those grades to each student, and the denial of any appeal, I find it very hard to agree with Dr Saxton’s assertion that grades “can be relied on, that they will be fair”.

What the appeals process does do is to protect against the first two reasons that grades can be wrong, namely, clerical, and administrative mistakes, moderation errors, and marking errors attributable to the failure of an examiner to comply with the mark scheme.

So, how many marking errors are there? Let’s take a deeper look…

 

How many marking errors are there?

Just before Christmas every year, Ofqual publishes a raft of statistics relating to the appeals process, including the numbers of grades awarded, challenged, and changed, and also information on the numbers of administrative mistakes, moderation errors, and marking errors (see Ofqual 2022 for the latest iteration of these figures).

Let’s consider the following tables:

Figure 1: Appeals process statistics from Ofqual relating to exams in England since 2016 (when Ofqual introduced the marking error rule for appeals) until 2022 (the last year for which data are available), but excluding 2020 and 2021 (the Covid years of centre and teacher assessed grades)

Figure 2: Appeals process statistics from Ofqual relating to exams in England since 2016 (excluding 2020 and 2021) showing the ratio of challenges and grade changes

 

When looking at figure 2, the percentages in each column are more-or-less stable, and the eye is immediately drawn to the second column showing that only about 1% of grades are changed each year.

This figure is often cited as evidence that grades can indeed be trusted, for if only 1% are changed, surely the other 99% must be right – for example, this is explicitly stated on the Pearson/Edexcel website.

The inference that “because only 1% of grades are changed, the remaining 99% are right” is, however, a grave numeracy error, for as the first column in figure 2 shows, only about 5% of grades are challenged.

This is important, for a grade can be changed only if it has been challenged in the first place. Accordingly, the number of grades changed can be meaningfully related only to the number of grades challenged – not the number of grades awarded.

No-one knows what might have happened if any, or all, of the 95% of unchallenged grades had been challenged.

And so it is the third column in figure 2 that is the most significant, showing that about 20% of challenges results in a grade change – that is about 1 challenge in 5.

The fourth column (figure 2) is important too, for it shows that more than 90% of the grade changes result from marking errors, these being failures of examiners to comply with the mark scheme – failures that were not spotted by the exam boards’ quality assurance processes.

Taking easy-to-use round numbers for the summer 2022 exams in England:

  • The total number of GCSE, AS and A level grades awarded was about six million.
  • Some 5% of those grades were challenged, so that’s about 300,000.
  • Of those challenges, about 20% resulted in a grade change – that’s 60,000.
  • Of this 60,000, about 90% were attributable to marking errors – that’s about 54,000.

The actual number of marking errors in that sample of 300,000, however, must be greater than 54,000, for those 54,000 were the ones that resulted in a grade change. There will be, of course, a number of marking errors that were discovered, but which on correction did not result in a grade change. I wonder what that number might be.

Teachers across the country will have that collective knowledge for they will know how many of the challenges they raised did indeed result in a change in the mark, but not the grade. That is, I feel, an important number.

 

Undiscovered marking errors

And there’s another important number too – namely, the number of marking errors that remain undiscovered simply because no challenge was raised.

And that could be a very big number indeed. If there are at least 54,000 marking errors in the 300,000 challenged grades, how many might there be in the 5,700,000 grades that were not challenged?

I find it remarkable that so many marking errors are actually discovered and I shudder to think about those that remain, lurking undiscovered, in the unappealed grades.

And since all these marking errors are consequences of the exam boards’ collective failure to control the quality of the original marking, I find it very difficult to be reassured by Dr Saxton’s assertion that “the quality assurance … is as good as it is possible to be”.

Dennis Sherwood is an independent consultant and author of Missing the Mark: Why so many school exam grades are wrong, and how to get results we can trust (Canbury Press, 2022). Visit https://bit.ly/3NHO5Xk

 

Further information & resources