Pause for REFlection: Time to review the role of generative AI in REF2029
- This blog has been kindly written for HEPI by Richard Watermeyer (Professor of Higher Education and Co-Director of the Centre for Higher Education at the University of Bristol), Tom Crick (Professor of Digital Policy at Swansea University) and Lawrie Phipps (Professor of Digital Leadership at the University of Chester and Senior Research Lead at Jisc).
- On Tuesday, HEPI and Cambridge University Press & Assessment will be hosting the UK launch of the OECD’s Education at a Glance. On Wednesday, we will be hosting a webinar on students’ cost of living with TechnologyOne – for more information on booking a free place, see here.
For as long as there has been national research assessment exercises (REF, RAE or otherwise), there have been efforts to improve the way with which research is evaluated and Quality Related (QR) research funding consequently distributed. Where REF2014 stands out for its introduction of impact as a measure of what counts as research excellence, for REF2029, it has been all about research culture. Though where impact has become an integral dimension of the REF, the installation of research culture (into a far weightier environment or as has been proposed People, Culture and Environment (PCE) statement) as a criterion of excellence appears far less assured, especially when set against a three-month extension to REF2029 plans.
A temporary pause on proceedings has been announced by Sir Patrick Vallance, the UK Government’s Minister for Science, as a means to ensure that the REF provides ‘a credible assessment of quality’. The corollary of such is that the hitherto proposed formula (many parts of which remain formally undeclared – much to the frustration of universities’ REF personnel and indeed researchers) is not quite fit for purpose, and certainly not so if the REF is to ‘support the government’s economic and social missions’. Thus, it may transpire that research culture is ultimately downplayed or omitted from the REF. For some, this volte face, if it materialises, may be greeted with relief; a pragmatic step-back from the jaws of an accountability regime that has become excessively complex, costly and inefficient (if not even estranged from the core business of evaluating and then funding so-called ‘excellent’ research) and despite proclamations at the conclusion of its every instalment, that next time it will be less burdensome.
While the potential backtrack on research culture and potential abandonment of PCE statements will be focused on to explain the REF’s most recent hiatus, these may be only cameos to discussion of its wider credibility and utility; a discussion which appears to be reaching apotheosis, not least given the financial difficulties endemic to the UK sector, which the REF, with its substantial cost, is counted as further exacerbating. Moreover, as we are finding in our current research, the REF may have entered a period not limited to incremental reform and tinkering at the edges but wholesale revision; and this as a consequence of higher education’s seemingly unstoppable colonisation by artificial intelligence.
With recent funding from Research England, we have undertaken to consult with research leaders and specialist REF personnel embedded across 17 UK HEIs – including large, research-intensive institutions and those historically with a more modest REF footprint, to gain an understanding of existing views of and practices in the adoption of generative AI tools for REF purposes. While our study has thrown up multiple views as to the utility and efficacy of using generative AI tools for REF purposes, it has nonetheless revealed broad consensus that the REF will inevitably become more AI-infused and enabled, if not ultimately, if it is to survive, entirely automated. The use of generative AI for purposes of narrative generation, evidence reconnaissance, and scoring of core REF components (research outputs and impact case studies) have all been mooted as potential applications with significant cost and labour-saving affordances and applications which might also get closer to ongoing, real-time assessments of research quality, unrestricted to seven-year assessment cycles. Yet the use of generative AI has also been (often strongly) cautioned against for the myriad ways with which it is implicated and engendered with bias and inaccuracy (as a ‘black box’ tool) and can itself be gamed in multiple ways, for instance in ‘adversarial white text’. This is coupled with wider ongoing scientific and technical considerations regarding transparency, provenance and reproducibility. Some even interpret its use as antithetical to the terms of responsible research evaluation set out by collectives like CoARA and COPE.
Notwithstanding, such various objections, we are witnessing these tools being used extensively (if in many settings tacitly and tentatively) by academics and professional services staff involved in REF preparations. We are also being presented with a view that the use of GenAI tools by REF panels in four years’ time is a fait accompli, especially given the speed by which the tools are being innovated. It may even be that GenAI tools could be purposed in ways that circumvent the challenges of human judgement, the current pause intimates, in the evaluation of research culture. Moreover, if the credibility and integrity of the REF ultimately rests in its capacity to demonstrate excellence via alignment with Government missions (particularly ‘R&D for growth’), then we are already seeing evidence of how AI technologies can achieve this.
While arguments have been previously made that the REF offers good value for (public) money, the immediate joint contexts of severe financial hardship for the sector; ambivalence as to the organisational credibility of the REF as currently proposed; and the attractiveness of AI solutions may produce a new calculation. This is a calculation, however, which the sector must own, and transparently and honestly. It should not be wholly outsourced, and especially not to one of a small number of dominant technology vendors. A period of review must attend not only to the constituent parts of the REF but how these are actioned and responded to. A guidebook for GenAI use in the REF is exigent and this must place consistent practice at its heart. The current and likely escalating impact of Generative AI on the REF cannot be overlooked if it is to be claimed as a credible assessment of quality. The question then remains: is three months enough?
Notes
- The REF-AI study is due to report in January 2026. It is a research collaboration between the universities of Bristol and Swansea and Jisc.
- With generous thanks to Professor Huw Morris (UCL IoE) for his input into earlier drafts of this article.
Comments
Add comment