What ever happened to Open Data?

A guest blog kindly contributed by Mick Fletcher, Honorary Fellow at the Post 14 Centre for Education and Work, UCL

Once upon a time the UK government took pride in the fact that it was a world leader in open data. Back in 2012 the Cabinet office released a White Paper entitled ‘Unleashing the potential’ with a foreword by Paymaster General, Francis Maude boldly declaring ‘the future is Open’. The document proclaimed that ‘unfettered access to anonymised data should be extended to support improvements in the quality, choice and efficiency of healthcare, education, transport and a whole host of other public services’. It argued for a cultural change in government so that access to data by users, public sector organisations and third parties was seen as enabling rather than threatening, and something to be promoted rather than restricted. The situation feels very different today.

The focus of attention these days, particularly with the introduction of the GDPR (General Data Protection Regulation), seems to have swung back to controlling access to data rather than liberating it. There are good reasons to pay attention to data security since the extent of misuse of large datasets by companies such as Facebook and Cambridge Analytica has raised alarm and exposed some serious vulnerabilities. Nevertheless, it is worth reminding ourselves why the Open Data Initiative encouraged greater use of public data and what benefits were expected to flow from ‘unfettered access’.

The arguments hinge on quality and efficiency. It is very expensive to gather, process and store data securely; it makes sense therefore to get the most value from it. The more useful it is, particularly to those who have to supply it, the greater the effort that will be put into its accuracy and timeliness. If the better use of data improves the choices made by citizens, by government and by those who run public services we all benefit. If more than one analyst is looking at data the more likely it is that the right conclusions will be drawn; the less likely it is that there will be mistakes or manipulation, and the better will citizens be able to hold the executive arm of government to account. Moreover, a unique characteristic of working with data is that researchers can add value without taking value away from anything or anyone else; engaging a wide range of bodies with analysing data should therefore be a ‘win-win’.

Let me declare an interest. For most of my professional life I have worked with a range of public and private organisations to analyse data that has been collected from FE colleges and other education providers and present it in ways that enable the education system to operate more effectively. Sometimes the research has been conducted on behalf of government or other public agencies in order to understand the impact of policy changes. Sometimes the work has been with individual institutions seeking better to understand their own context and performance; and sometimes with nationally based research organisations seeking to inform public debate.

Needless to say such use of public data has always been subject to the most rigorous data protocols. There have been strict limits on what the data we access can be used for and where, how and by whom it is processed. It has always been considered particularly important that no individuals could be identified – though one is bound to reflect on the contrast between this level of security and the masses of highly personal information held on almost every one of us by the internet giants, finance companies and even supermarkets.

DfE, in common with other government departments, is currently tightening its procedures in response to the GDPR. The situation seems to be that instead of a presumption that anonymised data should be available to the research community unless there are good reasons to withhold it, there is now a requirement to have an explicit legal entitlement to share data before anything can be released. In technical terms there needs to be a ‘legal gateway’ where legislation has anticipated and authorised certain uses of specific data sets. For example use of longitudinal education outcomes (LEO) data linking HESA data with salary information is restricted to the meanings allowed by the Small Business, Enterprise and Employment Act 2015, and subject to HESA’s charitable purposes.

The restrictions impact on higher education in two distinct ways. For those researchers who study aspects of the education system there can be serious delays in accessing data about schools, colleges or HEIs and doubt about whether some data can be accessed at all. Exactly where such legal gateways exist and where they do not is currently not clear. When a research question involves the use of different datasets – for example to look at patterns of A level provision one needs access to the national pupil database (NPD) for schools and the individualised learner record (ILR) for FE – the complexity is magnified. Researchers from a range of organisations concerned with education have described the current position to me as a nightmare; and no doubt the position is replicated across other parts of the public sector. Exactly the same issues must arise in relation to health for example or justice.

While not every department will be affected by these issues every HEI will be affected by the difficulties they pose for institutional research. Many will be concerned to understand how they are placed in relation to the growing focus on higher technical education – sub degree provision at levels 4 and 5. To map the current pattern of provision requires both HESA and ILR data: to understand the potential for recruitment, particularly as and when T levels take off will require use of the NPD and ILR. At the moment it is not clear how and when access to the relevant data might be made available.

In all cases the need for a ‘legal gateway’ effectively restricts access to uses anticipated by legislators. There is a real concern among researchers that the only questions that can be answered are those that government wants to be answered; amplified by the worry that hard pressed civil servants will be risk averse when making difficult judgements about who can have access to what. There are also understandable fears that decisions about access to data are not wholly transparent and that some organisations might have an ‘inside track’.

I have no problems with increased levels of data security being introduced and I also accept that in the short term this could mean temporary delays in gaining access to the information the research community and research users need. We have to acknowledge that there is a bigger picture and that the public require assurance that the use of data collected by public bodies is only made available to researchers subject to clear and robust safeguards. I have a concern however that there is a bigger picture still that needs to be borne in mind – the picture painted by the Open Data Initiative declaring that access to anonymised data rather than restriction should be the default position.

When new security arrangements have been put in place therefore it is important that the burden of justification should clearly pass back to those seeking to restrict access to such data rather than those seeking to use it for public benefit. We need to remind ourselves, and remind those who hold public data, of the many benefits to the wider community of access by independent and impartial researchers to data collected at public expense. Many see the effective use of big data as being a major industry of the future; it would be distinctly odd were we to go back to treating it as a nationalised industry.

Leave a Reply Cancel reply