Eerke Boiten

As part of our podcast series we spoke to Eerke Boiten, Professor of Cyber Security and Director of the Cyber Technology Institute at De Montfort University, who is talking in the Health Tech stream at the 2020 conference. Eerke talks about data sharing in health, why anonymisation isn’t really safe, and suggests that we need to think about health data in a new way to move forward. 

You did your PhD in the Netherlands. What drew you to computer science informatics? 

As a 17-year-old I was taught about the mathematics and logic behind it. And that was the most fascinating thing I saw at secondary school. I’ve told 17-year-olds that they shouldn’t take career advice from me, because I went to University at 17 wanting to do computing, and I still haven’t got a proper job. 

I’ve played with computers when I got the chance. I think in my career i’ve moved away from them a little bit, up to a point where I had a computer repaired last week. The repair-man had googled me with my rather unique name, and found out that I knew a lot about data protection, but he didn’t realise that I was a computer scientist by origin from what he could see on the internet.

You are now Professor of Cyber Security at the Cyber Technology Institute at De Montfort University.  What sparked your interest in cyber security?

I was using particular mathematical methods to prove correctness of programs, and a student came to me after his studies and said, “Do you think we could apply your sort of stuff to cryptography?”

It was a chance encounter in some sense, because I had a student who did a degree elsewhere, and he’d studied a lot of cryptography, which we weren’t teaching awful lot of. I was using particular mathematical methods to prove correctness of programs, and he came to me after his studies and said, “Do you think we could apply your sort of stuff to cryptography?”.

And we went ahead and did that, and I had a years study leave, developed a national network through all sorts of contacts around that, and things kicked off. I rolled into having a side interest in security. And then cyber security kicked off nationally, and that turned out to be the right thing for me. 

The first 20 odd years of my career were very technical, very mathematical methods for computer science. And that gets a bit lonely at some point. Cyber security provides the option to talk to other disciplines. I talked to lawyers, psychologists, anthropologists, all those sorts of people on a fairly regular basis, because the topic material that I’m looking at now is intrinsically interdisciplinary. 

I’ve still got links to the area in which I grew up scientifically. Most of my reputable publications of high citation are in that area still. But yes, also looking at cyber security for last last 10-15 years in various ways.

You’re an expert reviewer on The Nuffield Council on Bioethics report on the collection, linking, and use of data in biomedical research and health care. At what point did you become interested in data and privacy with respect to health? 

That’s an interesting story by itself. So there’s a computer science based author on that report on using health data and they needed a reviewer. At the point that I was selected for that, I had no single scientific publication on health data anywhere whatsoever. But I had written a lot of comment pieces on it, I was fully in with the politics on it. 

He said, we’re going to share this data about you, and at that point they were talking about selling it broadly as well, in a form that nobody can recognise you from. And I thought, I don’t trust that. 

In some sense I‘d been radicalised as a privacy activist, by hearing things on the radio that just couldn’t be true. A guy called Tim Kelsey was on the radio at one point. He ran the care.data programme for the NHS. And he said, we’re going to share this data about you, and at that point they were talking about selling it broadly as well, in a form that nobody can recognise you from. And I thought, I don’t trust that. 

I see my background coming into all this policy data privacy area, as technical realism of some kind. Nothing irks me as badly as when somebody makes a claim for political reasons that technically is just wrong. And it’s moved on from there. 

What about health interests you?

The data in that area are fascinating. It’s an area where it’s obvious that the data is rich, and there’s an awful lot that we don’t know. It is always always sitting there waiting for the next the next buzzword to come and get at it. Big data, AI, people want to put it on the blockchain, people want to do innovation with it – which is of course a fantastic new concept that nobody’s ever thought of before – so it’s in the limelight. And it plays out interestingly in political processes as well.

You wrote an opinion piece in The Guardian in which you argue that rather than sharing anonymised health data, a solution might be a facility where researchers can interrogate data under tightly controlled conditions for specific registered purposes. Can you talk about the benefits of this arrangement?

Sharing of health data is focusing on the wrong thing. It is not the data that needs to be shared. It’s the utility from the data that needs to be shared. We need to make sure that the value of the data can be used in different contexts in a safe way. And if you share the data itself, it’s no-holds-barred. There is no such thing as a safe copy, because of the richness of the data. Anonymisation doesn’t really work. 

You need to find alternative mechanisms that allow people who access the utility of the data. There’s two main thrusts on that. One is the physical access measures, in which you make sure that you control the circumstances in which people access the data. And for census data, for example, that’s already being done. But clearly that doesn’t scale to the level of use that you want ideally in, for example, health data usage.

Now, there are some models around, which say we need to make everyone keep control of their own data, and give access to it in controlled ways. And that may or may not be a good model. I’m sure that using blockchain as the underlying model for that is a bad idea because you can’t put health data on the blockchain, for all sorts of reasons.

In principal, I do believe that there’s scope for a cryptography based solution, which makes sure that the results from the data can be computed in a way that ensures that the data itself doesn’t get shared more broadly than it absolutely needs to be.

But in principal, I do believe that there’s scope for a cryptography based solution, which makes sure that the results from the data can be computed in a way that ensures that the data itself doesn’t get shared more broadly than it absolutely needs to be. So privacy-enhancing techniques, multi-party computation, homomorphic encryption, all that sort of stuff has a role to play in this area. Scaling that to workable solutions is still a long time away.  It’s making sure that there’s an infrastructure for people to actually do those computations. Some of the techniques are not efficient yet. 

What do you enjoy most about this work? And why do you think it’s important that academics engage in public debate in this way?

I’m a bit of an old-fashioned academic. I think experts actually matter a little bit. In the political space there’s a lot of misdescription of technical stuff, there’s a lot of spin, and I like to cut through that a bit. So in that sense, I think if I’m able to do that, I have a role to do that. To make sure that the political debate is at least technically honest.

I said jokingly that I was radicalised as a privacy research activist. I have to skate carefully as an academic between having an insight and having an opinion. But I think that the crucial point was that I discovered, yes, I can have a meaningful voice in this debate, and in these cyber security related and privacy related debates. Because there’s always issues going on there, and I’ve stuck with that and enjoyed that. It is important to have an informed debate about these issues to make sure that we cut out to spin a little bit.

Do you think enough academics step up and engage in public debate, or do you think there’s still room for that to happen? 

I think the transition for academics to get into debates like that is actually becoming a lot easier. I think Twitter is magical in that respect. But maybe it’s easy for me to talk about that, as a white bloke. Whatever I do on Twitter, I won’t get as much abuse as a black woman might. But in principle, Twitter gives you a chance to get in touch with issues really quickly. You can float an opinion out there, and if there’s a journalist who thinks this is an interesting angle on a particular technology-related story that they’re writing at the moment, then you’re in there.

That’s basically how a lot of my engagement with debate, the press, and also some governmental organisations has happened in practice. So should more academics do it? I think some people enjoy it, some people don’t, some people are more extroverted than others. We need the academics who keep their head down and work on the brilliant technical results in the labs, who know what their results mean at a deep fundamental level, and can predict scientific development over the next 10, 20, 30 years.

You shouldn’t just think about what’s possible. You should also think about what’s responsible, with what the possibilities offer you. 

But yes, it’s obviously important that we do link the technological developments, scientific developments, to what they mean out there. I guess that is the big theme of thinking about doing AI responsibly. For example, that you shouldn’t just think about what’s possible. You should also think about what’s responsible, with what the possibilities offer you. 

What are your thoughts around building AI responsibly?

Here my age is showing to some extent. I took AI as an undergraduate topic, it was the promising thing in 1985. That was about halfway down the growth period of AI, so I don’t see it as a shiny new thing. I’ve also seen the debates about responsible software engineering over the years. Should there be codes of practice for software engineers from professional organisations, things like that. I think the professional organisations in computing have had a threat for many years about dealing with responsible algorithms.

We’ve had a trend over many years of thinking about responsibility for programming, responsibility for algorithms. The idea that technology is not ethically neutral, or anything like that. That’s one threat. The other threat is going from mundane data protection issues – somebody’s got a database – to surveillance capitalism, where data gets generated, gets observed in all sorts of places and all sorts of ways that people are barely aware of. 

The AI ethics crisis to me is two dimensions of abandoning responsibility happening at the same time, with potentially dramatic effects.

In my view, AI is a combination of two dramatic movements. It’s on the one hand, the abandonment of algorithmic responsibility. We haven’t written a program. We use AI techniques which generate what people call an algorithm now, which just detects a pattern. There’s no conscious design choices in there. This acts in collaboration with the abandonment of responsibility for gathering and guarding data responsibly. The AI ethics crisis to me is two dimensions of abandoning responsibility happening at the same time, with potentially dramatic effects. 

Where does the responsibility lie: with the computer engineers and computer scientists, or with the business leaders? 

I think as a technologist you have a responsibility. Even as an undergraduate in the Netherlands in the mid 80s, I had courses on computing and society, and how the stuff you’re doing has an impact that you need to think responsibly about. Maybe my university at the time was a little bit exceptional in that area.

But ethics is always there. There’s no such thing as just a tool. There’s no such thing as, we can do this, so we should do this, and then let somebody else worry about what can potentially be done with it. These discussions are older than the computing industry. These are discussions that you can find when you read up on the development of the atomic bomb, arms industry, and all of that business. 

In April you were a signatory to a letter asking the government’s Digital Health Agency NHSX questions about its plans for a contact tracing app with respect to data protection. What are the key things that governments must get right when developing a contact tracing app? 

I’ve seen this issue unhelpfully described as privacy versus saving lives, and that’s definitely what it isn’t.

I’ve seen this issue unhelpfully described as privacy versus saving lives, and that’s definitely what it isn’t. A contact tracing app is something that intrinsically infringes privacy. Because it is intended to find people who are potentially infected by disease and whose freedom of movement we need to restrain from that point onward, for the collective good.

And if you look back into the Universal Declaration of Human Rights, you see that privacy is not an unlimited right, it is in context. And this is a context in which it is entirely justifiable that privacy is infringed, that we record sensitive personal information, medical information, about people in order to do something with it that has a serious impact on their lives. 

And if you look at Data Protection law, it has public health as a justification for a large amount of data processing, including sensitive personal data. So it’s not privacy at all costs, it is the stuff around that. It’s the transparency, accountability, it’s about data minimisation and things like that. Any project that gathers a lot of data incurs risks. And the responsible practice, which is also legally required these days, is to treat those risks seriously, and to analyse them in a very broad sense before projects start. 

The UK government has been getting that quite consistently wrong for anything to do with COVID-19. And the argument has always been, but we’re in a hurry. Not thinking something through in a hurry, and then taking the wrong decision and having to U-turn at the last minute, I mean, we’re seeing that with A-level results today. 

And that’s the main thing we were asking for, we were asking for openness on, what data are you collecting, what’s going to happen with the data? Can you justify that the amount of data you’re collecting centrally? Is actually necessary to be at a central place? Or are there actually alternatives in place, which allow you to collect significantly less central data? And by doing so, you could reduce a large amount of privacy risk, and reduce the security risk from third parties.

The academics and lawyers generally feel that in any sort of assessment of risks with data, we should look as function creep as effectively as the owner of the data, then as a potential adversary. 

If you have a large central database that might be attractive to cyber criminals, if such a database doesn’t exist, then you also don’t have to defend it. But I think a fundamental difference between the academics, and the UK government – because the boundaries are thin in this sort of area – the academics and lawyers generally feel that in any sort of assessment of risks with data, we should look as function creep as effectively as the owner of the data, then as a potential adversary. That’s standard privacy thinking. If we have this data anyway, what else could we do with it? What else could a less scrupulous manager want to do with it in the future? And is that a reason to put extra safeguards in?

In the discussion about contact tracing, lots of extra safeguards had been discussed. Technological ones, legal ones, having a special commissioner to oversee these things. The Human Rights Commission in Parliament had a concrete proposal in there, and none of that was taken up. The crucial Data Protection Impact Assessment which would have assessed all these risks was done late, and was not shared with people that it might have been shared with. It’s not required to publish them, but it’s a good thing to do in terms of transparency and trust.

NHSX was set up to get it wrong. NHSX is a unit that was added onto the NHS in order to do innovation. I think that it ties in directly with the tech optimism of Matt Hancock in particular. He was the one supporting Babylon Health, the AI doctor which was getting everything wrong. And he had his own app, that had all of its data protection problems, because he thought it was cool, probably. The NHS has a digital infrastructure, and they’ve been working in this area for a long time. So despite them getting things wrong, it’s more the politicians around them getting things wrong.

They roughly know what they’re doing, but they developed a new unit to do innovation and they gave them a large budget to transform the NHS with AI, and then they gave that unit the task of developing an app.  So, that was always going to be full of techno-optimism. The thinking there was rich with, what else can we do? They talked to the Oxford epidemiologist of the Big Data Centre.

Don’t call yourself an  innovation unit, because innovation is a means to an end. Don’t call yourself a big data laboratory, because having lots of data for something is only a means to an end. Having lots of data by itself is a risk.

Don’t call yourself an  innovation unit, because innovation is a means to an end. Don’t call yourself a big data laboratory, because having lots of data for something is only a means to an end. Having lots of data by itself is a risk. Yes, there are things that you want to do with lots of data,  like weather prediction and less risky things like that as well. But having lots of data is just a tool. So that sort of thinking was rich in the context of developing the NHS app initially, and that’s come to a rather unfortunate crash. 

It’s almost like you could do a Data Protection Impact Assessment for all of all of the potential things that you could do with health data. I think it would make sense to create an updated global picture of, this is what health data is, these are the risks inherent in gathering it, these are the risks for which we know solutions. 

There’s the basic ones like protecting data against theft by third parties, which is almost completely covered by having encryption around. That’s a cheap one. Just like in traditional risk analysis, losing data has backups as a standard solution. Some risks are easy to mitigate. And then look at the bigger risks, and what do we have in terms of mitigations around those in place and what are we lacking? And I think in the current UK situation, you would find that there’s gaps between the ICO, which is looking at the data protection side of things, but is massively understaffed, and massively underactive in enforcing.

For this contact tracing app, it became clear to me and many others that more legal safeguards would be useful even though we have the Data Protection Act, but for these context specific uses. 

There’s the National Data Guardian looking at things. But again, that’s a small unit. And I think there’s a lack of transparency on the plans. I think if organisations like that had a better view of what plans, what ideas might actually be around, they could advise more constructively. For this contact tracing app, it became clear to me and many others that more legal safeguards would be useful even though we have the Data Protection Act, but for these context specific uses. 

So there might be a bigger picture to analyse and continue to update, of what are the real risks? What mitigations do we currently have available? And I think to me the big step forward would just be to say okay, data sharing is a bad thing. Utility sharing is a good thing. But data sharing is a bad thing. 

And change your thinking dramatically about that. And there are positive aspects of the open data movement, but some of those things are a little bit happy clappy. If we put things out there, innovation will happen by itself, in hack jams etc. But looking more at methods that allow sharing the utility in a way that doesn’t share the data. Sharing data is only a means to an end. 

Can you tell us what you’re going to be covering in your talk at the conference?

I will explain why data sharing is unsafe. The bigger picture around that. And maybe a little bit of the power that lies in people holding data in the first place. The modern story about AI responsibility is moving from ethics to power. There’s massive power in holding the data in the first place. So some of that story, and then some examples of why it’s really unsafe. There’s famous examples aplenty. And some glimpses at what the alternatives might be.

What are you looking forward to most about the 2020 A+T conference? 

I always enjoy talking to people from different disciplines. Through my personal interdisciplinary journey of the last ten-ish years I’ve realised that there’s always a lot to learn. There’s a rich background sometimes to the simplistic technological problems that I’m thinking of, or conversely, that the way I think about some problems turns out to be a complete mismatch with how things are in the bigger context.  I’m always keen to talk to social scientists with an interest in technology because we complement each other in a way.

You can listen to the full interview with Eerke on our podcast here.

Read more from Eerke at The Conversation. 

Buy Your Ticket