Researchers and technology companies have begun racing to automate the process of locating people infected with the coronavirus and contacting anyone else who may have been exposed to those who tested positive. This process—known as contact tracing—is going to be a key component of public health efforts, in conjunction with widespread testing, to protect essential workers during the pandemic and to eventually get others to start feeling safe enough to venture out, and for societies to start functioning again.
We are in the midst of a coronavirus pandemic. When did you realize that dealing with the pandemic would require a technological solution?
The research in my group has always been about privacy, health and machine learning. When I started seeing cases from China and Asia, I thought, ‘There must be something we can do’. I was at a conference in Florida with a lot of top officials from the CDC and the FDA, and nobody seemed to have a clue. Nobody was even thinking about it. We started thinking about it in February and then by the end of February we starting building the solution.
What was the exact nature of the problem that wanted to solve?
Telling people that they might be at risk, so that they can self-isolate and we can stop the spread of the disease. It is an old idea, from the Ebola days.
How is what you are doing different from the solutions being used by China and other Asian countries?
China’s solution is a top-down system, where they have a bird’s eye-view into everything that is going on, and they can orchestrate anything in the society. Can we do the same thing here, but without a bird’s eye-view, where everybody is doing the right thing for themselves, and we can still orchestrate that in a privacy-preserved way? That was the question.
What I realized at this conference in Florida is that all the research I have been doing in my group at MIT is a perfect match for this problem.
Isn’t it too late for such technological solutions, given that the virus has already spread throughout the world?
There are multiple ways to think about it. One is that about 18 percent of the population—health workers, grocery store delivery people, restaurant owners—they are all working. We need a solution for them, because they are essential workers. If they start shutting down hospitals or fire stations or grocery stores, the rest of the socio-economic system will collapse. So, keeping the essential workers healthy and functioning requires this solution.
The second is that to get the economy back on the rails, people will need to feel like there is some safety mechanism. You need some kind of dosimeter for people to know when they are taking too many risks and when they should probably stay away from certain places. If you want to get the economy back on track you need this early warning system.
And third, this is not going away anytime soon. We are going to see a second wave, a third wave of this epidemic. We need to be prepared for it.
Can we dig deeper into the design principles behind preserving privacy that you want to follow? Did these principles evolve as the work proceeded?
Some of them evolved as the work proceeded. After looking at the scenario and talking to a lot of health professionals, we realized that the notion of privacy in health is very different from the notion of privacy in other areas. In this case, healthy people should have complete privacy, but infected people have to go to the hospital, so there is a human touchpoint. You don’t expect those infected to be anonymous forever.
We have to ensure that healthy people have complete privacy. They should never have to give away any data about themselves. For infected people, their redacted information can be part of the system, but in such a way that their identity is not disclosed and they cannot be re-identified from the information collected about them.
And that achieves a really good balance to get started. In the future, we will use encryption, so that the privacy of the infected person is also completely preserved.
Can you contrast your approach with the top-down approaches that have been used in China or Singapore or South Korea? What about those solutions bother you?
In terms of the top-down solutions, they actually work really well. The problem is that over-zealous government officials can take the law into their own hands. So what ended up happening in Asia is that if government officials knew exactly where potentially infected persons were living, they would just go to their houses and either forcibly remove them or weld-shut their houses. They would take draconian measures. These local officials thought that was the right thing to do. But those draconian measures impacted certain populations in a disproportionate way.
That’s where privacy comes in. Privacy is for people who don’t have much power, so that local, over-zealous officials cannot use arbitrary rules to create challenges in their lives. It’s not a luxury that only a well to-do person needs; privacy is about the livelihood and dignity of an under-served person.
It’s important not to reveal the identity of the potentially exposed person. We should create a humane and harmonious mechanism, so that health officials can intervene, but ensure that they don’t have complete power over this individual.
What are the various components of the solution your team has developed and how does it work?
There are two main components. One is called Safe Paths and the other is called Safe Places. Safe Paths is a citizen-facing app. In version 1 of the app, people start logging their locations on their own phones. And they can compare their location trails with publically available data. If the government starts releasing data per county or per zip code, they can see where they are with respect to that data. They can check to see if they passed through a county or zip code designated as a hot zone.
In version 1, the app is merely logging location data, but that location data does not leave the phone?
That’s correct. Complete privacy for the healthy person.
What about someone who tests positive? What is their responsibility at that point?
This will be functionality built into version 2. The expectation is that the infected person will get a onetime password from the testing site to self-report their trails. It’s up to them to offer their data to public health officials, so that it’ll be available digitally in the system. That’s the role.
It remains a voluntary action?
Exactly.
Right now, with manual contact tracing, the process typically involves a phone call. A health official will call saying these are your test results, let’s talk about it: what’s your condition, do you have somebody who can take care of you, what kind of symptoms you have, and so on. And in the process, the official will also convince those infected to share information about everywhere they have been. Each call lasts over 30 minutes. We are hoping that it’ll be a shorter call now, because the health officials won’t have to ask questions about where you have been. That information is available digitally.
How will health officials process that location information?
They will use the web-based app called Safe Places. The official would do a short interview with the infected person, and they would use the Safe Places web-app to get location data from the infected person, and then release that information on what looks like a Google Map, using the Safe Places web tool.
Will that data be on a central server somewhere, which the Safe Paths app can access?
The Safe Places web tool will upload it to the city server or the state server, and then everybody in that state or town can download it.
So, health officials will have the GPS trails of people who have been infected. Will all identifying information be removed?
Yes. House location, work location, anything that the infected person doesn’t want to release into the public domain will be redacted. We expect public health officials to only release redacted data of infected people.
What comes next?
Once the redacted information is downloaded by the Safe Paths app on to your phone, you can immediately see if there is an overlap between your movements and those of infected people. So, if the infected person was at Starbucks on Tuesday at 2PM, and was at the grocery store on Wednesday at 4PM and at a wedding on Saturday morning, then you can download that data, and check to see if you were at Starbucks at the same time, or at the grocery store at the same time, or at the wedding at the same time.
And if you were, then it gives you some notion of your risk. Then you can decide, based on your risk profile, whether you should self-quarantine, or call the number to get some guidance or check your symptoms.
You are using GPS to log these trails. What kind of resolution will you get with GPS?
About 10-20 meters. We think it is a sufficiently good resolution to decide your risk, especially if you stay in the same location as the infected person for more than 10 minutes.
Can you also combine GPS location data with Bluetooth to get finer resolution?
Yes. We are using Bluetooth as well. Right now, GPS gives you a resolution of 10-20 meters. If you use a combination of GPS and Bluetooth, the resolution can be anywhere between zero to tens of meters. It’s just a parameter and we can change it based on what the epidemiology of the virus tells us. It’s not a technological challenge.
From the perspective of the public health official, what else do they need to do?
We have a whole separate pipeline of solutions that will be released in coming weeks for public health officials, where we will create “heat maps”, which show the epidemic’s hotspots, while still preserving the privacy of an individual. But, the information is being crowd-sourced from people’s phones in a privacy preserving way.
If you don’t create a privacy-preserving solution, then nobody will use it. They will just leave their phone switched off, or they will not load the app, or they’ll load the app and disable the location services and so on. How do you create a Google Maps like experience, where people are willing to give up their location, and in return they get to see the traffic? We don’t want to force people to give us their location, but we want to be able to show where the traffic is, where the disease hotspots are. We have a mathematical way of achieving that.
Can you talk a little bit more about the mathematical way?
There are various methods, whether it’s based on hashing, or encryption, or differential privacy, or homomorphic encryption. We are hoping to use a collection of those methods. Some of them work well when there are few users. If you are, say, in rural Oklahoma it might work. Some of them work when you are in a very dense region, for example, in New York City. Different densities will require different solutions.
Where are you in terms of figuring out the right algorithms?
We are writing some papers and keeping them in the public domain. We don’t want to implement them until we have enough peer review and feedback. We want to know about possible attacks against these systems. Once the ideas have matured, then we will start implementing them in our software.
Is this openness and transparency a necessary and good way to achieve your aims?
I think so. When it comes to health, people are very particular. Either you have a very good public image, like Google or Apple, and people are willing to trust you, or you have a solution that preserves privacy from the start, so that people feel comfortable contributing data.
What about the code itself? Will it be open source?
Yes, all the code is open source. That’s another way to develop trust.
There are many teams that are working on Bluetooth-only solutions. Could you describe how they work?
In Bluetooth, there are two phones. Let’s call them Alice and Bob. If they are next to each other, they will exchange a Bluetooth token. Later on, when Bob is declared positive, he will upload his tokens into central servers, and Alice’s phone will download tokens from the server every day, to see if the token she received from Bob when the phones were together is the same as the token she received from the server. If they match, she knows that she was close to an infected person.
What are the pros and cons of the Bluetooth-based approach?
Bluetooth based methods are also great. But while they might tell you that you crossed paths with someone who was infected, they don’t tell you where it happened. If it happened in a grocery store versus Starbucks, the healthy person needs to know, because maybe they were wearing a mask in the grocery store, but not in Starbucks. So, the context is important for people to know whether they should take this information seriously or not.
Another important problem with Bluetooth is that it only scales as the square root of the number of apps using it. So in Singapore they have 12 percent penetration of their Bluetooth app. The square root of that is 1.44 percent. This means that if 12 percent of the population uses the app, it’s only capturing 1.44 percent of the total encounters.
What about GPS-based solutions?
They don’t have that problem, because they scale linearly. If 12 percent of the population is using the app, then 12 percent of the places they went to can be marked as hotspots.
How do Google and Apple fit into the scheme of things?
Using Bluetooth was very clunky. The good news is that Apple and Google have decided to create an API to do Bluetooth-based proximity detection. That makes life easy, and now other folks simply have to build apps on top of it, to provide GPS solutions, or to provide the ability to call, or to do what we are doing, which is to create solutions for public health, such as dashboards and heat maps.
Is there any concern, now that Google and Apple are involved, about what happens to the data that might end up in their servers?
These are both seen as benevolent companies, so people aren’t worried about them misusing the data. Nevertheless, it does lock out other players from providing more meaningful solutions. That’s a threat. It’s the same as when Microsoft integrated internet explorer into Windows—that prevented Netscape from growing as an internet browser. We could get a similar situation here. If Google and Apple start locking away the Bluetooth tokens, then nobody else can provide such solutions. That could stifle innovation.
France has challenged Apple and Google that their actions may stifle innovation.
Are there concerns about security?
Absolutely. While it’d be difficult to hack into the Apple and Google servers, it’s possible for a third party to release an app and listen to these Bluetooth tokens. That signal can be used in many ways.
In this context, do you feel that the GPS-based solutions are more secure?
Yes. In the GPS-based solutions, your phone is not constantly sending out tokens.
What else worries you about such technologies going forward, given that some form of technological contact tracing is going to become really important?
I see more problems with the Bluetooth solutions, because it will definitely get used for commercial reasons, and this will become commonplace. Then we won’t be able to turn the clock back.
Do you mean that our phones will constantly be generating these Bluetooth signals and that leaves you open for other ways of using it?
Exactly. And it’ll take some time before things stabilize. The same thing happened with GPS. Even today there are many apps that exploit the GPS and sell your data, but things are stabilizing now. People are realizing that GPS is not low-stakes information; rather, it’s pretty high-stakes information, and you should not be giving it away.
Could Bluetooth tokens also become high-stakes information at some point?
Yes, over time.
The success of all of this technology will depend on how widely it’s accepted and adopted. What are the main obstacles you see for widespread adoption of this contact-tracing technology?
There are many obstacles. People have to trust the technology; it needs to be available to many players; and so on. It’s not going to be an easy battle.
What about people who don’t use smartphones—young people, the elderly, people who cannot afford them? How does this solution reach them?
The good thing about the GPS-based solution is that if infected people can release their redacted trails, these can be shown on TV news channels, like they show weather maps, or it could be printed in a newspaper and so on. It’s very easy to disseminate that information.
Do you have any final thoughts?
To overcome this crisis, ideally we need careful orchestration by the government and large businesses. To do that, they need a bird’s-eye view. That can be invasive and in conflict with individual freedom, dignity and privacy. It could also compromise trade secrets and create national security concerns that organizations and countries need to consider. So the need of the hour is new technology that that can both preserve privacy and provide the bird’s-eye view.
We need tools for citizens and organizations to coordinate among themselves. Safe Paths is about creating these privacy-preserving coordination tools. In the short term, we hope to beat the virus and this public health crisis. In the medium term, we want to build the coordination backbone to help restart the economy. And in the longer term, we want to help build resilient societies.
Further info on our Safe Paths/ Safe Places initiative: