Raluca Popa is designing a secure computation platform that will enable organizations to perform collaborative machine learning studies on their aggregate data, while maintaining the privacy of their data and without sharing it with each other.
Raluca Ada Popa is an Assistant Professor of Computer Science. Her research is in the area of computer security, systems and applied cryptography, with an emphasis on computing on encrypted data. She received her B.S., M.Eng.,and Ph.D. from the Massachusetts Institute of Technology.
Area of Research
Secure Collaborative Learning
Organizations that own sensitive data often wish to conduct collaborative studies informed by the aggregate data from all of these organizations, but they cannot do so because their data cannot be shared due to privacy concerns or regulations. With her Bakar Fellows Spark Award, Raluca Ada Popa will design and build a data encryption platform that will enable collaborative machine learning studies by performing these multi-party computations under encryption. Each participating group will have a private key to encrypt its own data before submitting it for analysis. Training will be performed on the combined encrypted data from all collaborating groups, and the resulting final model, also encrypted, can be decrypted by each organization using its own private key. In this way, the organizations will not share their sensitive data with any other party, yet they will all learn the results of the study. Examples of such uses could be hospitals seeking to predict influenza hot spots or better cancer treatments, or banks seeking to share information to detect money laundering or fraud, potentially saving thousands of lives and millions of dollars annually.
The border between what is public and what is private has become hard to patrol. Social media’s siren song plays out billions of times a day, connecting us far and wide, while sometimes hyper-sharing the most private facts of our lives.
The tension between the need to share and the need to rope off critical information constrains a wide range of enterprises too. Banks, for example, can’t collaborate on concerns of common interest without compromising customer confidentiality.
Raluca Ada Popa, assistant professor of computer science, designs computer systems to protect confidentiality by computing over encrypted data, while at the same time allowing joint access to the results of data analysis. With the support of the Bakar Fellows program her lab plans to build and test a new encryption system.
She describes the new systems and the encryption that underlies them, and she discusses her work with industry and health care systems eager to collaborate without compromising precious data.
Q. How does the push-pull of data sharing and confidentiality affect business operations?
A. Banking is a good example. Banks need to keep their transactions private or they risk losing clients to other competing banks. But if they try to stay totally sequestered from competitors, they lose the chance to tackle problems best solved together.
We’re involved in a collaboration now with five banks in Canada. We’re helping them address a big problem: Criminals launder money across different banks, but the banks can’t share data to track the laundering because they don’t want to expose their clients’ data to each other.
We are developing encryption algorithms and a system that can allow each bank to supply information for a shared strategy – a model to detect money laundering, but without disclosing their clients’ data. Each bank’s data can be manipulated, but it can’t be read by the other banks.
Q. How can a bank share data without the others seeing it?
A. That’s the magic of the encryption technology. Only the model is accessible to all the banks. It’s as if you supply your data to a blindfolded machine learning system that can compute on the data but cannot see it. You end up with a useful model without comprising your confidential data.
Q. You are working with medical centers to tackle similar kinds of problems, right?
A. Yes, we have proposed a solution to a problem faced by a major Bay Area health care provider. Researchers want to develop a good flu predictor to improve their vaccination program. They need patient data from hospitals over a wide geographic area, but they are blocked by each hospital’s obligation to protect patient confidentiality. Just as with the banks, they need a way to analyze data to develop a useful model without divulging sensitive information.
The encryption algorithm we are developing will allow them to supply enough data to allow a machine learning program to develop the model, while still “masking” the patient records. Researchers can access only the data needed to develop the model.
Q. This seems like such a pervasive problem. Aren’t there already algorithms and other strategies to allow this now?
A. Our peers in the theoretical cryptography community have developed a number of general-purpose solutions, but these are orders of magnitude too slow for many problems. For example, our system called Helen is about 1,000 times faster than current technology for the same level of security.
Q. How would you get new clients to adopt the system of submitting their data to the machine learning algorithm when that data is at the very heart of what makes them competitive?
A. Yes, clients indeed need to gain confidence in the power and privacy of the encryption algorithm. We already have a formal mathematical proof of the security of the algorithm. You need the community to understand, analyze, and build trust in it. Our system is going through extensive security and code reviews and tests by experts. And we are also setting up “hackathons” for hackers to try to attack the system.
My senior graduate student, Rishabh Poddar, is a Bakar Innovation Fellow this year and leads this effort on the student side. In addition to enabling adoption for his research on this project, he is considering launching a startup company to offer this technology. Organizations have a huge need to be able to work together but protect their sensitive data, and many industry sectors need such a technology.