Click to learn more about author Sameer Vadera.
Businesses often employ AI in applications to unlock intelligent functionality like predicting relevant product recommendations for customers. Recently, businesses have started building AI-powered applications that provide predictive functionality using sensitive information — a significant benefit to users. For instance, today, there are AI applications trained using medical records to assist in predicting diagnoses for patients and other applications trained using private emails to predict the next sentence to write in a text or email.
But, training predictive AI models using training data that includes sensitive information complicates compliance with data protection regulations. Data protection regulations in the context of AI require businesses to ensure that training data is safeguarded against unlawful access by unauthorized parties. Certain types of AI models, however, exhibit inherent characteristics that make protecting the privacy of training data a difficult task. For instance, if appropriate safeguards are not taken, predictive AI models, such as generative sequence models commonly used in predictive sentence-completion applications, can unintentionally memorize sensitive information included in training data. This unintentional memorization creates the risk that the predictive AI model will leak sensitive information, such as a business’s trade secrets or a user’s credit card number, as a prediction in response to a new, previously unseen input.
An adversary can exploit this information leakage and infer details about the intellectual property contained within the training data (e.g., sensitive training examples protected by trade secrets) or of the parameters of the model itself (e.g., the weights between nodes of a neural network, which could be protected by trade secrets). This is a significant problem because collecting training data and training an AI model to generate the model’s parameters can be very costly and effort-intensive. Businesses have an interest in protecting those costs. Adding to the problem, the adversary can make inferences about the contents of training data by merely using API access to the AI model. For instance, the adversary can probe the AI model with a large number of queries and then analyze the model outputs to make inferences about the contents of the training data or the parameters of the model itself.
This article provides a primer on two common AI privacy attacks that an adversary could use to extract intellectual property, such as trade secrets, in the training data or in the parameters of the AI model itself. Additionally, this article provides an overview of the Information Commissioner’s Office’s (ICO) recommendations for safeguarding the intellectual property included in the training data. The ICO is the UK’s independent body, which is set up to uphold data rights.
Sensitive Information Can Show up in a Variety of Different Forms Within Training Data
Sensitive information includes any data that is protected using a security measure, such as encryption. An example of sensitive information is any intellectual property, such as trade secrets, contained within training data. Trade secrets can include, for example, certain financial, business, scientific, technical, economic, or engineering data. Training data can include sensitive information in the form of structured data (e.g., a data element containing a customer’s private email) or unstructured data (e.g., the customer’s annual spend included in the text of a chat transcript or in an audio recording).
While a single data element can contain sensitive information (e.g., a private email address), the combination of different non-sensitive data elements can also be sensitive. For example, a study has shown that 87 percent of users in the U.S. can be uniquely identified using a combination of three non-sensitive data elements — zip code, gender, and date of birth. Far more complex, though, is when sensitive information shows up in training data as context from otherwise non-sensitive information. For example, a transcript of a chat session between a user and a chatbot may include the statement: “I have a bad connection on the 4th floor of my office, so I’m now in the lobby in front of the G St. Buffet.” Suppose there is only one G St. Buffet; then the user’s location could be contextually recognized from the unstructured transcript text.
Keeping Sensitive Information in Training Data Secure Is a Complex Challenge
Protecting the training data of a predictive AI model against unintentional loss to adversaries is a challenging task. Many AI applications run on systems that use large sets of training data, validation data, and testing data. Training data is used to train a model, validation data is used to fine-tune the model’s hyperparameters, and testing data is used to evaluate the performance of the final model. If a business uses a third-party machine-learning-as-a-service (MLaaS) vendor to build a predictive AI model, the business may need to authorize the MLaaS vendor to access the business’s training data, validation data, and testing data. Authorizing access to a third-party adds complexity to complying with data protection regulations because this can potentially pave a path for a privacy attack.
Additionally, a trained predictive AI model inherently memorizes aspects of its training data to some extent (e.g., the weights between nodes of a classifier model can represent memorized correlations within the training data). If appropriate safeguards are not employed, an adversary can exploit this inherent memorization characteristic of predictive AI models to extract rare or unique sensitive information within that training data simply by making inferences on model predictions. There are several types of privacy attacks that adversaries can perform to infer the contents of the training data or of the parameters of the model itself. Two main types of privacy attacks — a model inversion attack and a membership inference attack — are discussed below.
Model Inversion Attacks on Predictive AI Models
In a model inversion attack, an adversary aims to expose the unknown sensitive features of a target training example using known non-sensitive features of that target training example and the output of the predictive AI model. To illustrate, in a real-world model inversion attack, data scientists built a predictive AI model trained to predict the correct dosage of an anticoagulant to prescribe to a patient. The predictive AI model was built to receive certain genetic biomarkers and other demographic information of patients as input. An adversary had access to some of the demographic information about the patients included in the training data. The adversary used a model inversion attack to infer the sensitive genetic biomarkers of the patients included in the training data, even though the adversary did not have access to the training data.
Membership Inference Attack on Predictive AI Models
An adversary can perform a membership inference attack to infer whether or not a given user record was included in the training data of a predictive AI model. This is a black-box privacy attack, and thus, the adversary does not have access to the training data or the trained predictive model. To illustrate, electronic health records are used to train a predictive AI model that is built to predict the optimal time to discharge patients from a hospital. If an adversary can gain access to query the trained predictive AI model with any patient features and receive the output (e.g., through an API), then the adversary could launch a membership inference attack. While a membership inference attack does not reveal the information contained in a given training example, this type of privacy attack does reveal the existence of the given training example in the training data. In some cases, the existence of a given user record within training data is sensitive information, for instance, in the case of a user enrolled in a confidential genomic study.
The ICO’s Recommendations for Protecting the Privacy of Trade Secrets Contained in Training Data
The ICO recommends assessing the privacy risks involved in providing a predictive AI model to others outside of an enterprise.
- The ICO recommends safeguarding against privacy attacks, such as model inversion attacks and membership inference attacks, by avoiding building a predictive AI model that overfits its training data. A predictive AI model that overfits its training data is one that learns the noise of the training data. Learning the noise of training data results in the predictive AI model unintentionally memorizing particular training examples in the training data, as opposed to learning the generalizable patterns within the training data. If the noisy training data includes trade secrets or any sensitive information, that information is at risk of being unintentionally revealed as an output of the model.
- Some predictive AI models are trained to output a confidence score along with the model’s prediction. The confidence score represents the model’s interpretation of the confidence that the model’s prediction is accurate. Confidence scores, however, can be exploited by adversaries in a privacy attack. Providing a confidence score along with a model prediction creates a vulnerability in that the confidence score is an indication of the extent to which the model has seen the input before. If the input is a target user (for which the membership in the training data is being inferred), then the confidence score can lead to an inference of whether information about the target user is included in the training data. In light of this, the ICO recommends balancing the need for end-users to know the confidence of a model’s prediction with the vulnerability created by providing end-users with the confidence score.
- If the predictive AI model is accessible to anyone, for example, through a public API, then the model might be vulnerable to black-box privacy attacks. In a black-box privacy attack, an adversary can query the model and receive a model prediction using an API, for example. An adversary can transmit lots of queries, obtain the outputs of the queries, and then evaluate the relationship between the inputs and the outputs to infer characteristics about the training data or about the model itself. Therefore, monitoring the queries that are transmitted to the model could help identify an AI privacy attack.
Consider the ICO’s Recommendations When Making Your AI Model Externally Available
Protecting the privacy of intellectual property in training data or in the parameters (e.g., feature weights) of a model is very important for many reasons. Data protection regulations require protecting this data from unauthorized access. Further, collecting training data and generating the parameters of a model can be a significant investment, and thus, protecting this investment is important. Predictive AI models can add complexity to the goal of protecting the privacy of training data. The ICO’s recommendations offer a safe avenue to opening a predictive AI model to the public (e.g., using a public API) while at the same time protecting the training data of the model from unintentional loss to adversaries.