Data Anonymization: Use Personal Data AND Respect Privacy

Click to learn more about author Stephan Kessler.

Today’s information technology allows many organizations to generate, integrate, store, and analyze data of unprecedented size and complexity. In many cases, this is personal data, and its usage is restricted according to privacy laws and regulations. In 2018, the European Union introduced the General Data Protection Regulation (GDPR), which limits the processing of data to either consent by the person affected or a mutually agreed upon contract. Similar laws are also in place or currently being passed in other countries, such as the US, where the California Consumer Privacy Act (CCPA) went into enforcement on July 1. Besides the legal and regulatory implications, organizations have to demonstrate the cautious treatment of personal data in times when privacy and security have become more and more important. With this in mind, organizations should stay current on best practices and technologies addressing data anonymization.

While analyzing collected personal data via analytics or machine learning can create tremendous value and act as a competitive differentiator for organizations, data anonymization, when done properly, is not a simple task. Even with the identification of sensitive information and removal of identifying attributes such as social security numbers or addresses, a data set is still not anonymous. For instance, a record without identifiers can still be relinked to an individual with the help of external knowledge. The combination of zip code, age, and sex is unique for a large number of people. So, how can organizations ensure that personal data can be used without compromising the privacy of individuals?

First of all, let us define some technicalities of data anonymization. The term “anonymization method” refers to technical measures that modify personal data in such a way that certain privacy guarantees can be kept:

K-Anonymity makes sure that each person is indistinguishable amongst “k” others with respect to the attributes. That means an individual can no longer be identified as described.
Differential privacy, in turn, is known from statistical databases and often uses random numbers to hide individual information.

Anonymization in an Enterprise Stack

Now let us take a closer look at how these technical measures are applied in an enterprise stack. In most cases, the organization’s IT stack is divided into three planes — infrastructure plane, data plane, and application plane — located on-premise or in the cloud, or on both via a hybrid architecture.

Applications typically collect personal data and store it in the Data Management layer for further processing. The Data Management layer can then exchange the data between different applications. Because the technology available is independent of the applications, and any application can leverage the data, this is the ideal place to apply anonymization. The main usage scenario for anonymization is to create data sets that can be analyzed without identifying individual persons. The second usage scenario is to protect confidential company data if a company contributes information to a data set but does not want anyone to know their true data. In this case, the company itself is the individual and must be protected.

Anonymization Use Cases

A timely example of this is today’s hospitals, who want to use data to help combat COVID-19 by leveraging insights from previous patients but are challenged as patient data is highly sensitive and confidential, requiring special protection according to data privacy laws. Through anonymization, data about COVID-19 patients can be used to provide beneficial insights for clinicians seeking information to treat a newly diagnosed patient without compromising the privacy of previous patients.

Another everyday example is centered on optimizing business travel bookings and expenses. Through data anonymization, a travel agent can work with secure and confidential data sets to optimize booking processes and reduce associated fees while responding to traveler requests and providing high customer satisfaction.

Personas and the Anonymization Process

So how exactly does this work? The first step is to ensure proper data integration in the Data Management layer. When personal data is involved, this requires the use of “personas” such as:

1. The Data Consumer who requires access to the data
2. The Data Controller who oversees the management and security of all personal data
3. The Data Protection and Privacy Officer who is responsible for ensuring that the organization treats personal data according to privacy laws and guidelines

Each of these personas has a unique role. For example, the Data Controller is responsible for defining the anonymization parameters and retrieves approval from the Data Protection and Privacy Officer, who also approves the Data Consumer. The Data Controller then creates an entity and grants access for the Data Consumer to a privacy view, which is a different representation of the original data set. Additionally, the Data Protection and Privacy Officer can request information on the privacy view in order to audit the applied anonymization methods. In the end, the Data Consumer only has access rights to the anonymized representation of the original data.

Summary

Image Source: SAP
Figure 1: Personas interacting with personal data.

Companies need to keep two main points in mind to ensure a successful data anonymization strategy. First, simple measures like removing identifiers are not enough to anonymize a data set. Second, anonymization enables the use of unidentifiable personal and sensitive data for performing analytics and machine learning, which historically required an individual’s consent. As concerns about personal data usage continue to rise at a global scale, countries and government bodies will further restrict the access and processing of personal, sensitive data. By implementing the proper data anonymization techniques and technology, a common middle ground can be found whereby companies can better serve consumers through personal data without compromising their identity. Read more information on this topic here.

BECOME A DATAVERSITY INSIDER FOR ACCESS TO 160+ COURSES

Data Topics

Leave a Reply Cancel reply