Click to learn more about author Dimitri Sirota.
Effectively responding to DSARs or Data Subject Access Requests is perhaps the most challenging aspect of complying with emerging privacy regulations such as GDPR and the impending California Consumer Privacy Act (CCPA). Rooted in the concept that a person’s data belongs to them and not the organization retaining it, being able to respond to these requests within the time parameters set by the legislation requires newfound agility for enterprises.
It is no longer enough to be able to fulfill these requests in an ad hoc manner in which a request triggers a series of emails to different members of the organization to unearth what data they possess related to a particular person. This method is not only inefficient, it often causes replication of personal information in a way that compounds the problem further. And with the potential to have to fulfill hundreds if not thousands of requests simultaneously, there is no way such a process will scale. As a result, businesses could incur brand damage, fines, and face civil action.
Companies collect personal data in many places across the data center and cloud. Data is collected in every type of data store and application. And while organizations have developed technology for detailing their data stores and applications, they have nothing similar for the data residing in them. Complicating matters further is the fact that with DSARs, it is insufficient to simply find personally identifiable information (PII) residing in the data stores. Complying with the intent of the legislation necessitates enhanced accountability to subjects by providing them with all of the personal data relating to them.
This means being able to look across all data stores and applications and identify what data is personal, to whom it belongs and whether consent exists. Finally, there is the matter of who decides if a company has satisfactorily fulfilled a DSAR request. In the case of GDPR, when it comes to personal data rights, the responsibility doesn’t belong to a regulator. Instead, compliance is determined by the individual data subject–usually a consumer or employee. This means that there are effectively 500 million “regulators” in Europe.
Getting Personal: You Need More Than Classification
DSARs are a means of protecting personal data rights under GDPR and additional global data privacy regulations. In order to fulfill these requests, organizations must know exactly whose data they hold, where and in what context. As mundane of an understanding of organizational assets as this sounds, in practice, it is extremely difficult. This is because legacy technology cannot identify PI or contextual personal information, cannot automatically determine to whom it belongs, and cannot look everywhere an organization keeps personal data.
Until recently, the process of manually finding a person’s data when asked was an acceptable practice. With loose regulations and infrequent requests, an inaccurate, timely process was of little organizational concern. Therefore, a request would be routed to a person who would, in turn, ask various application owners and data store owners to report back on what each system contained. This labor intensive, imprecise process with great reliance on search failed to provide contextual personal information, and also lacked scale.
But now, in the post GDPR era, organizations are starting to turn to technology employing automation in the DSAR request and fulfillment process. Prior to the advent of purpose-built tools, default technology would come in the form of data classification-based security or eDiscovery tools. These products were designed to find keywords or PII in files, email, and databases relying on pattern magic using Regular Expression. For use cases in PCI or HIPAA, where exact search criterion was available with only a limited volume of data to scan, they worked, albeit slowly. However, they prove inadequate for privacy use cases like DSAR since they are not identity aware, meaning they are unable to look everywhere, can’t identify general PI and most critically – have no way of correlating any PI data accurately back to an individual.
DSAR Automation at Scale: How To
Automating DSARs at scale, which with current and pending privacy regulation is a necessity, will require two types of innovation not previously afforded with older data classification tools. First, organizations will need to find and inventory what data they have on an individual. Secondly, once this is achieved, they must be able to operationalize requests and fulfillment activity to accommodate differing request portals, configurable response types, analyst work-flows, consent integration and batch for greater volume scenarios.
The process of discovering and inventorying data on every individual an organization has collected information on presents legitimate challenges. There is the hurdle of defining what is personal, there is the need to look across unstructured, structured, Big Data, cloud, etc., and there is the obvious requirement to be able to sort data by person. With legacy classification capable only of finding pre-defined data sets, finding all personal information is impossible. Finding PI vs PII requires an ability to determine if data is by, or about, someone because it is the context of that person. A new method of interrogating data stores that avoids classification style dependencies is required to look across the entirety of data, from cloud to application.
This new method must map or visualize data without copying or duplicating, as PI is extremely sensitive. A great way to attract attackers, who are growing more sophisticated by the minute, is centralizing data and creating a jackpot of sensitive information. Lastly, the ability to automatically correlate data back to a person is a necessity with enhanced technology. While that is pretty easy for uniquely identifiable data like a credit card, that is very hard for semi identifiable data like a birthday, GPS coordinate, IP address, cookie or shopping preferences to give some examples. Moreover, the technology will need to be able to both resolve identities to ensure any requestor gets all their data accurately, but also disambiguate similar identities to ensure persons with the same name don’t get confused with one another.
Getting Privacy Right
Beginning with Europe and onto Brazil, India and the U.S., new privacy regulations are emerging on a seemingly weekly basis. Although unique intricacies are associated with each, they all share a common foundation: companies need to know what data they have on individuals and if they are using that data in legitimate, approved ways. For individuals, these new privacy regulations will most strongly manifest themselves as a set of new personal data rights to things like access, port, correct or delete one’s data. For organizations subject to new mandates, personal data rights will represent the greatest challenges since it requires them to know their own data to a degree of detail never before needed.
In order to succeed in this new age of regulation, organizations must employ technology and tactics allowing them to find personal data by identity across the big petabyte scale of information volumes organizations keep across their data stores and applications. But, finding and understanding data in a privacy-centric way is not sufficient to help organizations meet the emerging requirements of modern DSARs. Companies will also need all the operational capabilities to report or act on the data back to an individual. You can’t protect what you can’t find. You can’t accommodate what you can’t process. These are the realities of solving today’s essential privacy problem – DSARs.