Click to learn more about author Roland Bullivant.
One of the key tasks early in a GDPR compliance programme is to locate where Personal Data is stored across an organisation’s information ecosystem. Often this is referred to as an Information Audit or Readiness Assessment.
To comply with GDPR, an organisation must be able to point accurately to where Personal Data is held. This information is critical to meeting GDPR requirements such as the upgraded Rights of Data Subjects in terms of Subject Access Requests, Rectification, Deletion and more.
This article explores the challenges posed by the most popular ERP and CRM packages in terms of discovering the location of Personal Data. It looks at the advantages and disadvantages of traditional approaches organisations might employ to find Personal Data in those systems, and explores whether there are software driven approaches which can help to solve this problem.
The size and complexity of the Information Audit/Readiness Assessment phase for GDPR will vary significantly. Smaller organisations are likely to have fewer systems and less complex business processes so it should simpler.
Larger organisations will by their very nature tend to rely on multiple computer systems to manage their operations. They may have a variety of Cloud-based Software-as-a-Service, ERP and CRM packages as well as home grown developments. Multiple Information Management and Business Intelligence solutions may also be part of the IT estate. Then of course there are desktop applications such as Excel and email plus files containing unstructured information.
We recently conducted research into five of the largest and most widely used application packages to understand the scale of the challenge encountered by their customers when locating Personal Data for GDPR compliance.
The research reveals that the task facing organizations with large and complex ERP and CRM systems is significant. SAP contains over 900,000 fields. There are about 140,000 in JD Edwards and 100,000 in Microsoft Dynamics AX 2012. These attributes may (or may not) contain personal information that require discovery and assessment. The size and complexity of the databases that underpin these applications mean that businesses that are not well-advanced in Personal Data discovery from these systems or are doing this manually may not be ready on time for GDPR.
So how do you go about finding and recording Personal Data in enterprise CRM and ERP packages?
Many organisations are addressing the challenge by creating a repository of Personal Data. Some are implementing Data Catalogue or Glossary solutions as a component of their overall Governance or Compliance programme. Others are relying on less sophisticated approaches such as spreadsheets. These provide a central location for storing the information which identifies Personal Data for GDPR and will include elements of the source application metadata, which in turn identify areas to be addressed further.
In the past organisations have never been required to categorise data in this way so how could they go about it now?
Looking for Documentation
Searching through documentation may seem a natural first port of call when trying to understand and locate Personal Data attributes. However, assuming it exists and is up to date with any changes that have been made to the data model during implementation there are still challenges.
If the information does exist in this static way, navigating it to find the relevant tables and attributes from amongst thousands will be a significant challenge. In addition, to share any useful data will require it to be rekeyed or copy/pasted into the Data Catalogue or similar repository.
Consider the SAP data model example above. Searching through documentation for 900,000 attributes for individual pieces of Personal Data would be impractical at best.
Manual Investigation
This typically involves someone tasked with scouring the relational database (RDBMS) system catalogue for any information which might provide clues as to Personal Data attributes exist. This is a perfectly acceptable approach for small database systems, where a package is limited in scope or was developed in house. It is even possible to reverse engineer their schema using modelling or other tools.
However, in the case of large packaged ERP and CRM systems the lack of meaningful metadata in the database system catalogue means that this approach will take a long time and require the skills of highly technical staff.
Another approach might be to try to discern useful information about the data itself through profiling, although this is difficult to do with any degree of confidence if the content of tables to be profiled are not known.
Asking Application or Technical Specialists
The most common method for identifying the metadata needed for programmes such as GDPR is probably to ask technical specialists.
They are likely to have the most familiarity with the application and its data model. They are also likely to have access to any technical tools supplied by the vendor which can be used to locate the information required.
These experts may have an intimate knowledge of the application under scrutiny and may be familiar with customisations to the data model. However, they are often busy and so delays may occur.
Once the information has been found there may be a challenge in transferring it to the Data Catalogue efficiently. This may involve the need to collate the source information, write specific software programmes to import data into the data catalogue, use copy and paste or even rekey information.
Hiring External Consultants
Another common approach to solving the problem of identifying the tables which contain relevant Personal Data is to engage external consultants or application experts from the package vendors.
In addition to the cost, there may be delays if the consultants have to familiarise themselves with the way the system has been implemented before they can start being productive. Once again, the issue of integrating this information into the Data Catalogue may also prove a challenge.
A further drawback is that using external consultants means that an organisation’s own data professionals may never get the opportunity to gain sufficient knowledge about this field. This can hamper future efforts to maintain the GDPR programme internally.
Internet Search
Using the internet to locate Personal Data attributes from popular application packages is also common.
This approach presents significant challenges as the results are unlikely to reflect the version of the system that has been implemented or its customisations. In addition, it would then be necessary to undertake a manual comparison between what has been found and what is in place.
Often the models found using this method are part of documentation, which means that they are static and there is no way to make use of that information in other software tools without rekeying it.
Best Guess and Hypothesis Testing
Finally, when faced with the problem of Personal Data discovery some organisations resort to using guesswork or hypothesis testing methods to try to find what they need. They rely on data observation, insight and on trying to find an appropriate start point from which to launch searches.
It can be frustrating, time consuming and potentially inaccurate process, especially when dealing with ERP and CRM systems containing thousands of data tables.
How About a Metadata Discovery Software-Driven Approach?
The quickest and most effective way to find Personal Data across an organisation’s IT ecosystem is to use software tools to do the heavy lifting in terms of uncovering the relevant metadata and sharing it with the Data Catalogue or other repository. This works well for many sources; small home grown databases, smaller applications, some Cloud-based systems, flat files etc.
Most tools designed to extract metadata, such as those from Data Catalogue, Data Governance or Data Modelling vendors to a good job of connecting to these source systems. For example most Data Catalogue solutions incorporate some form of scanner, crawler or other mechanism which connects to a source, identifies the metadata and imports it automatically.
Information and Data Management vendors with broader software portfolios generally provide much of the same type of connectivity and functionality for the type of sources outlined above.
However, for organisations running enterprise CRM or ERP applications from SAP, Oracle, Salesforce, Microsoft and others, using such software for this is unlikely to be effective unless it has a way to access their rich metadata.
Their application metadata is usually in the form of large, complex and customised data models which are hidden from view. Metadata extraction requires specialist technology to retrieve it and make it usable. This is because in most instances any useful metadata such as business names for tables and attributes or relationships between tables is held in a series of data dictionary tables not in the database system catalogue.
In addition once all the metadata is available it is necessary to be able to locate Personal Data attributes before bringing them into whatever repository is being used. As an example imagine the time it would take to find all instances of a Personal Data attribute such as “birth” in an SAP system with 90,000 tables and 900,000 attributes without some form of specialist solution.
One might imagine that vendors of the packaged application would have usable solutions to this. However, they do not have tools designed to make it quick and easy for data analysts and architects to access, understand and exploit relevant metadata in the context of a GDPR programme.
Other software vendors who need to provide functionality for discovering and recording Personal Data have a variety of different approaches. This may include offering templates for common business topics in specific packages as a jump off point for discovery. This would help their customers to make a start, however there is still work to do to compare the content of the templates with the system as implemented.
Others provide connectors, which do just that – they connect – however they provide little in the way of assistance in navigating a package’s data model.
A small number of vendors, like ourselves, do provide software with specialist capabilities to discover, extract and exploit the rich metadata in these ERP and CRM packages as implemented.
By doing this they provide the user with the ability to locate specific Personal Data attributes quickly and accurately. Results can then be consumed by Data Catalogue, Modelling or even spreadsheet repositories.
The challenge posed by GDPR with respect to Personal Data discovery is new. For many source systems existing technology solutions can be made to fit the requirement.
It is possible to locate Personal Data in the largest and most complex ERP and CRM packages. It is however a challenge which requires a different approach using specialist software in order for it to be met quickly and effectively. Sticking to the traditional time consuming, expensive and manually intensive methods will not deliver an effective solution.