Unlocking value from data is a journey. It involves investing in data infrastructure, analysts, scientists, and processes for managing data consumption. Even when data operations teams progress along this journey, growing pains crop up as more users want more data. The problems can spike quickly or grow subtly over time. You don’t have to grin and bear it; some tools and approaches can fix this. However, your data team should recognize that the following “time sink” issues are real and have a strategy to deal with them.
1. Access requires automation to scale
When you successfully set up a data catalog or implement a process for users to find and request access to data, administering access becomes an enormous problem. In typical data architectures, granting access to sensitive data often consists of a long list of manual tasks. For instance, creating and updating user accounts for multiple services can be very time-consuming.
Put it another way, no plan for Data Governance survives contact with users. Once you create your data infrastructure in a legacy Data Governance model, you will spend all your time providing access. For example, one global company I talked to had set up a data pipeline to move customer information successfully from one on-premises system to their cloud data warehouse. They implemented tools for self-service access, yet the demand was so high that they spent the next three weeks solely focused on granting access to that one system.
The only way to scale access is to automate it:
- No-code approaches allow you to quickly enable access or block access to a data set within a cloud data warehouse, associate that policy with specific users, and apply various masking techniques within minutes.
- You can also see who your users are, what roles they have, and the sensitive data they’re accessing.
- You can then identify areas where you can apply access policy and make it easy to create an audit trail for governance. (I’ll come back to auditing later.)
- More mature organizations grappling with thousands of data users may already have a data catalog solution. Integrating a control and protection solution with the data catalog allows you to create the policies, manage them in the catalog, and automatically enforce them in connected databases.
2. Migration needs to mature
Once you set up your initial cloud data warehouse and schema for consumption, you’ll want to move more data sets into the cloud. However, manual data migration approaches can bog you down and limit your ability to unlock insight from multiple sources. Instead, you can gain efficiencies by maturing your migration process and tools:
- Eliminate manual discovery and migration tasks with an extract, transform, and load (ETL) SaaS platform. ETLs simplify connection to multiple data sources, collect data from multiple sites, convert that source data into a tabular format to make it easier to perform analytics on the data, and move it to the cloud warehouse.
- Start using a schema manipulation tool like dbt, which transforms data directly in the cloud data warehouse.
- Follow a three-zone pattern for migration – raw, staging, and production.
- Maintain existing access and masking policies even as you add or move data or change the schema in the cloud data platform. For instance, every time an email address moves around and gets copied by an automated piece of software, you must be aware of that and go back in and apply masking policies. In addition, you’ll have to create an auditable trail every time you move data for governance.
3. Governance auditing must simplify
Now that data is accessible by more people, you should establish a Data Governance model to ensure the continued privacy, compliance, and protection of all that data. You must be able to answer questions about what type of data is in a specific database and who has accessed it. Frequently, this requires the data team to find the query or access logs and build their own access charts and graphs. When you have a big data footprint, with lots of users touching it, you can’t afford to waste time manually applying role-based access or creating reports. The only way to scale auditing is to simplify it. In this way, you can:
- Visualize and track access to sensitive data across your organization. Have an alerting system to let you know who, where, and how your data is accessed.
- Keep access and masking policies in lockstep with changing schema.
- Understand if access to data is normal or out of normal ranges.
- Create and automate thresholds that block access or allow access with alerting all based on rules you can apply quickly.
- Automate classification and reporting to show granular relationships, such as how the same user role is accessing different data columns.
Should This Even Be Your Job?
Finally, these all add up to the most significant time sink of all: the fact that data engineers and DBAs handle data control and protection. Does that even make sense? Because you’re the ones handling the data, moving it from place to place, who know how to write SQL code required by most tools to grant and limit access, it has fallen to the data team.
But is that the best use of your time and talents? Wouldn’t it make more sense for the teams whose jobs focus on Data Governance and security to be able to manage Data Governance and security? With the right no-code control and protection solution, you could hand these tasks off to other teams – invite them to implement the policies, pick which data to mask, download audit trails, and set up alerts. Then, get that all off your plate and move on to what you were trained to do: extract value from data. Now that’s eliminating the ultimate time sink!