Click to learn more about author Brian Platz.
Data has been called the new oil. Now on a trajectory towards increased regulation, the data gushers of yore are being tamed. Dedicated agencies such as Britain’s recently approved Digital Market Unit and the new California Privacy Protection Agency (“CalPPA”) will enforce compliance. Data will become trackable, accountable, and, if misused, a source of punishable offenses.
Just as the oil industry went from wildcatted fortunes to managed wells on EPA-approved plots of land transported through OSHA-certified tankers, so data will eventually be held and transported in ways that enable full traceability. That requires a big change in the data stack, one from data lake to data ecosystem and Data Management to data services. The first step is understanding where data architectures are today and their limitations.
Wildcatted Data
Dazzled by data, organizations in the past decade built their own specialized stacks to store the valuable packets. In big companies, this meant that every department received its own stack, leading to a virtual city of stacks. The marketing department might use Hubspot, Marketo, Mailchimp, Salesforce, Google Analytics, and/or WordPress. HR, meanwhile, might use Kronos, Workday, Gusto, ADP, or Zenefits. Customer support’s data stack would be different, with ZenDesk, SurveyMonkey, Hootsuite, and so on.
Data moves between innumerable team members, customers, contractors, and third-party vendors. Keeping every data packet managed and visible under these conditions is nearly impossible. Yet visibility, transparency, and integrity are exactly what compliance agencies demand.
What It Takes to Become Compliant
In order to meet government visibility and accountability requirements, data in all those different stacks, traveling through tens of thousands of pipes and databases, must become integrated into a single, fully visible system.
Compliance is a daunting task. New agencies are popping up around the world to regulate data and digital competition, and no two agencies have the exact same requirements. In addition to Britain’s Digital Market Unit and CalPPA, each EU member state has its own Data Protection Authority. Japan and South Korea both have an agency called the Personal Information Protection Commission; Brazil has a National Data Protection Authority.
It helps to break down compliance into its most basic elements. What do you need to have in place to ensure that your data systems fit regulations at the most elementary level?
1. Data Integrity
This refers to the accuracy and consistency of data. It requires audit trails, regular archiving, and systems that automatically validate data accuracy and consistency. Data must be clean and interoperable.
The data stack, in other words, has to get smarter. It should automatically prevent someone from inputting the wrong address, credit card expiration date, or other data. It should also reconcile semantic mismatches (for example, “surname” in one system must match “last name” in another). The idea is that data is regularly updated and accurate so that in case an agency demands an audit or some form of evidence, the gears of the entire data team don’t grind to a halt in an effort to dig for the right information.
2. Data Provenance
Data provenance is a record of everyone and everything that influenced data from its point of origin. Where does each piece of data, including unstructured data, come from? Where has it gone? In a world where cameras surveil the public, cars contain computers, and the average enterprise uses 1,295 cloud vendors, these questions are not easily answerable.
To even begin to solve for data provenance, teams must access an app or endpoint that can query data, write custom code, and be able to deploy a number of databases to keep a historical record. Hopefully, those systems will be enough to appease auditors.
To further complicate the matter, regulations are sometimes written without implementation in mind. Adhering to GDPR, for example, may prevent you from complying with someone else’s set of regulations. Organizations are being pushed into a situation where hiring an expert, such as a law firm, becomes the only way to even understand what they are up against. Those who can’t immediately afford such an expense may wait until absolutely necessary, but this, in turn, may both affect productivity down the road by hurriedly pulling IT systems offline sometime in the future and alienate consumers, who are increasingly privacy-conscious. There is a way around this.
Master Data Services
Compliance will be easier to attain when marketing, HR, sales, DevOps, and all other departments are connected. This is the emerging practice of Master Data Management, which involves building a single platform or “golden record” for organization-wide data assets.
In its current form, this golden record appears more like hieroglyphics: a static, read-only representation drawn from a data lake that is legible to data scientists and data compliance teams but few others. This approach comes up short in the face of tomorrow’s compliance questions. Imagine bringing up static, read-only records to track a criminal who used a driverless car or a piece of recruiting data caught by an HR rep during a casual online browsing session. Placing stringent accountability on static, read-only records is like trying to pay your monthly mortgage using bags of coins — time-consuming, bulky, outmoded, and cost-prohibitive to ship.
From Data Lake to Data Ecosystem
In the early 1900s, oil became the energy source for nearly everything. Oil — alongside the electric light bulb and car — changed our cities, houses, transportation patterns, and many other things. We now live in an interconnected oil ecosystem, one that would be far less functional if oil were sourced, stored, shared, and managed differently by every city and town.
The evolution from data lakes to Master Data Management runs along similar lines. Master Data Management organically connects all users, supporting an ecosystem. Data integrity and Data Governance are baked in, leading to the privacy and accountability that agencies demand. Master Data Management spits out a “golden record” on-demand, with little interpretation required.
Put technically, it shifts accountability from a read-only master data lake to a dynamic, bidirectional data fabric. It’s a new type of organic stack that enables advanced use of data, the same way oil infrastructure enabled an ecosystem of innovation and services (with similar ecosystems now growing around renewables).
As societies around the world continue to react to the consequences of unregulated data, which fueled tech monopolies and virulent disinformation, it is time to do more than react with fear to looming regulations. It is time to rethink Data Architecture to create the ecosystem of tomorrow.