Click to learn more about author Tejasvi Addagada.
To date, many organizations have focused on formalizing data consumption practices through distribution technology, access-based delivery mechanisms for analytics, and AI functions. However, with data protection laws and positive awareness across the world, firms have extended the formalization to data collection management. This, in fact, is the first lifecycle stage of data.
1. Formalizing Data Collection from Customers and Third Parties
Managing Data Quality at Sources: Across the multitude of native and digital channels, harmonizing Data Quality rules will bring consistency in sourcing correct customer data. There will be an increase in the use of AI-based discovery of data rules that will make it much easier for data offices to achieve the scale of fixing bad data. For example, validity rules for mobile numbers can be consistent across all channels that curate this data from customers and partners. This approach removes ambiguity in monitoring for quality even though data is siloed.
- Measuring accuracy using AI — clean data is a crucial need to get an outcome from machine learning capabilities. Scale and diversity in data is also another important aspect
- Data Quality and Data Governance can maximize your AI outcomes
Classifying and Labelling Legitimate Data: An important aspect is classifying data that is curated from customers or third-parties as private and zero-copy data. Data protection best practices suggest minimizing data curated from customers to reduce the threat surface area for data security risk.
Extended Trust to Customers: However, as trust builds up with customers, they will be more forthcoming in providing you additional zero-copy data to improve the services and products they receive. For example, if I am traveling to Europe, I will be providing my dates of travel and locations of travel to my bank. That enables the bank to provide me a forex card as well as enabling international transactions and increasing limits on cards.
Ownership and Stewardship: A successful data ownership should traverse divisional siloes. Data ownership is often not a full-time job for most data owners, while it can be a full-time profession for data stewards. We see an increased focus in organizations to enable full-time stewards.
At the outset, the context of data should also be recorded by stewards and owners in a central namespace like a catalog. Data Governance is a methodology that helps implement individual and shared ownership of data across the organization. Often without the context for a data element, it can be non-relevant to consumers to provide AI and analytical models. Deriving accountability for privacy will be an enabler
2. Increased Data Awareness and Literacy
Knowledge of data-in-context, data processes, best techniques to provision, as well as tools enabling these methods of self-service are crucial to democratize data. However, with technology advancements, including virtualization, self-service discovery catalogs, and data delivery mechanisms, the internal data consumers can shop and provision for data in shorter cycles. In 2020, it took organizations anywhere between a week to three weeks to provision complex data that includes integration from multiple sources.
Also, an increase in data awareness will help data consumers explore further available dark data that can provide predictive insights to create new user-stories that can propel customer journeys.
Measuring Benefits from Data Management and Aligning to Value-Chains: A lack of focus is common across organizations as they assume Data Governance as an extension of either compliance or a risk function. Data Literacy will, in fact, change the attitude of business owners towards having to actively manage and govern data. There are immediate and cumulative benefits from actively governing data either by defining data or fixing bad quality data. But there is a need for a value-realization framework to actively manage the benefits of Data Management services.
Data Ethics: Blending data privacy and data sharing can bolster innovation in the business ecosystems while unlocking the economic value of data. The first step for any startup or a well-established organization is to build a controlled environment that can govern and manage data well. This activity will further cascade trust in the internal data hosted by various functions like marketing and create a culture of sharing for “digital and customer-centricity.”
As Data Governance is known to have a cascading positive impact on corporate governance, at the same time, people outside the organization start trusting the organization as a steward of their data. A well-matured organization can be called a “data trust” where the control of data is held by its customers. Though organizations are either controllers or processors of data — the group can be viewed as having an ethical element in having a fiduciary duty to maintaining the integrity of people’s data.
Data Protection: With an increased focus on data protection and governance policy in governments, awareness of these methodologies will assist in preparedness for compliance with the laws. Governing data will bolster digital transformation .
Consumers have already started embracing digital handshakes with marketers at a rate even faster than the previous year. U.S. consumers have spent more than $66 billion online in July 2020, 55 percent more than one year earlier.
Use of AI in Data Protection: The development of artificial intelligence and data protection domains is largely dependent on economic and societal needs. While artificial intelligence develops better customer services by wrangling trillions of pieces of big data and learning from it, data protection is poised to build trust in people to share data with organizations. A recent survey from Gartner showed that over 40 percent of privacy compliance technology would rely on AI by 2023
3. Data Distribution Management
There is merit in having to drive events in customer journeys based on insights derived by a deep learning model that crunches real-time data. This requires data being pipelined in real-time rather than mini-batches or batches to a data lake or a cloud warehouse to run artificial intelligence models.
A simple question you should ask yourself — do you want to process streams of data before a state of an application changes, or are you okay with pipelining data into a lake or warehouse to derive insights within a timeframe like 15-30 minutes?
While we break this down, Data Architecture and engineering are associated with having the right stack available while satisfying security requirements to move data internally through batch, real-rime with low latency, and semi-real-time with acceptable latency.
Another example: While a customer fills out data on an application form for a home loan, a back-propagation model that uses real-time data can drive decisions based on demography like flood insurance or fire insurance or prompting for other protection plans. Or it can, in fact, predict fraud or typos from customers in certain co-related data like income or place of work.
The below enablers or processes of Data Management are the focus areas that have an impact on data distribution gaps:
- Data Delivery Management: How is data being delivered from sources?
- Platform Governance: Are there processes that make storage policy-relevant along with costs?
- Data Provisioning: Are sources of truth certified across the landscape?
- Metadata and Meaning of Data: Is there a unified data fabric or a business information model to build confidence through standards?
- Integration Management: Are common standards and canonical models leveraged?
- Data Availability: Is data discoverable by rightful consumers with ease?
Gartner predicts that by 2023, organizations can accelerate time to integrated delivery by 30 percent by employing data fabrics.
Platform Governance: In the past few months, firms have accelerated digital transformation across multiple journeys of onboarding and servicing customers. This has been possible by integrating and aggregating multi-sources as well as taming the “data swamps” to deliver quality data.
Platform Governance is the mantra to a healthy delivery of big data and native data platforms. An increased number of platforms, including native data warehouses, data lakes, and cloud-warehouses that are being fueled by cost parameters for compute and storage, have increased the complexity of platform teams.
A formalized methodology is required to maintain authorized provisioning sources, integration methodology, redundancy maintenance, and other use cases like deleting specific instances of customer data.
- Read more here about Platform Governance and how it differs from Data Governance.
4. Future-Proofing Businesses with Data Strategy
Analyzing the data and digital strategy of the organization at intervals of change to organizational strategy will assist the alignment of benefits.
The below data-related goals have been identified as being related to the organizational goal:
- Availability of reliable and useful data for decision making — internal balanced scorecard dimension
- Adequate use of data and technology solutions — customer balanced scorecard dimension
- Realized benefits from data-enabled investments and services portfolio — financial balanced scorecard dimension
On performing a maturity assessment of the current state, the below problem statements are elicited.
- Data collections, analytics, and decisions in value chains are often time-consuming and expensive.
- Data is a core factor of input into every business process and is supported by applications. Data collection, access, and delivery dependencies must be defined and verified across the organization.
- Read more here on conducting a maturity assessment for your organization to assess the current state of the landscape.
- Read more here about creating a Data Strategy in alignment with digital goals.
Today’s data landscapes in enterprises are increasingly based on core principles of data discovery, right data interpretation, coverage, availability, and interoperability.
5. Governing Data on the Modern Cloud
Governing data involves having to create a control-environment if your change strategy is regulatory-driven. Else, it is about creating an enabling environment that assists the organization in monetizing data for benefits. Moreover, the prime focus of leadership must be understanding the business value of Data Governance in the cloud. Most organizations will prefer a hybrid cloud setup, as it’s a popular choice. With this popular mix of data spread across multiple cloud providers, including Azure, AWS, and GCP, as well as on-premises traditional systems — governing data becomes even more important. Each cloud provider will maintain its catalogs and integrate them with the enterprise catalog using a push model as a relevant choice. Data security in a hybrid cloud is an evolving area with guidance evolving on the best approaches to encrypt and anonymize data.
IDC predicts that by 2021, over 90 percent of enterprises in APAC will rely on a mix of on-premises as well as private and public clouds and legacy platforms to meet their infrastructure needs.