With the rising concern surrounding the manipulation of data and misuse of statistical methods in Data Science, it is becoming imperative that strong Data Governance policies and practices are put in place to curb any degeneration of data and the scientific methods used to arrive at data-driven conclusions.
Data Governance (DG) is expected to play a key role in future Data Science (DS) practices as it offers phased, validity checks at multiple points before, during, and after the data analysis process to prevent data misuse and application of corrupt scientific methods.
This Forbes publication states that Data Science has “become about lending false credibility to decisions that have already been made.” This has an implication that in the pre-Data Science era, the industry leaders made equally good business decisions without the help of data analysis, and suggests that Data Science has provided is a “scientific crutch” to justify those decisions.
The additional implication in the Forbes post is that decision-makers may be basing their decisions on misused statistical and research methodologies. That means the data is consciously manipulated by human brains to suit the specific needs of business decision-makers. This is where Data Governance plays a critical role to ensure, through checks and balances, that the data in Data Science cannot be manipulated. Also of concern are frequent data breaches compromising all subsequent data-related activities.
DG policies and procedures usually relate to the usability, integrity, security, and availability of data used in an enterprise. Thus, DG principles and practices are critical for all business-process functions such as regulatory compliance, legacy upgradation, M&A activities, business intelligence systems, risk management, data lakes, or data warehouses. Data Governance vs Data Quality: Managing Data-Driven Solutions argues the Data Quality overlaps with Data Governance. This post looks at the intersection points between DG and DS. What Every Data Scientist Needs to Know about Data Governance recommends that regardless of individual role within an enterprise, the data expert will be familiar with the basic minimum etiquette of handling the data assets.
First Point of Intersection between Data Science and Data Governance: Big Data Comes to the Rescue
Big Data typically reside on mobile, social, cloud, or IoT devices, which have a natural tendency to lose some integrity during high-speed data transfers. Moreover, the data pipelines are not free of security and privacy threats. Major data-breach incidents have happened during data transfers. Thus, businesses must make Data Security their top priority for all data-driven practices.
The white paper titled The Intersection of Big Data, Data Governance, and MDM, views the intersection of Big Data analytics, Data Governance, and MDM in social governance (governing the data arising from social channels).
In Big Data analytics, DG crosses path with DS during the following initiatives:
- Maintenance of regulatory compliance like EU General Data Protection Regulation (GDPR).
- Quality checks of multi-structured data.
- During Information Governance (IG), which includes creating “policies, processes and controls” to manage enterprise data in an end-to-end value chain.
The Age of Analytics: Competing in a Data-Driven World indicates that in the last five years, the rapid growth of diverse data types, along with the vastly improved predictive capabilities of machine learning (ML), and deep learning (DL) algorithms, has brought Big Data analytics to the forefront of business activities. Currently, no competitive business can think of separating the business goals from the data-analytics goals.
In the next five years, businesses will rely even more on data and analytics to make their critical day-to-day decisions and to plan their future actions; thus the trustworthiness of data and analytics (disruptive technologies) is of highest concern. While advanced algorithms promise to solve a wide range of business problems across industry sectors, the promises will remain largely unrealized unless the data and the analytic practices are of the highest quality.
In the future, more processing power and more data will augment the benefits rendered by neural networks and reinforcement learning. However, the future data troves will have to be governed with solid policies and practices to deliver the expected benefits.
Second Point of Intersection between DG and DS: Data Governance by Itself is an Expanding Market Segment
In the Data Science world, the importance of Data Governance will continue to grow, as evidenced by marketplace news. Your Choices for Data Governance Are Growing indicates that diverse Data Governance platforms and solutions are increasingly flooding the markets. These platforms include sophisticated solutions for policy enforcement, policy monitoring, Data-Governance stewardship, and data discovery technologies. The ultimate goal of the DG solutions will be to maintain data at the highest level of quality, while managing master data or the entire information lifecycle.
With the rising data-breach scandals, such as those related to FB or Cambridge Analytica, data-ownership, accountability of usage, and data-protection are assuming high importance in the business corridors.
Breachlevelindex has reported a daily loss of “5 million data records,” which amounts to the loss of “60 records per second.” This alarming statistic indicates that clean and honest behavioral practices have to be implemented in data-driven activities. This forms the core of data ethics. Data Ethics:The New Data Governance Challenge explains this new concept in DG. The article offers full-length discussion on important aspects of data-handling such as ownership, responsibility, security, privacy, confidentiality, informed consent, and more.
Third Point of Intersection between DG and DS: Building Ethical Models
Why Data Governance Leads to Data-Driven Success describes how DG has enabled value to managing corporate data assets in the data-powered analytics era. In an environment where Data Quality and integration drive the success of data-driven insights, the advanced data models and algorithms are only as good as the “data they are applied on.” Without excellent data, even the best of models and algorithms will fail to deliver results.
Governance in Data Science indicates that the governance data scientist role will be integral to ensuring that data for predictive models are properly validated.
Fourth Point of Intersection between DG and DS: Data Lakes
As one of the5 Predictions for 2019: Business Value From Data, Forrester Research has identified the adoption of “data fabric technology” with data lakes. Businesses that are serious about extracting the most value from their technology investments are using data lakes. In 2019, as observed by Forrester, the addition of data fabric technology will ensure automated Data Governance and deployment of policies related to scaled data. With these technologies in place, the data service-providers will be able to grant access to a wide range of data sources to their customers.
Fifth Point of Intersection between DS and DG: The Chief Data Officer
In a data democratic era, the Chief Data Officer (CDO) is the new leader to drive the efforts and initiatives of citizen data scientists in an enterprise. With the growth and popularity of self-service data analytics platforms across organizations, the democratic power of data-driven activities is gradually surfacing.
In the near future, the absence of highly qualified data scientists will not remain as much of a challenge as it was initially made out to be. Now, thanks to advanced technologies like ML and DL, automated and semi-automated data-analytics platforms will empower ordinary business users to get their daily jobs done without the help and support of an IT team.
The CDO
will drive this process from the front and ensure all the democratic rights of citizen
data scientists are preserved. In this era, Data Governance is really about
governing the people, and guiding them in appropriate data-handling behavior.
Final Note
The Future of Data Governance: Balancing Data Governance and Data Management takes a look far ahead when citizen business users will be empowered to make important decisions, backed by superiorly governed technology. That day is just around the corner.
Image used under license from Shutterstock.com