The shift in the business perception of data has now catapulted Data Management into new heights. Data Science is a core component of Data Management now, but Data Management and Data Science are often seen as two different activities. Working among data analysts, data engineers, and DBAs, data scientists spend their time getting the data infrastructure right for data analysis and competitive intelligence. But, in the growing next-generation data market, Data Management and analytics will be the core differentiators for market success, and so both Data Management and Data Science must work together.
A Forbes post refers to an Everest Group study that states the global Data Management and analytics market will reach $135 billion by 2025. Over the years, vendors in this market have moved from a function-to-process to platform orientation. In platform orientation, data is no longer viewed as a byproduct of business processes, but rather the nerve-center of the business.
Today, enterprise data is viewed as a “strategic asset” instead of just a business commodity. Most major businesses around the globe and across sectors have embraced the “data first” corporate strategy. This sudden recognition or heightened perception of data as an asset has mainly stemmed from rapid advancements in data technologies, tools, and practices. Moreover, when the Covid brought about the digital push to organizations, data and data technologies received “overnight stardom.” One can hardly think of a business entity without data-driven practices in the present time. Recently, with the efforts to promote Data Literacy throughout the enterprise, business leaders have taken a decisive stance towards educating employees, executives, customers, and all other stakeholders about the power of data.
This trend is not about to slow down in the future of global business.
According to Data Management Association (DAMA), Data Management encompasses every activity pertaining to the security, control, delivery, and value enhancement of data. This holistic discipline, thus, includes all strategies, policies, procedures, technologies, tools, and practices related to data. Quite logically then, Data Management will preclude such subject areas as databases, data integration, data quality, data governance, and data security. Although these subfields come under Data Management, each one of these areas is an independent subject area of study.
Data Management vs. Data Science: The Fundamental Difference
The Data Management function of an organization is in overall control of the enterprise data acquisition, storage, quality, governance, and integrity — thus overseeing the development and implementation of all data-related policies within that organization. However, the Data Management team only manages the data assets; it does not usually get involved in the core technical applications of the data. The Data Management function owns all the data. In the webinar Data Management vs Data Strategy, Peter Aiken, talked about “prioritizing organizational Data Management needs versus Data Strategy needs.”
On the other hand, the Data Science function in an organization conceives, develops, implements, and practices all “technical application” of the data assets. In this sense, the “technical applications” imply the science, technology, craft, and business practices involving the enterprise data.
The Data Science team never owns any data; they simply collect, store, process, analyze the data — then report data-driven outcomes to the rest of the organization for business gains. The data scientist is considered an expert on Data Science and associated technologies, who relies on highly specialized knowledge (knowledge of statistics, computer science, AI and so on) for advising the enterprise on data-driven practices.
In actual practice, the Data Science function is under the Data Management function in the organization. The Data Science team brings a set of core technical skills to the organization to implement best practices, as set up by Data Management policies, procedures, and guidelines.
Data Management Practices vs. Data Science Practices
With data rising exponentially in volume and complexity, Data Management has become one of the most important aspects of business functioning. Data Management practices involve setting up of data-related policies, procedures, roles, responsibilities, and stringent access-control mechanisms.
A well-structured Data Management strategy, which focuses on Data Governance for maximizing business value, is now a central theme of discussion among business leaders and operators. The Data Management team in an enterprise conceives and develops all the policies.
The data professionals in the different parts of an organization are responsible for implementing and following all policies and guidelines in their daily data-related work. Data Governance has been identified as a core component of Data Management, as explained in Data Management vs. Data Governance: Improving Organizational Data Strategy.
In the Data Science world, the strategic policies, procedures, and guidelines play a major role in the implementation of the data technology projects, although none of the management roles are directly present at this stage. In other words, the organizational data strategists conclude their work by shaping the policies, procedures, and guidelines for managing data; then it is the data scientists’ or other data professionals’ duty to adhere to the policies and guidelines to ensure that the organizational-data-strategy blueprint is intact.
Data Management strategists will also think about possible violations and penalties in order to oversee the implementation of the enterprise Data Strategy through the use of controls.
Data Scientist vs. the Data Manager: Comparison of Roles
The data manager, often vested with the charge of data-centric activities in an organization, is generally not required to demonstrate technical skills. The data manager has a team of highly technical staff with direct accountability for the quality, governance, and day-to-day management of enterprise data.
On the other hand, the data scientist is a technically qualified individual, whose primary responsibility is to analyze data and extract competitive intelligence or insights from the data. The data scientists often possess a collection of technical skills in statistics, mathematics, computer science, OR and so on.
The data scientist may usually work under the data manager, coordinating all the analytics-specific processes with full compliance (regulatory) requirements.
What the Data Scientist Should Know About Data Management
Towards Data Science states that several recent technology movements have required data scientists to rethink Data Management practices for advanced analytics. These technology movements are:
- Reduced cost and rising capacity of data storage
- Rise of IoT devices with streaming data
- The reinvention of data lakes to store and analyze multi-type data
- Big data analytics
- Use of machine learning models
With the above taking center-stage in modern businesses, the data scientist now faces the challenge of building the right governance-enabled data infrastructure to conduct advanced analytics and extract value-added insights.
Augmented Data Management: Relieving the Data Scientist
When the personal computer emerged in the mid-1980s, everyone thought that it was just a matter of time when these dumb wizards would take over human labor. Fortunately, to date, humans and personal computers are working in harmony, and have actually enhanced mutual worth! Now with the emergence of AI and associated technologies, humanity is once again concerned about machines replacing human labor. Contrary to popular beliefs, advancements in machines have traditionally made humans more superior, more efficient, and more productive beings. This so true in the fields of data management and data science – the presence of AI and associate technologies will only “augment” human expertise – not replace it.
In a typical augmented Data Management system, five core Data Science activities, namely data integration, Data Quality, Master Data Management (MDM), Metadata Management, and Database Management Systems (DBMS), are fully or partially automated through tools.
The data scientist is relieved of the “drudgery of data preparation” through the use of advanced AI, Ml, or analytics tools. Typically, about 80 percent of a data scientist’s time is spent on preparing data for analytics; these tools remove that time-consuming engagement — leaving ample time for complex analytics work, which may include model development or data interpretation.
According to the author (Brandon Cosley – a creative data scientist, sci-fi fan, and adventurer), AI is described as:
“AI is not only for engineers. If you want your organization to become better at using AI, this is the course to tell everyone — especially your non-technical colleagues — to take.”
Hope you got the essence of the above quote. AI technologies are for everyone, not just for the technology nerds because they make humans “better and more productive in their jobs by closing the skill gap.” The last and the most significant recommendation from the author involves “a change in mindset, skillset, and dataset,” beautifully interpreted through a graphic. Do not forget to review the graphic in the linked post.
The Role of Data Regulations in Data Management and Data Science
The emergence of data regulations such as General Data Privacy Regulations (GDPR) and CCPA has added a new dimension to existing Data Management practices overlapping Data Science. The new regulations offer better governance mechanisms, especially in the areas of data privacy, data security, and ethics, but complicates the AI-powered Data Science platform. Now, the data managers have to not only think of implementing strict controls for data privacy, security, and ethics, but they also have to worry about the impact of advanced technologies (AI, ML) on Data Governance.
In the new world of regulation-centric Data Governance, Data Management, and Data Science practices, these will remain parallel activities, but will intersect at several instances.
The net result of such collision? Vendors and service providers will merge, acquire, and integrate.
From a strictly technical standpoint, Gartner has laid down the following observable shifts in enterprise Data Management and Data Science practices:
- Learning by doing
- Business information architecture
- Thinking of a data hub for enhanced Data Governance
- To centralize or de-centralize and the new CDO role, whether it’s Chief Data or Chief Digital
How Do Data Management and Data Science Align?
In an ideal business scenario, Data Management and Data Science practices align to get the best results. So, how can the two practices align?
- Through mutual agreements on preserving Data Governance guidelines
- Through better understanding of how and where Data Management and Data Science overlap
- Through having a well-structured Data Science framework in place, so that junior data scientists can get the job done
According to a discussion on Quora, Data Management focuses on well-governed data collection and data access. Data Science focuses on deriving strategic business decisions from data analysis. The absence of Data Management indicates the risk of “Data Science delivering bad analytics due to poor quality or inaccessible data.”
Data Management & Data Science Trends in 2022
In this article, the author Mark Van de Wiel highlights five trends that are about to dominate Data Management in 2022.
- Rising adoption of AI/ML platforms
- Accelerated use of Cloud SaaS platforms
- In the public cloud space, Google Cloud , AWS, and Azure, and will steal the show
- CDC will be the preferred mode of data synchronization activities
- Data Fabric will enhance Data Management efficiencies while reducing costs.
It’s needless to say, that along with the above game-changing trends, the following trends will also sweep both Data Management and Data Science fields in 2022:
- Shift from on-premise DM and Data Science to managed services on the cloud
- Growth of cloud-friendly data technologies such as containerization, data fabric, and so on
- Data Science for all (democratization of data and all data activities)
- Automated Data Management and automated Data Science is the forms of Augmented DM, Augmented DS, embedded AI, and Self-Service Analytics.
Image used under license from Shutterstock.com