I was privileged to deliver a workshop at Enterprise Data World (EDW) 2024. Publishing this review is a way to express my gratitude to the fantastic team at DATAVERSITY and Tony Shaw personally for organizing this prestigious live event. Part 1 of this article considered the key takeaways in data governance, discussed at Enterprise Data World 2024. Part 2 of this article discussed the trending topics in data architecture and modeling. Part 3 will discuss the key trends in applying artificial intelligence to data management practices and developments in other areas of data management like data quality, master and metadata management, and data visualization.
Artificial Intelligence in Data Management
Artificial intelligence (AI), generative AI, and machine learning have become buzzwords in the data management community. First, let’s look at their definitions. Many approaches exist to define these disciplines and their relationships. In this article, I will use the definitions from reliable sources. However, they may not be the “only correct ones.”
Gartner stipulates, “Artificial intelligence (AI) applies advanced analysis and logic-based techniques, including machine learning, to interpret events, support and automate decisions, and take actions.”
“Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio, and synthetic data,” as TechTarget defines.
IBM defines machine learning (ML) as “a branch of artificial intelligence (AI) and computer science that focuses on the using data and algorithms to enable AI to imitate the way that humans learn, gradually improving its accuracy.”
These three technologies are interrelated. First, it is important to realize that generative AI (GenAI) and ML are types of artificial intelligence.
AI includes all forms of computational techniques that mimic human abilities.
Machine learning is a technique through which systems learn and make data-based decisions. ML is based on statistical methods and is used in other AI types, including GenAI.
GenAI is a specialized application of ML in which the algorithms do not just make decisions or classifications but produce new data instances that mimic the characteristics of the training data they have learned from. These models capture the essential patterns of the data and utilize this knowledge to generate synthetic data samples similar to the original dataset.
According to Roochir Purani of IBM, GenAI’s key use cases and deliverables are “written content augmentation and creation, questions answering and discovering, language tone, summarization, simplification, classification of content for specific use cases, chatbot performance improvement, software coding, and synthetic data.”
Somehow, we have created a fetish for GenAI, hoping it will solve our challenges in managing data properly. Let me demonstrate a simple example: I want to visualize the relationship between AI, GenAI, and ML described above.
I checked various graphical options on Google and then asked ChatGPT to visualize this simple relation. Figure 1 compares the results of human and GenAI work.
I am curious: Which of the images appeals to you? I think you get my point about fetishizing GenAI capabilities.
So, let us be realistic about the role of AI and its sub-areas in data management. AI is a technology that can be used in different data management IT tools. AI is not a unique feature that can be used independently from other functionalities. AI empowers different data management capabilities and processes.
So, let me share some takeaways:
Data management capabilities can be enforced by using AI technologies.
Several experts at EDW shared their expertise in this area. According to them, AI can be used to empower the following DM capabilities:
- Data governance: AI can automate stewardship activities and assist in content analysis and summarization.
- Metadata management: AI can assist in documenting and aligning technical and business metadata, identifying dependencies between different metadata objects, and generating metadata.
- Data modeling and architecture: AI can help create business glossaries, document and link logical and technical data models, build data classifications, and integrate and aggregate data from various sources.
- Data quality: AI can be used in data profiling, translating data quality requirements written in business language into data quality checks in technical languages, recommending DQ rules based on data source systems scans, and cleansing data.
- Data lifecycle management: AI and ML can optimize data processing and storage of significant data volumes using distributed computing, data compression, and predictive caching techniques. These technologies can be used for real-time processing to ingest, analyze, and act upon data arrival.
- Other use cases: Purani predicts the following emerging GenAI use cases: “producing synthetic data helping augment scarce and incomplete data,” “enterprise applications proactively suggesting actions based on historical transaction data,” “encapsulating and modernization of legacy systems code,” “assistance role in managing complex projects,” “improving process workflows,” and “negotiating contracts and optimizing bids.”
AI requires governance.
According to Stephanie Paradis and Gretchen Burnham of First San Francisco Partners, data governance should focus on defining business cases, curating data sets taken for training, controlling syntactic data sets for production, and controlling models’ governance.
Douglas R. Briggs of Daugherty Business Solutions stressed that good governance for AI should “balance support for innovation with risk and impact,” take into account the concerns of a “broad spectrum of interested parties,” “provide clear and effective guidance for practitioners,” integrate with “existing organizational governance,” and “remain flexible and agile to adapt.”
Challenges in leveraging AI models exist.
Purani mentioned the following challenges related to AI adoption: “AI projects are not always aligned with business strategy,” “failing to consider what tangible capabilities AI projects need,” “misunderstanding the probabilistic nature of AI,” and “articulating what business value they want AI to create, but not recalibrating organizational behavior in ways that will deliver value with the humans who interact with AI.”
Adopting AI impacts multiple business capabilities/operations.
These business operations include a business strategy, operating model, enterprise architecture, engineering and operations, and change management. It also impacts people, processes, and technology.
Sonny Rivera of TIFIN AG believes, “We need to stop micro-optimizing archaic processes and start reimagining them in a GenAI world – across the whole data-to-insight value chain.”
Other Data Management Capabilities
Even when discussing fancy staff about AI, we must still come down to earth and focus on applying and improving foundational data management (DM) capabilities.
Let me share with you some takeaways from the conference that relate to core DM areas of expertise.
Data Quality (DQ)
- According to C. Lwanga Yonke of Padouk Consulting, LLC, the key challenges to improving data or information quality include inefficient collaboration between data producers and consumers, decentralized data administration, and DQ being considered IT tasks. As Teri Hinds of First San Francisco Partners stated, some other reasons can be that DQ has different meanings to various people.
- Multiple frameworks and methodologies exist to establish DQ management. It was interesting to see that three presenters described DQ using different concepts (e.g., activities or capabilities) and had pretty different viewpoints on the content of this capability.
- Establishing a data quality business function is one way to improve data quality. This means investments in people, processes, and technology developments.
- Establishing data quality management is a must for financial institutions due to the need to comply with regulations. Risk management becomes an important component of overall data management, as demonstrated by Gerard Koster of Dentons.
- Managing data quality in the cloud environments and an agile-oriented culture has its specifics.
Master Data Management
The DAMA Dictionary defines master data as “the data that provides the context for business activity data in the form of common and abstract concepts related to this activity. It includes the details (definitions and identifiers) of internal and external objects involved in business transactions, such as customers, products, employees, vendors, and controlled domains (code values.)”
DAMA-DMBoK2 separates master and reference data. However, in practice, I’ve seen situations where professionals combine these two data types because distinguishing them is challenging.
In my practice, I have also experienced some other challenges. Sometimes, the same data (e.g., contract) can be identified as master or transactional, depending on an organization’s business model. I also can’t understand the difference between master data and other data management. The data management techniques and required capabilities are the same as applied to any data type. Of course, the outcomes, like data architecture, may differ.
Let’s come back to one of the key conference takeaways: According to Donna Burbank of Global Data Strategy, a successful MDM initiative requires alignment between data architecture, data governance and stewardship, and business processes. MDM must be considered in the context of a more comprehensive data management strategy.
I believe all other data management capabilities, such as data quality, metadata management, and other types of enterprise architecture, also enable MDM.
Metadata Management
My general observation is that the topic of metadata was largely overlooked at the conference. This may happen because many people do not realize the role and importance of metadata. One reason is the complexity of the metadata concept. For example, unlike other data, metadata can be presented by a single element (e.g., a data owner) or a complex construct (e.g., data lineage).
In my workshop, I demonstrated that metadata management has two key goals: enable the data lifecycle and manage the metadata lifecycle. Most data management capabilities, like data governance, enterprise architecture, data quality, etc., produce, exchange, and consume three key types of metadata: business, technical, and operational. Various metadata constructs combine different types of metadata. For example, data lineage combines business and technical metadata, sometimes enriched by operational one. The data observability concept combines all three metadata types.
Knowledge graphs were discussed in several presentations – a metadata-related topic.
According to Gartner, “Knowledge graphs are machine-readable representations of the physical and digital worlds. They include entities (people, companies, digital assets) and their relationships, which adhere to a graph data model – a network of nodes (vertices) and links (edges/arcs).”
Let me share a couple of takeaways on this topic:
- The EDM Council, presented by Elisa Kendall, “promotes adopting data content standards to promote innovation across industries.” It does this by developing and standardizing industrial ontologies.
- Knowledge graphs enable data integration via semantic layers.
According to Dan Collier and Jeremy Debattista, implementing knowledge graphs requires a mindset adjustment to embrace data “as a valuable asset, one that could fuel growth and success.” This includes training staff, preparing data architecture, integrating and interlinking data assets, and improving data quality.
Knowledge graphs enhance the existing business processes, allow for the representation of diverse data sources, relationships, and metadata, help map models of business domains, create a foundation for data governance, and ensure data processing transparency by documenting data lineage.
Data Visualization
Data visualization represents information in graphical format using charts, graphs, maps, and other visual tools.
Let’s discuss a couple of takeaways.
- Michael Scofield of Loma Linda University demonstrated different techniques for human beings to use for data visualization. He stated that graphics have several goals: “to figure out what is going on” and “to explain to decision-makers what they need to know about reality.” Data visualization helps “see things that other people cannot” and provides “unique insights and exclusive understanding of what’s happening.” Knowing the audience and normalizing data to acknowledge context are the key success factors in expressing information.
- Kulev Rail, Dr. Deepak Singh, and Dr. Arunkumar Ranganathan discussed technological developments in data visualizations. According to them, data visualization is a dynamic, human-centered, analytical, and discovery process that requires multiple methods. Three factors ensure good data visualization: data, design, and function. Advanced data visualization is focused on complex data and includes interactive dashboards, 3D visualization, augmented or virtual reality, etc. The benefits of advanced data visualization are improved operational efficiency, enhanced risk management, increased collaboration, reduced costs, and improved decision-making.
Data Analytics
According to Amazon, “Data analytics converts raw data into actionable insights. It includes a range of tools, technologies, and processes used to find trends and solve problems using data. Data analytics can shape business processes, improve decision-making, and foster business growth.”
Gartner recognizes four maturity levels of data analytics: descriptive, diagnostic, predictive, and prescriptive. While many companies are still on the first and maybe second maturity levels, the ultimate goal is to reach the upper levels. However, the predictive and prescriptive analytics are linked to the AI capabilities we discussed before.
Prashanth Southekal shared his insights regarding predictive analytics in business. The key takeaways were:
- There are several insight sources: intuition, science, data, and analytics. Four key components in data analytics are algorithms, data, assumptions, and ethics.
- Predictive analytics makes predictions about the future with historical data.
- Five key criteria for selecting data analytics projects are improving business performance, practicality, relevance, the applicability of data analytics concepts, implementation, change management, and quantifiable business impact.
Part 3 is the last part of this article. In conclusion, I strongly advocate for the invaluable experience of engaging in face-to-face interactions with leading data management experts at live conferences. These interactions are not only an excellent opportunity to acquire in-depth knowledge but also to stay abreast of the latest industry trends and innovations. Additionally, they offer a unique platform to gather fresh ideas that can be effectively implemented within your organization, driving growth and fostering innovation.