Advertisement

The Convergence of the Data Mesh and Data Fabric: Data Architecture’s New Era

By on
Read more about author Sean Martin.

The most meaningful development in contemporary data architecture isn’t the growing interest in the concepts of the data mesh and the data fabric. Instead, it’s the potential for convergence of these two architectural approaches into a single architecture that supports both decentralization and centralization of integrated data, local data ownership, universal accessibility, and top-down and bottom-up implementation methods. 

In reality, data meshes and data fabrics are more similar than they are different. Rather than opposing one another, I would argue that they’re complementary constructions for making data available across (and between) organizations. When properly implemented with knowledge graph technologies, they become a powerful approach for devising reusable, integrated data products that can span both business domains and the enterprise as a whole.

Combining Top-Down and Bottom-Up Methodologies

What are the core principles of a data mesh and data fabric? The data mesh concept is simply a bottom-up philosophy for assigning responsibility for data to specific business units or business domain expertise groups while de-emphasizing centralized infrastructure like data warehouses. A data fabric is a top-down, user self-service-driven methodology for integrating data from many parts of an organization. Typically it assigns responsibility for contributing datasets closer to where the data is produced; it is also purported to utilize artificial intelligence (AI) using metadata to automate the discovery and integration of data to achieve a centralized version of the truth, an approach that is becoming increasingly viable with the rise of data description and integration solutions based on generative AI (genAI).

However, in practice, what both data architectures provide is needed. At a higher level, a data fabric can integrate the data products of a data mesh that exist locally at the lower level across the organization. When those data assets are well-described via semantic technologies, organizations can unify these architectures by increasing data set reusability while reducing costs, time to value, and extract, transform, load (ETL) and extract, load, transform (ELT) processes – while also increasing their ability to exploit data relationships in richer representations. 

Complementary Architectures

When it comes time to implement a data fabric approach, it’s almost impossible to do so without using some ideas and techniques borrowed from the data mesh philosophy. A data mesh localizes data management duties to business groups instead of combining them across domains in centralized options like data lakes and data warehouses.

Data fabrics can do the same thing; implementing one doesn’t involve centralizing everything into a single data warehouse, for example. It requires the opposite: establishing domain experts, sourcing data, implementing Service Level Agreements (SLAs) for it, then formalizing metadata so datasets are clean, reliable, and reusable. Data mesh supporters call these curated datasets “data products.” The output of a data fabric is a data product too, albeit it one situated a higher level and integrating data sourced from across an organization (instead of across a business unit).

For example, let’s say a company wants to make an SAP system a source for its data fabric. The data owners for that source will make that data reusable so it’s available to the rest of the organization but expose the data where it makes the most sense while retaining control over those assets. Data mesh adherents could (and usually do) advocate the same things for their sources.

The Key Role of the Knowledge Graph

The semantic open data standards underpinning knowledge graphs are ideal for data mesh and data fabric architectures – and their synthesis. Semantic technologies excel at providing uniform, standards-based descriptions of data assets or products in business-friendly terminology designed for understanding and seamless collaboration between users, systems, and applications.

The crux of semantic technology is focused on sharing models and associated well-described data. Experts can implement these technology standards on integrated data so they can be reused by anyone requiring that data product – regardless of whether it’s for a data mesh or data fabric. In addition, the standards readily support combining data products to make additional products for emerging use cases, like connecting data from different domains for a data fabric. Doing so can be as simple as combining knowledge graphs from individual domains by establishing a single conceptual linkage (graph edge) between them through a shared entity or relationship so that the combined data can be queried.

Simultaneously, semantic knowledge graph technology is ideal for implementing data fabrics. Data fabrics entail integrating data from a plethora of sources, schema, data types (including both structured and complex unstructured information), and beyond. Subsequently, the resulting models become more intricate and more detailed; this demands technologies to accommodate complex relationships and descriptions for connecting this data. Semantic knowledge graphs fulfill this obligation at the higher level of abstraction necessary for weaving a data fabric.

Two-Tiered Architecture

An easy way to conceptualize the data fabric and data mesh architectures is as two tiers of a common architecture. For the first tier, a data mesh is the bottom-up approach closest to the data sources and an understanding of the data in the context of the business. This tier provisions the data, which is described with rich metadata according to semantic standards to produce reusable data products from business domain groups. The objective is to make these localized descriptions meaningful and accessible throughout the enterprise. Semantic technologies accomplish this goal with standards for RDF, OWL, and taxonomies, so datasets are readily understood by all.

The data fabric is the top-down approach above the data mesh. It integrates any data product across domains, locations, and datasets. This construction is great for devising new data products by combining them across domains. As such, data fabrics encompass all business domains while retaining the meaning of the specific business ownership of those data assets. As such, organizations benefit from the best elements of each architecture when combined.

The Role of AI

The ability of AI to automate the necessary data integration implicit to the data fabrics – and its unification with a data mesh — has perhaps been exaggerated in the past. However, the advent of genAI is changing this. Examples include automated classification and ontological description of data, data mapping, and data cleansing. One unexpected benefit of using OWL ontology metadata to describe data using natural language is that data assigned meaning using this standard is both intelligible and actionable by genAI solutions. This unanticipated synergy arises because large language models (LLMs) have been trained on gargantuan amounts of natural language in the form of text.

Today when it comes to data integration, AI use is still somewhat limited. Data fabric supporters claim this approach includes automation of data integration via metadata, which is typically a significant part of prudent data integrations. However, contemporary integration processes revolve around the actual data itself as much as they do metadata. More traditional AI and machine learning does have some utility in integrating data for data fabrics and practical uses do exist. For instance, AI can automate the creation of knowledge graphs that describe data in the effort to unify data mesh and data fabric architectures. Moreover, there are numerous techniques for identifying connections in datasets and making intelligent suggestions about them to accelerate the population of a domain-specific knowledge graph. Examples include approaches like semantic inferencing, in which self-describing statements about data are combined to devise new ones. 

There are also approaches typified by symbolic reasoning and OWL-based reasoning. Germane unsupervised learning techniques include varying means of dimensionality reduction and clustering. Supervised learning applications include link predictions, which can be spurred by graph neural networks. There is an abundance of techniques for entity resolution to determine if an entity in one dataset is the same as or related to another entity in another dataset. Increasingly, these techniques rely on AI and machine learning. But the scale, complexities, and various distinctions between data in integration processes still require human effort alongside automation. We can expect this balance to shift towards full automation over the next 1-3 years as genAI finds its way into data integration solutions.

The Benefits of Combining Data Meshes and Data Fabrics

The convergence of a data mesh and data fabric into a two-tiered, knowledge graph-powered architecture yields significant advantages. It minimizes the amount of ETL and ELT processing required for transforming data. Well-described semantically tagged data is also inherently reusable. Semantic technologies make data self-describing in natural language business terminology, so once domain experts introduce those descriptions as a model, they can be reused without limit within and across domains.

Lowered costs are another significant benefit. Because semantic data is reusable, organizations can spend less on cleansing raw data and wrangling it. The current costs of mapping, cleansing, and normalizing raw data are considerable; with semantics, this process needs to be done just once. These savings add up quickly. 

There are also temporal boons for this approach’s reduced time to value; after all, less time preparing data means faster access to analytics, insights, and business action. It also means there can be more data integrations supporting additional business cases without increasing data engineering resources. There’s also a heightened capacity to ascertain, manage, and interconnect relationships among disparate datasets. This benefit ensures better understanding of data’s importance for data discovery and data exploration, which enhances analytics and the value organizations can gain from it.

A Symbiotic Relationship and a New Era for Data Architecture

The concepts of data mesh and data fabric work well together to fulfill similar goals. They localize responsibility for data to business units without conventional centralization methods, creating curated, reusable data products across an enterprise. A data mesh incorporates a bottom-up approach to this task, while a data fabric’s approach is top-down. 

Uniting these approaches into a single architecture can usher in a new era of data architecture – particularly when their implementations are streamlined and their efficacy enhanced by the rich, self-describing nature of knowledge graphs.