In today’s complex business environment, data lakes and data warehouses may not be sufficient to meet organizational requirements. From the perspective of agility, both data lakes and data warehouses have limitations when it comes to maintaining and managing various types of data. Enter data mesh.
The idea of a data mesh was born when Zhamak Dehghani, the notable ThoughtWorks consultant, discussed the limitations of “centralized and monolithic” data platforms in her paper in 2019. She created data mesh to overcome the limitations of data lakes or data warehouses.
Data mesh has been described as a distributed data architecture with global interoperability standards. The primary goals of a data mesh are to provision decentralized, domain-oriented, self-serve data infrastructure for extracting value from analytic and historical data at scale.
What Are the Core Principles of a Data Mesh Architecture?
Data mesh empowers data teams to adopt a “domain-agnostic” approach to data use through global standardization of data rules and regulations, thanks to the data mesh self-serve infrastructure-as-a-platform. The logical architecture of a self-serve platform is organized in three planes: data infrastructure, product development expertise, and data-mesh monitoring.
Unlike the traditional monolithic data infrastructure, which handles the ETL at one central location, a data mesh supports distributed, domain-specific data consumers and a view of “data as a product,” where each domain manages its own data pipeline.
Influenced by data fabric, data marts, microservices, and domain-driven architectures, the four core principles of data mesh can be summarized as:
- Domain-oriented data ownership: The operational and analytical data ownership is shifted to the domain teams with domain-specific knowledge from the central data team.
- Data as a product: The domain team is responsible for productizing data and satisfying all the data needs of other domains.
- Self-service data platform: A dedicated data platform team ensures that interoperable data products are maintained for all domains to consume.
- Federated governance: Executed by the governance guild, this governance model ensures that all data products are interoperable through global standardization of applicable rules and regulations.
What Are the Components of a Data Mesh Architecture?
As data mesh is primarily designed as a distributed system architecture with interconnected data hubs, the architecture has four core components: the hub nodes, spokes, links, and routing protocols.
- The hub nodes manage the routing paths for the spokes. Hub nodes very often control the execution of quality or security policies. Hub nodes can be implemented either as a hardware or a software component.
- Spokes connect the hubs with other devices, and they route and manage the network traffic. Spokes may be implemented either as a hardware or a software component.
- The links constitute the “physical” or “logical” connections between the spokes. Usually, the links are made either of copper wires, fiber-optic cables, or as a piece of software.
- Routing protocols control the rules of data exchange between the hubs and the spokes. The routing protocols ensure that network traffic flows smoothly between the network devices.
What Advantages Does a Data Mesh Architecture Provide?
Data mesh architecture provides three main advantages: simplicity, scale, and robust remote connections. All three benefits are critical to IT organizations of any size seeking to deliver quality services to customers.
According to Thoughtworks, a data mesh is intended to overcome the limitations of the traditional centralized data lake or data warehouse architectures. Data mesh achieves this ambitious goal by leaning on today’s distributed architectures and self-service data infrastructure.
Due to the rising popularity of the data mesh, the enterprise and IT are getting closer, whether it is building integrated domain teams, or having teams in the engineering department providing data from a domain-as-a-service for the enterprise, such as supporting C-level executives or management.
What Roles Do Domains Play in a Data Mesh?
In a data mesh, the domains in a customer’s journey publish their data as a data product for others to access. A data domain can offer one or more data products and can include supporting data used to build the data products, which are not accessible in a mesh. These domains must be responsible for the storage, management, and maintenance of data. The domain ownership principle says each team or unit that has a domain, such as content distribution, must also own the data created in it.
From the architectural standpoint, the data mesh supports autonomy of the domain teams for the deployment of operational or analytics data. While the data warehouse or the data lake teams have “centralized data ownership,” the data mesh empowers individual domain teams with data-ownership privileges. This way, the decentralized data ownership and architecture reduces resource overloads, by spreading the responsibilities over several domains and their associated data.
To be feasible as a domain-oriented architecture, the data mesh requires a governance model that promotes decentralization, domain orientation, and interoperability. With data mesh, the new customer-facing domain teams are focused on meeting data needs for one particular business domain, which allows them to develop deeper domain knowledge and continually develop better analytics results.
How Does the Federated Governance Model Support Data Mesh?
The concept of federated governance in the data mesh ensures that teams can always leverage the data available to them from other domains. Organizations implementing data mesh should clearly identify which domain teams own what datasets, and all teams take collaborative responsibility to make sure that the data they have on the mesh is of high quality at all times.
The federated governance model in data mesh is designed to support the distributed systems architecture. Simply put, it is a set of independent data products, with independent life cycles, built and deployed by independent teams. Data products are nodes in a data mesh, which wraps the three structural components required to perform their functions, providing access to the domain’s analytics data as products. The “data functional unit” is in this case not just a stage in a pipeline, but rather the whole domain, which gathers, handles, processes, and serves that data. This principle ensures a network of interconnected data hubs across the domain.
How Is Data Mesh Different from Data Fabric?
According to James Serra, both data fabric and data mesh offer architectures to access data over a variety of technologies and platforms, but while the data fabric is focused on technologies, data mesh is focused on changing data sources and changing Data Management environments.
In the case of a data fabric, data is controlled by a unified data governance point, as the data access is centralized. On the other hand, in the case of a data mesh, the distributed infrastructure helps speed up processes, while ensuring richer datasets because it is maintained closer to domain experts. Distributed architectures lower the amount of computing and intervention that delays the data.
Conclusion
As the networking footprints continue to grow on enterprise data ecosystems, there is an urgent need to upgrade existing Data Management architectures into more scalable and resilient environments, which also promise simplicity. The data mesh architecture appears to be a perfect answer, with all the required benefits – challenging and winning over the rigidity of a traditional data ecosystem.
Image used under license from Shutterstock.com