Data Architecture describes the infrastructure that connects a Business Strategy and Data Strategy with technical execution. Ideally, Data Architecture happens within a systematic framework, providing a foundation for people and systems to work with data.
Three types of components underlie the architecture infrastructure and connect to drive insights, make data-driven decisions, and manage risk. They include:
- Outcomes: Models, definitions, and data flows, depicted at various levels, usually referred to as “architecture artifacts”
- Activities: The forms, deploys, and fulfills of the architecture intentions
- Behaviors: Collaborations, mindsets, and skills impacting business teams, roles, and enterprise architecture
Specific examples of each Data Architecture element type are in the “Example of Data Architecture Components” section below.
With the demand for insights from streaming data, many Data Architecture foundations face the need for modernization, primarily to support sales, purchases, and business intelligence (BI). As new technologies and data formats become available and data speed and ingestion grow, Data Architecture will continue to evolve and change in organizations.
Data Architecture Defined
Other definitions of Data Architecture highlight its structure around data flow, noting that it is a set of rules, policies, and models that determine what kind of data gets collected and how it gets used, processed, and stored within a database system.
The Harvard Business Review highlights that Data Architecture covers how data gets from production to consumption and all data activities, including those in between, such as transformation or storage.
Data requirements and standardization also play a central role when identifying whether something is a Data Architecture. DAMA DMBoK states that the specifications used to describe an existing state, define data requirements, guide data integration, and control data assets as put forth in a Data Strategy form a Data Architecture.
Expressing Data Architecture relies on a common vocabulary describing integrated requirements, ensuring data assets are stored, arranged, managed, and used to support strategy. This shared vocabulary ensures Data Architecture produces accessible data for systems and teams across an organization.
Examples of Data Architecture Components
This section uses examples to expand on the three different Data Architecture components mentioned at the beginning of this article.
Outcomes: As mentioned above, data architecture outcomes consist of models, definitions, and data flows depicted at various levels, usually called architecture artifacts. For example:
- The enterprise data model (EDM): The EDM connects other modeling components to show a holistic and consistent data view throughout the organization. Through the EDM conceptual models showing business requirements map to physical models, the technical blueprints show where systems need to integrate.
- Automated data flows: Automated data flows depict how data moves in the organization and play a critical role in improving DataOps by creating a predictable delivery.
- The business glossary: A business glossary defines the components of conceptual data models and provides meaningful definitions connected to the business requirements. These shared meanings underlie conversations between IT and business, leading to the development of an agreed-upon data infrastructure.
Activities: Data Architecture activities form, deploy, and fulfill architecture intentions. Here is a list of some more modern activities used as examples:
- Migrating to the cloud: By migrating to the cloud, organizations can outsource computation, storage, security, maintenance, and management resources. By freeing up these resources, organizations can have more power to compute data and save costs on data storage for insights. Consequently, companies will implement cloud deployments to update their data architectures according to their business and data strategies.
- Creating data pipelines: Companies create pipelines to scale data and optimize data movement through the organization. These connectors transform data elements, connected in series, with the data output of one element acting as the input for the next one. Organizations plan to upgrade their data architectures to include these pipelines to improve data integration across the enterprise and improve sharing between teams.
- Setting up containerized applications or Kubernetes: Containerized applications segment code into component algorithms that can be reused in different data products. Data Architecture, typically in the cloud, uses containerization to speed up digital transformation, creating value by continuously deploying technologies at scale and making business operations more efficient.
Behaviors: To recap, behaviors comprise collaborations, mindsets, and skills impacting business divisions and enterprise architecture. Find some examples of Data Architecture behaviors below:
- A Data Quality (DQ) mindset: Data architecture relies on people having a Data Quality mindset to get excellent and accurate insights. Additionally, good DQ practices inform accurate data architecture models and schemas that lead to more effective and efficient use of technology. Focusing on Data Quality fosters better business and IT working relationships, which improves data architecture.
- Data Governance (DG) collaborations: DG collaboration harmonizes and formalizes access to, relationship with, and ownership of enterprise data. This connection between DG and data architecture results in a clear blueprint of where the data lives, how to use automation, and how to set accessibility by group.
- Metadata management skills: Metadata forms context to define datasets and their contexts. Companies need to do metadata management to distinguish one data entity from another clearly.
Data Architecture translates shared metadata to data models. Business and IT communicate about metadata and use automation to create, update, and maintain metadata to align with Data Architecture’s data models.
Data Architecture Patterns
Businesses choose technologies based on commonly available Data Architecture patterns according to their strategies. These forms sort themselves into two architectural types: centralized and distributed architectures.
Centralized data architectures: Centralized data architectures organize data storage in one repository and have one view of the business data across functions. Data Architecture patterns for this type include:
- Data Warehouse: A data warehouse comprises a single repository for all information and remains attractive because it organizes information in a single schema for quick access.
- Data Mart: A data mart is a subset of a data warehouse designed to service a specific business line or purpose.
- Data Lake: A data lake holds vast data types and structures that can be ingested, stored, assessed, and analyzed. It has an undefined structure.
Distributed data architectures: Distributed data architectures lead to a single view but use multiple platforms and processes to store and compute data. Businesses like distributed data architectures for flexibility, domain adherence, and sharing capabilities. Distributed architectures generally contain at least one or more of the following technical designs:
- Data Lakehouse: The data lakehouse, a term coined by Databricks, combines the data architectures of a data lake and a data warehouse.
- Data Mesh: The data mesh acknowledges that organizations will have multiple data warehouses and lakes organized by different domains and recommends four core principles to extend the collaborative Data Architecture.
- Data Fabric: The data fabric combines intelligent and automated algorithms and unifies disparate data across systems, providing access to integrated enterprise data. This Data Architecture scales effortlessly as organizations grow.
- Data Cloud: The data cloud is a newer concept, according to William McKnight, and its Data Architecture involves multiple partners and umbrella businesses that access their shared data with a few clicks. Its Data Architecture pattern will solidify as companies determine how to leverage generative AI best.
Common Use Cases
Data Architecture meets many use cases, and some examples are provided below:
- Engaging in DataOps: Engaging in DataOps monitors and improves enterprise data flow, creates predictable business and data services, and makes existing data architecture components adaptable.
- Increasing Data Quality to leverage machine learning (ML): Machine learning describes algorithms that adjust outputs from analyzing and synthesizing new input patterns, leading to better insight and solutions. Companies improve Data Quality with a good data architecture, leading to better training material for and recommendations from ML.
- Breaking down data silos: A large West Coast utility wanted to break silos for better integration. Data architecture focused on data flows, storage, and governance/administration to achieve this objective.
- Implementing a data fabric: Dominos implemented a data fabric to integrate and unify distributed data across various data types and locations. This led to accessing insights across multiple data formats and services.
Roles and Responsibilities
While a data architect plays the most prominent role in developing, updating, and maintaining an organization’s Data Architecture, everyone participates in data architecture. For example, a worker locking a data device from unauthorized access makes the data infrastructure less risky, supporting the business strategy.
Each person engages with Data Architecture differently, depending on their position. Example roles are described below:
- Data architects: The data architect provides clear specifications, models, and definitions, connecting the business with its data. This role has expanded, requiring business understanding when proposing a technical implementation.
- Data engineers: Data engineers build Data Architecture and maintain data systems. They may do some manual processes, like ETL, to transform data sets for use by other systems.
- Subject matter experts (SMEs): SMEs organize, manage, and deliver data sets, improving on the existing architecture. SMEs advise the data architect on needs and implementations. Some SMEs can change Data Architecture through an application, like a drag-and-drop interface.
- Legal teams: Data architectures must comply with regulations, and about 65% of the world’s population in 2023 will be covered by laws similar to European GDPR. Lawyers need to advise companies on how to comply with these regulations within existing and updated data architectures.
- Project managers: A project manager ensures that teams develop, implement, maintain, and update data architecture to specifications in data models, data flows, or other outcomes.
Why Is Data Architecture Important?
A good Data Architecture prevents a data user from spending more time extracting and organizing data than analyzing it. Data Architecture makes technologies and tools more valuable to an organization through its standardization.
Its benefits include:
- Preparing organizations to evolve quickly through a modular approach
- Facilitating alignment of IT and business systems
- Managing complex data and information delivery throughout the enterprise
- Delivering insights into new business activities
- Managing risks from security breaches
Image used under license from Shutterstock.com