The term “data architecture” is defined as a set of models, policies, rules, and standards governing data flow and management within an organization. Thus, data architecture, a subset of enterprise architecture, involves the operating rules governing data-related activities like data capture, data storage, data integration, and data usage. In other words, this collection of models, standards, policies, and rules regulates what data is collected and how it is stored, organized, integrated, and used within enterprise data systems. This data architecture 101 article will define data architecture and explore data architecture components, principles, trends, and challenges.
What Is Data Architecture?
According to the Data Management Body of Knowledge (DMBoK 2), data architecture is akin to a “master plan” to manage data assets. Internal organizational policies and business policies often guide the data architecture design. Data architecture deliverables include multilayer infrastructures for data platforms and Data Governance tools, and specifications and standards for data collection, integration, transformation, and storage.
The data architect develops the architectural blueprint, which is aligned with the organizational goals, culture, and contextual requirements. Typically, during multi-party data projects, the data architect acts as the central figure who is responsible for coordinating the many departments and stakeholders around the organizational objectives.
What Are the Components of Data Architecture?
The most basic components of an enterprise data architecture include the following:
- Data pipelines comprise the entire process of data collection: storage, cleaning, analysis, and flow of data from one point to another.
- Cloud storage indicates the presence of a hosted service with data residing on one or more remote cloud servers accessed via the internet.
- Cloud computing refers to the process of storing, managing, and analyzing data on cloud hosts. The biggest benefits of cloud computing are low cost, data security, and zero on-premise infrastructure requirements.
- APIs facilitate communication between the host system and a service requester.
- AI and ML models provide automated capabilities to the data architecture. These models automate many of the core functions of the data architecture.
- Data streaming supports continuous data-flow capability, where data may need to be processed closed to the source, in real time.
- Real-time analytics define data analysis in real-time for instant actionable insights.
- Kubernetes, the one-stop platform for handling computing, networking, and storage workloads.
What Are the Core Principles?
Here are some of the core principles around which a data architecture is built:
- Automation: Automation has removed all the handicaps of legacy data systems. A modern data architecture enables building processes in hours, provides quick access to any type of data through data pipelines, facilitates agile data integration, and allows continuous data flow.
- Security: Security features built into modern data architectures ensure that data is available only on a need-to-know basis. All data is compliant with regulatory bodies like HIPAA and GDPR.
- User orientation: Modern data architectures provide access to data to users when and where they need it.
- Resilience: Data architectures promise high data availability, disaster recovery measures, and adequate backup/restore capabilities.
- Flexible data pipelines: Modern data architectures support data streaming and “data bursts.”
- Collaboration: A well-designed data architecture facilitates collaboration by removing silos and allowing data from all parts of the organization to coexist on a single location.
- AI-driven: AI and ML together reinforce the automated capabilities of a data architecture by providing alerts and recommendations for changing conditions.
- Elasticity: This characteristic empowers organizations to make use of on-demand scaling features quickly and affordably. This same characteristic also frees administrators from many tedious tasks so that they can concentrate on what is important.
- Simplicity: Data architectures allow organizations to limit data movement and data duplication and advocate for a uniform database platform.
You can also review this DATAVERSITY® article about additional data architecture principles.
What Are the Latest Trends and Challenges?
New technologies and architectures to capture, process, manage, and analyze a wide range of data throughout the organization are constantly emerging. In addition to innovations in storage and processing on the cloud, enterprises are moving to new approaches for data architectures that enable them to manage the diversity, veracity, and volumes that come with big data.
From a long list of data architecture trends shaping 2021, ones worth noting were the democratization of data access, an AI-ready architecture, the rise of data architects, data fabrics, data catalogs, DevOps, and, of course, the cloud. Beyond 2021, here are some trends to watch:
- Many data architecture leaders are moving away from the enterprise-wide central data lake to a design based on the domain, which can be customized and fit for purpose, in order to increase the time-to-market for new products and services built on top of the data.
- The move toward highly modular data architectures is here to stay. The modular concept uses best-of-breed, open-source components that can be replaced by new technologies when needed, without impacting the rest of the data architecture.
- More recently, the data architecture concept has emerged out of increasing adoption of cloud computing in enterprises, followed by a dramatic move toward cloud-based platforms for all or most Data Management tasks.
- Big data architecture is the conceptual or physical system to ingest, process, store, manage, access, and analyze massive amounts, velocities, and diverse data that are challenging to deal with in traditional databases. Big data technologies are very specialized, using frameworks and languages not commonly found in more general-purpose application architectures.
- AI-driven cloud services capable of processing a variety of data types is on the rise. With greater data access demands comes the need for supporting AI and ML within a complex, multi-cloud, hybrid environment.
- As the data fabric enables faster data analytics in a variety of cloud setups, growth of the data fabric also means growth in hybrid and multi-cloud.
- High-performance data architecture models, though still in a nascent stage, are developing rapidly. Advanced Data Science applications, such as predictive analytics and applied AI and ML, need data environments that are both high volume and high speed, and at the moment, that condition is achievable with only hybrids and multi-clouds.
- A well-designed big data architecture makes it easier for the business to crunch data and forecast future trends so that it can make smart decisions. Enterprises are using big data analytics technologies to optimize their business intelligence and analytics initiatives, moving beyond slower report tools dependent on data warehousing technologies and toward smarter, more reactive applications that provide greater insight into customer behaviors, business processes, and overall operations.
- This trend is related to the previous trend. Cloud infrastructure providers like AWS, Microsoft, Google, and IBM can handle nearly unlimited amounts of new data and provide on-demand storage and computing capabilities.
One big challenge is that while new technologies to process and store data are coming, the amount of data is doubling in size roughly every two years. Gartner analysts estimate that more than 50% of business-critical data will be created and processed outside the data center or the cloud by 2025. The three main drivers for the future of data infrastructure can be described as moving toward the public cloud, SaaS services, and enhanced data engineering.
The second challenge comes with hybrid and multi-cloud. While these environments are becoming increasingly popular, ongoing data security and management challenges will cause enterprises to look more at hybrid and less at multi-cloud options. Only cloud platforms, with their diverse solutions, can provide the speed, scale, and ease of use of an enterprise-level Data Management platform, though Data Quality will continue to be a lingering issue.
The modern data platform is built around a business-centric value chain, not IT-centric coding processes, in which the complexities of the traditional architecture are abstracted to a single, self-service platform, which converts streams of events into analytic-ready data.
The third challenge is that the new data technologies dramatically add to the complexity of data architecture, frequently hindering the organization’s continued ability to deploy new capabilities, maintain existing infrastructure, and guarantee integrity of AI models.
What’s the Difference Between Data Modeling and Architecture?
While both data architecture and data modeling strive to align data technologies with business goals, data architecture is solely focused on maintaining relationships between business functions, technologies, and data. In short, data architecture sets standards across data systems, acting as a vision or a pattern for how data systems interact. Data architects develop a vision and blueprint of an organization’s data infrastructure, and the data engineers are in charge of creating this vision.
On the other hand, data modelers work with application developers to understand the business processes implemented by the developed applications, and to identify the best representation for data supporting that application. Data modeling is built around data and their relationships, with a model providing the information that is intended to be stored, and is of main value in cases where the end product is to generate the computer software code for an application, or prepare a functional specification that helps guide decisions about the computer software.
Entity-relationship models use diagrams to depict data and its relationships. A data model may translate directly to an entity-relationship model, a relational database, a business-oriented language, or an object-modeling language.
During the enterprise architect data modeling process, data models must be made logical and easily understood by the people who extract insights about data objects. The enterprise architecture data modeling process requires appropriate planning and involves three phases: a conceptual model, a logical model, and a physical model.
Data Architecture 101: Further Reading
Ready to learn more? Here are some additional data architecture 101 resources discussing best practices, challenges, and aligning data governance with data architecture.
Image used under license from Shutterstock.com