Data Architecture principles are a set of policies that govern the enterprise data framework with its operating rules for collecting, integrating, using, and managing data assets. The basic purpose of the Data Architecture principles is to keep the supportive data framework clean, consistent, and auditable. The overall enterprise Data Strategy is built around these principles.
In recent years, DA principles have gone through a major overhaul to accommodate modern Data Management systems, processes, and procedures. The modern-day DA principles help lay the foundation for a Data Architecture that supports highly optimized business processes and advance recent Data Management trends.
Here is a list of Data Management trends that forced global organizations to take a critical look at their existing Data Architecture:
- Shift from on-premise to cloud-based data platforms
- Reduced costs of stream processing, favoring real-time over batch processing
- Pre-made commercial data platform replaced by scalable and customizable modular solutions
- Data reuse and APIs for data access
- Shift from data lakes to domain-based data storage
- Shift from predefined data models to flexible data schemas
Within an enterprise, every user wants clean, easily accessible data that is updated on a routine basis. Effective Data Architecture standardize all Data Management processes for quick delivery of data to people who need it. The existing Data Architecture designs need to change to keep pace with evolving Data Management requirements.
As a McKinsey author observes, “many new and advanced technology platforms have been deployed alongside legacy infrastructure” in global enterprises in the recent years, These novel technology solutions like the data lake, customer analytics platform, or stream processing have put tremendous pressure on the performance capabilities of the underlying Data Architecture. The existing Data Architecture has failed to deliver enhanced support, or have even failed to maintain the existing data infrastructures.
Additionally, with the rising adoption of AI and ML platforms for business analytics and BI activities, the time has come for an overhaul of enterprise Data Architecture. As is true for any technology transformation, the Data Architecture principles“ developed, tried, and tested” for the present-day Data Architecture are quite different from those of legacy Data Architecture.
This post reviews some core principles that define an AI-ready, modern Data Architecture.
The Top Five Essential Principles of Data Architecture
As enterprise data continues to grow exponentially, global businesses are responding to this phenomenal growth of data by implementing throng data literacy and Data Governance programs. However, in order to derive the maximum business value from data, organizations need a strategic mindset along with advanced technologies.
To leverage data as a competitive asset, organizations have now turned to fundamental DA principles for answers. The rest of the post will focus on five essential Data Architecture principles for success with enterprise data activities:
- Data Quality (DQ) is the core ingredient of a strong Data Architecture. Data Quality is critical for building an effective Data Architecture. Well-governed, high-quality data helps build accurate models and strong schemas. High-quality data also helps extract valuable insights. Often overlooked, DQ is the core principle of a good Data Architecture. This KDNugget post reminds that Data Quality is one of the most forgotten aspects of Data Architectures.
- Data Governance (DG) is a critical factor for building Data Architecture. Closely linked to the above principle, DG policies govern enterprise data, regardless of source, type, or volume. At any point during the data lifecycle, users must know the location, the formats, the ownerships and use relationships, and all other pertinent information related to the data. Thus, Data Governance policies are critical to the success of Data Architecture as they perform the job of a “watch-dog” on scalability, DQ, and compliance matters.
- Data provenance is necessary for periodic audits. Data provenance is a set of information about data, which tracks the data from its original source through its journey till it has been processed. If users do not know how data was collected, cleaned, and prepared, then they wouldn’t have a clue about the reliability of the underlying Data Architecture.
- Data in context is a necessary element. A discriminating attribute distinguishes one data entity from another. Users first need to understand the entities that exist in the data and what attributes distinguish them from one another. Unless this step is completed, users will fail to understand the context of the data or its role for extracting insights. Discriminating attributes help data architects understand data in context, which is a necessary step for data modeling.
- Granularity of details for each attribute needs to be understood. Data architects have to determine the level of details required for each attribute. The Data Architecture needs to store and retrieve every attribute at the correct level of detail; so this is a critical step for building a high-performance Data Architecture.
Although some other DA principles contribute to the building of an enterprise Data Architecture, a discussion about them is beyond the scope of this post.
Modern, Big Data Architectures Principles
Any discussion about Data Architectures without mentioning big data is surely leaving a critical aspect out of the discussion. Big data indicates petabytes of multi-structured, multi-type data that have to be managed for meaningful analysis. Here are some principles for building a modern Big Data Architecture:
- Centralized Data Management: In this system, all data silos are replaced with a centralized view of the business data across functions. This type of centralized system also supports 360-degree view of customer data with the ability to correlate data from different business functions.
- Custom User Interfaces: As the data is centrally shared, the system provides multiple user-friendly interfaces. The interface type is aligned with the purpose, such as an OLAP interface for BI, an SQL interface for analytics, or the R programming language for data science work.
- Common Vocabulary for Data Use: An enterprise data hub ensures easy comprehension and analysis of the shared data through a common vocabulary. This common vocabulary may include product catalogs, calendar dimensions, or KPI definitions, irrespective of the type of consumption or type of use of the data. The common vocabulary removes unnecessary disputes and reconciliation efforts.
- Restricted Data Movement: Frequent data movements have a high impact on costs, accuracy, and time. The cloud or Hadoop platforms provide a solution for this; they both support multi-workload environments for parallel processing of data sets. This type of architecture removes the need for data movements, thus optimizing the cost and time investments.
- Data Curation: Data curation is an absolute must to reduce user frustrations with data access stored in clusters. Data curation steps, like cleansing raw data, relationship modeling, setting dimensions and measures, can enhance the overall user experience and help realize the maximum value from the shared data.
- System Security Features: Centralized Data Management platforms like Google BigQuery, or Amazon Redshift require stringent security and access control policies for raw data. Today, many technology solutions facilitate Data Architectures with built-in security, and self-service features without compromising access control.
The above Data Architecture principles can substantially enhance the effectiveness of a Big Data Architecture. For further information, you may wish to view some DA best practices.