Data Modeling describes the plans and activities around diagramming data requirements for business operations across one or more systems. This blueprint defines unique data entities (such as tables, systems, concepts, and hubs) and their characteristics (such as inheritance, importance, and physical structures) along with how the data flows between entities.
Organizations create or update data models anytime, depending on the business context and purpose. For example, Data Modeling may happen at the beginning of a sprint for developing functionality in an app, during bug triage and resolution, to predict how to resolve data performance when a system is down, or as part of a Data Governance request to clarify data definitions of critical elements through discussion.
Data Modeling Defined
Most Data Modeling descriptions cover processes used to understand and meet business needs through Data Architecture design and implementation. Data Modeling includes discovering, analyzing, representing, and communicating data requirements visually, according to the Data Management Body of Knowledge (DMBoK).
Functional and technical analysts – such as data scientists – do Data Modeling to guide planning and construction around data platforms. Conceptually, Data Modeling compares to class modeling programming. Engineers identify data entities, attributes, and relationships with Data Modeling, whereas, with class modeling, engineers identify classes, otherwise known as containers for code.
Types of Data Models
Depending on its purpose, data models generally fall into one of three different types:
- Conceptual: Conceptual data models capture business understanding of entities, attributes, and relationships.
- Logical: Logical data models describe how the systems within a larger platform function together based on rules and data structures, providing a good way to understand data flow.
- Physical: Physical data models cover how a single project or application works to work with and understand how to interface with the technical components better.
The best type to choose depends on the audience and on the business operations that will happen next. Organizations may take different approaches to each type of data model by using one or more Data Modeling techniques.
Data Modeling vs. Data Architecture
Data Modeling
Data Modeling happens within Data Architecture to formally document and concretely lay out what data assets exist, where they come from, where they go, and how to describe them. Data modelers also forecast and analyze, through visual representations, the best ways to optimize data storage and compute resources for business activities.
Aligning core business rules with data definitions through its blueprints, leading to better Data Architecture, is a prominent goal of Data Modeling. Additionally, companies and their customers get an inside view of data structures and their relationships when data is modeled. Businesses use this documented perspective to resolve larger Data Architecture goals – like agreeing on data definitions and requirements.
Technical professionals like engineers, data architects, and data scientists construct and revise data models, sometimes with the assistance of AI. They depict an understanding of Data Architecture and the business operations that function with it.
Data Architecture
In contrast, Data Architecture takes a big-picture view, going beyond Data Modeling by covering business and technical perspectives through an overarching framework. This methodology aligns with an organization’s Data Strategy and business strategy while supporting guidance, roles, priorities, and processes determined through Data Governance.
Data Architecture covers more Data Management activities than Data Modeling.
Data Architecture comprises outcomes, activities, and behaviors to get good enough Data Quality, whereas Data Modeling provides a guide to better Data Quality during implantation.
Since Data Architecture covers the organization’s entire data infrastructure, components, and behaviors, everyone participates in Data Architecture activities, including the non-technical subject matter experts (SMEs), executives, data consumers, data producers, IT personnel, and data engineers.
In contrast, some businesspeople may not be involved in Data Modeling activities around creating technical diagrams. However, these folks may use the resulting data models to do Data Architecture – such as data definitions, new opportunities, and changes to business operations for more efficiency.
Benefits of Data Modeling
Benefits of Modeling Relational Data Structures
Organizations typically embrace modeling relational databases, as over 70% of businesses have at least one relational database management system (RDBMS), either on-premise or in the cloud. Also, an RDBMS consists of organized tables containing records and attributes and SQL capabilities, making it a good candidate for Data Modeling.
As a more mature technology, the RDBMS has built-in Data Modeling functionality in most applications where administrators can see relationships between tables, keys, and attributes. Moreover, organizations can use this technology to update Data Architecture components. Organizations find advantages in modeling relational data at the outset before developing an application or deconstructing relational database systems for potential changes. With a data model, organizations better understand and control their relational data assets.
Consequently, relational Data Modeling gives organizations:
- Information to develop higher-quality data automated tools
- Reduced costs on rework
- Faster time to release apps or to update existing ones
- Understanding of how to define data and how its activities translate into data
- Better RDBMS performance
- Insights to get better Data Quality
As an example, see the Kimball diagram on departmental purchases among company employees below:
Benefits of Modeling Unstructured Data
Around 2008, database technologies advanced to handle non-relational data to counter drawbacks in RDBMS and discover new business opportunities. This modernization allowed data to be ingested, stored, transformed, and used in real time without requiring a predefined schema.
Consequently, Data Modeling became even more important in understanding an organization’s data and its processes needed to evolve to keep up with rapid changes in data structures and their relationships.
Data modelers iterated their constructions for model-driven development alongside actual code and data deliverables. Additionally, modeling drew on newer powers to capture the current context, infrastructure layout, and real-time requirements. These capabilities have proven invaluable as Data Management technologies have progressed to include cloud computing, which typically bills depending on resource usage, and AI, which automatically identifies improvements in data models and makes recommendations.
Data Modeling in real time offers the following benefits (in addition to the benefits listed under modeling relational data):
- Recommends resources for cloud storage and computing optimization
- Increases cost-effectiveness of system utilization, whether on-premise or in the cloud
- Identifies necessary changes to existing data models based on patterns in data representation
- Reduces software errors and the time to fix them
- Manages risk better
- Narrows in on specific information sectors through small and wide data models
- Identifies new business opportunities based on changing data trends
- Clarifies the business uses of data and its technical needs to do so
Data Modeling Use Cases
Generating a high-quality data model with excellent quality has increased business productivity and value through organizational data. See some examples below:
- Producing multivariate models by diagramming time series data to identify seasonal adjustments to collect data for the Census Bureau
- Using a metadata-driven modeling application for smoother integrations between legacy, non-relational, and proprietary database systems
- Constructing a knowledge graph of data entities and the context of their relationships to fuel a smart data catalog
- Creating a relational data model used for reporting on departmental purchases
- Generating no-code data models that accelerate the insurance underwriting progress and improve customer data flow
- Creating a common data model from disparate source data systems to do supply chain planning
- Updating older data models to train AI algorithms used for precision marketing
- Modeling data through a knowledge graph to improve fraud detection
- Creating a data flow data model (DFD), as shown below, to describe bank interactions and analyze them for efficiencies
How Does Data Modeling Work?
Data Modeling typically involves several steps. First, the data modeler needs to understand the business problem and its requirements. To do so, the data modeler does a combination of looking at existing documentation of data models, talking with the stakeholders and potential data consumers to see what they want, and keeping informed with people involved in Data Governance about data definitions, metadata management, or standard usage.
If ambiguity lies about what Data Architecture already exists, the data modeler will use a bottom-up method to piece together systems. From there, the modeler will identify technical areas used in business operations and clarify business processes with those who own and use that data.
Once the data modeler gets enough information to understand business use cases and existing data components, they sketch the visual model data entities, relationships, and flow. Then, the data modeler shares the data model with data producers, consumers, and stakeholders for review and feedback on work impacts.
Getting Feedback on Data Models Toward Approved Data Architecture
Who should give feedback depends on the conceptual nature of the data model and the context for the business operations depicted by the model. Based on feedback and prioritization from critical stakeholders, the data modeler updates the visualization. Any changes to the data model go back for re-review before being implemented in production.
Central to data model feedback is to get agreement among business and IT about what to build, implement, or change. Approval of data definitions, usages, and relationships by an appropriate, with sign-off from relevant representatives, makes modeling mode applicable to an organization.
Sometimes, existing data models need to be updated in real time through an app to address a critical issue, such as a blockage in data flow. There may not be the means to research and get a formal signed-off review.
Doing so requires careful judgment, as altering a known good original data model can create unanticipated problems for other users outside of the fix. As a good rule of thumb, Data Modeling works best when following best practices.
Image used under license from Shutterstock.com