Businesses can benefit from Data Modeling in a variety of important ways. Data models serve two primary purposes. They can be designed to represent the organization’s current data system, providing an understanding of how the data flows through an organization, or be developed to show a new desired data system.
The model can be used to create a new streamlined and efficient database, or to improve on a current one. Consequently, Data Modeling has become an important step in the process of developing and improving a database system.
Data Modeling provides a visual representation – typically in the form of a diagram – of how the data flows (or will flow) through a business. At its core, Data Modeling is about learning and understanding an organization’s data flow so that bottlenecks and inefficiencies can be identified. Additionally, opportunities and needed improvements can be identified.
The data model should be an extension of the database type. For example, using a relational data model with an object-oriented database might be a mistake.
Christopher Bradley, an information strategist at DMA Advisors, said in his presentation at a DATAVERSITY online conference, “The main purpose of a data model is actually not to design a database – it’s to describe a business.” Later he added, “There’s no one definitive statement about what a data model is, but data has to be understood to be managed – and data models are the best tool to provide that understanding.”
Data models can provide a blueprint for developing the optimum data flow for your business.
Data Governance as a Part of Data Models
When a data model is developed, the organization’s Data Governance program should be included as part of the process. Many of the goals are the same, and some experts state that regulatory compliances, government policies, and business rules (typically assigned to the Data Governance program) should be a part of the Data Modeling process. By design, data models promote consistency in naming conventions, semantics, and improved Data Quality, as do Data Governance programs.
If a Data Governance program does not currently exist, developing one as a part of the data model is ideal. Creating them separately would consume much more time than creating them simultaneously. (The data steward, a part of the Data Governance program, could be assigned responsibility for implementing and maintaining the data model.)
The Three Phases of Building a Data Model
There are three phases the data model moves through as it evolves: the conceptual phase, the logical phase, and the physical phase. Each phase provides a foundation for the next phase and supports understanding the business’s data flow as the model is developed. (There is software available for this process.)
The conceptual data model: The initial step in developing a data model is the conceptual phase. At this point, the goal is to identify the various entities, attributes, and their relationships within the system – without going into any significant detail.
An entity can be described as an object (for example: a person, a restaurant, a vehicle) whose data will be stored in the database. Attributes are bits of information about an entity (a class, as an entity, would have the names of the attending students as attributes). Relationships describe how these entities within the database relate to one another.
Conceptual data models are often considered a discovery stage in the data model’s development and should present the basic structure with a minimum of detail.
The logical data model: This version of the data model adds another level of information to the conceptual model and expands on the framework. Relationships become a focus in this phase, and are expanded upon, becoming more detailed. The problems and issues that exist within the system should be listed during the logical data model phase.
When the logical data model is completed, designers and managers often step back to consider what is needed for a new database, or the changes needed to optimize an established database. During this phase, new software is often considered, as well as changes in staff behavior as they handle the data.
The physical data model: This third phase should be focused on creating a fairly detailed diagram of the system’s current data flows, but may also evolve into a diagram of a desired future model.
The physical data model is a more mature version of the logical model, and much more detailed. It should be based on an accurate, detailed representation of the business’s data flow.
The third phase should become a final actionable blueprint of the desired flow of data, with all the instructions needed to alter the data’s flow or build the database.
Specific Data Models
The type of data model selected should be based on the type of database that is being used, or will be used. Selecting the type of database depends on the goals of the organization. Organizations have a selection of different designs for developing and visualizing a data model.
Different data models offer different designs and resolve different problems, and selecting the best fit requires a basic understanding of those models. Each can be modified and adjusted to suit the particular needs of a business.
Selecting the best model to maximize its benefits for your business requires a basic understanding of the data models – and their variations. A list of specific models include:
Relational data models: This model maps out the various connections linking different tables of data. The relational database model has been the most popular model since the late 1970s. Its primary strengths are that it is very familiar, easy to use, and reasonably efficient. It is a mature model and works with an ever-increasing number of apps useful for doing business.
It uses tables and columns for data storage and each table stores information that is relevant to a single entity; links between the tables is referred to as “relationships.”
A relational data model normally limits the types of data format it will accept, and (unless involved with the cloud) has limited storage.
NoSQL data models: This model is not reinforced with, nor supported by, a relational Database Management system. As a consequence, it does not support relationships within the storage process. However, as a database system, it does have massive amounts of storage and will accept all kinds of data formats.
NoSQL databases are generally used for research purposes, primarily because of the massive amounts of data they can store. (Massive amounts of data used to be called “Big Data,” prior to big data becoming the norm.)
Hierarchical data models: This model resembles a tree structure, with a main trunk and branches (the tree is often drawn upside down). Descriptions of the hierarchical data model often use a parent/child analogy, with the parent being the main trunk or a prominent branch, and small branches described as children. (A prominent branch can be both a child of the main trunk and a parent of smaller branches.)
This is an early data model design and was replaced by the relational data model. If an organization works with small amounts of data, it can be quite efficient in helping to make decisions. However, it does not work well with the huge amounts of data normally flowing through modern businesses.
Object-oriented data models: A strength of this data model (page 2,556) is its ability to closely model the “real world.” The object-oriented database communicates a more accurate and realistic representation of reality and can store and express all the relationships existing with other objects.
An additional strength is that objects can be transformed into complex objects that traditional models cannot easily cope with. Object-oriented databases work with object-oriented programming languages to promote efficient storage and retrieval.
Unlike many traditional databases, object-oriented databases can store a variety of data types, such as pictures, audio, video, etc. This database is becoming more popular, but finding techs who are comfortable working with it can be a problem.
Network data models: A network database should be used when presenting objects and their relationships in a flexible way is needed. Some of its advantages include its flexibility in accessing data and its ability to handle relationships. Disadvantages are the complexity of the system, and that, once established, the structure can be difficult to change.
The network data model is based on mainframe computers used for networking in the 1970s. (It is not currently a popular model.)
Graph data models: The graph Data Modeling process uses nodes (objects) and edges/links (relationships). Graph databases are schemaless and do not store data using columns and rows. During the development of the model, it is decided which entities/objects should be nodes, what the links/relationships are, and what data should be discarded. The model provides a blueprint of the data’s entities, relationships, and attributes. (Some do this modeling on a regular basis to eliminate unnecessary data.)
Graph data models are rapidly becoming popular as a method for developing artificial intelligence.
Entity-relationship data models: These databases provide a graphical presentation of an organization’s data structure. They are often used in combination with relational models. The entity-relationship model uses boxes with several different shapes and lines to communicate objects/entities and their various relationships.
This model offers a framework for analyzing, understanding, and designing databases, and can be used to design relational databases.
Maintaining a Big-Picture Perspective
When designing a data model, it is best to take a holistic view, and not focus on a single problem. Maintaining a perspective that encompasses a variety of problems and seeking feedback from those who will be working with the system will develop a more effective model. The model should be kept both as simple as possible and as close to reality as possible.
The data model requires regular updating and maintenance to ensure that changes made within the business are also reflected in the model. While most data models require little maintenance, a formal scheduled updating process will keep the model up-to-date and fully functional.
Image used under license from Shutterstock.com