In 2024, companies have developed a renewed interest in the benefits of Data Modeling, engaging in pragmatic planning and activities around diagramming requirements. Organizations want to document data architectures to get good Data Quality and overcome challenges.
Notably, the resolution to each data incident has risen significantly by 15 hours between 2022 and 2023. Furthermore, 80% of data executives and business leaders say cultural impediments — people, business processes, and organizational alignment — prevent a data-driven approach.
However, past efforts in diagraming data architectures have proven difficult. Many organizations attempt to model the entire enterprise system or fail to understand their data solution. Compounding these issues, some companies still rely on older Data Modeling tools, which can intimidate businesspeople.
Consequently, frustration grows within companies, leading to a tendency to skip the modeling process until after a data solution has been built – a code-first approach – or to a rudimentary understanding of their data architectures through tribal knowledge. Unfortunately, these situations often result in a painful process of comprehending data systems and fixing problems retroactively.
To change this experience, pragmatic Data Modeling promises a smoother and more efficient design-first approach, empowering businesses to establish a shared understanding of the meaning and context of their data. Pascal Desmarets, founder and CEO of Hackolade, discussed the benefits of pragmatic Data Modeling and shared his expertise in creating visual tools for NoSQL or non-relational databases to show how such a modern approach leads to better experiences.
Adapting to NoSQL Data Architectures
Modern technologies embrace NoSQL database systems that scale quickly and speedily process large amounts of data. However, they speak different languages.
So, data modelers needed to take on a different mindset. Desmarets explained:
“If organizations do their Data Modeling as they did in the past, with a relational database management system (RDBMS), they waste time. While different RDBMS speak the same language with different SQL dialects, newer technologies communicate very differently. A graph data system written with the Neo4j Cypher is uniquely constructed, and differs from an Avro schema used to serialize and exchange data. Both have nothing to do with the Open API documentation.”
Integrating NoSQL technologies thoughtfully into the larger data infrastructure is crucial for businesses seeking to seize new opportunities and mitigate emerging threats promptly. Despite the steep learning curve for modelers, the proliferation of these systems offers more options for event-driven architectures and microservices, a collection of services that provides a set of functional features for an application.
Data Architecture intricacy will only increase as developers apply open plug-in data structures or write their apps to get more boutique services. Moreover, many firms have a mosaic of different technologies in their data stacks and pipelines. For organizations to make sense of what they are building, Desmarets advises that data modeling tools must speak the language of all these technologies and adapt with consistent translations, known as polyglot persistence.
The Benefit of Polyglot Persistence
Polyglot persistence prevents organizations from losing or having inconsistent or wrong data due to bad translations among Data Architecture components or schemas. Both factors are involved when AI applications hallucinate, create incorrect recommendations, or retrieve the wrong results.
Desmarets explained:
“Schemas represent the data contracts used between data producers and consumers. These contracts need to enforce Data Quality and consistency. Data application systems evolve so quickly, with changes made during sprints. So, a data modeling tool like Hackolade, which supports over thirty target technologies, is indispensable for polyglot persistence and data exchanges.”
Since schemas fly around in all types of directions with so many different non-relational technologies, polyglot persistence is necessary. That way, people and systems can communicate their data concepts effectively.
Purposeful Modeling
Managers often seek solutions to their complex problems that are cost-effective and purposeful. In pursuit of this, some may be tempted to rely solely on industry models, predesigned data mappings that are tailor-made for a business sector, or other generative AI solutions.
However, starting with industry models or generative AI leads to more work rather than a good result. Desmarets observes that teams overestimate their ability to get a competent model and think they no longer need subject matter experts (SMEs).
Implementing these agnostic models can result in a mere academic exercise that fails to address business needs. Businesspeople fill in the gaps because they know the vocabulary, terms, and purpose behind the data.
Desmarets suggested using industry models or generative AI resources after engaging with business experts. Then, consulting these technologies as a checklist to ensure necessary functionality is included. He stated:
“Include industry models or generative AI in data modeling, but not as a free-flow prompt that is your starting point. These tools do not function as a magic wand – say where you are a bank manager and ask technology to spit out a data model. That approach is never going to work.”
By involving SMEs in the data modeling process, organizations can ensure that the resulting model serves the specific purposes of their business. They are invaluable resources for creating a data model with a clear and meaningful purpose.
The Benefit of Domain-Driven Design
As companies embrace the involvement of SMEs in modeling activities, Data Architecture decisions evolve from the responsibility of a few technical individuals to become a collaborative effort. This shift toward collaboration is further exemplified as organizations adopt data mesh, a decentralized sociotechnical Data Architecture approach to sharing, accessing, and managing analytical data.
Pragmatic Data Modeling brings significant benefits to organizations by emphasizing domain-driven design. According to Desmarets, domain-driven design principles are derived from the domain-driven development methodology, focusing on components.
He stated that the key principles of domain-driven design include:
- Breaking down complex problems into smaller manageable pieces
- Using consistent terminology across the different phases of the project and business units
- Involving SMEs and working closely with them
In this context, integrating modeling tools with AI capabilities, such as the one offered by Hackolade, becomes invaluable. These tools assist SMEs “model and spell out data requirements better and more efficiently,” said Desmarets. By harnessing business professionals’ expertise and leveraging AI’s capabilities, organizations can access relevant query patterns better and maximize the effectiveness of Data Modeling tools.
A Single Source of Truth
Designing and implementing a data solution works best when everyone is on the same page about what is available now and what needs to change. So, having a single source of truth is critical in getting to that shared understanding needed to run continuous integration/continuous delivery (CI/CD) pipelines.
Problematically, many companies can point to multiple applications that act as a single source of truth – such as data catalogs, Databricks (a unified analytics platform), Collibra (a Data Governance platform), or any other Data Management suite. Desmarets cautioned:
“With multiple sources of truth, there is no longer any standardization because each version of a source diverges from the others. … As the development of Data Architecture happens so fast, the number of revisions mounts, resulting in many links in the chain. Results start to diverge, and it takes little time for schemas in production to differ too much from the baseline data models held by Data Governance.”
To address this challenge, Desmarets recommended synchronizing all the places developers update and submit their code, such as GitHub, Jenkins, or CI/CD Pipeline. Consequently, the engineers do not need to record their changes in a different program that they have to learn, which improves their efficiency and reduces the risk of confusion from all the different versions. Moreover, the synchronization processes generate metadata about Data Architecture changes, providing additional understanding about the single source of truth.
The Benefit of Metadata as Code
Organizations should use automated tools to synchronize data model versions across various systems through metadata, the information describing the Data Architecture. Desmarets suggested accomplishing this task with code metadata as code. That way, developer updates synchronize with the other data applications and views.
He explained the different principles of metadata as code.
- Data models should align with the metadata as code originating from the same lifecycle or version.
- As developers deploy their code and changes to the system, this schema should automatically propagate to the target technologies, which also serve as sources of truth — e.g., data catalogs, data bricks, Collibra, etc.
- Data model synchronization should occur automatically, as implemented in the Hackolade suite.
With metadata as code, data models can be updated and kept accurate in real time, allowing businesses to manage Data Architecture updates efficiently and point to a single source of truth.
Conclusion
Pragmatic data modeling offers must-have benefits, as businesses recognize the importance of establishing a common understanding of data and its context for good Data Quality. Desmarets highlighted three key data benefits:
- Polyglot persistence
- Domain-driven design
- Metadata as code
When revamping schemas, it is crucial to consider these functionalities.
Looking ahead, AI in Data Modeling practices promises to make Data Architecture updates seamless. Desmarets expects modeling to evolve from relying on user input on data modeling to offering intelligent suggestions, providing valuable insights for better constructions. Who knows, future Data Modeling may enable critical customers to propose and sell recommendations to their vendors, creating a win-win situation for everyone involved.