Think of an organization trying to create a single understanding of the information of the organization and the instances of that data around its estate. Consider different groups of people contributing to and using this model from different perspectives and varying reasons. And view this in the context of Data Governance, Data Architecture or Business Intelligence. A seemingly simple task becomes as complicated as six blind men building a model of an elephant. Each blind man has a different perspective of the elephant. This is similar to stakeholders and staff, scattered across the organization, having different conceptions and implementations of Data Governance.
As a result, many organizations end up with silos of knowledge that are fundamentally different, owned by different groups and used for different purposes. This presents risk and cost to an organization where Data Governance is important. The data architect is located at the center of this and often has the most mature and detailed view of information and data. However, data architects struggle to unite the silos and the teams involved.
Jamie Knowles, Product Manager for IDERA, has listened to the pain of the data architects pain. In a recent DATAVERSITY® interview, Knowles discussed the challenges of integrating different viewpoints of Data Governance. He provides suggestions on how the data architect can streamline Data Governance into an enterprise view of the information ecosystem of the organization.
Three Different Implementations of Data Governance
Knowles sees that problems with Data Governance arise from different activities between “tribes”. Each tribe believes that they are concerned with knowledge of the information and data of the organization.
The first tribe works within IT, traditionally taking ownership for information and data. They recognize that an organization needs to have both information and data conceptions connected into a single data model. Enter the data architect with traditional logical and physical data models. According to Knowles:
“Data architects connect the perceptions of information of users and consumers with technology. In the past, data architects constructed logical data models to understand information and use them to document and design databases held as physical data models. They will manage well-defined models rich in valuable knowledge.”
The second tribe is more recent and is a powerful group responsible for Data Governance. This group often resides outside of IT and has leadership in or close to the C-level group such as the Chief Data Officer.
“The Data Governance tribe provides standards for concepts that everyone needs to understand to do business, based on the values of the organization and their importance. For example, they need to define clearly what a customer means to an enterprise, what rules apply to them. This includes related concepts such as orders, employees, addresses, and products”
The Data Governance tribe consists of a network of data stewards who catalog information as business terms and assign business owners to those concepts. Data stewards and owners take responsibility for business information, including attributes, rules, quality, and requirements. A business glossary shares this enterprise vocabulary with consistent meanings.
The third tribe is the data tribe that supports the Data Governance tribe:
“The data tribe deals with data, which is the raw material that makes up the information an organization needs. This includes documents, databases, and flat files. This data perspective conceives of Data Governance as lists of data asset, with inventories of technical metadata around the ingredients that comprise information.”
The data tribe establishes data custodians, typically IT or operations, to handle Data Governance. Data custodians document the structure of assets, accessibility and security details. They also capture data quality, and document technical usage. All of this information is held in a data dictionary. The data custodians classify their data assets in the data dictionary with the business glossaries to form a data catalog, which is designed for easier searching among available data sets.
The fourth and last tribe is concerned with the analysis of data to provide insights to the organization. This is the Business Intelligence tribe that is concerned with data warehouses, data lakes, and analysis tools.
“The Business Intelligence tribe has the challenge of providing accurate data in context to solve organizational questions. Knowing what data they have, where it resides, and the context and quality of that data is vital. Their goal is to make good data available to consumers quickly and efficiently. They also need to understand the rules that apply to data.”
Knowles sees each tribe operating Data Governance separately and not synchronized with each other.
A Unified Ecosystem of Information
A Data Governance toolset needs to tie together information and data in the main types of data models from different tools used by different tribes into a contiguous standardized Data Governance model across the organization. This single view is rarely seen in practice. Knowles said:
“Our customers have different tools. They may have Collibra or Informatica that owns the business glossary. The data cataloging tools may live somewhere in there as well or with another vendor. It is vital that we bring all these bodies of knowledge together. We want to allow people and tools to collaborate across this single understanding.”
Traditionally the data architect has created logical models for designing a database, consolidating some data systems, or migrating to new technology. Consider that an organization moves data to a fresh, new shiny data system or a data warehouse through Snowflake on the cloud. This needs to be done within a Data Governance framework. There could be legal and security implications of moving sensitive data to foreign locations. The content of this data system needs to be well understood and the rules associated with it considered. If it is a data warehouse, then the data needs to be advertised to consumers along with its quality and context.
Enter IDERA’s ER/Studio platform. According to Knowles, ER/Studio takes care of the standard Data Architecture tasks, using data models to document and design new and modified data assets. Also, ER/Studio facilitates Data Governance Management as a toolset. Users can build one or more of the three Data Governance documents: a business glossary, a logical data model, and a data catalog. ER/Studio is unique in that it allows data architects to be core to this unification process connecting these models together in one toolset.
Extending Capabilities to Unify Data Governance
Knowles sees that ER/Studio streamlines informational, logical, and physical Data Governance implementations in a single place while performing the typical Data Architecture tasks such as documenting how and where to put data into a platform.Knowles and IDERA plan on taking these capabilities another step further by making Data Governance integration even more accessible.
In the coming year, ER/Studio will grow its Data Governance functionality, starting with the business glossary.
- Advanced Ontology Support: IDERA understands that the information model held as business glossary terms may synthesize into an ontology containing taxonomies of classified concepts. Knowles said:
“One may form the concept of a person of which a customer, employee and associate are different types each with sub types. Here the business glossary forms a taxonomy tree. But one also thinks of customers within a broader ontology. For example, a person that places an order for a product can be thought of as a customer making an order for a product. In this context, the customer person has specific attributes, rules, and constraints that we need to understand.”
ER/Studio has taxonomy capabilities within its business glossary tool. IDERA plans on adding capabilities to the business glossary for visualizing ontology models and easily finding the important information, and where it resides. These visualizations will allow the user to explore the ontology and then how this information is realized as data assets. The user will be able to ask the import questions “What information is important to us?” “How does it relate?” and “Where is it?”
- Data Classification in Data Architect: As we have described, a core part of Data Governance is connecting the information model to knowledge of the data assets. Data architects are key to providing human knowledge on this to support other streams such as components for artificial intelligence and machine learning. ER/Studio will provide better capabilities to perform this classification process in the tool that they use for their core tasks. Knowles stressed the importance of using data architects in this way because of their unique knowledge and experience of the assets.
- Business Term Harvesting: This functionality uses the logical model. “Based on the hugely valuable knowledge capital in logical models, organizations can generate a list of business terms, their definitions, and relationships to seed their business glossaries,” stated Knowles.
Besides connecting other Data Governance platforms, IDERA will increase support for database platforms in ER/Studio. These include the recent support for Snowflake and Azure SQL Data Warehouse Synapse, besides enhancing its features for Microsoft SQL Server. Finally, ER/Studio will engage in improvements for supported platforms to integrate with cloud databases, and enhance its user interface, performance, and security. Integration with the leading WhereScape product (owned by IDERA’s parent company, Idera, Inc.) is also key, being able to deliver a seamless journey from requesting information to sourcing, preparing and delivering through the warehouse.
Knowles cautioned that Data Governance initiatives take many forms according to the tribal perspective and knowledge. One group can create a data glossary while another group creates a data catalog, etc., like the blind men modeling the elephant. This presents risk to the organization. To mitigate this risk, these diverse understandings of information and data need to operate together within a Data Governance ecosystem.
“Data architects must be recognized for the knowledge that they hold and connect deeper into Data Governance, uniting different conceptions and implementations into a single, high-level information ecosystem viewpoint.”
Image used under license from Shutterstock.com