When you use Google, pick a movie from Netflix, talk to Siri or Alexa, or look for your nephew on Facebook, you’re benefiting from Knowledge Graph technology. DATAVERSITY® recently caught up with the three co-founders of TopQuadrant, CEO Irene Polikoff, CTO Ralph Hodgson, and CMO Robert Coyne, to get their perspective on Knowledge Graph technology and how it fits with today’s Data Governance needs.
Data Governance
Polikoff said that 15 to 20 years ago, what we now call Data Governance was called Enterprise Architecture. The goal at that time was to capture everything about an enterprise’s data processing — an expensive and time-consuming task Polikoff likened to “boiling the ocean.” Coyne agreed with Polikoff and added that in trying to document all of the connections that exist, a small group of very technical people would wallpaper a room with enormous charts. From that process, the technical side did get a sort of understanding, but it was too complicated and inaccessible for business users to benefit.
Efforts to make that task more manageable by limiting the size and number of data stores and applications were unsuccessful over time, because within a short time, at most a few years, after each “rationalization” effort the companies would end up with twice as much data as when they started, Polikoff said. “That’s just the nature of the business.”
Due to proliferation of data stores and sources and increasing complexity of their processing, the need for Data Governance slowly emerged from the Data Management perspective. Understanding the context for data became more important, she said.
“If you’re trying to describe your data in order to better manage it, you have to describe it from the perspective of how it’s being used. Capturing the context surrounding the creation and use of data is necessary and that’s essentially what enterprise architecture is.”
Unlike typical operational systems that support transaction processing, a system for Data Governance is about connections across the digital landscape, so it’s essential to have a platform that is able to flexibly and incrementally build connections meaningful to different stakeholders, she said. “Now we see that the enterprise Data Governance space must be a fundamentally business-driven activity instead of an IT-driven activity,” added Coyne.
Hodgson said that to rely on metadata for context, it necessitates a fidelity or “acuity” with what’s going on. “So you don’t grab the metadata just once, you keep checking whether the metadata is the way it should be.” With Data Architecture, as soon as models are built and the “wallpaper” is up, something changes in the enterprise and the charts go out of date. “Change is the key driver behind the reframing of this as Data Governance instead of Data Architecture,” he said.
Coyne said that Data Governance is meant to provide a comprehensive view of the data lifecycle all the way from the business to the most technical elements. People sometimes assume that they have effective Data Governance in place because they have a small piece of that whole lifecycle. But a more connected, comprehensive lifecycle capability across the whole data ecosystem is what is needed in today’s world in order to achieve what he calls Data Governance 2.0.
“Many existing technical tools that have been here for a long time provide a piece, but going forward, we need a more comprehensive solution, and that’s where the Knowledge Graph approach plays a strong role,” he said.
Hodgson sees organizations with people at different levels having different relationships to the company’s metadata. Some are putting Data Governance to work.
“They’re not just collecting metadata, they’re wanting to do something — maybe it’s lineage, maybe it’s compliance, maybe it’s figuring out what they need to archive, how long they need to archive it for.”
Others are collecting metadata, and there are groups interacting with it because they’re working on a project. “Bringing all of this together requires a methodology that Data Governance hasn’t had before.”
Knowledge Graph Technology
Knowledge Graphs are often used in social networks for fraud detection in online transactions, and as recommendation engines. A Knowledge Graph is adaptable, reusable, and accretive, combining flexibility with structure and meaning.
Powerful, enterprise-wide associations can be made using simple constructs with Knowledge Graphs and Graph Databases. Mirroring the way we think, a Knowledge Graph uses a set of nodes, edges, and properties to represent and store data. Relationships between data points often matter more than the individual points themselves.
TopQuadrant defines a Knowledge Graph as an interconnected set of information, able to meaningfully bridge enterprise data silos, and provide a holistic view of the organization through relationships.
Because enterprise data is among the most important assets of an enterprise, the capture of its full range of contexts – both technical and business – through connections across all assets in the enterprise ecosystem is foundational to effective Data Governance. The best way to do this is through an open, extensible, and smart approach. One powerful way to do this is through Knowledge Graphs, said Hodgson.
Why are Knowledge Graphs important?
Hodgson outlines four key abilities Knowledge Graphs provide for the Data Modeling process:
- Extensibility: Able to accommodate diverse data and metadata that evolves over time
- Introspection/Query Ability: Models can be inspected to find what things are knowable and findable
- Semantic: The meaning of the data is stored within the graph alongside the data to understand connections
- Intelligence Enabling: The ability to infer dependencies and other relationships between objects
TopQuadrant
TopQuadrant’s mission is to make information meaningful, thereby empowering people. They consider their commitment to Semantic Web standards a key part of their success. The Semantic Web is an enhancement of the current web where meaning (i.e. semantics) is machine processable. Having data that uses vocabulary that computers understand makes it easier to find, share, and combine data/information.
Polikoff said that customer demand led to managing vocabularies using recognized semantic standards, and the company then used Knowledge Graph technologies to extend the product and manage all types of metadata: structured, unstructured, business, technical, operational, etc. “You need to have some level of automation in order to assist people who work in the Data Governance space,” she said, so they added supervised Machine Learning, to make “Cognitive Data Governance” possible. As Hodgson remarked, “Data Governance is about connecting things, so we created a platform that leverages knowledge graphs to build comprehensive relationships.”
TopBraid
TopBraid Enterprise Data Governance solutions use Knowledge Graphs, Rules, and supervised Machine Learning to manage metadata and address all three aspects of Data Governance:
- Executive Governance: Creating controls, processes, and policies, or formalizing them if they already exist informally
- Representative Governance: Creating models of the information to be captured, such as glossaries, data sources, applications, reference data, and so on, and using those models to describe these assets
- Applied Governance: Using the information captured to address specific needs. For some, it could be the ability to assess the impact of a change in data sources. For others, it could be about tracing Data Lineage in order to satisfy regulatory compliance requirements. Yet another common goal is improving the quality and consistency of data
Hodgson said he’s now seeing metadata silos confronting people who need to cross the landscape of their ecosystem to “Figure out what’s moving, where it’s coming from, where it’s going to, and how it affects things,” a situation he likens to a railroad system. He sees the biggest problem that the industry faces is that each vendor offers their own unique model. “We have an open architecture model. We use standards. That’s something people are beginning to be aware of and appreciate,” he said.
His role as CTO at TopQuadrant is to break down complexity and find the best way to provide a useful experience for business people as well as technical people. He sees increasing excitement about Knowledge Graphs (aka Semantic Technology) and the value that Data Governance can offer. “People don’t want just to have pretty pictures. When they do Data Governance, they want to be sure that it has an impact.”
Image used under license from Shutterstock.com