One of the greatest strengths of graph databases is their ability to treat “relationships” between the data as being as important as the data itself. They show a visual image of a graph in response to queries. Graph databases are designed to hold data without restricting it to a fixed, predetermined model. As a consequence, graph databases are very good at managing complex research queries.
Using a query and a starting point, graph databases search the surrounding data — collecting and accumulating information from thousands of nodes and relationships — while leaving any data outside of the search perimeter alone.
Nodes are used in graph databases, and represent entities such as accounts, businesses, people, or any other item being tracked. (In a relational database, they are the rough equivalent of a row, record, or relation.)
Edges are lines connecting the nodes, and represent relationships. For example, a chess club manager wanting to research the relationship of Alice and Bob would query the graph database, and be shown basic information about Alice and Bob and their relationships, as well as their relationship with the chess club. (A query regarding “all” members would show the same basic information, but would be much larger and more complex.)
Edges (and the relationships they represent), are what make graph databases unique. Meaningful patterns can be found when reviewing the edges (relationships) connecting the nodes.
The edges can either be directed or undirected. Undirected graphs use edges with “no specific direction,” and suggest a two-way, or bi-directional relationship. Each edge can be used to move in both directions. Directed graphs, on the other hand, have edges that move in a single direction during the query. These edges suggest a one-way relationship.
A directed graph could be used to represent a website (the “start” or “to” node), with the one-way edges representing hyperlinks, and the linked destination as the “end” or “from” node. Facebook can be used as an example of undirected graphs, because when a home page owner adds a friend, that new friend has full access to the page owner’s public content. Communications go back and forth in a two-way relationship.
Only databases natively embracing relationships can store, process, and query connections between two or more databases efficiently. While other databases (such as SQL) can find relationships between databases when responding to a query, they must use expensive JOIN operations.
Graph databases, on the other hand, are designed to store their connections (relationships) with the data. (Most complex queries using SQL database management systems require JOIN commands.)
The origins of graph databases can be traced back to 18th century Swiss mathematician Leonhard Euler.
In the last few years, two types of graph databases have gained significant popularity. They are knowledge graphs and property graphs. Both types of graph databases provide flexibility, a focus on relationships, and insights gained from the existing data. Property graphs have been described as simpler, and a potential first step to adopting a knowledge graph.
Property Graphs (Also called Labeled-Property Graphs)
A property graph uses nodes, relationships, labels, and “properties.” Both the relationships and their connecting nodes of data are named, and capable of storing properties. Nodes can be labeled in support of being part of a group. Property graphs use “directed edges” and each relationship has a start node and an end node.
Relationships can also be assigned properties. This feature is useful in presenting additional metadata to the relationships between the nodes.
Marko Rodriguez, cofounder and CEO of RReduX, stated:
“The term ‘property graph’ has come to denote an attributed, multi-relational graph. That is, a graph where the edges are labeled and both vertices and edges can have any number of key/value properties associated with them.”
Labeled-property graphs were first developed in Sweden in an effort to develop an enterprise content management (ECM) system. The developers decided on a graph system that emphasized efficient storage, in turn promoting fast query speeds and quick traversals across connected data. Property graph databases have focused on fast start up times and performance, but are continuing to evolve and develop new capabilities.
Property graphs use relevant, easy to recognize labels that can be used for modeling data and its connections. This data can be structured in ways that are easily understood by humans. When the data is modeled with these relevant terms, it can be queried in a way that is easy to read.
Some of the basic features that make property graphs so popular include:
- User-friendly: The visual images of graphs, with nodes and their properties, seem to be understood very easily
- No fixed schema: This feature works well with structured and semi-structured data
- Relationships have start and end points: Property graphs always have a start and end point (and a direction)
- Relationships are assigned properties: The edges can have values assigned to them which can specify such things as capacity, length, or other characteristics
- Internal IDs: Property graphs’ databases internally assign nodes and edges IDs by using text strings (relationship types, node labels, and property names)
Knowledge Graphs
A knowledge graph is basically a map of an organization’s data. It can be restricted to a specific domain, or used as an enterprise knowledge graph, mapping all the data a company has stored.
Knowledge graphs are sometimes called “semantic networks.” This is because they are based on the semantic web, a system designed to structure the metadata of web pages and their links, making them machine-readable. (Like property graphs, knowledge graphs also use directed edges.)
The semantic web, first proposed in 2001, never caught on because it developed a reputation for being too academic. However, Google adapted the basic concepts for its search engine in 2012, and called it the Google Knowledge Graph. To maintain their competitiveness, Microsoft, Yahoo, LinkedIn, etc., also incorporated “knowledge graphs.”
Knowledge graphs use semantic information to provide context, supporting human insights. The knowledge graph represents a collection of interlinked descriptions of entities (businesses, people, events, concepts), and a framework for data integration and analytics. Because of their ability to make metadata machine readable, knowledge graphs have great potential for use with machine learning and artificial intelligence.
Several specific applications rely on knowledge graphs. These include data heavy services, such as:
- Intelligent content
- Package reuse
- Content recommendations
- Knowledge graph-powered drug discoveries
- Semantic searches
- Investment market intelligence
- Information discovery in regulatory documents
- Advanced drug safety analytics
Knowledge graphs are very useful in working with data fabric. The semantics feature (and the use of graphs) supports discovery layers and data orchestration in a data fabric. Combining the two makes the data fabric easier to build out incrementally and more flexible, which lowers risk and speeds up deployment.
The process allows an organization to develop the fabric in stages. It can be started with a single domain, or a high value use case, and gradually expanded incrementally with more data, users, and use cases.
A data fabric architecture, combined with a knowledge graph, supports useful capabilities in many key areas. The combination can be used to connect related data from across silos with remarkable flexibility, connecting millions of related data points.
Businesses can gain unprecedented flexibility in the process of data integration. Knowledge graphs can also make complicated data significantly easier to use and understand by establishing a semantic layer of business definitions.
The recent breakthroughs in natural language processing (question asking, entity recognition, relation classification, text classification) have made it necessary for businesses to use NLP to stay competitive. (Natural language processing refers to the branch of artificial intelligence that allows computers to understand both text and spoken words.)
Knowledge graphs, when combined with NLP, become a powerful tool that can be used for data mining and research. Natural language processing (NLP) packages can be used to create a knowledge graph data model offering useful data insights.
There are semantic platforms that have incorporated artificial intelligence and knowledge graphs. The PoolParty Semantic Suite seems to be the leader in combining artificial intelligence with knowledge graphs.
Having an AI supported knowledge graph that can display and link data assets can provide a competitive edge. Andreas Blumauer, the CEO and founder of the Semantic Web Company said:
“We can finally use a really large amount of data and make complex queries over it, validate different data sources at the same time and put them together.”
Image used under license from Shutterstock.com