Generative AI has huge potential, but it also faces problems. If generative AI creates information that is not factually accurate in response to a user request – resulting in so-called hallucinations – it can have a big impact on users. Relying on large language model (LLM) training data on its own is not enough to prevent hallucinations. According to the Vectara Hallucination Leaderboard, GPT 4 Turbo has a hallucination rate of 2.5%, followed by Snowflake Arctic at 2.6% and Intel Neural Chat 7B at 2.8%.
To deal with this potential issue and improve results, retrieval augmented generation (RAG) allows users to leverage their company data through vector searches. However, RAG is not perfect either. When companies have documents that often reference each other or if the same data is repeated across different documents, it can reduce the effectiveness of the purely vector-search-based approach.
The issue here is that RAG focuses on information similar to the question prompt in order to return results. This makes it harder to answer questions that involve multiple topics or that require multiple hops, as vector search finds results matching the prompt but cannot jump to other linked results.
As an example, say that you have a product catalog with files on each product. Some of those products may be very similar, with minor differences in terms of size or additional functionality depending on which version you look at. When a customer asks about a product, you would want your LLM to respond with the right information around the category and around any specific product features too. You would not want your LLM to recommend one product that doesn’t have the right features when another in the same line does. Product documentation may also reference other information, e.g., by having a link in the document which means the chunk returned may not offer the end user the full picture.
Combining RAG and Knowledge Graph Data
To overcome the potential problem around including the right level of detail, we can combine RAG with knowledge graphs, so that we can point to more specific files with the right data for a response. A knowledge graph represents distinct entities as nodes within the graph and then edges indicate relationships between the specific entities. For instance, a knowledge graph can provide connections between nodes to represent conditions and facts that might otherwise be confusing to the LLM because they might otherwise seem similar.
When used for RAG, entities relevant to the question are extracted, and then the knowledge sub-graph containing those entities and the information about them is retrieved. This approach allows you to extract multiple facts from a single source that are associated with a variety of entities within the knowledge graph. It also means you can retrieve just the relevant facts from a given source rather than the whole chunk, which might include irrelevant information.
Alongside this, it means that you can deal with the problem of having multiple sources that include some of the same information. In a knowledge graph, each of these sources would produce the same node or edge. Rather than treating each of these sources as a distinct fact and then retrieving multiple copies of the same data, that repeated data will be treated as one node or edge and thus retrieved only once. In practice, this means that you can then either retrieve a wider variety of facts to include in the response, or allow your search to focus only on facts that appear in multiple sources.
Knowledge graphs also make it easier to find related information that is relevant for a request, even when it might be two or three steps away from the initial search. In a conventional RAG approach, you would have to carry out multiple rounds of querying to get the same level of response, which is more expensive from a computation standpoint and potentially more expensive in terms of cost too.
Creating Knowledge Graphs for Use Alongside RAG
To create and use a knowledge graph as part of your overall generative AI system, you have several options. For instance, you may want to import an existing set of data that you know is accurate already. Alternatively, you can create your own knowledge graph from your data directly, which can be beneficial when you want to curate your information and check that it is accurate. However, this can be time-intensive and difficult to keep updated when you have a large amount of data, or when you want to add new information quickly.
One interesting approach you can use is to employ your LLM to extract information from your content and summarize the data. This automated approach can make it easier to manage information at scale, while still providing you with that up to date knowledge graph that you need. As an example, you can use LangChain and LLMGraphTransformer to take a set of existing unstructured data, apply a structure, and then organize that data. You can then use prompt engineering and knowledge engineering to improve the automated extraction process into a relevant knowledge graph.
Storing the Knowledge Graph Data
Once you create the knowledge graph, you will have to store it so it can be accessed and used for requests. At this point, you have two options – to use a dedicated graph database to store the whole graph, or add the knowledge graph to your existing database.
While it may seem intuitive to use a graph database to store your knowledge graph, it isn’t actually necessary. Running on a full graph database is worthwhile if you are planning to run full graph queries using the likes of Gremlin or Cypher. However, graph databases are designed for more complex queries searching for paths with specific sequences of properties, i.e., graph analytics. That overhead is simply overkill for retrieving sub-knowledge graph results in these circumstances, and it opens the door for a host of other problems, such as queries that go off the rails in terms of performance.
Retrieving the sub-knowledge graph around a few nodes is a simple graph traversal, so you may not need the full capabilities of a dedicated graph database. When traversals are often only to a depth of two or three hops, any additional information is not likely to be relevant to the specific vector search query in any case. This means that your requests will normally be expressed as a few rounds of simple queries (one for each step) or a SQL join. In effect, the simpler you can keep your queries, the better the quality of the results that you can then provide to your LLM.
Adopting these simpler, coarse grained knowledge graphs eliminates the need for a separate graph database and makes it easier to use knowledge graphs with RAG. It also makes the operational side for your data easier, as you can carry out transactional writes to both the graph and other data in the same place. This should have a side benefit of making it easier to scale up the amount of data that you have for querying too.
Planning Ahead Around RAG and Knowledge Graphs
For projects where you have a lot of data that you want to make available for generative AI, RAG is the natural choice. However, you may need to combine RAG with other techniques to improve your accuracy in responses. Using knowledge graphs with RAG enables you to get over the issue of having multiple similar documents or content assets. By looking at how you can combine these data techniques, you can deliver better results for your users while not having to implement and manage multiple different data platforms.