Hospitals are one of the best examples to spotlight the complexities of unstructured data. From physicians’ notes in EHRs, emails, text files, photos, videos, and other files, the majority of patient data cannot be read by machines. Research firm IDC estimates that upwards of 80% of data will be unstructured, growing from 33 zettabytes in 2018 to 175 zettabytes by 2025. This example demonstrates a huge challenge in dealing with unstructured data and analyzing it when it is stored across disparate systems. The health care industry is just one prominent example of a sector awash with unstructured information that could have critical clinical and business-related data insights. That’s where graph technology comes in.
The (Unstructured) Data Deluge
Graphs are one way to contextualize and explain data. Graphs themselves can be particularly large, with data sizes of 10 to 100 terabytes. As such, graph data has been particularly beneficial when data is large, continually evolving, and rife with high-value relationships.
Knowledge graphs – which make connections between seemingly disparate sources to provide specific business insights – have been in existence for many decades. Historically, knowledge graphs have been associated with search engines such as Google to enhance and hasten its search results, as well as from social media networks such as LinkedIn and Facebook to understand their users and surface relevant content (including relevant ads and common friend connections).
In recent years, graph computing companies have flourished exponentially, with the benefits of graph databases, analytics, and AI trickling down from the big tech titans to a slew of organizations and industries. Gartner predicts that by 2025, graph technologies will be used in 80% of data and analytics innovations, up from a mere 10% in 2021. This raises the obvious question: Given graph technology’s long legacy, why is it ballooning in demand and popularity now?
Barriers to Data Insights
One barrier to embracing graphs has been earlier approaches to gleaning insights from unstructured datasets. Traditional graph databases have aimed to address concerns with relational databases but were not built with analytics in mind. This meant that organizations hit performance limitations when traversing massive knowledge graphs or query processing – even at low latency and scale.
Another barrier has been the lack of standardization in graph technology, which has resulted in high costs for any organization looking to move from one legacy database to another. Today, the industry still needs to make strides to cultivate the right tools and open standards, such as providing common libraries to allow data scientists to easily process large-scale graphs.
From Niche to Norm
For data-forward organizations, there are a few key solutions. A unified platform, which combines graph query, graph analytics, graph mining, and responsive graph AI, can offer unparalleled flexibility and scalability in understanding massive datasets. Such a platform can bring together disparate systems and reduce time to insight – or how long it takes for an analysis of the data to produce actionable feedback. This enables a faster sharing of those insights to facilitate faster decision-making and foster innovation. The rate of insight is important because industries that rely on graph computing can overtly benefit from real-time intelligence, such as monitoring network traffic for suspicious activity and alerting teams when any such activity is discovered.
For virtually every industry – from financial service to health care and pharmaceutics – their analytics and intelligence are only as good as their ability to truly understand and action the vast amounts of data they have. Beyond a unified platform, another option is to create metadata on top of the disparate systems and then build a knowledge graph on top of that, called a “data lakehouse.” In this case, the metadata serves to extract information from the data lakehouse in disparate systems and unifies them into a knowledge graph that can be used to provide actionable insights.
As organizations continue to experience an exponential rise in data, more enterprises will organically amass graphs that have billions of nodes, and hundreds of billions of edges. The most data-forward organizations will have to build scalable systems that not only reduce time to data insights but address the underlying complexities of unstructured data and legacy architectures.