The Panama Papers. The Paradise Papers. Human trafficking. Russian social media trolls. Self-driving cars.
If you’re wondering what all the above have in common, the answer is: Graph Databases. Their ability to store data connections in the service of gaining a better understanding of relationships across nodes has been recognized for use cases such as e-commerce recommendation engines and fraud detection.
According to a Neo4j case study:
“The key technology in enabling real-time recommendations is the Graph Database, a technology that is fast leaving traditional relational databases behind. Graph Databases easily outperform relational and other NoSQL data stores for connecting masses of buyer and product data (and connected data in general) to gain insight into customer needs and product trends.”
The possibility of applying Graph Databases to whole new worlds is intriguing. Consider that Neo4j, a primary player in the graph database space, has already been used by the International Consortium of Investigative Journalists to explore 2.6 TB of data as it looked into the complex world of secret shell companies and offshore accounts – and their shady operations – that are linked to world leaders and other politicians and public officials.
The Consortium used it again to wade through some 1.4 TB of data as it continued to study the tax haven activities of the rich promoted by the offshore financial industry. Add to that that the U.S. government has done a project leveraging the technology to analyze dark web data to spot human trafficking rings.
Pierre Romera, the Chief Technology Officer, remarked when speaking about their use of Neo4j and graph technologies that:
“With the Paradise Papers, those documents represented 1.4 TB of data and were gathered from different sources. Putting them in a single database was a challenge for us. With Neo4j and [visualisation tool] Linkurious, and after a few weeks of research, we were able to propose to our 382 journalists a way to explore the data and also to share visualisations from stories they were working on. It’s surprising how intuitive a graph database can be for non-tech savvy people. Thanks to this approach, we could both investigate and prepare the future releases.”
NBC News also released a Neo4j database of more than 200,000 deleted Twitter messages from Russian trolls that may have influenced the 2016 U.S. presidential election, which anyone can explore in the company’s Neo4j Sandbox to answer queries such as: What are the trolls’ most commonly used hashtags; what trolls retweet other trolls; and what legitimate influencers received and possibly (unknowingly) propagated bad information? The Sandbox lets users spin up a private hosted instance of Neo4j pre-populated with interesting datasets. “The net is that you have technology being employed to create fake news and we show how to use technology to combat that problem,” says Neo4j CMO Lance Walter.
A recent article also has discussed how Graph Databases will be at the core of the $7 trillion self-driving car market, Walter says. The piece notes that the average autonomous vehicle will generate 1 gigabyte of data per second, and that accessing and interpreting this information will drive vehicle optimization. It references cross-referencing geospatial and geolocation data, too: “Graph Databases are good at handling telemetry data,” Walter says, helping to solve real-time safe driving problems. “All these data points are connected – the fact that a car is quickly coming up behind me and there’s an obstacle in front of me,” and it’s critical that a self-driving car can quickly receive query results about how to react in such a scenario.
“In an esoteric sense, in this situation, there’s a huge difference between taking 20 seconds to run that query vs. 7 milliseconds, so performance is at a premium,” he says. With index-free adjacency, native graph processing engines such as Neo4j provide greater efficiency when processing data in a graph because connected nodes physically point to each other in the database. The graph paradigm illustrates data connectedness, while NoSQL and relational paradigms focus on data collection. There’s no need to do a round trip across the network to see connections. “We optimize for that,” says Walter.
Graph Database Market Reaches Out
In fact, Neo4j, as it continues to support and deliver value to the developer community at large, is now primarily focusing on the large enterprise market – which of course includes big name vehicle manufacturers partnering with tech firms on autonomous driving cars and platforms.
“The hope is that the developer [in large businesses] gets his hands on [the core open source Neo4j database] and shows it to someone who lights up about it and ultimately moves the business to the enterprise version,” says Walter.
The enterprise version of Neo4j 3.4 adds security, horizontally scalable multi-clustering, administration tools and impressive performance gains over previous versions. “People are analyzing data sets with billions of nodes, and in every release we show huge leaps forward in performance and scale,” he says. Neo4j already counts among those users seven of the top ten retailers, 12 of the 25 biggest banks and two of the top three telecom providers. “You don’t want the recommendation engine for Walmart e-commerce or eBay ShopBot to go offline,” he says.
It’s an advantage having huge commercial customers, he notes, with large data problems to solve. “They pull you to be enterprise-grade in solving graph problems.” And they validate the category, too, with their use cases and significant returns-on-investment.
Graph Databases indeed are becoming more mainstream to the enterprise, and Neo4j sees its open source contributions as helping grow that market. It has contributed its Cypher query language to the open source community, which is being leveraged by vendors that typically have large footprints in big companies, such as SAP. It uses Cypher as its graph interface to SAP HANA in-memory computing and Real-Time Analytics solution. “Cypher is a great way for even competing vendors to build the most elegant interface for creating, querying and analyzing graph data,” says Walter, who added that Neo4j has introduced a Cypher interface to the popular Gremlin programming language for Graph Databases.
Image Credit: Neo4j
“If Cypher is widely available, more people will have more success with Graph Databases and as the leader with the best technology and most complete platform, we think all roads ultimately will lead to Neo4j,” he affirms. “Competitors using our technology makes their products better and also moves the battle to who is best at solving graph problems.”
Mega-vendors getting into the graph database category – from SAP to Microsoft to Amazon – only serves to generate interest in the sector.
Building out from its core engine to a complete platform also hits at the enterprise market sweet spot. Addressing how to get data in from relational sources or Hadoop, providing rich Analytics through various graph algorithms, and making graphs meaningful to business users through visualizations are integral to the Neo4j Graph Platform.
“We will continue to build that platform out, bringing the power of the graph to more user communities,” Walter says.
Photo Credit: Titima Ongkantong/Shutterstock.com