In its just-released report, Cool Vendors in In-Memory Computing Technology 2016, Gartner highlighted a handful of interesting and innovative solutions within the In-Memory Computing (IMC) space for enabling a range of real-time analytics scenarios. It gave a first-time look at IMC-enabled analytics products from Bottlenose, Striim, and Zoomdata that target use cases including event stream processing, context brokering, and Hybrid Transaction/Analytical Processing (HTAP). The research firm also provided a retrospective on two vendors it had previously singled out as cool IMC players: MemSQL and Finch Computing, a division of Qbase, LLC that was formerly known as Synthos Technologies.
DATAVERSITY® recently had the opportunity to discuss with Finch CTO Scott Lightner what the company has been up to with its FinchDB in-memory JSON-based, NoSQL DBMS, contextual search, and real-time analytical solution. Gartner in its report cites FinchDB as notable for its innovative approach to supporting a wide range of applications – context mining and brokering, intelligent decision automation, fraud detection, cyber security, sentiment analysis, ad placement, and alerting among them – that require real-time, in-context analytics on multi-terabyte datasets with low latency and at Web scale.
Lightner says that the company is indeed familiar with high-stakes, high-volume data environments, given Qbase’s position as a federal government IT services contractor with customers in the intelligence community and the U.S. Department of Defense. Finch Computing was carved out as a separate division from Qbase in order to commercialize technologies on top of an IP portfolio of 25 patents covering in-memory databases, enterprise-scale compression, analytical modeling and text analysis, among other things, he says.
The purpose was to go beyond standard database storage and retrieval functions, to drive real-time, predictive analytics; to deliver custom scored-and-ranked results that change as the data changes; to apply predictive models on the fly; and, to find faster, better insights from data by asking better questions of it, Lightner says:
“We had to build a lot of technology that just didn’t exist. We had to merge components of multiple technologies together (search, analytics, and databases),” he explains. “We also had to fundamentally change what we expect each of them to do for us, so that we could enable people to make better decisions in true real time.”
The result is FinchDB, a platform that allows people to access massive amounts of data and to do analytics on it that are based on a real-time picture of the data at the exact moment a user queries it, Lightner says. “We pass-in context in every model applied to every query. Nothing is pre-linked or pre-computed,” he says.
Think Different
As Gartner’s report shows (and the research firm states upfront that it’s not providing an exhaustive survey of products in the area), other vendors are exploring the opportunity to apply in-memory computing in the service of real-time analytics. Lightner agrees that there are lots of in-memory computing vendors trying to tackle the problem of getting faster, more complex analytics and insights from huge datasets. But what’s required is the “paradigm shift that will enable businesses to really make sense of the massive amounts of data, of various types, that the world is creating – in true real time,” he says.
Finch’s approach to combine what he calls the “best of” analytics, search, and databases in one in-memory computing platform was underpinned by a few key ideas:
- Flexibility: So that FinchDB can dynamically adapt to changing data and to individual business events (because it can pass-in context in every model on every query)
- Scalability: So that it can run on commodity hardware and grow with increasing data needs
- Adaptability: So that it can handle complexity and change with ease. “Enterprise data today is nothing if not complex and always changing!” Lightner says.
As data volumes grow and become more complex, and as more people in the enterprise want to understand data for different reasons, you have to be able to ask better and different kinds of questions than existing technologies allow you to do, Lightner says. Often, products limit the kinds of questions users can ask, and how they can ask them:
“For example, if you’re focused on answering a specific question, or set of questions, or a specific type of question, you’re missing everything else you didn’t ask. Or didn’t know you didn’t know. Or didn’t know you needed to ask,” Lightner says.
The concept behind FinchDB is to rethink all the prescribed views about what database, analytics, and search technologies ought to do, and how they ought to do it (batch processing, model building, and so on). Technology has to change to suit the way in which people want to use data in the enterprise today, and to support the real-time decision-making needs that are emerging across multiple functions and multiple business moments, he says.
A Look Inside FinchDB
At its core, FinchDB is an all in-memory NoSQL, JSON, doc-style database, and a distributed, redundant, fault-tolerant system. Everything in FinchDB is persisted to disk, but all computations are performed on the in-memory image of the data. It can consume data that is structured or unstructured, words or numbers, streaming or static, internal or external.
Lightner highlights among the key patents that enable FinchDB to function as it does the following:
- Compression: This enables its brand of in-memory computing at scale, by dramatically compressing the size of a dataset to as little of 16 percent of its original size while also preserving the ability to decompress a single record or field, in fractions of milliseconds.
- Embedded Models in the Query: This invention ensures that every database query can specify which analytical models and parameters to apply on-the-fly. This intermediates query answers without having to first retrieve data out of the database, he says: “It’s an incredible differentiating feature and contributes to both FinchDB’s lighting fast, true real-time performance and its ability to offer dynamic scoring and ranking in our query responses.”
- Real-time, On-The-Fly Entity Linking: Its in-memory computing architecture coupled with its ability to process all relevant, contextual data around an entity make it capable of instantly finding connections in the data and performing real-time knowledge discovery and Machine Learning so that it gets smarter and better the more data it ingests.
- Scored and Ranked Results: FinchDB uniquely offers these based on a real-time picture of the data, at the moment a user queries it. At that moment, he explains, FinchDB analyzes all possible answers and returns the best ones to a user, along with a confidence score for each.
- Fuzzy Searching: FinchDB allows a user to perform fuzzy searches and to find exactly what they’re looking for, whether or not they’ve spelled it exactly right, or have the exact right information to build a query.
Lightner says that Finch is working with development partners in various use cases. That includes banks using FinchDB for fraud detection and for understanding criminal networks on the dark web; information service providers using it to mine and enrich their content libraries as they change and are updated with new information; marketers using it as recommendation engines to serve up personalized, real-time ad content; and, the U.S. intelligence community, using it to identify and disambiguate entities in streaming text. Finch also is using it internally on a proof of concept project to deliver better, more relevant news alerts, since it’s capable of on-the-fly, real-time knowledge discovery.
Finch for Text, a separate solution from Finch Computing to turn unstructured text into structured, descriptive information about that text to make it more usable in the enterprise, also uses the computing power of FinchDB to algorithmically determine what is in the text, he says. It can disambiguate identically named entities in text – such as whether a piece of content is talking about John Roberts the Supreme Court Justice, or John Roberts the Fox News anchor. It “understands context and can quickly and accurately isolate 15 different types of entities and correctly determine their identities. Our unique approach to doing that is also part of our IP portfolio,” he says.
FinchDB, says Lightner, is ready to change the “prescribed views about what in-memory computing database technology ought to do. And, similarly, about what analytics and search technologies ought to do for us.”