There’s Data Discovery and then there’s Smart Data Discovery. The former has become associated with discovering answers buried in Big Data in order to drive business value, whether to enhance decision-making, boost revenue, or improve the customer experience.
Smart Data Discovery has been defined by Gartner as “a next-generation Data Discovery capability that provides business users or citizen data scientists with insights from advanced analytics.” Smart Data Discovery combines the best in brain research, visual perception, Advanced Analytics, Machine Learning, Natural Language Processing and Natural Language Generation. By this year, Gartner believes that:
“Smart, governed, Hadoop-based, search-based and visual-based Data Discovery will converge into a single set of next-generation Data Discovery capabilities as components of a modern business intelligence and analytics platform.”
Rohit Mahajan is the CTPO and Co-Founder of Machine Learning Smart Data Discovery platform, Io-Tahoe, which enables organizations to automatically discover data relationships across heterogenous and distributed enterprises.
The technology is backed by a handful of Machine Learning patents (pending approval) for looking beyond metadata towards greater visibility into complex data sets. Io-Tahoe offers clients assessments of its Machine Learning algorithms’ ability to auto-discover data relationships. “It’s so fine-tuned that true positive results are significantly high and our false positive results are significantly reduced,” Mahajan says.
Among its capabilities to discover data in the smartest possible ways is Io-Tahoe’s ability to work within and across relational and non-relational data stores, Mahajan says, providing a blueprint of what is in these environments. The non-relational capabilities are critical, given how many companies’ Data Lakes have turned into Data Swamps – filled with data that they can’t find, use, and operationalize. Io-Tahoe ingests legacy data stores, such as those connected to platforms like the decades-old AS400, too, and is looking toward a future of ingesting true unstructured data in social platforms like Twitter and Facebook.
“We help to maximize your data investments,” says Mahajan. “We allow organizations to maintain operability within the Data Lake and all the continual changes going on there, through data ingestion adaptability. Many of these tasks can be automated, thereby ensuring the ease of rediscovering the various data relationships that already exist in the Data Lake.”
Io-Tahoe takes things a step further in its understanding of Smart Data Discovery with its latest version, which brings in a Data Catalog based on the core engine. This allows data owners and data stewards to utilize a Machine Learning based smart catalog to create, maintain, manage, search, and enrich business rules; define policies for critical data elements; and provide Data Governance workflow functionality. “It’s really about complete business rule management and enrichment,” says Mahajan. “It doesn’t matter the underlying technologies; the Data Catalog allows organizations to become truly data driven.”
Why Smart is Smart
“We call our platform a Smart Discovery solution because our philosophy is that it’s only the data that can tell you the truest and most accurate story,” as opposed to documents, diagrams, or metadata, Mahajan says. Effective and comprehensive access to a company’s data – regardless of where it is retained – requires a clear view into not just its metadata but its contents, as well.
To get this access, Io-Tahoe’s platform brute-forces actual data to discover the relationships in relational data stores. Data flow within discovery also aligns data replicated across multiple systems by different names. Again, Io-Tahoe is not just relying on metadata but brute-forcing real data to understand how what’s called “order entry” in one system, “ticket ID” in another, and “order ID” in yet a third reflects the same data within a multi-part workflow.
For data issues related to the regulatory space – where conformance by global organizations to standards such as PII and GDPR is mandatory – Io-Tahoe relies on its same algorithm-based Smart Data Discovery, defining certain policies in order to discover the appropriate sensitive data. A significant number of out of the box policies are built into the solution so that users can start identifying sensitive data fields in their applications, systems, and complete data landscape. Users can, if they like, set up their own policies and upload their own reference data sets to enhance policies that are specific to their organization.
As an example, Mahajan discussed that Io-Tahoe is discovering sensitive data for a client that is focusing on addressing GDPR across relational and Data Lake technology:
“That’s the kind of scale we work at,” he says. “That’s not just smart Data Discovery, but discovery that spans huge sets of data across heterogenous landscapes and must be done within a short period of time. We can do this because we approach it based on smart sampling of data to determine if a field hosts sensitive data or not.”
The Data Catalog, Mahajan says, is a major differentiator; it is underpinned by Io-Tahoe’s Smart Data Discovery. This means the Data Catalog utilizes Machine Learning algorithms which sit on top of its Smart Data Discovery capabilities to enhance information about data automatically, regardless of the underlying technology, and build a more accurate and automated Data Catalog. That moves Data Governance competence forward.
Mahajan mentioned a recent white paper on GDPR and Data Governance, and how it’s necessary to ensure that an organization’s data “does not turn into a liability.” It takes better Data Governance of all data assets to achieve such success:
“Organizations need to understand and comply with a variety of Data Governance regulations, many of which assess financial penalties for non-compliance. More than 80 countries and independent territories, including nearly every country in Europe and many in Latin America and the Caribbean, Asia, and Africa, have now adopted comprehensive data protection laws.”
Such regulations are not going away and are likely only to get more stringent moving into the future. It takes the ability to perform Smart Data Discovery, coupled with effective Data Governance, at a rapid pace, with the knowledge that the results are reliable for organizations to know that their compliance is accurate.
Io-Tahoe’s Smart Data Discovery solution allows all of that, through its innovative “algorithmic approach” and Machine Learning technology that “makes data available to everyone in the organization,” says Mahajan. “We help to untangle that complicated maze of relationships,” thereby helping organizations to bring together their entire enterprise Data Management stack, including Data Science, Data Analytics, Data Governance, and everything in between.
Photo Credit: Sergey Nivens/Shutterstock.com