Data discovery describes processes in understanding data sets on hand for data integration and/or data analysis. This step occurs in design and should combine technical search from tools with subject matter expertise, from people. During data discovery, a high-level view is taken in assessing data preparation, or data quality needs. Data discovery can be broken into two concepts:
- Manual Data Discovery: Within the past 20 years, before machine learning advances, data specialists mapped data by the sole brain power of humans. Simply put, people critically thought about what data is available, where it is stored and why and what needs to be provided to the end customer. Companies monitored metadata and data lineage discovery to learn about data categorization and flow. Data stewards, usually people with sophisticated technical knowledge, who take care of data assets document rules and standards that guide the data discovery process. In these approaches, people conceptualize and/or draw out a map to comprehend all the data in an organization.
- Smart Data Discovery: With advancements in technology, over the last year or two, the definition of data discovery included automated ways of presenting data, to reveal deeper business insights. Smart data discovery represents a leap forward using augmented analytics and machine learning. Artificial intelligence prepares, conceptualizes, integrates, and presents, usually through visuals, hidden patterns and insights. Consider that the overall understanding and analysis, of the available data sets, resides in the machines computers receive queries, do some processing in a black box, and comes up with their reasoned answers.
Some in the Data Science field may equivocate data discovery with automated smart data discovery tools. However, both manual and automated tools, works best under in the definition data discovery, as both can be discussed in articles and are implied. As AnalyticsWeek states, “Machine learning is the intermediary that improves the data discovery process to make it suitable for the prominent data governance and regulatory compliance concerns contemporary enterprises face.”
Other Definitions of Data Discovery Include:
- “A result that allows business users to leverage Advanced Analytics and create citizen data scientists.” (Kartik Patel)
- “Tools that clean and prepare data, find hidden patterns and correlations, and deliver insights without user intervention.” (Paramita (Guha) Ghosh)
- “Practices, architectural techniques and tools for achieving the consistent access and delivery of data across…. the enterprise to meet the data consumption requirements of all applications and business processes.” (Gartner)
- “Information relationship mapping that is aided by Machine Learning.” (Forbes)
- The activity of finding where sensitive data resides so that it can be adequately protected or securely removed. (MIT IST)
Data Discovery Use Cases Include:
- Processing insurance claims
- Minimizing the risk of fraud
- Understanding relationships between business prospects
- Analyzing social media
Businesses Use Data Discovery to:
- Identify subtle patterns
- Comply with laws, such as the General Data Protection Regulation (GDPR)
- Allow non-technical people or citizen data scientists access to data analysis
- To test for and ensure more data completeness
- Reduce costs by up to 80 percent
Image used under license from Shutterstock.com