Data Mining is an older (and now allied) subset of machine learning and artificial intelligence that deals with large data sets. It uses pattern recognition technologies with statistical and mathematical techniques to forecast business trends and find useful patterns. “Data mining is also known as Knowledge Discovery in Data (KDD).” A component of data mining, text mining, analyzes documents with text analysis, by classifying content automatically into ontologies that can be easily searched.
Data and text mining techniques include:
- Profiling: Characterizing norms and detecting anomalies
- Data Reduction: Replacing a large data set with a smaller set that contains much of the important information in the larger set, for easier processing and analysis
- Association: Associating and learning, unsupervised, “to find relationships between study elements based on transactions involving them.” This includes “frequent item set mining, rule discovery and market-based analysis”
- Clustering: Grouping elements together by shared characteristics (e.g. customer segmentation)
- Self-organizing Maps: Analyzing clusters using neural network methods
Commonly, the computer language Python does data mining. Data mining promises to be more efficient as big data techniques and tools, continue to improve. Like advances made in predicting when and where a storm or hurricane will hit, data mining continues to improve, e.g. customized data mining around a particular business.
Other Definitions of Data Mining Include:
- “The process in which businesses sift through data in order to find relevant information.” (David Anderson)
- “The process of discovering meaningful correlations, patterns and trends by sifting through large amounts of data stored in repositories. Data mining employs pattern recognition technologies, as well as statistical and mathematical techniques.” (Gartner IT Glossary)
- “The process of discovering useful patterns and trends in large data sets.” (OReilly)
- “A rapidly growing field that is concerned with developing techniques to assist managers to make intelligent use of” electronic repositories.” (MIT)
- “An indispensable technology for businesses and researchers in many fields. Drawing on work in such areas as statistics, machine learning, pattern recognition, databases, and high-performance computing, data mining extracts useful information.” (MIT Press).
Data Mining Use Cases Include:
- A retailer who “found a way to pinpoint which first time customers were likely to become long term spenders”
- An insurance company that reduced costs and sped up customer service after “discovering which offices processed certain common claim types more efficiently than any other”
- A law enforcement agency dropped an ineffective process prioritizing cases and worked on something better
- “A manufacturer identified warning signs of chemical spills, providing information needed to prevent future accidents, protect the environment and avoid costly capital investment and litigation”
Businesses Use Data Mining To:
- Help avoid costly mistakes
- Make data processing more relevant
- Identify gaps in
- Increase profits
- Forecast short-term price movement
Image used under license from Shutterstock.com