A data lake is an environment where a vast amount of data, of various types and structures, can be ingested, stored, assessed, and analyzed. Data lake technologies can scale to massive volumes of data, and combining datasets is easy with data stored in a relatively raw form.
A data lake architecture can centralize data over distributed storage, providing a scalable, fast, secure, and economical solution.
Data lakes serve many purposes, including:
- An environment for data scientists to mine and analyze vast amounts of raw, structured, and unstructured data
- A central storage area for raw data, with minimal (if any) transformation
- Alternate storage for a detailed historical data warehouse
- An online archive for records
- An environment to ingest streaming data with automated pattern identification