by Angela Guess
Andrew Brust recently wrote in Data Informed, “With the shift to the client-server architecture in the mid-90s, databases became more centralized and the procedures for interacting with them became more formal. In response, a new category of tooling emerged under the moniker of ETL – extract, transform, and load. ETL tools allowed data transformations to be expressed graphically, and they included the “plumbing” necessary to run the transformations on a scheduled basis and manage exceptions that occurred during processing. They also worked very well for loading data warehouses from source databases, and this in fact became ETL’s primary use case.”
Brust goes on, “ETL tools persist to this day, but their applicability to big data is limited. That’s because big data systems are fed with data coming from things as wide ranging as log files, sensor readings, and even digital images, sound, and video. Working with data like that simply has a different scope, and frameworks for such work are premised on less formal structures than is typical ETL. Because of this mismatch between big data and ETL, and also because of the trend toward self-service, a new category of tools has emerged, known as self-service data preparation. The category has grown big enough to merit comparative product reports from various analyst firms and has even prompted one ETL vendor (a most iconic one, in fact) to bring its own data-prep tool to market.”
Photo credit: Flickr/ okfn