by Angela Guess
Aaron Auld recently wrote in ITProPortal, “Today data scientists can be deprived of their strengths when moving to larger datasets – datasets in the realm of ‘big data’ – because large scale tools are too inflexible to support the data science style of working. Michael Stonebreaker, winner of the Turing Award 2014. said: ‘…the change will come when business analysts who work with SQL on large amounts of data give way to data scientists, which will involve more sophisticated analysis, predictive modelling, regressions and Bayesian classification. That stuff at scale doesn’t work well on anyone’s engine right now. If you want to do complex analytics on big data, you have a big problem right now’.”
Auld goes on, “How do data scientists overcome this? In the past they have fallen back to two compensation strategies. Either they have worked in a batch-oriented fashion but lost major components of their powerful style of interactive working, or they have utilised small subsets of the data, missing insights and lacking the ability to ‘drill down’ to the most granular levels. However, there are a new ways of working that can help. Combining the two top trends of recent years, namely ‘big data’ and ‘data science’, the new field of ‘big data science’ is blazing the trail for data scientists. It uses a new type of database that is designed to handle large data volumes while still delivering an agile, flexible and interactive feel that matches the exploratory style of a data scientist. It allows users to perform advanced analytics tasks on large volumes of data in an interactive fashion – right in the database using any programming language.”
Photo credit: Flickr/ allyrose18