by Angela Guess
Rick Delgado recently wrote in Dataconomy, “You probably had some big ideas in mind when you first started thinking about adopting big data solutions for your business… Hiring a qualified data science team is usually one of the first priorities, along with all the investment in equipment and technology needed to properly collect and analyze all the big data you’ll want. Over time though, that excitement might have worn off. Insights from big data analytics were likely coming in, but not at the pace you were hoping for. Is this a result of your data scientists simply not getting the job done well enough? Is it a case of laziness on their part? As easy as it is to think that big data insights should be reached one after the other in a short amount of time, more than likely the data scientists on your staff are doing everything they can. There are reasons for them not being more inventive, and it has nothing to do with their work ethic.”
Delgado continues, “There’s a lot that goes into a data scientist’s job. Some of their time is spent exploring the vast amounts of data they have to work with. Some of it requires preparations of data visualizations. And still other times they’re working on extract, transform, and load (ETL). While these are all valuable tasks in their own right, chances are most of their time is taken up in something far less glamorous. It’s sometimes referred to as data cleaning, but other terms include data wrangling and data munging. Many data scientists jokingly refer to themselves as data janitors, with a lot of time spent getting rid of the bad data so that they can finally get around to utilizing the good data. After all, bad data can alter results, leading to incorrect and inaccurate insights. The costs of bad data are high, with some research stating it costs a typical business more than $13 million every year. So data cleaning is important, but it’s time-consuming and not all that fun.”
Photo credit: Flickr/ go_greener_oz