by Angela Guess
Andrew Rosenblum recently wrote in Business2Community, “You’ve read about many of the kinds of big data projects that you can use to learn more about your data in our What Can a Data Scientist Do for You? article—now, we’re going to take a look at tools that data scientists use to mine that data: performing statistical techniques like clustering or linear modeling, and then turning them into a story through visualization and reporting. You don’t need to know how to use these yourself, but having a sense of the differences between these tools will help you gauge what tools might be best for your business and what skills to look for in a data scientist.”
Rosenblum goes on, “Once the data scientist has completed the often time-consuming process of ‘cleaning’ and preparing the data for analysis, R is a popular software package for actually doing the math and visualizing the results. An open-source statistical modeling language, R has traditionally been popular in the academic community, which means that lots of data scientists will be familiar with it. R has literally thousands of extension packages that allow statisticians to undertake specialized tasks, including text analysis, speech analysis, and tools for genomic sciences. The center of a thriving open-source ecosystem, R has become increasingly popular as programmers have created additional add-on packages for handling big datasets and parallel processing techniques that have come to dominate statistical modeling today.”
Photo credit: Flickr/ Clearwater1967