Advertisement

A Deep Look at the Role of Data Scientists

By on

databy Angela Guess

Suzanne Rose recently wrote in DZone, “The distinction between statisticians and data scientists is that statisticians are given data and run regressions, while data scientists find the data, organize and analyze it, and then communicate the relevance in an understandable, actionable way to their organization. In order to have actionable data, data scientists need quality data—and that starts with quality sources. Data sources are divided into three main categories: databases, applications, and third-party data. Databases can be structured or unstructured. Structured databases run on SQL and store data in a finite number of columns. Generally, structured databases are used by organizations like banks, financial institutions, and operations that need perfect, reliable data.”

Rose goes on, “Unstructured databases are much more flexible than structured databases. This allows for less friction when querying vast amounts of data and allows it to be examined in ways that structured data cannot. This comes with a sacrifice of perfection and complete consistency, but allows for some of the greatest recommendation engines such as Google and Yahoo.”

She continues, “According to Bernard Marr, the first data center was built in 1965 by the US government to house 742 million tax returns and 175 million fingerprints. Since then government data has become one of the most reliable big data sources for research, and the practice of company investment in big data has become commonplace. Vendors like Amazon Web Services house massive volumes of public data, and others like Factual sell lucrative business data. If you have the integration and computing power, running regressions on massive amounts of data like this can offer valuable trends that save millions for large enterprises.”

Read more here.

photo credit: Flickr/ dirkcuys

Leave a Reply