Steve Miller

Important Big Data Additions to the R Analyst’s Toolchest

Steve MillerFebruary 1, 2017March 14, 2017

Click to learn more about author Steve Miller. After my partner read my last blog, Frequencies in R — Part 2, where I used R’s data.table and dplyr packages to construct performant frequencies procedures on an in-memory 27M+ row, 30 attribute data.table, he asked if I’d compared my results with the equivalent functionality of R’s MonetDB.R package. I have […]

Frequencies in R — Part 2

Steve MillerJanuary 4, 2017January 3, 2017

Click here to learn more about author Steve Miller. In last month’s blog, I compared several functions that compute frequencies and crosstabs in R. The ones I’ve worked with primarily, and the foci of Part 1, were table from the base package, xtabs from the stats package, and count from Hadley Wickham’s plyr package. Tests were conducted on a data set […]

Frequencies in R — Part 1

Steve MillerDecember 7, 2016December 7, 2016

Click here to learn more about author Steve Miller. I’m often asked to name the most common statistical procedure used in my company’s Data Science work. My answer, only partly in jest, is frequencies and crosstabs — to help with the mundane tasks of profiling and exploring data. Indeed frequency distributions and the dotplots that showcase […]

A Common File Format for Python Pandas and R Data Frames

Steve MillerNovember 2, 2016November 1, 2016

Click here to learn more about author Steve Miller. I’ve been doing analysis on a Chicago Crime data set off and on the last few of months, using the now ubiquitous Jupyter Notebook to manage my work. Trouble is, I like to switch between data science language leaders R and Python, using the best of each for data munging, […]

Efficient Machine Learning in H2O with R and Python, Part 1

Steve MillerOctober 5, 2016October 3, 2016

Click to learn more about author Steve Miller. One of the major benefits of working with R and Python for analytics is that there’re always new and freely-available treats from their vibrant open source ecosystems. And now more and more, data scientists are able to reap the benefits of working with data in R, Python […]

Identifying and Deleting “Empty” Columns in R data.frames

Steve MillerSeptember 7, 2016September 6, 2016

Click here to learn more about Steve Miller. Toward the end of last month’s blog on SAS, R, Python, and WPS, I mentioned a current project challenge of identifying and eliminating “mostly” null columns from wide SAS data sets. As the team discovered, such columns can impose a significant drag on performance. My take is that while […]

SAS, R, or Python – Enter World Programming System (WPS)

Steve MillerAugust 3, 2016August 3, 2016

Click here to learn more about author Steve Miller. With more than a little serendipity, I came across a report detailing the results of the third annual survey by Burtch Works Executive Recruiting, entitled “SAS, R, or Python Survey 2016: Which Tool Do Analytics Pros Prefer?” The survey asked each respondent to name the single […]

My Stock Market Index Dashboard with R, Plotly, and the Plotly Cloud

Steve MillerJuly 6, 2016July 6, 2016

Click here to learn more about author Steve Miller. It’s been difficult for me to ponder my 2016 stock index dashboard this week, the markets roiled by the turmoil of Brexit taking a toll on my fragile investment psyche. Alas, I dutifully update the underlying data and run the visualization daily, hoping for the best […]

New Jobs Analysis with Python

Steve MillerJune 1, 2016June 1, 2016

Click here to learn more about author Steve Miller. The presidential race is heating up as primaries come to an end. And if it’s Trump vs Clinton, there’ll be no shortage of strong opinion among the electorate as to which offers the best policies for economics, defense, energy, health care, etc. Last year I posted […]

Web Scraping for Data Science — Part 2

Steve MillerMay 4, 2016April 27, 2016

Click here to learn more about author Steve Miller. Read Part 1 of this blog series here. Between R and Python, analytics pros are covered on most data science bases R-Python. In last month’s blog, I discussed simple webscraping using Python in a Jupyter notebbok, the nifty css-generating tool SelectorGadget, and the Python XML and HTML handling package lxml. […]