Click to learn more about author Steve Miller.
It seems I write something on the nature of Data Science every year. Early on, my take on DS was in motion, but now it’s much more grounded.
My points of Data Science departure are the iconic “what is DS” pronouncements from Drew Conway and David Donoho. For Conway, DS is the intersection of Substantive Skills, Math & Statistics Knowledge, and Hacking Skills. Donoho sees “greater Data Science”, GDS, as the combination of Data Exploration and Preparation, Data Representation and Transformation, Computing with Data, Data Modeling, Data Visualization and Presentation, and Science about Data Science.
Expanding a bit on Conway and reorging a bit on Donoho, my definition now reads Technology/Computing, Quantitative Methods/Models, Substance/Business, and Science/Research Methodology. Technology/Computing and Quantitative Methods/Models are what one would expect. Substance/Business is equivalent to Conway’s Substance Skills and includes, for example, vertical business knowledge in insurance, financial services, and health care, and/or horizontal expertise in marketing, accounting and supply chain. My Science/Research Methodology looks a lot like Donoho’s Science about Data Science, and includes knowledge of research methodology/design, and both expertise in the conduct of research and the delivery of readily shareable and reproducible results.
Several years ago I attended a Strata Data Science round-table run by three well-known PhD Data Scientists – one in Math, one in CS, and one in Stats. To a question of the optimal academic background for aspiring Data Scientists, all three were guarded, opining that quality DS’s come from many disciplines, and that it’s as much the individuals and the work environments as the grad school programs that make quality practitioners. One did offer, however, that he felt that the Computational Social Science discipline was probably the closest to Data Science in academia today.
That especially resonated with me, having been trained in Quantitative Social Science, CSS’s progenitor, many years ago. Back then of course computational capacity was limited, but I had the good fortune that my career tracked with quickly-emerging technology. In fact, I now see most of my 40-year work background through the lens of computational social science.
Probably no one has done more for the development of CSS than Gary King, Director of the Institute for Quantitative Social Science at Harvard. The IQSS has blazed trails in CSS over the last 15 years, helping to change the landscape of social science methodology and research design in the directions of big data, new analytic techniques, and computation.
“In the last half-century, the information base of social science research has primarily come from three sources: survey research, end-of-period government statistics, and one-off studies of particular people, places, or events. In the next half-century, these sources will still be used and improved, but the number and diversity of other sources of information are increasing exponentially and are already many orders of magnitude more informative than ever before…. big data is not only about the data; what made it all possible are the remarkable concomitant advances in the methods of extracting information from, and creating, preserving, and analyzing those data and the resulting theoretical and empirical understanding of how individuals, groups, and societies think and behave.”
Indeed, the IQSS has played a leadership role in the evolution of social science research methodology that is foundational to Data Science. “Over the last 20 years, political methodology has built a bridge to the discipline of statistics and the methodological subfields of the other social sciences (such as econometrics, sociological methodology, and psychometrics)…Political science graduate students are now trained at a high enough level in political methodology so that they can move from the end of a sequence in political science directly into advanced courses in these other fields. Students in these other fields.” In short, many students graduating with Master’s and PhD’s in the social science are ready to immediately assume positions as Data Scientists – often to the chagrin of academia.
I was reminded of these developments when I came across a new Masters in Computational Social Science curriculum at the University of Chicago. Several years ago, I wrote on a then-new UofC program on computational public policy that impressed me a lot. I arranged a call with MCSS program faculty Chad Cyrenne and Ben Soltoff, asking questions related to my four pillars of Data Science noted above – and like what they had to say.
The MCSS is a meaty two-year with thesis program that appears to closely align with my thoughts on Data Science. In the first year, students take a three-course computation sequence taught by computer science faculty. Check the Technology/Computing box. There’s also the three course perspectives sequence that focuses of research methodology, so check off Science/Research Methodology as well. Then there are three courses to be approved in math/stats, covering Quantitative Methods/Models. Finally, in year two, students take three computational social science electives in addition to a three-course thesis sequence that in tandem address Substance/Business – and embellish the other categories as well.
The MCSS is part of a growing cross-university collaboration in Data Science at the UofC. At that same Strata, I listened to a talk by Michael Franklin, a UC Berkeley professor who’d just accepted a position at the UofC. And there’s the Computation Institute, the Center for Data and Applied Computing, and the Center for Data Science and Public Policy, among other UofC DS initiatives.
The first MCSS cohort graduates in June. Starting this Fall, each new class will include roughly 35 students, half of whom are women, with a notable mix of foreign candidates. About 50% of students will come from the social sciences, the remainder from business and STEM. Roughly half the graduates will continue on for a PhD, while half go immediately into the Data Science work world. I expect the latter to be in high demand.
To my thinking, MCSS conceptually compares favorably to the many one to two year Masters in Analytics/ Data Science programs extant today. With a full two year and thesis curriculum from a social science powerhouse like the UoC, I believe MCSS grads will likely be more grounded in Science/Research Methodology than their Masters in DS peers, allowing them to progress quickly in their careers. After all, Data Science is science, and I’ve repeatedly seen the benefit of hiring students with strong research and methodology backgrounds.
I am, however, a bit hesitant to unconditionally recommend MCSS grads for hire right now – deviously hoping to hoard a few for my consultancy before word spreads too far.