Click here to learn more about author James Kobielus.
We’re living in a supposedly post-industrial era. So it can feel a bit odd to refer to the need to bring “industrialization,” an “assembly line” approach, or “factory”-like processes to the development of business analytics.
Behind any such discussion, the specter of automation usually lurks, as the author of this recent O’Reilly article makes explicit in her first sentence. But I think she greatly overstates the importance of automation in the industrialization of any function. She doesn’t go so far as to predict massive unemployment for human data scientists, but one wouldn’t be surprised to see others read that into her otherwise excellent discussion of trends in business analytics.
Contrary to what many believe, automation isn’t the essence of industrialization. You can achieve many of the same benefits without turning personnel into de-skilled drones who are doomed to stoke a heartless machine. Even in enterprises that we all take as the very definition of industrial—such as manufacturing, mining, and milling—there are a wide range of indispensable human functions. Fundamentally, industrializing any business function—even those that we associate with the post-industrial economy–involves organizing it to achieve greater scale, speed, efficiency, and predictability.
With that in mind, you can industrialize enterprise analytics by reconstituting its operational processes around the following core principles:
- Role specialization: This has been the core principle of industrial production since long before there were machines to do the work. The key specializations in an industrialized analytics process consist of data scientists, data engineers, application developers, business analytics, and subject-domain specialists. As I discussed in this recent post, role specialization within collaborative environments enables team members to pool their skills and specialties in the exploration, development, deployment, testing, and management of machine learning and other data-driven business-analytics models.
- Workflow patterning: This is the foundation of scale, speed, efficiency, and quality control in any industrial process. The principal patterns in the analytics lifecycle consist of standardized tasks, flows, and rules governing the creation and deployment of machine-learning models, statistical algorithms, and other repeatable data/analytics artifacts. As I discussed in this recent post, the primary workflow patterns fall into such categories as data discovery, acquisition, ingestion, aggregation, transformation, cleansing, prototyping, exploration, modeling, governance, logging, auditing, archiving, and so on. In a typical enterprise analytics practice, some of these patters may be largely automated, while others may fall toward the manual, collaborative, and agile end of the spectrum.
- Tool-driven acceleration: This is what people usually associate with task automation. As I discussed in this recent post, many analytics lifecycle processes are being accelerated through cloud-based tools and platforms such as massively-parallel cloud-based Spark runtime engines; through unified workbenches for fast, flexible sharing and collaboration within analytics development teams; and through on-demand, self-service collaboration environments that provide each specialist with tools and interfaces geared to their specific tasks. To ensure industrial-grade quality controls, collaboration tools should enable the necessary checks and balances for monitoring and assuring automated outputs. For example, data science teams should be able to create workflows for manual reviews of machine-generated models prior to their deployment into production applications. This is analogous to how high-throughput manufacturing facilities dedicate personnel to test samples of their production runs before they’re shipped to the customer.
Increasingly, we see the nouveau coinage “InsightOps” referring to the intensifying industrialization of enterprise data-science processes. Check out an excellent recent column by my colleagues Tim Vincent and Bill O’Connell on this topic. Obviously based on the established DevOps paradigm, this term refers to a development and deployment lifecycle for data-driven enterprise analytics assets that has the agility to incorporate shifting blends of automated, collaborative, and self-service processes.
As this trend picks up speed, and even as automation pushes more deeply into their core functions, enterprise analytics developers will be in even greater demand. Industrialization of their work processes will enable them to focus on problem solving, domain knowledge, interactive exploration, and application development.