Advertisement

Data Science vs. Domain Expertise: Who Can Best Deliver Solutions?

By on
ghosh_DSDE_022416b

The passionate debate nowadays is not whether data scientists can deliver business solutions, but rather whether domain experts play a major role in the delivery of such solutions. On one end of the spectrum, thought leaders seem to feel that domain experts must be involved at every stage of the design, development, and implementation of a machine learning system. On the other end of the spectrum, KDNugget and Kaggle have repeatedly proved that expert solutions can be built and tested for performance without the intervention of domain experts. The ideal or desirable position is somewhere in between data science vs. domain expertise.

Domain Experts Know Their Business

In cases where vast amounts of transactional or process data are stored in Word files or Excel worksheets, domain experts must be available to interpret the esoteric operational or procedural information, so that data scientists can gain a better understanding of business processes within particular domains. Without this deep knowledge or insight into domain operations, data scientists cannot provide custom solutions tailored for highly specific business tasks or operational decisions.

This debate is not a new one, as Jeremy Howard of Kaggle.com pointed out back in 2013, ordinary business users tend to develop unfounded conclusions from their Excel spreadsheets (along with other legacy analytics tools) and influence Data Scientists with retrofit data interpretations. So, in a way, Domain Experts can limit the capability of a good Machine Learning system. Thus, these experts can be better utilized for their understanding of business issues and practical judgment rather than for data interpretation. Moreover, domain expertise has proved to play a solid role in defining classes, developing data dictionaries, designing training for algorithms, and evaluating the business outcomes of such training among business users. The data discovery part of Machine Learning is best left to the Data Scientists who enable data to guide users to data-driven insights.

While in general, data technologies are going main stream, aligning the goals of Domain Experts and Data Scientists may be an ongoing challenge. The more Data Scientists and Domain Experts learn to share and move towards common business objectives, the more data systems or solutions will make a larger impact on day-to-day business operations.

Is Lack of Domain Knowledge a Handicap for Data Scientists?

In the article Do Data Scientists Need to be Domain Experts, Bhavani Raskutti (with over 30 years of experience) wistfully admits that most of her clients remain skeptical of her capabilities as an outsider to the industry domain. She affirms that strong Machine Learning skills can create an effective channel to work with Domain Experts to deliver data-driven business solutions.

What continues to be baffling is amidst this controversy over the necessity of domain knowledge in building data systems, year after year data mining competitions organized by Kaggle or KDD have repeatedly proved that Data Science can deliver solutions without the presence of Domain Experts. In these competitions, Data Scientists with no or minimal domain knowledge have submitted excellent entries. In fact, well known Data Scientists like David Vogel and Claudia Perlisch have won competitions across different domains—establishing that solid Data Science skills are universally applicable.

There is one counter argument posed to these success stories. In many of these competitions, the Domain Experts had provided the initial business hypothesis by asking the appropriate questions and preparing the data. The competitors later took the material and developed a model and tested its performance. Google has demolished this reliance on business hypothesis too by offering an alternative method for understanding businesses in absence of a hypothesis.

Another school of thought argues that Domain Experts, when involved, should participate in a more iterative manner, rather than in some sequential steps. So where is the value of Domain Experts most likely to be perceived during a Data Science project? Many times, while capturing and exploring raw data, Data Scientists get stumped with gaps or anomalies in the collected data, which Domain Experts, with their practical wisdom and operational experience, can easily fill in. During this dual-engagement process, good Data Scientists mature into Domain Experts over a period of time.

Data Visualization and Working Together

Another important function that Domain Experts can play is during data visualization, when data is seen and interpreted for rare insights. An example of this was found during the study of sensor and maintenance data  in an airline fleet. Although no prior model existed, an interpretive analysis of the results of path analysis led to improved understanding of aircraft safety conditions, which would not have been possible without sound domain expertise.

Two safe conclusions can be drawn from the above discussions on the cross-functional significance of both Data Science and Domain Expertise in developing robust solutions:

  1. So long as Machine Learning equips Data Scientists to ask relevant questions about a domain, the direct collaboration between Data Scientists and Domain Experts will not only enrich both the parties with new knowledge, but also strengthen the value of their partnership. It is not Data Science vs. Domain Expertise, but Data Science and Domain Expertise.
  2. Machine Learning offers an alternative mode of learning that requires no prior domain knowledge, thus easily overcoming domain biases.

Data Scientists with strong Machine Learning skills and an analytical mind can quickly grasp and solve business problems by exchanging and sharing their acquired domain learning with Domain Experts at different stages of system development. The problem isn’t an either/or issue, but rather requires both parties to come to the table.

The issue then for Data Scientists remains an issue with proper skill training in advanced technologies. As far back as 2011-2012 it was discussed that what the industry needed was not more Data Scientists, but Data Scientists with access to advanced data technology skills such as Big Data and Machine Learning. A McKinsey survey demonstrated that most businesses do not have the skilled manpower to take advantage of cutting-edge data technologies – this problem still exists today for many enterprises, even after the upsurge in Data Scientists entering the workplace and it’s not likely to go away anytime soon. Thus, modern Data Scientists have to become more tech savvy and serve as moderators between technologies like Hadoop, NoSQL, and R, and deliver timely data-rich information and insights to business leaders. The Domain Experts can aid in the visualization and explanation of the insights, but the Data Scientists also need the ability and training to provide them in a comprehensible manner.

The Collaborative Strength of Data Science and Domain Expertise

Finally, the undisputed fact is that Domain Experts run the daily business; so if Data Scientists succeed in providing an advanced, data-enabled decision machine to these business experts when they need it and where they need it, then the Data Scientists have proved their worth. The ideal solution may be to create templates for standard data inputs for data capture, connect the data tools for seamless analytics activities, and provide excellent visualization platforms like dashboards for quick and effective decision making. These template-driven solutions can equip Domain Experts to directly input necessary data and arrive at results on their own.

When Domain Experts have ready-made Machine Learning systems at their disposal, they can select any standard domain-specific analytics package available in the market to study the data trends and patterns and gain hidden insights. The Domain Expert’s greatest strength is the ability to identify which questions need to be answered, and the Data Scientists role is to maneuver and leverage advanced data technologies to build expert systems to answer those questions.

All said and done, the ultimate goal of Domain Experts ought to be gradually reducing the reliance on Data Scientists as too much dependence on Data Science can result in the same bottlenecks that existing processes and data silos suffer. This unresolved and ongoing tug of war between two contradictory views still needs more work, but a strong collaboration is best for everyone in the end.

Leave a Reply