Sentiment analysis is not entirely new. Businesses have been attempting to generate insights from various pieces of content for some time now. With automated data collection becoming more accessible, sentiment analysis can be done by nearly anyone.
While it’s often used by research and development departments for product analysis or by finance institutions to drive investment decisions, these aren’t the only use cases. It has been widely used in academic and political research, for example, to measure inflation or track lobbying across the United States.
An issue, however, that some use cases run into is one of impact. If, as some researchers have done, you try to predict the outcome of an election through Twitter sentiment, clearly some accounts will be more impactful than others even if everything else is the same.
How Is Sentiment Measured?
While we can usually intuitively judge sentiment in text-based content, it’s a fairly complicated task to impart such knowledge to machines. Most of the time, nowadays at least, machine learning is used.
Usually, a training dataset is created with a significant amount of data (sentences) labeled as either positive, neutral, or negative. The machine learning model then learns correlations between the syntax, words, and sentiment.
After the training is complete, the machine learning model can then accept various sentences or paragraphs of texts and evaluate the sentiment hidden within them. Currently, it is the preferred method for sentiment analysis.
There are various powerful tools you can access for extremely low prices. One of the most popular options is the Google Cloud Natural Language AI that can take huge volumes of data and provide not only sentiment analysis, but also outline entities, analyze syntax, and categorize the topics.
Rule-based systems have been used to develop algorithms for sentiment analysis as well. These, however, can be quite complicated when complex language appears. For example, contrastive conjunctions (“The food was great, but the service was awful”) or anaphoras (“The music stopped, and that upset everyone”) can be harder to integrate into rule-based systems.
Yet, none of these approaches will give you the context of sentiment. If you work in, say, a PR department that measures the perception of a brand, you’d likely want to know how impactful sentiment could be, especially if negative press coverage is in play.
The Case for SERPs
Search engines have ranking algorithms that assess a large number of factors before providing specific results. Google, in particular, has some of the most complicated algorithms out there. A common thread, however, is that search engines measure trustworthiness.
We know, and I use the word with some precaution, that Google evaluates the expertise of the website, the number of links pointing back at it, domain age, and a multitude of other factors before granting a specific position in the SERP. In other words, Google weighs the quality of any content before it gets to the results page.
Due to the widespread use of search engines today, we can assume that anything that ranks highly in search engines result pages will have a greater impact than something that sits at the bottom of the page. We don’t talk about page 2.
In other words, Google and other search engines do content weighting for us. All we have to do is harness the power of SERP results when doing sentiment analysis.
Sentiment Weighing
Following in the footsteps of what’s outlined above, we can use already existing machine learning models and tools to perform sentiment analysis for any content we find online. Additionally, if it’s online and it’s visible, it will be indexed by Google and other search engines.
There are some exceptions to the rule. Some websites might publish content in sponsored sections or choose to remove indexing, which would cause Google to either rank pages lower than usual or not list them at all.
These, however, are mostly exceptions rather than the rule. In most cases, searching for a keyword will produce all-encompassing SERP results. Additionally, they will often have the snippet one is looking for when sentiment analysis is being performed.
I can foresee several ways to go about building a weighing system. One is to assign flat values to ranking positions (e.g., 1, 0.9, 0.8…) to produce perceived impact. A drawback of such an approach is that it doesn’t consider context. There might be searches so narrow that only scarcely visited blogs are ranked. These wouldn’t have enough impact.
Such a weighing system would need to be enriched with third-party data. Luckily, SEO tools have that ground covered. Most of them (e.g., Ahrefs, Mangools) provide some estimates of organic traffic. These are calculated differently as it’s all estimations, but the variance will usually not be that high.
Pulling organic traffic, combining it with a SERP-based weighing system would give a fairly accurate reflection of most sentiment, outside of social media pages. For social media, pulling data on followers from another source would be necessary to evaluate reach.
Static values, however, can produce a lot of noisy data. Other possible approaches might be more apt. An important factor in SERPs is that traffic is not distributed evenly across positions with the 1st and 2nd ones getting the lion’s share. While it may vary, common estimates suggest that the first positions get between 30-40% and the second ones 15-20%.
Additionally, other data enrichment practices could be included to provide even more accurate results. Google Trends and Google monthly search volume data are two sources that provide the number of searches per keyword. A combination of click distribution, rank, and the aforementioned data could give us an estimate of how many clicks a SERP result would get.
Conclusion
I have yet to see anyone suggest using search engine algorithms to enrich particular methods of data analysis. Sentiment is one such case where the innate nature of search engines can be highly beneficial. They provide us an easily accessible window into how impactful specific messages might be.