top of page
Search

Social media analytics, competitive intelligence and sampling bias

  • Writer: Marco Scacchi
    Marco Scacchi
  • Sep 24, 2020
  • 3 min read

Updated: Sep 26, 2020

...and the need to understand the history behind data.


Data is often seen as a panacea to solve business problems, and we often hear people say that in the forthcoming future big data will become the basis of competition and the key driver of businesses growth and innovation.


The emphasis put on big data and the advances in data analytics have spurred, in turn, an increase in the number of organizations integrating (big) data capture and analysis into their decision making processes. These 'big data' come from a variety of sources, and among the most popular ones, there are social media platforms like Twitter and Facebook, to name but a few.


Thanks to these developments, businesses now have unparalleled access to data about their customers, and this information is used to help detect opinions on specific products or services, analyze consumer behaviors, monitor competitors and identify early warning signs (EWS).


Based on these capabilities, there is no wonder then that social media analytics has become an essential part of competitive intelligence processes in many companies. However, while many of us would agree that data should be an integral part of CI and an integral component of any business strategy, more data isn't automatically a positive thing. There is, in fact, another side of the story, we need to be aware of when taking strategic decisions using data gathered from social media platforms.


The use of big data from social networks suffers, in fact, from a series of pitfalls, of which sampling bias and respondent bias are two of the most prominent and often overlooked problems. These biases imply that the use of Facebook, Twitter, Instagram and other social media networks' data to infer what the general population think might produce skewed results. Why is this so? If you think about it, the first element is that not everyone uses social media, and of those actively using them, not all are honest when interacting online.


Moreover, social media networks differ significantly among them in terms of end-users' demographics and social characteristics. Some platforms, like Instagram, mainly cater for young people, whereas others, while reaching wider audiences, are heavily structured in terms of gender, age, race, religion and income, or are not used by a significant percentage of people. As a consequence, data obtained through social media are often not representative of the general population. They might show different trends compared to customer data gathered through other processes and methods such as written or phone surveys, which are usually rigorously designed to minimize bias. These discrepancies do not occur because customers have suddenly changed their attitudes, but rather because of the nature of the networks themselves.


Just to put things into perspective, while around 71% of all US adults use Facebook, only 32% of all Americans use Instagram and less than 20% use Twitter. These numbers (in absolute terms and by demographic characteristics) are certainly far from a representative sample. The use of these sources to support a competitive intelligence function raises essential questions of representation. Demographic or social groups may show different behavior when interacting online and offline, and may not be fully represented or even sampled when an analysis is carried out using data from social platforms.


While advances in data analytics represent a clear business opportunity, it is crucial to understand that big data from social media might not always provide the whole picture. Businesses should use a mix of new and traditional data-gathering techniques to minimize bias and extrapolate meaningful and actionable insights.


Those relying on social media big data analytics only must be careful to understand how the data was generated to be able to make right decisions, as the risk is to have a vast amount of data but too little useful data!

 
 
 

Comments


bottom of page