7.1 What does the network interaction entail in terms of stance?
During this study, we conducted user-level stance detection and opted to use two stance classes (with/against) instead of three (neutral/with/against). This decision was influenced by
(a) All Users | (b) English-speaking Users |
(c) Spanish and Portuguese-speaking Users | (d) Other European Language Users |
(e) West Asian Language Users 15 (f) Remaining Language Users
Figure 5: Importance of network features shown using SHAP Beeswarm Plots for different user groups based on their main spoken language. The network feature suffixes are: fr for Friend, rp for Replied to, li for Liked, and rt for Retweeted. Each dot represents a separate user, with Pro-vax users shown in blue and Anti-vax users in red.
the significant online polarisation observed during the COVID-19 pandemic, which was further supported by our prediction and top feature overlap results. The underlying notion is that even if users do not generate sufficient signals, such as text postings or network interactions, to reveal their COVID-19 viewpoints, they still exhibit a high likelihood of being biased towards either pro-vax or anti-vax stances.
We employed a data collection methodology that involved three distinct iterations, leveraging image and hashtag labeling. To ensure a robust identification of users with clear stances, we set a threshold of three times for the frequency of pro-vax or anti-vax hashtags for each user. This criterion aimed to focus on users who consistently expressed their position on the topic through repeated use of specific hashtags. For instance, a user with only one or two instances of pro-vax or anti-vax hashtags might not exhibit a clear stance and could be engaging for reasons other than personal beliefs. By employing a threshold of three, we aimed to enhance the reliability of our classification, prioritising users with a more pronounced and consistent alignment with either pro-vax or anti-vax content.
The collected user profiles encompassed 31 different detected languages, with English being the predominant language spoken by approximately 90% of users. However, this does not imply that the majority of users were monolingual, as we observed many profiles that indicated proficiency in multiple languages, highlighting the linguistic diversity within the dataset. Interestingly, India emerged as a prominent advocate for vaccination within our dataset. India’s prominence in the dataset can be attributed to various factors, including its large population, active social media presence, and specific initiatives and campaigns related to COVID-19 vaccination. This significance was further reflected in the top extracted features, with over 30% of the prominent pro-vax features corresponding to Indian accounts.
The high polarisation between pro-vax and anti-vax users was demonstrated by the overlap of the top extracted features. No similar network accounts were found among the top 10k features. This was further evidenced by the general top features table, where each user class had distinct color codings. Pro-vax users tended to interact with official health and global organisations such as the World Health Organization (@who), UNICEF (@unicef), and the United Nations (@un). On the other hand, anti-vax users engaged with accounts that aligned more closely with their conservative viewpoints.
Moreover, the results of the overlap between the top features indicate that a user’s stance is strongly linked to the type of network interaction they have with the target account (author). Friendship, retweets, and likes generally signify a positive stance towards the author’s viewpoints, while replies indicate a more controversial stance that may require further understanding.
Extracting the top features revealed some unexpected results regarding the influence and impact of certain celebrities. Of particular interest was the anti-vax sentiment expressed by celebrities such as @elonmusk, @jordanbpeterson, and @joerogan. Upon further investigation, we discovered that this sentiment was associated with the skeptical attitudes of these celebrities. For instance, @elonmusk opposed remote work mandates and questioned the efficiency of PCR tests, while @jordanbpeterson objected to vaccine passport declarations made by @justintrudeau.
Conducting a more in-depth analysis of features based on users’ spoken language, categorised by region, revealed the distinctiveness of each language group. Notably, influential accounts were closely tied to the language of the respective group, underscoring significant variations in influential accounts across different language-speaking groups. However, it also highlighted instances where English-speaking accounts exhibited global influence on the stance classification, such as @conspiracyb0t. We interpret this phenomenon as being attributable to the global nature of the COVID-19 discourse, transcending linguistic boundaries. Additionally, many engaged users in this topic demonstrated a sufficient understanding of English, enabling interaction with accounts in their non-native languages.
7.2 Global Signals for COVID-19 Pandemic
In conclusion, our research aimed to address three key research questions concerning the creation of an effective stance model for a global topic like COVID-19. Firstly, we explored the possibility of developing a single stance model capable of classifying users from diverse regions and languages based on their pandemic-related stance. Our findings demonstrated that it is indeed feasible to create such a model, highlighting the existence of commonalities in users’ stances across different cultural and linguistic backgrounds. Secondly, we investigated the performance of different feature sets in stance prediction and examined whether our findings aligned with previous studies. Our results showcased the significance of network interactions as powerful signals for stance detection, corroborating prior research in the field. Lastly, our research sought to uncover the main common signals of users worldwide that enabled the classifier to make accurate decisions. Through our analysis, we identified key indicators such as friendship connections, retweets, and likes, which consistently demonstrated a positive stance towards the author’s viewpoints. Additionally, replies emerged as a more nuanced signal, representing a controversial stance requiring deeper understanding. Overall, our research provides valuable insights into the creation of an effective global classifier for stance. By addressing these research questions, we contribute to the understanding of cross-cultural and multilingual stance detection, shedding light on the shared signals that facilitate accurate stance classification worldwide. These findings have implications for better understanding online discourse and public opinion surrounding global topics like COVID-19.
7.3 Limitations and perspectives
While our study provides valuable insights, it is not without limitations. Firstly, the reliance on Twitter data introduces inherent sampling bias, potentially skewing the findings towards certain demographics and user behaviours present on the platform. This limits the generalisability of the results to the broader population, emphasising the need for cautious interpretation. The binary classification of users into pro or anti-vax stances, while useful for simplicity and clarity, might oversimplify the nuanced spectrum of attitudes individuals may hold towards vaccination. Many users may occupy a middle ground, expressing sentiments that are more neutral or ambivalent, rather than definitively aligning with either pole. Besides, exclusively relying on network interactions may pose challenges in capturing the diversity of user sentiments accurately, particularly when taking into consideration that users might engage with content for reasons beyond expressing their personal beliefs, such as sarcasm, critique, or sheer curiosity. In instances where users have limited interaction content, the predictive accuracy of stance classification could be compromised due to the multifaceted nature of user engagement and potential misinterpretation of their true stance. Finally, an additional limitation arises from our dependency on image annotation as a starting point for the data collection process. While images can be powerful for elucidating user stances, this approach may introduce bias into the dataset by predominantly capturing users with explicit stances. To address this, for future work, we are contemplating the implementation of hashtag or keyword based data collection. Leveraging multilingual resources, although potentially more labour-intensive, can be more effective in extrapolating user-collected data.