Diverse Enough but with Common Views:Building a Global Stance Classifier on COVID-19

doi:10.21203/rs.3.rs-4511190/v1

Download PDF

Research Article

Diverse Enough but with Common Views:Building a Global Stance Classifier on COVID-19

https://doi.org/10.21203/rs.3.rs-4511190/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Stance detection, which determines a user’s position on a specific topic through their generated content or interactions, has been widely studied for various domains. However, most existing work focuses on regional or community-specific topics, lacking a global perspective. In this paper, we investigate the ability to detect stance on the COVID-19 pandemic, a truly global issue transcending geographical and cultural boundaries. We compile a large, multilingual dataset of 7.9 million tweets related to COVID-19, accompanied by media content, spanning 3,516 users from 90 countries and 31 languages. Our objective is to develop an effective stance detection approach that can accurately predict users’ stances (pro-vax or anti-vax) regardless of their language or location. To achieve this, we propose a network-based method that leverages user interactions on Twitter, such as friends, likes, replies, and mentions, in addition to textual content. Despite the significant cultural diversity within our dataset, our approach demonstrates the ability to accurately predict users’ COVID-19 stance by analyzing their interaction signals and network homophily patterns. Our classification model achieves an F-score of 0.95 for both pro-vax and antivax user stances, surpassing state-of-the-art text-based methods. The findings suggest that echo-chamber effects and network homophily can extend beyond borders and languages, forming global patterns of polarization around certain topics. Our work highlights the potential of network-based approaches for stance detection on global issues and contributes insights into the challenges and opportunities of developing inclusive and robust models across diverse contexts.

Multilingual Stance Detection

User Polarisation

COVID-19

Cross-Cultural Analysis

Online discourse

Vaccine Hesitancy

Stance detection has been widely explored in the last decade, showing that online social network users’ stance can be predicted on various topics using different online signals [1]. Many studies have been carried out to detect online stance on different topics, such as polarisation in elections [2–4] and political debates [5–7]. More recently, stance detection has been utilised with rumors and fake news detection [8–10]. Different attributes and methods have been suggested for the task of detecting stance, both on the post and user levels, such as lexicon and deep neural text representations [7, 11, 12], or user interaction and network information [2, 13]. Most of the studies on stance detection have been carried out on specific countries [14–18], certain user communities [19–21] or specific text languages [22–26].

The findings of those studies demonstrate that stance can be accurately determined on various topics, particularly when social media network signals are utilised [27]. This is attributed to the significant polarisation of topics and the tendency of like-minded users to form echo-chambers within specific communities [28–30]. However, no study to our knowledge had emphasized on the power of using stance detection on a global scale, where the studied topic concerns people around the world crossing countries, languages, and cultures. For instance, the SemEval 2016 stance detection dataset, that had been widely used in the literature, uses only English tweets on topics related mostly to the US politics [7]. In this paper, we seize the opportunity presented by the extensive discussions during the past four years regarding the COVID-19 pandemic. Our goal is to investigate the feasibility of a single stance detection model capable of classifying online users globally, encompassing individuals from diverse countries and linguistic backgrounds. Specifically, we aim to classify users into two categories: 1) ”pro-vax” users who consider COVID-19 a serious issue and advocate for mask-wearing and vaccination, and 2) ”anti-vax” users who view COVID-19 as a hoax, oppose mask-wearing, or reject vaccination.

Our paper aims to address three research questions:

RQ1 For a global topic, such as COVID-19, is it possible to create one effective stance model that can classify users across different regions and languages according to their stance on the pandemic?
RQ2 How different feature sets about the users would perform in stance prediction? Does this align with findings from previous studies?
RQ3 If it is possible to create an effective global classifier for stance, what are the main common signals of users around the world that allowed the classifier to make correct decisions?

To answer our RQs, we identified a set of 3,516 Twitter users from 90 different countries and posting in 31 different languages; those users showed stances towards COVID-19 either being pro or anti vax. For each user, we collected all their network connections and interactions, in addition to their tweets. We examined different sets of features inspired from the literature to predict their stances. Through our experiments, we observed that incorporating network features resulted in a notable prediction performance, achieving an F1-score of 0.95. This result was surprising given the highly diverse nature of the users who discussed the topic. Our analysis to the most predictive features showed that while the topic is global and users are highly diverse and speak different languages, there are clear signals of large polarization in the network interactions and some kind of global homophily among people who share the same stance even when coming from different continents. For example, it was clear that pro-vax users follow and interact with official health and governmental accounts, such as the World Health Organization (WHO) and the United Nations (UN), while anti-vax users interact more with conservative accounts, conspiracy theory accounts, and some business accounts like Elon Musk’s.

This paper conducts a feasibility study to assess the practicality of employing a single stance detection model on a global scale, particularly in the context of the COVID-19 pandemic. Our primary objective is to evaluate existing stance detection methodologies across diverse linguistic and cultural backgrounds. Our study contributes insights into the challenges and possibilities of utilising existing models globally. Through exploring the effectiveness of feature sets and identifying common signals across diverse user populations, our study contributes to the discourse on online stance detection in a global context and can inspire future research in developing more robust and inclusive stance detection approaches. We believe the results of our study show that echo-chamber and network homophily can extend beyond borders and languages and can be global.

We address the related works from two distinct perspectives: examining the dynamics of online polarisation during the pandemic and exploring the key methodologies in stance detection.

2.1 Analysing Online Discussions on COVID-19

The COVID-19 pandemic has created new opportunities for exploring online polarisation, stance detection, misinformation, and fake news propagation. These phenomena have raised concerns about their potential to undermine public health efforts and fuel vaccine hesitancy, defined as ”a delay in acceptance or refusal of vaccines despite the availability of vaccination services” according to the World Health Organisation.1 Previous research has identified two main categories of user stances: pro-vax and anti-vax [31–35]. For instance, a study by Cascini et al. [36] found that Arabian countries exhibited higher levels of hesitancy towards COVID-19 vaccines, while Mønsted and Lehmann [37] examined the polarisation in online vaccine discourse, analysing mutual interactions between users, and discovered that both antivax and pro-vax users exhibited similar patterns in terms of generated content and network properties. Furthermore, an analysis conducted by Cotfas et al. [38] on millions of COVID-19 related tweets revealed that the majority of tweets adopted a neutral stance towards COVID19 vaccination, with positive stance tweets being more extensive than negative ones. The emergence of the COVID-19 pandemic led to a significant increase in online misinformation propagation and polarised content, with potentially serious consequences for public health. It has been reported that online misinformation content related to COVID-19 has been viewed 4.5 billion times in just one month (between March and April 2020) [39]. Another study found that in the first two years of the pandemic, user engagement increased between 20% and 87% on social platforms such as Twitter and Facebook [40]. The rise of social media usage has been proven to play a determinant role in pushing controversial stances towards vaccine uptake, known as ”vaccine hesitancy” [41]. Although vaccine hesitancy is a contextdependent phenomenon, some common behavioral patterns that characterize it have been identified among users who share similar vaccine stances [42]. Users tend to have different sentiments regarding the mandatory aspect of the vaccine [43, 44], uncertainties around the vaccine’s increased risk of illness [45, 46], suspicions towards health authorities [47], and the potential conflicts of interest with pharmaceutical industries [48]. Further studies have linked vaccination hesitancy with social and political viewpoints, such as racial justice [49], left-wing and right-wing rivalries, and conspiracy theories [50].

2.2 Stance Detection

Stance detection is defined as identifying the speaker’s viewpoint towards a given topic. It is becoming a crucial part of social media analysis as it allows revealing disseminated opinions that cannot be captured by sentiment analysis alone. The relationship between stance and sentiment is not equal, as a given stance towards a certain topic can be expressed with various sentiments, including positive, negative, or neutral [51]. As an example, the sentence ”I am sad that Trump lost the elections” contains a negative sentiment but has a positive stance towards the ”Trump” topic. Stance detection tasks are generally applied to either users or posts. The former is concerned with the writer’s viewpoint, while the latter focuses on the disseminated views that can be extracted from short texts. AlDayel and Magdy [27] defined two types of stance detection: (i) Detecting expressed views: the user’s expressed views towards a given topic are extracted through their posts [52–54], with ”SemEval 2016 task 6” being a common ground reference for the evaluation of this kind of task, with topics such as atheism, feminism, and political orientations [7]. (ii) Predicting unexpressed views: where the task is to infer the user’s viewpoint on a topic they did not explicitly discuss, which amounts to predicting their future positions or the likelihood that they will be in favor of or against a certain event [55, 56].

A large portion of the literature uses natural language processing (NLP) techniques to infer stance, such as deep neural networks, attention models [15, 54, 57, 58], and lexiconbased methods [59]. The idea here is that users who share the same stance tend to have similar writing styles and vocabularies [23]. Applying Transformer and attention-based models on generated text has rendered successful state-of-the-art classification in many stance detection datasets, such as the use of BERT [60] and UmBERTo [22] models. Other works have explored the user’s network features, such as friends or accounts with which the user interacted [13, 61]. Some hybrid approaches have mixed textual information with network attributes [1, 56]. More recent work has investigated the exact context of the author’s posting behavior, which can affect the stance, such as the effect of citing, quoting, and sharing on the precision of the stance detection task [62, 63]. Additionally, Xi et al. [64] showed that images posted by US Congress members could be used to predict their political ideology, highlighting the potential of multimodal approaches. Most literature on stance detection confirms that online users’ views are highly predictable using many different signals, including the text in their posts, the accounts they interact with, and even the images they post on their accounts. The common explanation for such high performance of stance prediction is the high polarisation in online networks and the homophily phenomenon, which makes online users with similar views gather in echo-chambers [27–30].

Nevertheless, a main limitation of the existing literature is that most studies focus on analysing topics that are local to a given region or community. For example, Ebeling et al. [17] studied stance detection on COVID-19 in the United States, while Iguacel et al. [18] focused on Spanish tweets related to the pandemic. It is unclear if such high performance of stance detection classifiers would continue when applied to a global topic to predict users’ stance across different regions and languages. Several studies have looked at the multilingual analysis aspect, but they mostly focused on specific regions or languages. For example, [26] scrutinised stances in Europe, employing five languages on Twitter, while [22] delved into political debates on social media across five European languages. Additionally, [14] explored stance prediction concerning Islamophobia in tweets originating from three distinct countries. Despite the valuable contributions made by such studies, to our knowledge, no one has performed stance prediction on a global scale. It is noteworthy that our work is distinct; we conduct stance detection in 31 different languages and across 90 countries spanning six continents.

[1] https://www.who.int/news/item/18-08-2015-vaccine-hesitancy-a-growing-challenge-for-immunization-programmes

3.1 Data Collection Methodology

We used the Twitter COVID-19 streaming API, which was enabled by Twitter for researchers to collect a special stream of tweets related to COVID-19 covering all languages identified by Twitter.2 For the period between October 2021 and March 2022, we had a collection of around 160 million tweets.

To identify tweets with stance about COVID, we extracted all the tweets that have an image, since images can be language independent. A set of 7.9 million tweets containing images; thus, we downloaded all those images locally. We used a duplicate image identifier tool3 to detect the duplicate images among those, then we extracted all the images that appeared in at least 10 tweets. We got a set of 30,507 unique images that were posted in 1.2 million tweets. A set of two annotators have labeled those images according to their stance towards COVID-19. The majority of images were either unrelated (e.g. general memes) or neutral. Only a small set of those images were found to carry a polarised stance towards COVID-19. Some of the images contained text in different languages. Our annotators knew five languages: English, French, Chinese, Japanese, and Arabic, and thus they managed to label those images. For other languages, they used open-source OCR to extract the text and translate it. However, this was only applied when the image was clearly related to the COVID19 topic, or otherwise it was neglected (e.g. memes with text was neglected). A set of 100 images were labeled by both annotators, and the Cohen’s Kappa of inter-annotator agreement was 0.85, which shows high agreement between annotators.

After the annotation process, a set of 432 and 318 images were labeled as pro-vax and anti-vax respectively, while the remaining were labeled as unrelated or neutral. Figure 1 shows a sample of labeled pro-vax and anti-vax images. Later, we extracted the users that used only one set of the polarized images (either only pro-vax or only anti-vax images), which led to a set of 1147 users (254 pro-vax and 893 anti-vax). It was clear that these users already cover multiple languages and regions. Nonetheless, we wanted to expand our set of users to have a larger collection. Thus, for those users, we downloaded their timelines and extracted the top hashtags used in their tweets. Later, one of the annotators labeled the most frequent 200 hashtags that appeared in the timelines as pro-vax, anti-vax, or neutral (e.g. #COVID). Similarly, hashtags in languages not understood by the annotator were translated into English and labeled accordingly. This process led to 29 and 39 pro-vax and anti-vax hashtags respectively that covered seven different languages: English, Spanish, Dutch, Japanese, Portuguese, Indonesian, and Arabic. Examples of pro-vax hashtags: #maskup, #vaccineswork, and #getboostednow; and examples of anti-vax hashtags: #scamdemic, #novaccinemandates, and #lockdownslethal.

We used the extracted labeled hashtags to expand our collection of users, where we used them to extract all tweets in our large COVID-19 collection that mentions any of those hashtags. From those tweets, we identified the users who appeared at least twice in our collection mentioning any of those hashtags, which led to a set of 6504 users; then collected all their

timelines (up to the latest 3,200 tweets). Later, we scanned their full timelines for any mention of our set of pro/anti-vax hashtags, we only kept those users who have one sided hashtags triple the other side. This means that we selected only the users who had used one type of hashtag (pro or anti) at least three times more frequently than the other type of hashtags. The decision to choose a threshold value of triple was based on the desire to attain a higher degree of confidence in ascertaining users’ polarity towards either a negative or positive stance. Furthermore, after manual curation, we identified that several users who infrequently posted about the COVID-19 topic expressed sarcastic sentiments in their tweets. Using a smaller threshold value could potentially misrepresent their stance. The final dataset contains a set of 3516 user accounts, out of which 1834 and 1682 are pro-vax and anti-vax respectively.

To verify the methodology we used to automatically label these accounts as pro or anti vax based on the hashtags they use, a random subset of 100 users (50 pro-vax and 50 antivax) were selected to be manually annotated. An annotator inspected all the tweets of those users that contain the terms ”vaccine” or ”covid” to decide whether they are pro or anti-vax. The annotator labeled 93 users similar to the automatic label based on hashtags. Among the 7 remaining, 6 were labeled undefined, where the annotators did not find enough evidence in the tweets to judge the user’s stance. Only one user was annotated as pro-vax by the annotator while being labeled anti-vax using the hashtags. This user was discovered employing anti-vax hashtags despite the predominant pro-vax sentiment in the tweets, suggesting a strategy aimed at expanding the audience reach of their messages by using popular hashtags from both sides. Thus, our automatic method for labeling the users based on images and hashtags is estimated to have an accuracy of 93%, which is deemed suitable for training a stance classifier.

3.2 Dataset Insights

The final collected dataset consists of 3516 Twitter users, with 1834 supporting vaccination (pro-vax) and 1682 opposing vaccination (anti-vax). Table 1 presents some statistics about the collected networks for each group of users.

Table 1

Overview of the collected users dataset. The network features (#friends, #replies, #retweets, #likes) are median values per user.
	#users	#friends	#replies	#retweets	#likes
pro-vax	1834	92	265	167	208
anti-vax	1682	360	719	762	872

Table 2

Top user’s locations categorized by Table 3: Top user’s languages categorized by vaccination stance vaccination stance

pro-vax	#users	anti-vax	#users
N/A	454	N/A	818
India	218	USA	194
USA	145	Canada	92
UK	42	UK	88
Mexico	34	England	67
Canada	29	France	58
England	27	Kuwait	20
South Africa	26	Australia	19
Ireland	15	India	19
Japan	14	Scotland	16
Ghana	13	Brazil	10
Malaysia	13	Netherlands	9

pro-vax	#users	anti-vax	#users
English	1243	English	1216
Spanish	111	French	196
Hindi	54	Arabic	75
Japanese	47	Spanish	35
Indonesian	24	Dutch	31
Dutch	22	Portuguese	24
German	19	German	21
Italian	10	Italian	11
Tamil	8	Polish	7
Telugu	7	Hindi	5
French	5	Greek	5
Nepali	4	Japanese	5

3.2.1 Countries

We extracted the location of the users from their profile bio section. The location names were mapped to the country level using a city-to-country mapper that maps more than 120k worldwide cities to their corresponding countries.4 The distribution of the top user countries is shown in Table 2. Unidentified locations are the majority of users, since many users’ location field is left blank or have unmeaningful location. Interestingly, the top identified pro-vax country is India with 218 users, while the top anti-vax country is the USA with 194 users. This could potentially be attributed to factors such as the political climate, cultural influences, and the prevalence of specific narratives or misinformation campaigns in these regions during the pandemic. This is also confirmed by the language distribution where Hindi is the top second ranked language amongst the pro-vaccine users.

3.2.2 Languages

We performed a soft count of user languages based on the analysis of each user’s tweets timeline. Using the langdetect module,5 we detected the language of each single tweet in users’ timelines. We looked at the language a user used most in their tweets to identify their main language. As shown in Table 3, English is the most common language spoken by most users in the pro-vax and anti-vax groups. We can see that pro-vax languages are more diverse with languages from different regions (African, Asian, and European). On the other hand, 8 of the top 12 anti-vax detected languages are European languages. This observation could potentially reflect the varying levels of vaccine hesitancy and the prevalence of specific narratives or misinformation campaigns across different regions and cultural contexts during the pandemic. These statistics confirm that our set of collected users represents a wide range of users across many regions and languages, which is an important setup for our investigation into the feasibility of a global stance detection model.

[2] https://developer.twitter.com/en/docs/twitter-api/tweets/covid-19-stream/overview

[3] https://github.com/beeftornado/duplicate-image-finder

[4] https://github.com/lutangar/cities.json

[5] https://pypi.org/project/langdetect/

In order to evaluate the effectiveness of our network-based user stance detection approach, we compare it to state-of-the-art text stance detection methods that have been widely used in previous literature. The choice of these baselines allows us to benchmark our approach against established text-based techniques and assess the potential benefits of incorporating network features. To accomplish this, we employed two baseline methods: An aggregated text stance detection based on Logistic Regression classification, and a Tweet-level based stance detection based on the BERT-CT model.

4.1 Logistic Regression

The first method involves aggregated user stance classification, whereby all of the user’s tweets are concatenated into a single textual input, and the overall user stance is evaluated based on a textual classification algorithm using Logistic Regression. This approach has been utilized in previous large-scale works on user profiling and stance detection [65]. The input features for the Logistic Regression model are derived from TF-IDF representations of the aggregated tweet text.

4.2 CT-BERT

The second method involves the use of tweet-level stance detection based on a pre-trained BERT model. The tweet-level stances are then subsequently aggregated to the user level to determine the user’s stance. The user’s stance is determined by the prevalent stance across their COVID-19 related tweets, using a majority vote approach. We utilized CT-BERT, a model based on the BERT-LARGE [66] (English, uncased, whole word masking) model that was pre-trained on 160 million COVID-19 related tweets [67]. To perform our evaluation, we partitioned the users into training (70%), validation (10%), and testing (20%) sets. The CT-BERT model was then fine-tuned on the training set for 10 different epochs, and the bestperforming model on the validation set was selected for evaluation on the test set. For both baseline methods, we report the macro-averaged F1-score as the evaluation metric, which is a common choice for multi-class classification tasks and provides a balanced measure of performance across the pro-vax and anti-vax classes.

In this section, we present our user modeling approach, outline the training setup, and showcase prediction results. We evaluate performance globally and highlight performance based on users’ languages and locations.

5.1 User Network Modeling

We depict each Twitter user, denoted as @user, through their network interactions. Five types of network interactions are considered:

Friends: accounts followed by @user.
Replies: accounts to which @user replied.
Retweets: accounts that were retweeted by @user.
Likes: accounts whose tweets were liked by @user.
All-Nets: Combination of the above four types of network interactions for each user.

Each network feature representation of each user consists of a separate document, containing the usernames of the accounts the user interacted with. For instance, if @user likes a tweet from @who, the ”Likes” document of @user includes @who, along with other usernames liked by @user. Usernames in these documents are chronologically ordered and line-separated. The network representation of @user is then modeled using a simple bagof-words approach. In the vector representation, each column corresponds to a different username, with a value of 1 if it was part of @user’s interactions and 0 otherwise.

5.2 Training Setup

The classification training and testing is performed using 10 k-fold cross validation. We made sure different accounts are used in the training/testing sets by shuffling the accounts during the sets sampling. Each user feature embedding is represented as a bag-of-words embedding, where the features are the network interactions of that user (i.e., the account with which the user interacted through Friendship, Retweet, Reply, or Like). To reduce the training dictionary size, we delete accounts that have a lower number of unique user interactions and thus unlikely to influence the classification model decision making. We complete this by setting a higher value to the minimum number of document frequency (mindf). In terms of F1 performance, the most effective configuration was setting mindf to 100, signifying that features appearing in fewer than 100 user network documents were disregarded. On the other hand, limiting maxdf did not improve the prediction performance. We tested five different classifiers, namely Logistic Regression, Naive Bayes, SVM, Random Forest, and XGBoost. These models cover diverse approaches and represent classical and contemporary methods. CT-BERT, a state-of-the-art model, was employed for text-based classification, complementing our primary text-based approach. This allowed us to comprehensively assess user stance detection across different modalities, comparing network-based performance against a cutting-edge text-based method. Since logistic regression was the best overall performing model, we only report its results in the rest of this section. In the textual classification, we represent users as one textual document that contain all the tweets and replies text that the user made. We make sure to delete all the mentioned @usernames in the tweets text. This is to ensure that the baseline does not contain any network information.

Table 4: Classification results for pro-vax and anti- vax Regression trained on ALL Network Features. users. P: Precision, R: Recall, F1: F-score.

5.3 Findings

5.3.1 General prediction

Table 4 provides the classification results for anti-vax and pro-vax users based on different types of network interactions. Combining all network interactions into one prediction source (All-Nets) yielded the highest prediction performance with an average F1 of 0.95 for both user classes, which is 6% better than the text baseline and better than many of the state-ofthe-art binary prediction results that generally range .

5.3.2 Language-based performance

Next, we analyse our prediction performance on users who use similar language. We begin by labeling users based on their dominant language, determined as the most frequently detected language in their profiles. We then measure the macro F1 score for each language group. Figure 4(a) shows the classifier performance in six different groups: English, Spanish, Indonesian, Arabic, German, and others. English-speaking users are the largest group, with a total number of 2610 users (1380 pro-vax and 1230 anti-vax). They have been correctly predicted when all network information are used with an F1 score of 0.94 and 0.95. Good results were obtained, even with languages other than English. Some groups showed almost perfect predictive accuracy, including but not limited to Japanese and Spanish-speaking users. Other groups had lower performance, such as the Hindi-speaking anti-vax users that were predicted with 0.65 F1. This mainly due to the low number of users in such groups.

5.3.3 location-based performance

Figure 4(b) shows the classifier performance in each of the six selected country groups. Users from the USA were the largest group with 303 users. The classifier gave an f1 score of 0.96 for US pro-vax and anti-vax users. The two other English-speaking countries, namely Canada and the United Kingdom, performed in a similar way. As with the non-English speaking countries, Germany yielded the best results, with all users correctly predicted using the Friends network.

(a) Language (b) Country

Figure 4: Classification results split by countries and languages

The performance on the rest of the world were close to the general performance with a score of 0.93 for both classes for a group of 712 pro-vax and 921 anti-vax.

6.1 Global Features Analysis

We perform an analysis of our classifier’s top learned features to gain insights into how network features influence user stance modeling. Table 5 shows the top ten learned features of the Linear Regression model for both user classes and each network type. We can see the clear disjoint between the two classes through the color distribution. English-speaking accounts are dominant. Non-English speaking accounts are more present on the pro-vax side, such as Indian politician accounts (@narendramodi, @pmoindia, @rashtrapatibhvn), or French speaking accounts (@fphilippot, @bfmtv). @Potus is the only overlapping account between the two classes where it receives retweets from the pro-vax versus replies from the anti-vax.

The most common accounts within the pro-vax class are:

Non-US Politician account such as Indian politicians (@narendramodi, @pmoindia,

@rashtrapatibhvn), and politicians from the rest of the world (@PaulKagame,

@alfRRedodelmazo, @nicolasmaduro).

US democrat accounts, such as democrat President @potus and democrat activist

@nathaliejacoby1.

Government accounts such as the Indian Ministry of Health (@mohfwindia), the British Police (@metpoliceuk) and the official South African Government account

(@governmentza).

United Nations organizations such as @who, @unicef and @un.
Other media and health pages (@romfordrecorder, @cdcgov), pro-vax celebrities and activists (@srbachchan, @danrather, @nathaliejacoby).

On the other hand, the top anti-vax extracted features tend to contain more activist and media accounts. Interestingly, @elonmusk appears among both the friends and likes top coefficient. The top detected categories are as follows:

Right activist (@essexpr, @Keean).

Podcasters and media accounts such as @joerogan, @petersweden7, and @bfmtv.
Anti-vax scientists: @robertkennedyjr and @PeterMcCullough.
Republican journalists such as the Fox news journalist @tuckercarlson, conservative influencer @realcandaceo, and the British right-wing media @gbnews.
Conspiracy theory accounts, such as @conspiracyb0t, @jackposobiec, and

@disclosetv.

Other celebrity and intellectual accounts, such as @elonmusk and @jordanbpeterson.

The only anti-vax top features that are of pro-vax nature are the ones that appear in the replies category. The detected feature accounts are those of the Canadian and British prime ministers (@justintrudeau and @potus). Neutral media accounts appear also on the same anti-vax network: @bfmtv, a French news channel, and @saudinews50, a Saudi news page.

Table 5: Top accounts based on the top learned coefficients from the prediction task. NonEnglish are marked with a *. Each color encodes a common account category: Official health and government accounts, Non-US Politician accounts, US Democrat accounts, US Republican accounts, Conspiracy theory accounts, Media and journalist accounts

Pro−vax Anti−vax

Friends

@hootsuite, @cdcgov, @who,

*@narendramodi, @unicef,

*@pmoindia, *@rashtrapatibhvn,

*@mohfw india, @paulkagame

@disclosetv, @joerogan,

@robertkennedyjr, @elonmusk,

@tuckercarlson, @realcandaceo,

@jordanbpeterson,@p mcculloughmd,

@libsoftiktok

Retweeted

*@alfRRedodelmazo, @cdcgov,

@metpoliceuk, @who,

*@nicolasmaduro,*@mohfw india,

@potus, *@narendramodi,

@governmentza

@jackposobiec, @jamesmelville,

@conspiracyb0t, @catturd2,

@berniespofforth, @essexpr, *@f philippot,

@zubymusic, @therealkeean

Replies

@romfordrecorder, *@narendramodi,

@who, *@alfRRedodelmazo, *@ktrtrs,

@unicef, *@mcdonaldsjapan,

*@srbachchan, @nathaliejacoby1

@jackposobiec, @disclosetv,

@gbnews, @justintrudeau, @potus,

*@bfmtv, @zubymusic,

@jamesmelville, *@saudinews50

Likes

@un, @who, @citi973, @nhsuk

*@mohfw india, @pontifex,

@twitter, *@kkmputrajaya, @cdcgov

@elonmusk, @jackposobiec, @catturd2,

@jamesmelville, @conspiracyb0t,

@realcandaceo, @zubymusic,

@berniespofforth, @thefRReds

6.2 Language-Based Features Analysis

In order to assess the influence of spoken language on stance prediction, we conducted a top features analysis based on users’ spoken language. In addition to aggregating all users, we grouped them into five language groups based on linguistic similarities or geographical closeness, taking into account regional and cultural connections. The user groups and the rationale behind are as follows:

English (En): Users primarily tweeting in English.
Spanish + Portuguese (Es pt): Recognising the prevalence of users speaking Spanish or Portuguese and those from Latin America (e.g., Mexico and Brazil), forming a distinct category.
Other European (EU): Encompassing a variety of European languages, considering the diverse linguistic landscape of the continent. Example of prominent languages: German, Italian, French, and Dutch.
West Asia (WA): Incorporating mainly Indian descendant languages and Arabic, reflecting the linguistic diversity in the West Asian region. Example of prominent languages: Arabic, Telugu, Tamil, Hindi, and Urdu.
Rest of the World (Rest): Encompassing diverse languages beyond Europe and West Asia. Example of prominent languages: Chinese (Simplified), Indonesian, Japanese and Tagalog.

Figure 5 shows a SHAP beeswarm plots for each language category, depicting the top features of a logistic classifier for predicting COVID-19 stance. The general pattern observed is that anti-vax users (red dots) tend to have more extreme feature values compared to provax users (blue dots), indicating greater network activity. However, an exception is observed in the West Asian speaking group, where the features of pro-vax users exhibit similar activity levels to anti-vax users.

Noteworthy observations from the plots include: @conspiracyb0t is the only account present in all six language categories, indicating a global influence of this account on predicting user stance through the retweet interactions. The rest of the accounts showcase language-specific influence, like @jairbolsonaro for Portuguese-speaking groups and @nedaaalkhamis for West Asian groups (Arabic speaking).

6.3 Similarity between Networks

To explore common signals between the two classes of users, we investigated the overlap among the top features learned by our classifier when using different types of networks as a training set. Table 6 illustrates the overlaps between the top 100 learned features for the two user classes. Most of the overlaps between pro-vax and anti-vax features occur between the replies and the rest of the networks, such as @joebiden receiving retweets from pro-vax users and replies from anti-vax users.

Concerning the overlap between similar network types, we observed the two user classes to be highly disjoint. No overlap was found even among the top 10,000 extracted features of similar networks, such as pro-vax Friends and anti-vax Friends. This contrasts with previous works where an average overlap of around 18% was identified between two groups of users with opposing stances in their top 1,000 features [1]. Regarding affiliation, five of the six accounts in Table 6 belong to left-wing or right-wing

US politicians (@joebiden, @potus, @justintrudeau, @karllauterbach, and @tedcruz). @karllauterbach is the only non-US account (German), and @tedcruz is the sole right-wing account. The features overlap illustrates that pro-vax users predominantly appreciate left-wing accounts, as evidenced by their network interactions such as Friends, Retweets, and Likes. In contrast, anti-vax users primarily engage through Replies to these accounts, with the notable exception of @justintrudeau, who is categorized as a Friend. Conversely, the right-wing account @tedcruz is liked by anti-vax users while receiving only replies from pro-vax users. This suggests that replies can often be considered a sign of stance disapproval from the interacting account.

Table 6: Overlap between pro-vax and anti-vax top 100 extracted features

In this research, we investigated the power of network signals in revealing a user’s stance. Our primary objective was to understand the similarities and divergences of the learned stance model globally. Unlike other studies, our work did not focus on specific user communities sharing common geographical or linguistic backgrounds. To accomplish this, we devised our own data collection methodology, which involved labeling the shared media of users. These shared media often serve as language-independent carriers of messages. Selecting COVID-19 as the stance topic facilitated the collection of such data, given its extensive global discussion spanning over three years.

7.1 What does the network interaction entail in terms of stance?

During this study, we conducted user-level stance detection and opted to use two stance classes (with/against) instead of three (neutral/with/against). This decision was influenced by

(a) All Users

(b) English-speaking Users

(d) Other European Language Users

(e) West Asian Language Users 15 (f) Remaining Language Users

Figure 5: Importance of network features shown using SHAP Beeswarm Plots for different user groups based on their main spoken language. The network feature suffixes are: fr for Friend, rp for Replied to, li for Liked, and rt for Retweeted. Each dot represents a separate user, with Pro-vax users shown in blue and Anti-vax users in red.

the significant online polarisation observed during the COVID-19 pandemic, which was further supported by our prediction and top feature overlap results. The underlying notion is that even if users do not generate sufficient signals, such as text postings or network interactions, to reveal their COVID-19 viewpoints, they still exhibit a high likelihood of being biased towards either pro-vax or anti-vax stances.

We employed a data collection methodology that involved three distinct iterations, leveraging image and hashtag labeling. To ensure a robust identification of users with clear stances, we set a threshold of three times for the frequency of pro-vax or anti-vax hashtags for each user. This criterion aimed to focus on users who consistently expressed their position on the topic through repeated use of specific hashtags. For instance, a user with only one or two instances of pro-vax or anti-vax hashtags might not exhibit a clear stance and could be engaging for reasons other than personal beliefs. By employing a threshold of three, we aimed to enhance the reliability of our classification, prioritising users with a more pronounced and consistent alignment with either pro-vax or anti-vax content.

The collected user profiles encompassed 31 different detected languages, with English being the predominant language spoken by approximately 90% of users. However, this does not imply that the majority of users were monolingual, as we observed many profiles that indicated proficiency in multiple languages, highlighting the linguistic diversity within the dataset. Interestingly, India emerged as a prominent advocate for vaccination within our dataset. India’s prominence in the dataset can be attributed to various factors, including its large population, active social media presence, and specific initiatives and campaigns related to COVID-19 vaccination. This significance was further reflected in the top extracted features, with over 30% of the prominent pro-vax features corresponding to Indian accounts.

The high polarisation between pro-vax and anti-vax users was demonstrated by the overlap of the top extracted features. No similar network accounts were found among the top 10k features. This was further evidenced by the general top features table, where each user class had distinct color codings. Pro-vax users tended to interact with official health and global organisations such as the World Health Organization (@who), UNICEF (@unicef), and the United Nations (@un). On the other hand, anti-vax users engaged with accounts that aligned more closely with their conservative viewpoints.

Moreover, the results of the overlap between the top features indicate that a user’s stance is strongly linked to the type of network interaction they have with the target account (author). Friendship, retweets, and likes generally signify a positive stance towards the author’s viewpoints, while replies indicate a more controversial stance that may require further understanding.

Extracting the top features revealed some unexpected results regarding the influence and impact of certain celebrities. Of particular interest was the anti-vax sentiment expressed by celebrities such as @elonmusk, @jordanbpeterson, and @joerogan. Upon further investigation, we discovered that this sentiment was associated with the skeptical attitudes of these celebrities. For instance, @elonmusk opposed remote work mandates and questioned the efficiency of PCR tests, while @jordanbpeterson objected to vaccine passport declarations made by @justintrudeau.

Conducting a more in-depth analysis of features based on users’ spoken language, categorised by region, revealed the distinctiveness of each language group. Notably, influential accounts were closely tied to the language of the respective group, underscoring significant variations in influential accounts across different language-speaking groups. However, it also highlighted instances where English-speaking accounts exhibited global influence on the stance classification, such as @conspiracyb0t. We interpret this phenomenon as being attributable to the global nature of the COVID-19 discourse, transcending linguistic boundaries. Additionally, many engaged users in this topic demonstrated a sufficient understanding of English, enabling interaction with accounts in their non-native languages.

7.2 Global Signals for COVID-19 Pandemic

In conclusion, our research aimed to address three key research questions concerning the creation of an effective stance model for a global topic like COVID-19. Firstly, we explored the possibility of developing a single stance model capable of classifying users from diverse regions and languages based on their pandemic-related stance. Our findings demonstrated that it is indeed feasible to create such a model, highlighting the existence of commonalities in users’ stances across different cultural and linguistic backgrounds. Secondly, we investigated the performance of different feature sets in stance prediction and examined whether our findings aligned with previous studies. Our results showcased the significance of network interactions as powerful signals for stance detection, corroborating prior research in the field. Lastly, our research sought to uncover the main common signals of users worldwide that enabled the classifier to make accurate decisions. Through our analysis, we identified key indicators such as friendship connections, retweets, and likes, which consistently demonstrated a positive stance towards the author’s viewpoints. Additionally, replies emerged as a more nuanced signal, representing a controversial stance requiring deeper understanding. Overall, our research provides valuable insights into the creation of an effective global classifier for stance. By addressing these research questions, we contribute to the understanding of cross-cultural and multilingual stance detection, shedding light on the shared signals that facilitate accurate stance classification worldwide. These findings have implications for better understanding online discourse and public opinion surrounding global topics like COVID-19.

7.3 Limitations and perspectives

While our study provides valuable insights, it is not without limitations. Firstly, the reliance on Twitter data introduces inherent sampling bias, potentially skewing the findings towards certain demographics and user behaviours present on the platform. This limits the generalisability of the results to the broader population, emphasising the need for cautious interpretation. The binary classification of users into pro or anti-vax stances, while useful for simplicity and clarity, might oversimplify the nuanced spectrum of attitudes individuals may hold towards vaccination. Many users may occupy a middle ground, expressing sentiments that are more neutral or ambivalent, rather than definitively aligning with either pole. Besides, exclusively relying on network interactions may pose challenges in capturing the diversity of user sentiments accurately, particularly when taking into consideration that users might engage with content for reasons beyond expressing their personal beliefs, such as sarcasm, critique, or sheer curiosity. In instances where users have limited interaction content, the predictive accuracy of stance classification could be compromised due to the multifaceted nature of user engagement and potential misinterpretation of their true stance. Finally, an additional limitation arises from our dependency on image annotation as a starting point for the data collection process. While images can be powerful for elucidating user stances, this approach may introduce bias into the dataset by predominantly capturing users with explicit stances. To address this, for future work, we are contemplating the implementation of hashtag or keyword based data collection. Leveraging multilingual resources, although potentially more labour-intensive, can be more effective in extrapolating user-collected data.

7.4 Ethics and Privacy Considerations

During the process of collecting user-generated content and profiles, we used official Twitter streaming accounts that were certified by the Twitter Developer Portal. We did not reveal any confidential or personal information of the users during this study. All of the mentioned usernames belong to celebrities or official organisations. We did not store or share any confidential user information during this study, such as users’ private messages or users’ private tweets.

In this work, we have made significant strides in understanding the dynamics of COVID-19 stance detection across diverse linguistic and geographical landscapes. Our efforts culminated in the creation of a robust COVID-19 Stance dataset, meticulously curated to represent a wide spectrum of users from various regions and speaking different languages. Through meticulous data collection and rigorous analysis, we unraveled the intricate web of network signals that underlie users’ stances on this globally significant issue.

Our primary focus was on user stance prediction based on network interactions within the Twitter platform. The results of our prediction experiments underscored the formidable predictive power embedded within these network features, transcending language barriers and geographical boundaries. Notably, we observed a consistent superiority of network features over textual ones, highlighting the subtle yet potent cues embedded within users’ online interactions. This revelation sheds light on the inherent vulnerability of online profiles to stance inference, even in the absence of explicit statements from users regarding their beliefs.

Delving deeper into our analysis, we conducted an in-depth examination of the top learned features to decipher the underlying mechanisms driving the high performance of our stance classifier. Through this exploration, we unearthed a clear demarcation between the categories of accounts prevalent in each user class. This stark contrast underscores the pervasive polarization that characterized the discourse surrounding COVID-19, transcending cultural, linguistic, and ideological boundaries.

Looking ahead, our work opens up a myriad of avenues for future research and exploration. As we continue to refine our methodologies and expand our datasets, we aim to delve deeper into the linkability and characteristics of additional languages and regions. By incorporating larger volumes of user data, we seek to unravel the intricate interplay between linguistic nuances, cultural contexts, and online behaviors, offering invaluable insights into the multifaceted nature of global discourse.

In conclusion, our research represents a significant step forward in the realm of stance detection, illuminating the intricate dynamics underlying users’ perspectives on COVID-19. By harnessing the power of network signals and leveraging advanced machine learning techniques, we have provided a nuanced understanding of the universal polarization surrounding this global issue. As we continue to navigate the complex landscape of online discourse, our findings serve as a beacon of insight, guiding future endeavors towards a more comprehensive understanding of societal attitudes and beliefs in the digital age.

Author Contribution

Y.B Conducted the main experiments and wrote the main manuscriptW.M Contributed to the project direction and reviewed the manuscript

Aldayel A, Magdy W (2019) Your stance is exposed! analysing possible factors for stance detection on social media. Proceedings of the ACM on Human-Computer Interaction 3(CSCW), 1–20
Lai M, Far´ ´ıas H, Patti DI, Rosso V (2016) P.: Friends and enemies of clinton and trump: using context for detecting stance in political tweets. In: Mexican International Conference on Artificial Intelligence, pp. 155–168 Springer
Darwish K, Magdy W, Zanouda T (2017) Trump vs. hillary: What went viral during the 2016 us presidential election. In: International Conference on Social Informatics, pp. 143–161 Springer
Aldayel A, Magdy W (2022) Characterizing the role of bots’ in polarized stance on social media. Social Netw Anal Min 12(1):30
Thomas M, Pang B, Lee L (2006) Get out the vote: Determining support or opposition from congressional floor-debate transcripts. arXiv preprint cs 0607062
Somasundaran S, Wiebe J (2010) Recognizing stances in ideological on-line debates. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 116–124
Mohammad S, Kiritchenko S, Sobhani P, Zhu X, Cherry C (2016) Semeval-2016 task 6: Detecting stance in tweets. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 31–41
Haouari F, Elsayed T (2023) Detecting stance of authorities towards rumors in arabic tweets: a preliminary study. In: European Conference on Information Retrieval, pp. 430–438 Springer
Luo N, Xie D, Mo Y, Li F, Teng C, Ji D (2024) Joint rumour and stance identification based on semantic and structural information in social networks. Appl Intell 54(1):264–282
Haouari F, Elsayed T (2024) Are authorities denying or supporting? detecting stance of authorities towards rumors in twitter. Social Netw Anal Min 14(1):34
Siddiqua UA, Chy AN, Aono M (2018) Stance detection on microblog focusing on syntactic tree representation. In: International Conference on Data Mining and Big Data, pp. 478–490 Springer
Alturayeif N, Luqman H, Ahmed M (2023) Enhancing stance detection through sequential weighted multi-task learning. Social Netw Anal Min 14(1):7
Darwish K, Magdy W, Zanouda T (2017) Improved stance prediction in a user similarity feature space. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 145–148
Darwish K, Magdy W, Rahimi A, Baldwin T, Abokhodair N et al (2018) Predicting online islamophobic behavior after# parisattacks. J Web Sci 4(3):34–52
Rashed A, Kutlu M, Darwish K, Elsayed T, Bayrak C (2021) Embeddings-based clustering for target specific stances: The case of a polarized turkey. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 15, pp. 537–548
Samih Y, Darwish K (2021) A few topical tweets are enough for effective user stance detection. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 2637–2646
Ebeling R, Saenz CAC, Nobre JC, Becker K (2022) Analysis of the influence of political´ polarization in the vaccination stance: the brazilian covid-19 scenario. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 16, pp. 159–170
Iguacel I, Alvarez-Najar JP, V´ asquez PdC, Orte Alarc´ (2022) M.´ A., Samat´ an, E.,´ Mart´ınez-Jarreta, B.: Citizen stance towards mandatory covid-19 vaccination and vaccine booster doses: A study in colombia, el salvador and spain. Vaccines 10(5), 781
Wu B, Paltridge B (2021) Stance expressions in academic writing: A corpus-based comparison of chinese students’ ma dissertations and phd theses. Lingua 253, 103071
Weinzierl MA, Harabagiu SM (2022) From hesitancy framings to vaccine hesitancy profiles: A journey of stance, ontological commitments and moral foundations. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 16, pp. 1087–1097
Zhou L, Tao J, Wang K (2023) On left and right: Understanding the discourse of presidential election in social media communities
Lai M, Cignarella AT, Far´ıas DIH, Bosco C, Patti V, Rosso P (2020) Multilingual stance detection in social media political debates. Comput Speech Lang 63:101075
Darwish K, Stefanov P, Aupetit M, Nakov P (2020) Unsupervised user stance detection on twitter. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 14, pp. 141–152
Mart´ınez RY, Blanco G, Lourenc¸o A (2023) Spanish corpora of tweets about covid-19 vaccination for automatic stance detection. Inf Process Manag 60(3):103294
Liu Y, Li D (2023) Metaphor translation as reframing: Chinese versus western stance mediation in covid-19 news reports. Translation and Interpreting in the Age of COVID-19. Springer, ???, pp 13–34
Wildemann S, Niederee C, Elejalde E (2023) Migration reframed? a multilingual´ analysis on the stance shift in europe during the ukrainian crisis. arXiv preprint arXiv :230202813
AlDayel A, Magdy W (2021) Stance detection on social media: State of the art and trends. Inf Process Manag 58(4):102597
Bessi A, Petroni F, Del Vicario M, Zollo F, Anagnostopoulos A, Scala A, Caldarelli G, Quattrociocchi W (2016) Homophily and polarization in the age of misinformation. Eur Phys J Special Top 225(10):2047–2059
Fuchs C (2017) Social Media: A Critical Introduction. Sage, ???
Garimella VRK, Weber I (2017) A long-term analysis of polarization on twitter. In: Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017, Montreal, Qu´ ebec, Canada, May 15–18, pp. 528–531. AAAI Press, ???´ (2017)
Davies P, Chapman S, Leask J (2002) Antivaccination activists on the world wide web. Arch Dis Child 87(1):22–25
Keelan J, Pavri-Garcia V, Tomlinson G, Wilson K (2007) Youtube as a source of information on immunization: a content analysis. JAMA 298(21):2482–2484
Ache KA, Wallace LS (2008) Human papillomavirus vaccination coverage on youtube. Am J Prev Med 35(4):389–392
Keelan J, Pavri V, Balakrishnan R, Wilson K (2010) An analysis of the human papilloma virus vaccine debate on myspace blogs. Vaccine 28(6):1535–1540
Santoro A, Galeazzi A, Scantamburlo T, Baronchelli A, Quattrociocchi W, Zollo F (2023) Analyzing the changing landscape of the covid-19 vaccine debate on twitter. Social Netw Anal Min 13(1):115
Cascini F, Pantovic A, Al-Ajlouni Y, Failla G, Ricciardi W (2021) Attitudes, acceptance and hesitancy among the general population worldwide to receive the covid-19 vaccines and their contributing factors: A systematic review. EClinicalMedicine 40:101113
Mønsted B, Lehmann S (2022) Characterizing polarization in online vaccine discourse—a large-scale study. PLoS ONE 17(2):0263746
Cotfas L-A, Delcea C, Roxin I, Ioanas¸ C, Gherai DS, Tajariol F (2021) The longest˘ month: analyzing covid-19 vaccination opinions dynamics from tweets in the month following the first vaccine announcement. Ieee Access 9:33203–33223
Smyser J (2020) Lies, bots, and coronavirus: misinformation’s deadly impact on health. Views from the field. Grantmakers Health
Apuke OD, Omar B (2021) Fake news and covid-19: modelling the predictors of fake news sharing among social media users. Telematics Inform 56:101475
Burki T (2019) Vaccine misinformation and social media. Lancet Digit Health 1(6):258–259
Truong J, Bakshi S, Wasim A, Ahmad M, Majid U (2022) What factors promote vaccine hesitancy or acceptance during pandemics? a systematic review and thematic analysis. Health Promot Int 37(1):105
Yaqub O, Castle-Clarke S, Sevdalis N, Chataway J (2014) Attitudes to vaccination: a critical review. Soc Sci Med 112:1–11
Silva ME, Skeva R, House T, Jay C (2023) Tracking the structure and sentiment of vaccination discussions on mumsnet. Social Netw Anal Min 13(1):152
Larson HJ, Jarrett C, Eckersberger E, Smith DM, Paterson P (2014) Understanding vaccine hesitancy around vaccines and vaccination from a global perspective: a systematic review of published literature, 2007–2012. Vaccine 32(19):2150–2159
Crescitelli MD, Ghirotto L, Sisson H, Sarli L, Artioli G, Bassi M, Appicciutoli G, Hayter M (2020) A meta-synthesis study of the key elements involved in childhood vaccine hesitancy. Public Health 180:38–45
Bashar MA, Nayak R, Balasubramaniam T (2022) Deep learning based topic and sentiment analysis: Covid19 information seeking on social media. Social Netw Anal Min 12(1):90
Kuc¸¨ ukali H, Atac¸,¨ O, Palteki AS, Tokac¸ AZ, Hayran O (2022) Vaccine hesitancy and¨ anti-vaccination attitudes during the start of covid-19 vaccination program: a content analysis on twitter data. Vaccines 10(2):161
Bunch L (2021) A tale of two crises: addressing covid-19 vaccine hesitancy as promoting racial justice. Hec Forum, vol 33. Springer, pp 143–154
Sturm T, Albrecht T (2021) Constituent covid-19 apocalypses: contagious conspiracism, 5g, and viral vaccinations. Anthropol Med 28(1):122–139
Aldayel A, Magdy W (2019) Assessing sentiment of the expressed stance on social media. In: International Conference on Social Informatics, pp. 277–286 Springer
Augenstein I, Rocktaschel T, Vlachos A, Bontcheva K (2016) Stance detection with¨ bidirectional conditional encoding. arXiv preprint arXiv :160605464
Sobhani P, Inkpen D, Zhu X (2017) A dataset for multi-target stance detection. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 551–557
Siddiqua UA, Chy AN, Aono M: Tweet stance detection using an attention based neural ensemble model. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (long and, Papers S (2019) pp. 1868–1873
Qiu M, Sim Y, Smith NA, Jiang J (2015) Modeling user arguments, interactions, and attributes for stance prediction in online debate forums. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 855–863 SIAM
Dong R, Sun Y, Wang L, Gu Y, Zhong Y (2017) Weakly-guided user stance prediction via joint modeling of content and social interaction. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1249–1258
Sobhani P, Inkpen D, Zhu X (2019) Exploring deep neural networks for multitarget stance detection. Comput Intell 35(1):82–97
Saenz CAC, Becker K (2024) Understanding stance classification of bert models: an´ attention-based framework. Knowl Inf Syst 66(1):419–451
Faulkner A (2014) Automated classification of stance in student essays: An approach using stance target information and the wikipedia link-based measure. In: The Twenty-Seventh International Flairs Conference
Ghosh S, Singhania P, Singh S, Rudra K, Ghosh S (2019) Stance detection in web and social media: a comparative study. In: International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 75–87 Springer
Magdy W, Darwish K, Abokhodair N, Rahimi A, Baldwin T (2016) # isisisnotislam or# deportallmuslims? predicting unspoken views. In: Proceedings of the 8th ACM Conference on Web Science, pp. 95–106
Yu W, Yu M, Zhao T, Jiang M (2020) Identifying referential intention with heterogeneous contexts. In: Proceedings of The Web Conference pp. 962–972 (2020)
Li Y, Caragea C (2019) Multi-task stance detection with sentiment and stance lexicons. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 6299–6305
Xi N, Ma D, Liou M, Steinert-Threlkeld ZC, Anastasopoulos J, Joo J (2020) Understanding the political ideology of legislators from social media images. In: Proceedings of the International Aaai Conference on Web and Social Media, vol. 14, pp. 726–737
Benkhedda Y, Azouaou F, Abbar S (2022) Profile fusion in social networks: A data-driven approach. Social Media Analysis for Event Detection. Springer, ???, pp 89–110
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv :181004805
Muller M, Salath¨ e M, Kummervold PE (2020) Covid-twitter-bert: A natural language pro-´ cessing model to analyse covid-19 content on twitter. arXiv preprint arXiv:2005.07503

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Diverse Enough but with Common Views:Building a Global Stance Classifier on COVID-19

Status:

Version 1

Abstract

Figures

1 Introduction

2 Related Work

2.1 Analysing Online Discussions on COVID-19

2.2 Stance Detection

3 Dataset

3.1 Data Collection Methodology

3.2 Dataset Insights

3.2.1 Countries

3.2.2 Languages

4 Baseline

4.1 Logistic Regression

4.2 CT-BERT

5 Stance Prediction

5.1 User Network Modeling

5.2 Training Setup

5.3 Findings

5.3.1 General prediction

5.3.2 Language-based performance

5.3.3 location-based performance

6 Features Analysis

6.1 Global Features Analysis

6.2 Language-Based Features Analysis

6.3 Similarity between Networks

7 Discussion

7.1 What does the network interaction entail in terms of stance?

7.2 Global Signals for COVID-19 Pandemic

7.3 Limitations and perspectives

7.4 Ethics and Privacy Considerations

8 Conclusion

Declarations

Author Contribution

References

Additional Declarations

Status:

Version 1