An Augmented Multilingual Twitter Dataset for Studying the COVID-19 Infodemic

doi:10.21203/rs.3.rs-95721/v1

Download PDF

Data Note

An Augmented Multilingual Twitter Dataset for Studying the COVID-19 Infodemic

https://doi.org/10.21203/rs.3.rs-95721/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 20 Oct, 2021

Read the published version in Social Network Analysis and Mining →

Version 1

posted

You are reading this latest preprint version

We present an openly available dataset to facilitate researchers’ exploration of popular discourse about the COVID-19 pandemic. The dataset, whose collection is ongoing, currently consists of over 780 million tweets, from all over the world, in multiple languages. Tweets start from 22 January 2020, when the total cases of reported COVID-19 were below 600 worldwide. The dataset was collected using the Twitter API and by rehydrating tweets from another openly available database. To facilitate access for other researchers, the English-language tweet data has been augmented by state-of-the-art Twitter sentiment and named entity recognition algorithms. The dataset and the summary files we provide allow researchers to avoid some computationally intensive analyses, facilitating more widespread use of social media data to gain insights on issues such as (mis)information diffusion, semantic networks, sentiment, and the evolution of COVID-19 discussions. The insights extracted from such analyses could help inform policy and advocacy work amid the current and future pandemics.

Systems and Networking

Social Policy

Health Policy

Twitter

COVID-19

named entity recognition

sentiment analysis

dataset

Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the latest manuscript can be downloaded and

accessed as a PDF.

lopezbecCOVID19TweetsDatasetCOVID19TweetsDataset.URL
Dataset link: https://github.com/lopezbec/COVID19_Tweets_Dataset

Download PDF

Journal Publication

published 20 Oct, 2021

Read the published version in Social Network Analysis and Mining →

Version 1

posted

You are reading this latest preprint version

An Augmented Multilingual Twitter Dataset for Studying the COVID-19 Infodemic

Status:

Journal Publication

Version 1

Abstract

Figures

Full Text

Supplementary Files

Status:

Journal Publication

Version 1