The use of new technologies has been bursting into society for more than a decade, playing an progressively important role and having a profound impact on society (Behl et al. 2024) Digital evolution is substantially transforming industries towards more sustainable technological adoption (Xin & Ye, 2024; Hmamed et. al, 2024). Moreover, citizens no longer behave in the same way and continue to experience changes, from the way they communicate and seek information to the way they work and entertain themselves (Yeo et. al, 2022; Hendijani, & Marvi, 2020). Reflecting this, the world is witnessing a social phenomenon of global dimensions driven by interconnectivity. Real-time online communication has given rise to a new era of digital experiences that enhance interaction and social connection between individuals and, in this circumstances, the health sector is no exception.
During the 2019 World Health Assembly, several governments unanimously adopted the urgency of the World Health Organization (WHO) to develop a global strategy on digital health, following the statement of WHO Director-General Tedros Adhanom Ghebreyesus. In his statement, he stressed the importance of harnessing the potential of digital technologies to achieve universal health coverage. In this vein, technologies are seen as essential tools to promote health, preserve global security and provide services to vulnerable populations. As a reflection, communication, appointment management and, to a lesser extent, medical consultations are currently carried out using these technologies (Møller et al., 2024) .All of these aspects fall under the concept of telemedician.
Telemedicine can be defined as a process of healthcare practice through ICT (Information and Communication Technologies), where the patient and the medical professional are not in the same place (Peetso, T., 2014; Leung et al., 2018). Although this commitment to telemedicine has been evident for years, posterior the pandemic, there has been an exponential growth in demand for remote patient monitoring services and the promotion of teleconsultation, especially by governments in the United States and Europe (Conde et al., 2022). Presently, according to Global Market Insights (Swain & Kharad, 2023), the value of the digital health market has reached USD 233.5 billion and is expected to experience significant growth, reaching USD 981.5 billion by 2032. As Silva, G. and Schwamm, L. (2021) mention, the concept of telemedicine has never been as topical, necessary and disruptive as it is today.
There are research findings evidencing the positive impacts of telemedicine, with numerous benefits for patients, such as highly responsive medical care and real-time remote medical consultations that allow health problems to be addressed quickly and diagnoses to be obtained remotely (Capponi & Corrocher, 2022; Lu et al., 2021; Xing et al., 2020). Nevertheless, complexities and challenges associated with the use of these technologies have also been reported. In this regard, the literature points out that the problem of telemedicine acceptance may hinder the advancement of this technology (Montserrat et al.; 2021). The difficulty of telemedicine acceptance may be related to social, cultural, financial, legal barriers, lack of knowledge, low telemedicine literacy and lack of evidence (Arkorful et al., 2019). Therefore, the adoption and implementation of telemedicine has become a key challenge for the healthcare industry (Sims, 2018).
The impact of technologies has also led, in parallel, to the way information is shared and accessed in the healthcare sector, both for patients and professionals (Chaet et al., 2017), becoming its main tool for consultation and information exchange. Proof of this is the immense amount of user-generated content during the COVID-19 pandemic. In a single day, users shared COVID-19-related content on social networks more than 20 million times, reporting or sharing experiences about the pandemic (Molla, 2020). In fact, searching for health information through social networks has become a routine behaviour among Internet users due to (1) the high level of content that can be found on them (Ojéda-Martín et al., 2021), (2) the high level of interactivity among users (Ojéda-Martín et al., 2021) and, above all, (3) the easy access to information (Afful-Dadzi et al., 2023). In addition, social networks have a high capacity to disseminate both information and user experiences (Afful-Dadzi et al., 2023). Many users rely on other patients' ratings or experiences with the doctor to make a decision, while others are guided by their emotions when making a decision after finding attractive posts (Alalawi et al., 2019).
The ease of access to information is explained by the speed of social networks in obtaining information and the convenience of use, as they can be accessed at any time and place (Farsi et al., 2021), even allowing, without leaving the place, to obtain a medical assessment or interact in real time with health experts. Eventually, in social networks, they not only permit interaction between patients and doctors, but also between peers (patients with each other or doctors with each other), conversations even drive and convince about, for example, the choice of a doctor (Dorfman R., et al., 2019) (Farsi et al., 2021). Thus, we can state that social networks are implicated in the eWom phenomeno.
In their research, Farsi et al. (2022) classified the use of social networks from the patient's perspective into: 1. health information, 2. telemedicine, 3. health care provider search, 4. user support and shared experiences, and 5. positive influence on health behaviour. Furthermore, Farsi (2021) classified the use of social networks from the health care provider's perspective into: 1. health promotion, 2. professional development or practice promotion, 3. recruitment, 4. professional networking and stress relief, 5. professional medical education, 6. telemedicine, 7. scientific research, and 8. critical public health care issues.
In line with this, there are researchers who use this user-generated information to study consumers' shared experiences on social networks prior to purchase and identify relevant behavioural factors (Mishra, 2022) or identify behavioural trends. What makes social networks interesting as a source of research is that, in them, consumers express themselves freely and search for information on topics that are relevant and current for them, as in the case of COVID-19 (García & Berton, 2021; Moreno & Iglesias, 2021). Therefore, it is argued that social networks, being valuable Internet-based applications that users use for information seeking, shared experiences or peer support (Farsi, 2021), are in turn a good source of data to analyse their demeanour.
In any event, despite this advantage as a source of information for researchers, it has not been exploited for the analysis of consumer behaviour with respect to telemedicine. This is despite the fact that, as mentioned above, telemedicine is one of the most frequently discussed topics on the networks. Therefore, studying the comments made by users on the networks about telemedicine could help to eliminate the significant research gap on the perceptions and use of telemedicine by both patients and professionals.
In response to these gaps, this study aims to take further steps to better understand the use of telemedicine by patients and doctors and, more specifically, their main concerns and doubts about its use.
The aim of this work is to understand the aspects that most interest telemedicine users through online comments, in order to identify the issues that most concern, doubt and occupy patients and professionals related to telemedicine and to determine which are primary, secondary and residual. To achieve this, the following questions will be addressed:
RQ1: What are the most important topics (issues) related to telemedicine.
This work incorporates into the study of telemedicine user behaviour the dynamics of the topics of conversation that arise around telemedicine on social networks, specifically the social network Twitter. Within the digital world, studies show that the social network Twitter is the favourite network of users and professionals. However, its use is different depending on the group. Thus, while users use it to search for information on specific health problems (Antheunis ML., et al., 2013). Medical professionals use Twitter as a powerful weapon to disseminate knowledge and information among the population (Conde et al., 2022). For practitioners, the breadth of information that can be disseminated together with the fact that it is very accessible to the public makes it a very useful tool to promote positive health behaviours (Farsi et al., 2022).
Data mining techniques are used to extract and analyse the content of this platform, and machine learning is applied to identify the issues mentioned in both contexts. Finally, a conceptual model is developed to demonstrate the knowledge gained in this environment to understand the use of telemedicine.
The findings provide valuable information for understanding the context of telemedicine and emphasise the importance of reviewing not only the comments themselves but also taking into account the support of other users. This study provides a new perspective that may benefit future research in the field of social listening studies.
The present work uses Natural Language Processing NPL), a technique based on Machine Learning ML). It is an automated topic detection model using the Python programming language. Once the methodology of the work has been decided, the workflow shown in figure is followed to carry out the study:
2.1. Data collection
For the collection of the data, X network comments on telemedicine, we used the Snscrape library, setting 4 keywords ‘teleconsultation’, ‘virtual assistance’, ‘virtual doctor’ and ‘telemedicine’ as extraction parameters, which were validated by two researchers on online user behaviour and two telemedicine professionals. In total, 26,052 tweets containing the word ‘teleconsultation’, 129,858 comments with ‘telemedicine’, 687 comments with ‘virtual assistance’ and 36 comments containing the word ‘virtual doctor’ were extracted. In total 156,633 comments have been pucked out in the data collection phase.
It should be noted that in the extraction process we removed not only the comments, but also data related to the contents of the tweet, e.g. the unique number identifying the tweet, the name of the user, how many times it has been retweeted by other users or has been commented on, the number of likes obtained, the language of the tweet, users have been mentioned in the tweet, the url of the tweet, the location of the user or the hashtags they contain.
2.2. Selection, cleaning, transformation and pre-processing
In this stage, we group the comments into a single data frame. We remove duplicate and incomplete comments.
To justify which topics generate consensus, we visualise the number of comments that have ‘likes’, ‘retweets’ and ‘reply’ independently and obtain the result as shown below:
Table 1. Description of comments
|
like Count
|
retweet Count
|
reply Count
|
mean
|
5.097883
|
2.150116
|
0.532108
|
std (standard deviation
|
67.236642
|
35.817986
|
10.299488
|
min (minimum obtained)
|
0
|
0
|
0
|
about 75%
|
0
|
0
|
0
|
about 50%
|
1
|
0
|
0
|
about 25%
|
2
|
1
|
0
|
max (máximum obtained))
|
9020
|
7785
|
2411
|
at least 1
|
56224
|
35360
|
23186
|
Source: Own elaboration (2024)
It can be observed that more than 50% of the tweets have received at least one ‘like’; in the middle range is the number of ‘retweets’, while only 21% of the tweets have received replies or comments. In addition, to compare the topics mentioned in all comments with those that are also supported by the public, comments containing a ‘like’, a ‘retweet’ or a ‘reply’ are extracted, regardless of the weight of each comment. In total, 66,422 comments meet this precondition.
Subsequently, a series of procedures are executed to prepare the texts in the appropriate form for processing. The following steps are applied: 1. Convert the texts to lower case and remove non-alphabetic word characters (e.g., @, !, ?, number, any symbol and special characters); 2. These are words that do not have a significant weight or are not relevant when interpreting the data. For this purpose, we use the corpus of empty words within the NLTK library. In total there are 322 words plus two hashtag words: ‘teleconsultation’ and ‘telemedicine’ as they appear in most of the tweets.
2.3. Data mining
After the previous step, we ran a calculation of the 100 most mentioned words and obtained the word clouds. Since the colour and size of the word can affect the visual interpretation for the human eye, we extracted the 20 most mentioned terms in both cases.
To detect the most relevant topics by users in order to build a concept map, we decided to use topic detection algorithms. Topic detection models are useful for clustering documents and organising large amounts of textual data. They allow information to be retrieved from unstructured text and grouped according to context patterns. This technique has been used and recommended not only to detect recurring themes, but also to discover hidden dimensions or patterns in a collection of texts (Stevens et al., 2012; Aggarwal & Gour, 2020; Mishra, 2022; Pardo et al., 2022). There are several common algorithms for building the topic detection model. Each applies different algorithms, however, all models are based on the same fundamental assumption: each document is composed of a mixture of topics, and each topic is composed of a collection of words. In the present study, the unsupervised machine learning model known as LDA is applied. According to Stevens et al. (2012), LDA learns the relationships between words, topics and documents by assuming that documents are generated by a specific probabilistic model. Therefore, this model provides better results when dealing with a large volume of documents. This is the case of the present work. This model takes documents as input and finds topics as output, which allows identifying latent topics in a corpus of documents and is able to automatically detect and extract latent semantic relations from large volumes of information (Stevens et al., 2012; Tran et al., 2019). This is why this topic detection model is applied.
To build the LDA topic model, a dictionary and a corpus are created from the extracted comments. For this purpose, the text is divided into words to be analysed individually. Then, the lemmatisation process is carried out to group the texts that have a similar meaning. Subsequently, the cleaned tokens from the previous step are imported to create a specific dictionary and generate a corpus for the LDA model.
The number of words contained in this dictionary is 185,302. To reduce possible noise when creating the model, the words are filtered by their frequency. Words that appear only once and words that have a fraction of the total size of the corpus less than 0.25% are removed. In our case it is 25 times. Using the fraction of the total corpus size plus the absolute number is useful and provides a more accurate way to measure the relative frequency of terms. Finally we are left with the corpus of 100,000 for the construction of the model.
Once the dictionary and the corpus of the two data frames have been obtained, the latent Dirichlet (LDA) package of the Gensim library in Python is used. The tweets are fed into a text corpus and then the LDA model algorithm is applied for topic detection. The three main pre-set parameters are: number of topics: 8 for all comments and 4 for comments with consensus; words per topic is handled by eta: 5 for all comments and 7 for comments with consensus; document is handled by alpha: 10.
Once the algorithm is provided with the three parameters, the model rearranges the distribution of topics within documents and the distribution of keywords within topics to obtain an appropriate composition of the keyword distribution. Topics consist of a distribution of probabilities of occurrence of the different vocabulary words. They are basically a collection of prominent keywords or words with the highest probability in the topic, which helps to identify what the topics are about. This involves discovering the hidden topics in the collection, classifying the documents into the discovered topics and using that classification to organise, summarise and search the documents. After the machine learning process, the results shown in figure 4 is obtained. With the results of this step, the first preliminary question can be answered.