The 2019 novel coronavirus, COVID-19, which became a pandemic in 2020, has been the largest global public health emergency in living memory1. The virus is a strain of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)
[1] which afflicts the respiratory system and therefore causes symptoms such as coughing and breathing difficulties, fever, fatigue, as well as ageusia and anosmia [11]. Many research efforts are underway into various aspects of COVID- 19 including vaccine research [12], anti-viral treatments like Remdesivir [5], large scale literature data mining like the COVID-19 Open Research Dataset Challenge (CORD-19)2, and diagnostic tools for determining who has the virus at early stages [11]. In this study we investigate the feasibility of high accuracy audio classification of COVID-19 coughs as a potential diagnostic software application that would be available in-home or workplace through smart devices such as Amazon’s Alexa, or Google Home.
1WHO COVID-19, http://archive.is/SUtHp
2CORD-19, https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge
1.1 Background
There are a small number of research projects that demonstrate the feasibility of some types of classifiers. Brown et al at the University of Cambridge have developed a mobile phone application for crowdsourcing cough and breathing samples from members of the public [3]. They position their research as one in a long history of using bodily noises to diagnose ailments, from the logical conclusion that physiological changes can alter the natural sounds that human bodies produce [14]. They classify coughing and breathing audio using Logistic Regression (LR), Gradient Boosting Trees with Support Vector Machines (SVMs), achieving a best accuracy (AUC) of 82%. The data distribution of participants in this paper also shows that the dataset is skewed towards middle aged people, likely due to older participants (those more vulnerable to COVID-19) being less likely to engage with mobile phone crowdsourcing technology. However, the results show no difference between the age groups in classification.
Imran et al have also developed a mobile app which can classify a COVID-19 cough from a 2 second audio recording with accuracy in the region of 90%, with some data permutations producing accuracy of 96.76% [7]. The research used Mel-frequency cepstral coecients (MFCC), a type of spectrogram image, and a Convolutional Neural Network (CNN) for classification, which demonstrated better accuracy than the LR, and SVM, used by Brown et al.
Lastly, Sharma et al created Coswara, a database of coughing, breathing, and voice sounds (vowel sounds and counting) for COVID-19 diagnosis research. The data is collected through a web application which is used to collect, label, and quality control the dataset [10]. The data was then classified using a Random Forest (RF), achieving a mean accuracy of 70% for coughing.
1.2 Rationale
Research work is currently ongoing at the University of Manchester into audio classification in smart environments, towards a larger programme of work into human behaviour prediction. It seemed likely that the classifier developed as part of that work could easily be transferred with little modification to classify COVID-19 coughs. We were able to demonstrate a proof of concept that makes it possible for this diagnostic technology to be used at scale in smart home devices for early diagnosis of the virus before patients seek out clinical treatment, particularly in those cases where they might not seek help until the symptoms have significantly progressed.
1.3 Contribution to research
A demonstration of high accuracy classification of COVID-19 coughs using an MFCC CNN machine learning architecture.
1.4 Use case
In developing our classifier we proposed a use case scenario of a smart home device user (which could easily apply to a workplace or other location) such as an Alexa or Home device. The device passively monitors for cough sounds, and upon a positive classification for a COVID-19 cough, prompts the user to seek a professional medical diagnosis, or even calls the relevant local services for the user. This use case is shown in Figure 1.