Coronavirus disease 2019 (COVID–19) is an infectious disease caused by se- vere acute respiratory syndrome coronavirus 2 (SARS-CoV–2) and was first identified in December 2019 in Wuhan, the capital of China’s Hubei province. High fever, cough, sore throat, headache, fatigue, muscle pain, and shortness of breath are the initial symptoms of COVID–19. Mathematical analysis of infectious diseases using SIR epidemic and endemic models have been stud- ied in the past literature [18]. Luo [21] indicates that certain measures taken by governments are based on various predictions. These predictions suggest the hospital needs, future deaths, infection peaks and others. Several regions across the world have gone into a state of lockdown, including travel restrictions, to prevent the spread of this deadly virus [10], [17]. Despite lockdown, massive amounts of death and confirmed cases occurred across the world. Data from countries like China, Japan and South Korea indicates that even though lockdowns proposed by governments have caused reduction in the spread of COVID–19 but there are other important factors which could not be neglected. One such factor is wearing masks in public, which have currently been accepted as a norm [27]. Social distancing, considered as an important factor that has contributed towards the control of the spread of this deadly virus, is expected to be prolonged in the near future [19]. The social distancing and lockdown are not just enough to prevent the spread of COVID–19 across the world. Forecasting cumulative death and confirmed cases are very useful to know and take preparatory measures to prevent the massive amount of deaths and confirmed cases across the world.
The objective of this research is to predict the cumulative count of future death and confirmed cases in top affected regions across the world using Multivariate Linear Regression. Regression models are known to be more robust towards noise provided there is aggregation of response and exposure variables [29]. The experimental results obtained from this paper are compared based on the error metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), R-Squared value, and Root Mean Squared Error (RMSE) to validate the efficiency of these proposed techniques. It explains 99% of the accuracy using Multivariate Linear Regression. These results are taken from top affected regions across the world such as the United States of America, United Kingdom, Spain, Italy and India (countries in World) along with Maharashtra, Tamil Nadu, Gujarat, Uttar Pradesh and Delhi (states in India) and Chennai (district in Tamil Nadu).