COVID-19 Part 3 – Situation In India (Revisited)

Countdown displays the time before India's second countrywide lock-down ends (3rd May, 2020, 23:59:59)
#StayHome #StaySafe
Countdown displays the time before India's second countrywide lock-down ends (17th May, 2020, 23:59:59)
#StayHome #StaySafe

In my last blog, COVID-19 Part 2 – Situation In The United States, I made two predictions about the spread of COVID-19 in the United States. First, that the United States will report a maximum number of single day cases on the 12th of April, 2020 and the other that the total number of cases will rise to 1 million by the end of April.

I’m not going to talk about what I got right as I plan to dedicate a separate blog for that (Spoiler: almost both of them) but, what I want to talk about in this blog is what I got wrong. And, based on these corrections, what my model’s numbers are for India in the coming week.

Originally, my calculation said that the USA would hit the 1 million mark on 22nd of April (Refer the output of the Python script in my previous blog) but, I calculated the marginal error in the calculated gradient and reflected it on to my prediction (not discussed in that blog) and concluded that it’ll take another week for the States to reach that mark. Thus, I wrote ‘by the end of the march’ instead of the ‘ on 22nd of April’.

I reviewed the predicted numbers for each day and figured that the reason they were off was because the way I used Least Squared Residual for regression analysis. This method works great for datasets where the deviation in the data is limited but, the real world data does not follow this rule.

So, in the light of the recent data, I tried to determine what I could have done to make my predictions better. Among various solutions I could think of are the use of better algorithms for linear regression. I want to try Total Least Squared and Bayesian Linear Regression and see their performance but so far, I couldn’t find the time to program and integrate them with the dashboard I created to analyze and visualize this data (Screenshots below).

A quick solution is to use only the most recent data and limit the duration of prediction to a shorter term. Then, alter the predictions slightly as we receive fresh data each day. This works because the shorter the duration, the lesser the inconsistencies. I know it sounds like cheating in the game of data analysis but, nothing’s stopping me from trying that. I played with my model and noted that had I used the last 7 days of data (as opposed to 30 days) on the day I made those predictions to predict the event of next 10 days (as opposed to 17 days), it would have produced really nice and accurate results.

What my prediction could have been had I used 7 days of data to predict next 10 days


And that’s exactly what I’m about to do now. So, as of today (24th of April, 2020) the percent delta in the total number of cases stands at 7.50604 with the gradient of the LSR at -0.196109. Projecting that for next 10 days and calculating the number of cases based on the projection shows that India should have 43,018 number of cases by the night of 4th of May, 2020, given that we have 23,077 cases as of today’s ECDC report (24th April, 2020).

Numbers from my model for India’s next 10 days


Here’s the same graph zoomed at the prediction

Above graph focused on the prediction

This prediction will be significant to watch as this day almost coincides with the end of India’s second country wide lockdown, 3rd April, 2020. So, let’s stay home, wait for the clock above to tick down to zero and see how the numbers grow.

Leave a Reply