With the deteriorating COVID-19 situation in the United States and constantly rising daily cases, I decided to look into the available data and see if it reviles something that might help me understand what’s going on. For this post, I used the data from Ourworldindata.org, the same data source I used for my post, COVID-19 Part 1 – Situation In India.
I wanted to know where exactly does the States stands in terms of the spread of this disease based on the data I had from other countries that have passed this stage and have progressed to bringing this disease under control. This would give me an idea of how bad things could get and possibly help me predict when this is going to end.
So far, only China, with its draconian lockdown and South Korea, with its aggressive testing and isolation have been able to control the spread of the disease. When I looked at their data, it didn’t revile much at first.

The graph above represents the number of days after the first reported case in China on x-axis and the total number of confirmed cases on y-axis. It’s pretty much a simple logistic growth curve with its inflection point at somewhere around Day-40.

The graph above is a similar representation for South Korea’s data. The country hasn’t stopped the spread to the extent that China has but, it has certainly managed to suppress its exponential growth and turn it into, what appears to me, a linear rise.
But, I made the most interesting discovery when I changed my Python script to calculate and visualize the percent change in the number of confirmed cases (Or, change in number of confirmed cases for every hundred existing cases).
To simplify, imagine a hypothetical country had 1 confirmed case on day-1 and discovers a new case on day-2. It’s a 100% rise and so my first data point becomes 100 (on y-axis). Now, on day-3 this country discovers another new case. Now it’s a 50% change ( 1 new case for existing 2 cases). So, my next data point becomes 50 and so on.
When I plotted this data for China and Korea, I noticed that there were chaotic spikes in the beginning, followed by a period of expansion and then, an almost linear, negative slope until the graph approached zero.

I gave it a deep thought and came up with an explanation. The initial spikes are pretty obvious. The number of cases in the beginning are low so, a slightest change becomes to a huge shift in percentage (how in our hypothetical country, the percentage change was 100% at first, 50% later and had we continued with 1 new case everyday, would have corresponded to 33.33%, 25%, 20%…).
Since, the number of confirmed cases are quite low in the beginning, the governments wait to see how the situation evolves and do not act immediately. This leads to our second section, the period of rapid expansion where the numbers blow up.
Now is when the governments step in and put extreme measures of isolation to check the spread of disease, leading to the third section, the linear, negative slope. The beginning of this downward slope almost matches the day when such measures were implemented by Chinese and South Korean government. And, this downward slope is the most interesting thing that I discovered.

I noticed a similar pattern in the data from the USA. Except this time, the period of expansion was too long! The Chinese expansion started on day-17 and ended on day-29, lasting a total of 13 days. The same for South Korea was between day-29 and day-41, summing up again to 13 days. But, for the States, it started on day-40 and ended on day-63. These 34 days were enough to bring the total number of cases in hundreds of thousands (More than 300,000 as of this writing).
Because the graph for the States has recently started to go down following a long period of expansion, there’s no clearly visible downward line as of now. But, to mathematically formulate a graph, I started the calculation from day-45, calculated the best weights by minimizing the sum of squared residuals (SSR) from the data points and calculated the slope of a regression line. The coefficient of regression turns out to be -0.871203.
Then, I projected this line further until it touched zero to see when this mayhem in the United States will come to an end. And, here’s what the graph looks like.

It touches zero on day-92. With day-75 being the 5th of April, 2020, the new number of cases reported daily should approach zero on 22nd of April, 2020. Certainly, the new cases won’t be zero because of how a logistic curve behaves and elongates upon reaching the top. The number of susceptible population in the Compartmental SIR model of epidemiology does not become zero. In the real world, this would mean that there would still be people for this disease to infect either because they have not been infected yet or the immunity after recovery isn’t enough to prevent re-infection (This is what’s happening in China now). This continues until the disease is completely eradicated by a vaccine.
With the gradient of this slope calculated and the number of confirmed of cases available, I wrote another script to predict the count of daily new cases till day 92. “Init infec” is the number of infected people at the beginning of any given day, “Daily New %” is the percentage of “Init infec” that are expected to get infected on that day (calculated using the coefficient of regression above) and “Daily New Infec” is the number of infections predicted to be reported on that day (This is added to “Init infec” to calculate “Init infec” for next day). Here’s what I got:
=================
Day 76
=================
Init infec = 312237
Daily New % = 14.223377000000001
Daily New Infec = 44410.64564349
=======================================
Day 77
=================
Init infec = 356647.64564349
Daily New % = 13.352174000000002
Daily New Infec = 47620.214213222214
=======================================
Day 78
=================
Init infec = 404267.8598567122
Daily New % = 12.480971000000002
Daily New Infec = 50456.5543510369
=======================================
Day 79
=================
Init infec = 454724.4142077491
Daily New % = 11.609768000000003
Daily New Infec = 52792.44952887872
=======================================
Day 80
=================
Init infec = 507516.86373662786
Daily New % = 10.738565000000003
Daily New Infec = 54500.02829831922
=======================================
Day 81
=================
Init infec = 562016.892034947
Daily New % = 9.867362000000004
Daily New Infec = 55456.24123823741
=======================================
Day 82
=================
Init infec = 617473.1332731844
Daily New % = 8.996159000000004
Daily New Infec = 55548.8648515376
=======================================
Day 83
=================
Init infec = 673021.998124722
Daily New % = 8.124956000000005
Daily New Infec = 54682.74121795452
=======================================
Day 84
=================
Init infec = 727704.7393426766
Daily New % = 7.253753000000004
Daily New Infec = 52785.904361211615
=======================================
Day 85
=================
Init infec = 780490.6437038882
Daily New % = 6.382550000000004
Daily New Infec = 49815.20557972254
=======================================
Day 86
=================
Init infec = 830305.8492836107
Daily New % = 5.511347000000003
Daily New Infec = 45761.036515316824
=======================================
Day 87
=================
Init infec = 876066.8857989275
Daily New % = 4.640144000000003
Daily New Infec = 40650.76503738581
=======================================
Day 88
=================
Init infec = 916717.6508363134
Daily New % = 3.768941000000003
Daily New Infec = 34550.547396606686
=======================================
Day 89
=================
Init infec = 951268.19823292
Daily New % = 2.897738000000003
Daily New Infec = 27565.260062110683
=======================================
Day 90
=================
Init infec = 978833.4582950308
Daily New % = 2.026535000000003
Daily New Infec = 19836.40262405923
=======================================
Day 91
=================
Init infec = 998669.86091909
Daily New % = 1.1553320000000031
Daily New Infec = 11537.952477553772
=======================================
Day 92
=================
Init infec = 1010207.8133966437
Daily New % = 0.2841290000000031
Daily New Infec = 2870.2933581257807
======================
The calculation indicates that on day-82 (12th of April), USA will report the highest number of cases in a day and by the end of April, USA will have more than 1 million people infected.
I hope these dire predictions turn out to be wrong somehow. But, this is what the numbers say!