Time-Series Analysis of New Covid Cases

I used two different time-series libraries in a Python time-series analysis model to predict new Covid cases in the United States.

Data was taken from ‘Our World in Data‘, https://covid.ourworldindata.org/data/owidcovid-data.csv, with showed, in part, daily numbers of new Covid cases for the United States from January 23, 2020 – January 9, 2022.

The first model used LinkedIn’s time series algorithm called Silverkite, and LinkedIn’s library called Greykite. (What’s with the kites?)

The model forecasts new Covid cases in the U.S. 90 days ahead of today’s date (which was January 10, 2022, when I created this post), with a prediction interval of 95%.

The results from this analysis are as follows:

The second model I created also used LinkedIn’s Silverkite algorithm, but instead used Facebook’s Prophet as the library to predict new cases. When you run the Python commands (I used a Jupyter Notebook), you’ll get warnings that Prophet will disable yearly and daily seasonality unless you change a setting. I kept the default because I didn’t want to take seasonality into consideration at the first running of the model.

The results from this analysis are as follows:

So what do you think of these different predictions? I think both models are going to be wrong in actual numbers of new cases because of the rise of the Omicron variant. And this dataset did not differentiate between the covid strains. Omicron is reported to be easier to spread but, overall, affects people less severly than the other covid strains.

But one thing that I find interesting is how aggressive each model is in forecasting new cases. Greykite shows drop in cases after the train end date (Jan 9, 2022) that continues through February 2022, but then it quickly moves back up. Prophet also shows a drop after the train end date (Jan 9, 2022) but makes a slow upward trend.

I hope Prophet is right, but we will see.

Another I wanted to do was compare my models with the data posted on the NY Times about U.S. cases. I did a screen capture and, using Photoshop, overlayed the new reported cases data onto my Greykite model. (Please note: I taught Photoshop for a long time and fully believe using this tool is an important way for data scientists to better communicate their findings.)

There results are as follows:

The NY Times data ended on Jan 8th while the ‘Our World in Data’ ended on Jan 10th, but the number of new cases is a nice match between the two sources.

I’ll keep watching to see how accurate my forecasting models are at predicting new cases of Covid in the U.S. I’ve posted my Jupyter Notebook and csv files on my Github, located here.

Sara Kubik