Random Topics

John Hopkins COVID-19 Time Series Historical Data (with US State and County data) – Up to 9/1/2021

March 24, 2020

For those of you tracking COVID-19 (Coronavirus) thru John Hopkins, they modified their time series database schema on March 23, 2020 and deprecated their previous schema immediately.

I recreated the older schema version of the following files (up to date with September 1, 2021 data).

These files have ALL historical data from John Hopkins.

You can download those files here:

The URLs below will not change so feel free to link directly.

HAVE YOU SEEN THE NEW ULTIMATE COVID DATA SET? CHECK IT OUT. DAY NUMBERS, QUARANTINE DATES, POLITICAL AFFILIATION, POPULATION, LAND AREA, AND MORE INCLUDED.

GLOBAL WITH REGION/STATE LEVEL (THIS IS THE SCHEMA FROM MARCH 21, 2020)

These files were last updated September 2, 2021 at 10:23am PT (as of Dec 2020, I am refreshing the data once a week [Tue eve]):

time_series_19-covid-Confirmed.csv

time_series_19-covid-Deaths.csv

time_series_19_covid_Recovered.csv

GLOBAL WITH REGION AND US COUNTY/CITY LEVEL

If you want county-level data (combined with tons of other useful data), please click here to visit the “Ultimate COVID-19 data set” on my site.

3/27/2020 Update: I updated PowerQuery to fix some of complaints about latitude and longitude being missing and for multiple instances for various countries. The totals all worked fine before if you grouped them by country, but you would see duplicate regions (subtle naming differences) under the hood. See comments below for more details. There is now a table to override names in the Excel tool.

These files combine the original March 22, 2020 time series file with the individual daily stats from March 23, 2020 – September 1, 2021. This allows me to maintain state level data in the old schema so it doesn’t break my previously created reports. I will continue to refresh these with the latest data, as necessary.

Note: This new time series CSV file includes all US States, even for dates after March 22nd.

This is meant to hold you over until you update your schema or John Hopkins fixes their schema. You can connect your data source to the URL links above.

Download the PowerQuery tool I built to create the CSV files

I can refresh this file pretty easily since I have setup PowerQuery to do all the work, assuming John Hopkins doesn’t change the actual name of countries/regions (this wouldn’t break PowerQuery but would result in multiple instances of a given country/region..however, the totals should still be okay if this does happen).

Here is a link to download a zip file that contains my PowerQuery Excel sheet so you can refresh the data yourself. Let me know if you have any questions! I set it to refresh when you open the file so it may lag on opening. You just put the latest daily CSV files in the subfolder (all dates you want after March 22nd), open the XLS, then copy/paste the tabs into new CSV files.

Download the John Hopkins Excel Workbook (updated on 8/13/2020) that includes all the Power Query scripts to build the CSV files you are downloading on this page, by clicking here

Questions? Just ask.

Only registered users can comment.

  1. Thank you so much for doing this!

    Will your data be refreshed daily?

    I don’t understand why John Hopkins is so careless in their stewardship role of this data. It is obvious many many people and institutions are relying on this data and the arbitrary format changes and deletions have wide and adverse impact on all of us. How can they coordinate fighting the virus if they can’t even manage their data?

    We all do appreciate the access to the data. But clearly there is no need for it to be painful or waste our time.

    1. Bert, yes I will refresh the data daily until it no longer makes sense to. Expect it to be refreshed within 30 minutes of the John Hopkins data being refreshed (they update at 5p PT, so these files will be refreshed by 5:30p PT). I also included the PowerQuery tool that I use to refresh the data so others can refresh on their own.

  2. This is great. You are a hero. Like many, I was using the JHU timeline data and very disappointed when they totally changed it. I fought with many alternative solutions yesterday and today, but yours is ultimately better, and it means I don’t have to change anything about my app. I got the sense, however, that they were not going to rely on the recoveries data. So, I worry about them eventually dropping that altogether, which means there won’t be a way to track active cases. Until then, are these URLs going to stay consistent, or will you be moving to GitHub? Thanks!

    1. I will be refreshing these URLs by 5:30p every day. It takes about 5 minutes to refresh. The URLs will not change. I’m hoping this is a temp fix and we don’t need to make it an official github repository.

  3. Hey, thanks this data set is really helpful to me. Is there a chance you can upload your files to a github repo so I can access the raw file via URL? I noticed one detail was off, France appears a ‘France’ in the legacy data set pre 3/23 and ‘France, France’ post. There’s also a date missing in between the new and old.

    1. Jon, the URLs won’t change. I’m hoping that this is a temp fix until JH fixes their repo so won’t need a github?

      I did notice the France anomaly when I was putting the query together but ignored because when grouped by country it all works out. I notice a few cases where JH or someone upstream moved data from one province to another. And I’m not sure why they created a France, France region. I could fix but spot fixing things gets into a rathole for other minor issues like that. Were you breaking out France by region or aggregating? The only two regions I personally have looked at in detail is China and the US.

      1. Hey Ryan, I’m hoping for the same. My application pulls the data via web link, there’s probably some way to manipulate the file as a download in the server, but I’ve always gathered the data with a url to the “raw” data. I’ve uploaded your csv, here’s what it looks like:
        https://raw.githubusercontent.com/vanagonjon/corona-tracker/timeshift/time_series_19-covid-Confirmed.csv
        I understand your hesitation, and I don’t mind hosting it on my repository.

        My application is fairly generic, it simply takes an index of all time series, and allow you to plot them against one another hence France and France, France both show up in a search. Check it out if you like:
        corona.vanagonjon.com

        Thanks again for making this available!

  4. hi, ryan. just FYI, seems there are some little issues with the data, some countries are missing lattitude/longitude information and have multiple names, for example Denmark and Bahamas. seems like there are duplicates in the original data, but the actual numbers in the timeseries are not the same.

    1. Jane,

      I believe the issue you are calling out is that there is a “Region: Denmark, Country: Denmark” and a “Region:(blank), Country: Denmark”; Similar issue with France, United Kingdom, Netherlands. Then there are entries for Bahamas: original: “Country: Bahamas, The” and “Country: The Bahamas” and in the daily stats file: “Country: Bahamas”.

      The scope of work I was doing was to combine the files but I wasn’t trying to resolve name changes that John Hopkins does. Know that the empty region problems (majority mentioned above) would add up to the correct sum if you combine based on the country. In other words, the country totals should add up and “hide” the problem mentioned. Bahamas was just a bunch of variety of names that John Hopkins did for some reason. If I get time, I’ll put in a patch to manually patch these but I really don’t want to get in the habit of fixing naming issues as this could be a day job.

    2. Jane,

      Regarding the lat/long, it does look like any “new” countries that were not in the original time series are missing the lat/long. This would be countries like Belize, “Bahamas” (the new spelling), “Denmark” (the new version), Diamond Princess, “France” (the new empty region version), “Gambia” (the new spelling), Guinea-Bissau, Laos, Mali, Netherlands (new empty region version), Saint Kitts and Nevis, United Kingdom (new empty region version), American Samoa US, Diamond Princess Canada, Martinique France, Northern Mariana Islands US.

      The lat/long can be fixed. I think I relied on the original time series lat/long and didn’t even consider the lat/long from the daily files. The fix would be to use the lat/long from the daily files if there is none from the original time series. I’ll try to take a look at this if I get some time. The majority of areas should be okay though.

    3. Jon, I just updated the CSV files to resolve any naming and Lat/Long issues. PowerQuery is updated as well to handle this. Let me know if you still have problems.

  5. Did the country and state get swapped around today? Totally appreciate what you are doing to maintain this dataset, but making a column change order in a CSV with dynamic columns is a breaking change for many people.

      1. Hi Melih, if you are after State data, I would suggest using the other set of files on this post. If you would like county level data for the US (along with all the other global data) then use the “-county” version you are referencing. This “county” version actually closely mirrors the format that John Hopkins was using around March 11th. If you choose to use the county version, note that the aggregation of all counties within a State will total the same numbers for the same given State you find in the normal (other) version of the CSV files. The county version may have some non-county entries like “California” (versus “San Francisco, California”) that have all 0’s; This is an artifact of whats in the data sets from John Hopkins and you can safely ignore. In general, I would recommend aggregating (grouping) data if you are trying to view the data set in a certain way (this data set or any data set in general). Let me know if you still have questions.

        The way I personally use this is to separate the county from the State into different columns and that allows me to group by state or look at the county level.

  6. Seems to me that the time-series data for Wuhan are not available. Even though the records released by the government are not reliable, I think they are still useful to the research of epidemiological modeling. Any possibility of including a new row corresponding to Wuhan into the current CSV tables?

    1. Hi Xin, they are available in the data. You need to use the Hubei province (Wuhan is the capital city of this province). You should have no problem finding it with the number of cases early on there. Let me know if you have any questions.

  7. Hi Ryan,
    Thanks for your generosity to make this data available, I noticed that for April 1 Confirmed file, all US data for 4/1/2020 appear to be zero. Am I missing something?

    Thanks again.

  8. Hi Ryan,

    There seems to be an issue with the state data, the csv reports all zeros on today’s date. It looks like there are an number of US counties listed now which weren’t in the original format. Did JH change their scheme again?

  9. Hi Ryan,

    Great work.

    Do you have a time series for daily Active Case for counties? This will show an up or down trend.

    thanks.

    1. Kean, unfortunately I don’t think US hospitals are tracking active cases as much as other countries. Right now I have only seen active cases available at the country level and even that is questionable data.

Leave a Reply

Your email address will not be published. Required fields are marked *