Random Topics

COVID-19: Ultimate Data Set [CSV] (refreshed August 9, 2021)

April 14, 2020

This is the ultimate GLOBAL data set for COVID-19. It provides population, deaths, land area, ISO codes, confirmed cases, deaths, ICU beds, workers, workers that take public transit, new and cumulative cases/deaths/recovered, day numbers (to compare on set of outbreak or quarantine dates), and more ..at the county level!

Go to the bottom of the post for the download links!

I have been using John Hopkins (JH) data to track COVID-19 for quite awhile now. JH has done a great job collecting all this data. However, the ability to tie other data into it has been challenging and the schema has frequently changed. Again, I’m grateful for their work.

As you may know, I have been maintaining a consistent schema for a few months now and sharing that on my blog. However, I found myself manually combining and analyzing other data with JH’s data. So, this ultimate data set combines the hard work of John Hopkins with other useful data (e.g., population, land area, quarantine dates, outbreak dates, ICU beds, and more).

I have created an ultimate data set for COVID-19, that has the following:


I was considering adding the following:

  • US state-level test data (positive, negative, total)

This file is in beta at the moment and may change. Please provide feedback.

Download here (refreshed with 8/9/2021 data):

Note: Starting in June 2020, the “series2” file will only include a trailing 3 months due to an error I was getting, likely due to the size of the file and my computer’s ability to process. All other files have all dates available. If you need those dates, let me know and I’ll add a separate file that you can concatenate.

Dates are exploded into rows (my preference): (This file has more info avail, like day numbers, dates that events occurred, new cases/deaths/recovered but is only limited to the trailing three months)

https://www.soothsawyer.com/wp-content/uploads/2020/03/john-hopkins-ryan-format-time-series2.csv (~210MB)

Download gzip version here (~12MB) <—– FASTER DOWNLOAD

Dates are individual columns instead of rows (this makes the file way smaller) and this file has ALL dates:

https://www.soothsawyer.com/wp-content/uploads/2020/03/john-hopkins-ryan-format-time-series1.csv (~3MB)

Only registered users can comment.

  1. Hi Ryan,
    There is no new information for new or recovered cases in the whole of China for a few days, have you any information about this?
    I’m wondering about this since I live in China and your site is my only information source.
    Thanks!
    Paul

    1. Hi Paul, There were three new deaths on July 14/15, and there is about 20-30 new cases per day on average (7-day avg). However, to your point there has been zero new cases for 4 days now. I read an article today (7/21/2020) https://www.wionews.com/world/chinese-mainland-reports-11-new-covid-19-cases-including-8-in-xinjiang-314891 that indicates there were 11 new cases in Xinjiang where they only had 77 confirmed cases. I have not seen this show up in the data yet. Let’s see if it shows up today or tomorrow in the numbers.

  2. Hi Ryan,
    This is amazing – thank you.
    Where can I find data on confirmed COVID cases for last summer, for May and June only?
    Thanks!
    Nathan

Leave a Reply

Your email address will not be published. Required fields are marked *