Update on my County-Level #COVID19 Tracking Project: 34 states complete, 16 partially complete

Since I've been neglecting other ACA/healthcare posts the past couple of weeks, I figured I should at least provide regular updates on why I've been mostly absent.

I've made major progress in updating and revising my breakout of COVID-19 cases and fatalities at not just the state level but the county level. Again, I've separates the states into two separate spreadsheets:

Until a few weeks ago, I was only able to track about a dozen states at the county level, and even then it went painfully slow, as every state health department website presents their data differently. Some make it easy to collect, others make it difficult. Getting ahold of historic data (that is, daily numbers prior to the current date) was even more difficult; I had to rely on the Internet Archive's Wayback Machine, which can be spotty at times.

Later, someone clued me in to the New York Times' GitHub archive; the paper's team of data researchers are doing most of the work for me...and have been dating back to as early as February in some states. This has made the data itself far more accessible for every state and U.S. territory...but it still doesn't speed up the actual data plotting process, which I'm still doing manually.

Then, a week or so ago, someone led me towards an even better GitHub data repository resource from Johns Hopkins University, which separates out "Unassigned" and "Out of State" cases, has a better data layout and (critically important!) includes zeroes for the blank counties (i.e., the counties which don't have any reported cases yet). This is critically important for speeding up the posting process.

One irritating thing about both github repositories, however: For some reason, they've decided to lump together all five NYC borroughs into a single "New York City" listing each day. This is irritating since the whole point of this project is to break out the numbers into smaller units, and of course NY City is composed of over 8 million people, but it's the best I can do (the NY state health dept. does break out the numbers by each of the five boroughs, but that's only available for the current day, and the layout of their site is confusing and hard to work with as well).

In any event, as of this writing, I've completed the following:

  • Daily county-level data from 3/20 until at least 5/20 for 34 states
  • At least 2 weeks of county-level data for most other states

I hope to finish bringing all 50 states up to date through 5/28 by tomorrow. This should allow for plenty of interesting analysis of trends and counties to keep an eye on. It will also allow me to get back to posting more regular ACA policy updates/etc.

Advertisement