I’m using time at home to try new things or return to old ones. I’m cleaning the basement, and planting a vegetable garden, things I haven’t done in a few years, and experimenting with data visualization with new and different datasets. Like many of you, I’ve also been transfixed by the news and tracking global and local COVID-19 data, as it makes for an intense study of data visualization and communicating with data. 

So many questions bubble up as we consume data like this. Why these data points? Why did {organization} choose that chart, graph, diagram, or single number to communicate their message? Why do they seem to be collecting this data but not that data? How can I make that cool visualization I saw on that website? 😁

Too close (to a hotspot) for comfort?

I live in New York State, one of the earliest US COVID-19 hotspots. I’ve watched the news in horror as tens of thousands of my statemates (not a word, but I’m using it!) fell ill or passed from the virus. But while tracking a virus by geopolitical borders seems arbitrary in a way, it’s also what we have at hand and makes for understandable reporting to the public. People are naturally concerned with data on a local level. We want to know how close to home the virus has hit. The vast majority of COVID-19 cases in my state, however, have been around the New York City area, and I live more than 350 miles northwest of there, so I’ve been studying data from my county, Monroe – 1300+ square miles (relatively small as counties go), with about 742 thousand people. 

I’m frustrated that our county dashboard and daily data releases rely primarily on single numbers – how many new cases, people hospitalized, deaths since yesterday, etc. I wanted to look at trend data, so I opened Excel and got to work experimenting with data visualization.

What works for whom, when, and under what circumstances? 

This is a question I learned to consider doing evaluation work, and I mention it here because what follows is a number of works in progress… unfinished, unrefined, undecided-upon graphs I made, just to see what the data looks like when I view it in different ways, AND to try my hand at a few chart types I don’t often use given the work I do. Right now, the graphs are for me (well, and you too now!), so I haven’t considered one of the most important questions – who IS my audience? Who will consume these graphs and what do they need in order to make meaning from them?

It’s clear some graphs are not appropriate for the dataset or for communicating certain messages or communicating to certain audiences, but hey, I like experimenting with data visualization and writing about it. One key decision I need to make (and soon!) is how much data I want to see in each graph – one week? Two weeks? A month? As you’ll see, most of these graphs have outgrown the number of data points currently in them. 

Experimenting with Data Visualization

Combo Column Graph with Line

The day-to-day data is so variable and fraught with questions. Was there more testing on some days? Do weekends make a difference? Does the availability of testing in different locations make a difference? For that reason, I thought it best to focus on a 7-day average vs the day-to-day ups and downs, but I still wanted to see the day-to-day. I added a couple of annotations where there were days that may have compelled some people to gather, to watch for potential spikes in the weeks that follow.

Combination column ad line charts of daily new cases and deaths

Overlapping Bar Chart

I love overlapping bars. What a fabulous way to show a subset of a set. I frequently use these in my work. This shows me clearly that while hospital admissions have been on the rise, ICU placements have decreased and make up a lesser proportion of those in the hospital. (Note: there is missing data in the early days when the county didn’t report these numbers for a bit.)

Overlapping bar chart of daily hospitalizations and ICU placements

Vertical Dot Plot

I tried the same dataset with a different graph. I love vertical dot plots. So great for comparing two things. While this looks attractive, and to me really emphasizes that gap between how many hospitalized and how many in ICU, it isn’t quite the right chart because ICU placements is a subset of hospitalizations. They’re not two separate groups. I shared these two charts with private Facebook and Slack groups associated with Evergreen Data Academy and everyone there agreed that the overlapping bars communicate this dataset better. That’s why we experiment, though, right? Note too that this graph is getting way to big and the dates on the x-axis are diagonal – a big no-no! I don’t do neck-breaker charts! 🧐

Vertical dot plot of daily new cases

Waterfall Graph

I’ve rarely (if ever?) used a waterfall graph, but wanted to ensure I could make and customize one if I needed to. This dataset isn’t quite the best fit since 1.) there are no decreases and waterfalls work best when you have both increases and decreases, and 2.) the individual data points are really small, and it’s hard to see the differences even when they’re substantial. So, an increase of 96 doesn’t appear much bigger than an increase of 19.

Waterfall chart of daily new cases

Instance Chart

I have to admit, this is one of the coolest charts I’ve ever made! Instance charts are relatively new on the scene, and I used a tutorial from Evergreen Data Academy to learn this one, so you’re seeing my first attempt! Now here’s one chart that can withstand a LOT of data (people have famously tracked eons of climate data in one of these), so I can keep collecting and adding to this one for a long time!

Instance chart of daily new cases

Excel Table with Indicator Dots

Before I started making charts, I entered data in a simple Excel table. I collect all data my county reports, and keeping it all in the table for the time being, even though I don’t pay too much attention to certain columns. I wanted to see whether some daily changes were going in the right direction or not, so I added indicator dots to three of my columns looking at how my 7-day average of new cases per day, new deaths per day, and new hospitalizations per day was improving (i.e., decreasing), staying the same, or getting worse (i.e., increasing). I added conditional formatting rules to highlight highs and lows in some columns and color-coded, so the “good color” is green whether it’s a high or low point (e.g., low hospitalizations is good; high number of tests is good). I’m currently heartened by the 6-day streak of declining 7-day average of new cases I can see in column E with the green indicator dots.

I’ve rearranged and grouped columns in different ways, changed column headings, etc., and added conditional formatting. Again, a work in progress that I continue to evolve as I think about what I want from this visualization. A note about colors: Dataviz rockstar Stephanie Evergreen (and many others) advise against using traditional stoplight colors – red, yellow, and green – as they can be problematic for those with forms of colorblindness. I couldn’t agree more, and if I were to make this table/dashboard public or for a specific audience I would make different design choices (Note: The column headings got squished when I shrunk this to get the screenshot).

Excel table with indicator dots; multiple COVID-19 data points

Are you an amateur (or professional?) COVID-19 data tracker?

Are you experimenting with visualizing COVID-19 or other data? Feel free to share your work in progress here. I’d love to see it.

Check this out: My friend Elizabeth Grim explores how we think about COVID-19 data in Revisiting COVIDeracy: What’s in a Number?

Here are some of my favorite resources for learning and experimenting with data visualization.