Info Aperture is a blog about information design by Kate M.

Making Every one Count: Fun with small data sets

Making Every one Count: Fun with small data sets

In the modern world of “big data” we are often focused on giant datasets. The newest/hottest skills these days is to learn is how to wrangle and analyze all this data as a “data scientist.” Don’t get me wrong, I think this is great. Never in all of human history have we created and collected so much data to analyze. The potential to understand how things and even how we ourselves work is inspiring and a bit overwhelming.  Sometimes a giant dataset isn’t everything and may not actually be what is needed in some applications. Most people know that a higher “n ” (sample size) makes your research stronger but what if you are just starting out? Maybe you are doing something non-experimental like a program evaluation? Maybe you are evaluating a program that just got off the ground 6 months ago and has only had 40 participants. What if it will take years to get a high n ? Wouldn’t it make sense to every once in a while, take a look at what’s going on in your dataset at the beginning? Like in a pilot study?

While some people might groan at the idea of reporting on small datasets due to its lack of statistical power or supposed weak link to how things are in the population, it’s actually a really great opportunity to have some fun visualizing your data with some less common visualization techniques, and look more closely at the data you do have. Even more importantly, it’s a great way to humanize your data, and maintain the dignity and individuality of the people who made your data. When working with small datasets (n<40) I propose visualizing ALL of your participants. Here’s an example of the demographics of a pilot study (n=30).

TYPS_Demographics-01.png

Instead of saying X% were Hispanic or X% went to high school, I used squares to represent each study participant. I used a bar graph formation for ease of reading and comprehension, but these bar graphs are more data-rich than usual because each participant is actually visualized and not just represented by the length of a bar. We see each individual participant and not just the groups we’ve decided to group them in for the sake of summary.

  With hundreds or even thousands of participants we often rely on summaries of our data. We often present percentages or use statistics of central tendency which assume our distribution is normal, which often require a larger n to actually be normal. One outlier in a dataset of 20 is 5% of your dataset, where a dataset of 2,000 it’s only 0.05% of your dataset, which might not make as much of a difference when looking at averages. Percentages get wacky too. Let’s take a look at an example here.

skipped the Q.jpg

Here’s a sample where n=7. When you read the corresponding text, which presents percentages you may notice like almost 15% of our sample skipped the question. That’s a big chunk of “missing data.” A big enough chunk that any good researcher would have to address in their write-up. While some people may be able to easily discern that only one respondent skipped the question, why not be kind to your audience and just show them what 15% actually is, by visually representing all your respondents? It gives your audience an opportunity to focus on other things that might be more important. 

Another great thing about this approach is it gives you an opportunity to tell a story with your data representation. In the below example, you can almost anthropomorphize each square and see what each participant’s journey through foster care looked like.

permoutcomes.jpg

Building even further on using a visual symbol to represent each participant, I went and animated our participants with qualities that those participants would have in this video summarizing the report’s results:

In a lot of ways, data visualization is just another form of storytelling. We all know that being able to relate to or follow a character’s story is one way to capture our audience’s attention and encourage engagement with our content. This is just another simple way to do just that.

While we should be careful to make sure we are not equating each one of our participant’s identity, subjective experiences, or inherent dignity and worth to our simple visual representations, I think we are closer to maintaining these important things when we make every one count by at least attempting to visualize them all as individuals who just happened to take part in our research.

 

See the full pilot study report where I used squares here.

The report where I used circles is not out yet, but will be soon!

3 things I learned after 31 Food Illustrations for #inktober2019

3 things I learned after 31 Food Illustrations for #inktober2019

Telling the "whole story:" October 2019 SWD Challenge

Telling the "whole story:" October 2019 SWD Challenge