Wednesday 22 October 2008

Using colour for preattentive processing in stacked bar graphs

Earlier this month Robert Kosara at EagerEyes.org produced a visualisation of the difference, in historical US presidential elections, between the popular vote and the Electoral College vote, cast by the delegates that the state voters actually elect to vote on their behalf. The questions this visualisation might answer include:

Q1 How big were the popular and EC votes?
Q2 How big was the difference?
Q3 How often and when was the popular vote greater than the EC vote?
Q4 Was the EC vote over 50% (a "majority"-- only a "plurality", i.e. more than anyone else, is necessary to actually win)?
Q5 Was the popular vote over 50% (sometimes called a "mandate")?
Q6 Were they on opposite sides of the 50% line?

Robert used a stacked bar graph, in order to show the answer to some of these questions. I'll use my own version of his graph for consistency, but the colours are the original ones:

I found Q3 hard to compare across the years using Robert's graph, because detecting the difference meant seeing the change in position between the green and blue areas, and I had to do it consciously, instead of relying on preattentive processing to bring the few instances to my attention.

Kelly O'Day suggested dot plots, with or without lines, but I found the differences in Q2 hard to compare across the years, and still the switch rounds in Q3 hard to detect. It seemed to me that the blue and green bars were interfering with each other, and strictly speaking were redundant anyway, so in comments I suggested removing them to make a "floating bar" graph.

(In my original comment I changed the colours from blue and green to purple and teal, in an attempt to bring the hues round the colour circle toward the classic red-blue combination, without actually using red and blue, which for obvious reasons would be confusing in this political context. But I've decided the difference in hue discrimination wasn't dramatic enough to be worth the extra change)

Kelly liked it but said the scale didn't easily show the difference, which is true, but I was still trying to show the numbers in question Q1 as well as the difference in Q2. That purpose hadn't changed from the original bar graph, and I wouldn't want to just have a graph of the differences aligned along a common scale, because that would lose the Q1 information. I had only removed what I thought was duplicated information from the graph.

As a compromise, I present a re-colored stacked bar graph.

Now it's not floating any more, and there's no danger of interpreting it as a graph for differences only, but the eye is still drawn to the difference bars, and to the (three) instances where the popular vote is less than EC vote (Q3), and to the (seventeen) instances where the EC vote is a majority, but the popular vote isn't (Q6).

I've used this technique of more saturated colours to draw the eye, and lighter or less saturated ones to avoid distractions without removing information, in my blog post of a few months ago Always show the distribution if you can. There, I wanted to emphasise Pentagon-reported military fatalities attributed to terrorist attack (dark green) and hostile action (red), without concealing all the rest of the data. It's all there, nothing hidden, but it isn't overwhelming the eye.

Thursday 9 October 2008

Another Nobel Prize for visual intelligence!

When Al Gore was awarded the Nobel Peace Prize last year, robert Kosara at Eager Eyes called it "A Nobel Prize for Charts". Now the 2008 Nobel Prize in Chemistry has been awarded Osamu Shimomura, Martin Chalfie and Roger Tsien, for green fluorescent protein (GFP). I think that counts as another prize for graphical display of information:

Wednesday 8 October 2008

Using spots and rings in tables

jenmoocat in comments asks about the "spot matrix" table I used to display the scores from one to ten of X options in Y categories. My technique has always been about using bubble charts, in a similar way to this heatmap tutorial at More Information Per Pixel. Chandoo at Pointy Haired Dilbert describes a different way, using a table and the Wingdings 2 font.

This is a great alternative, and would work really well in a dashboard. Five separate score levels is about the maximum that people can easily distinguish anyway. After that, you're relying more on the approximate response to levels of darkness to guide the eye. It becomes less of a table and more of a map.

Those curious about the history of such tables should have a look at page 174 of Edward Tufte's 1983 classic Visual Display of Quantitative Information, where Tufte shows and praises a Consumer Reports small multiple of tables of cars and their repair trouble spots by make and year. Tony Rose of DSA Insights points out that this is a sophisticated version of Harvey Balls, made less qualitative and more quantitative.

Edited to add: Thinking about the design of Chandoo's table some more, if you want to try his technique out in your own tables, bring the spots closer together, so that they appear to be words in a sentence. They'll be easier to read that way. And as there are only five columns in the example, if you bring them still closer, they can be like letters in a word, and "read" at a glance. Narrowing the table may require abbreviating the titles or turning them on their side, but I think it's worth it.

The design philosophy to follow is one similar to Tufte's "sparkline" philosophy, that a tiny picture is like a word, and should be presented at a similar typographical density. Stringing them out is liable to make it harder to see the patterns.

If you want to avoid privileging one orientation, you'll want the lines to be no further apart from each other than the spaces between columns. If you have only one row, consider abandoning spots altogether, and go for a tiny bar-graph sparkline instead. Gauging the relative value of circular spots is a problem, because you're asking the reader to judge areas, which are lower in Cleveland's Hierarchy than lengths. Their symmetry is only an advantage if the table is two-way, where columns would be harder to read up and down.