Saturday, 3 January 2009

Sorting and grouping

Visual tables
Visual intelligence is arranging a set of facts on a page or screen so their implications are comprehensible instead of incomprehensible, and sorting and grouping can help a lot. A common example in published stats is the American states: there are fifty of them (plus the District of Columbia), and that's a lot to take in in one go, so some sorting would help.

Unfortunately the most common sorting for states is the notorious "Alabama first" system of simply listing them in alphabetical order. This might sound like a good idea if you haven't got a better one, but the question is, why haven't you got a better one? There's almost always a pattern you can use, and if there isn't you should seriously ask yourself why you're bothering with a graphic at all.

Consider this visual table (or reorderable matrix, as Bertin calls them) showing the state results of US presidential elections: (you can click on all these tiny graphics to see a larger and I hope more readable version, though I'm trusting to Blogger's HTML settings on this occasion)

You can vaguely see the ebb and flow of party wins over the whole country, but, sorted alphabetically by state abbreviation, this data set could be impossible to come to any more detailed conclusions about. But sort it by most recent results......and a pattern begins to emerge, showing which states have been most steady (at the left and right sides of their regional boxes) or most "swingy" (vacillating in the middle).  Now group it (even if naively) into three regions and some things become even easier to see.Notice the switch over between north and south in the sixties.

Pie Charts
Jorge Camoes has recently defended the pie chart, which has a bad reputation for looking like this:He points out that with a bit of sorting and grouping, it can look more like this:
Actually, the first example was already sorted and grouped, but you wouldn't know it from the chaos of colours, which tells us that grouping is no good unless it's properly depicted, using colour fields, dividers, and group labels. By sorting and grouping, and showing the grouping, you can turn a large number of confusing values into something people can make sense of. I can't say it exactly makes me like pie charts, but it makes them less horrible.

Thursday, 27 November 2008

British design stamps



What I love about these is that the design of the "design stamps" is itself so beautiful, in the most understated way: clean white background and black and grey sans serif text and icon, nothing else visible except the subject of the stamps.  I like to use the trick of making headline text black and supplementary or optional text grey myself, especially where for one reason or another only one size of type is wanted. 

The other designs include Issigonis' Mini, Quant's mini, Concorde, the Spitfire, and Penguin books. Notwithstanding the presence of Beck's Underground map in the set, I think a stamp each could have been devoted to the Underground roundel and the font it uses, Edward Johnston's Railway type.

Wednesday, 19 November 2008

Wednesday, 22 October 2008

Using colour for preattentive processing in stacked bar graphs

Earlier this month Robert Kosara at EagerEyes.org produced a visualisation of the difference, in historical US presidential elections, between the popular vote and the Electoral College vote, cast by the delegates that the state voters actually elect to vote on their behalf. The questions this visualisation might answer include:

Q1 How big were the popular and EC votes?
Q2 How big was the difference?
Q3 How often and when was the popular vote greater than the EC vote?
Q4 Was the EC vote over 50% (a "majority"-- only a "plurality", i.e. more than anyone else, is necessary to actually win)?
Q5 Was the popular vote over 50% (sometimes called a "mandate")?
Q6 Were they on opposite sides of the 50% line?

Robert used a stacked bar graph, in order to show the answer to some of these questions. I'll use my own version of his graph for consistency, but the colours are the original ones:

I found Q3 hard to compare across the years using Robert's graph, because detecting the difference meant seeing the change in position between the green and blue areas, and I had to do it consciously, instead of relying on preattentive processing to bring the few instances to my attention.

Kelly O'Day suggested dot plots, with or without lines, but I found the differences in Q2 hard to compare across the years, and still the switch rounds in Q3 hard to detect. It seemed to me that the blue and green bars were interfering with each other, and strictly speaking were redundant anyway, so in comments I suggested removing them to make a "floating bar" graph.

(In my original comment I changed the colours from blue and green to purple and teal, in an attempt to bring the hues round the colour circle toward the classic red-blue combination, without actually using red and blue, which for obvious reasons would be confusing in this political context. But I've decided the difference in hue discrimination wasn't dramatic enough to be worth the extra change)

Kelly liked it but said the scale didn't easily show the difference, which is true, but I was still trying to show the numbers in question Q1 as well as the difference in Q2. That purpose hadn't changed from the original bar graph, and I wouldn't want to just have a graph of the differences aligned along a common scale, because that would lose the Q1 information. I had only removed what I thought was duplicated information from the graph.

As a compromise, I present a re-colored stacked bar graph.

Now it's not floating any more, and there's no danger of interpreting it as a graph for differences only, but the eye is still drawn to the difference bars, and to the (three) instances where the popular vote is less than EC vote (Q3), and to the (seventeen) instances where the EC vote is a majority, but the popular vote isn't (Q6).

I've used this technique of more saturated colours to draw the eye, and lighter or less saturated ones to avoid distractions without removing information, in my blog post of a few months ago Always show the distribution if you can. There, I wanted to emphasise Pentagon-reported military fatalities attributed to terrorist attack (dark green) and hostile action (red), without concealing all the rest of the data. It's all there, nothing hidden, but it isn't overwhelming the eye.