Thursday, 4 September 2008

Reorderable Tables

I've been interested in the idea of reorderable tables since I read about Jacques Bertin coming up with the idea in his 1967 book Semiologie graphique. The idea is that if you don't have a preferred order for your rows and columns, why not order them so the values in the cells make a rough diagonal across the grid? This makes the patterns much more straightforward to detect.

But I didn't know how to implement it in a spreadsheet until I googled on the word "reorderable", which turns out to be used for Bertin's tables, and almost nothing else. Some of the hits describe something called the "barycenter heuristic", which is really quite simple. It's based on the idea that each row or column has a "centre of gravity" that you can calculate, and then sort by. Take for example this table from Juice Analytics review, 5 Options for Embedding Charts in a Web Page

I added formulae for calculating the barycentre of the rows and columns, of the following form:

=SUMPRODUCT(COLUMN(D5:F5),D5:F5)/SUM(D5:F5)
=SUMPRODUCT(ROW(D5:D9),D5:D9)/SUM(D5:D9)


Then, because Excel sort up/down is temperamental without well-defined headers, and left/right is even more so, I recorded macros like this:

Application.Goto Reference:="SORT_COLS"
Selection.Sort Key1:=Range("SORT_COLS"), Order1:=xlDescending, Header:=xlNo,
Orientation:=xlLeftToRight

Application.Goto Reference:="SORT_ROWS"
Selection.Sort Key1:=Range("SORT_ROWS"),
Order1:=xlAscending, Header:=xlNo,
Orientation:=xlTopToBottom


These use the pre-created named ranges SORT_ROWS and SORT_COLS shown in red and blue here:

Now when I run the two macros, the table sorts itself into this:

I linked the macros to the shorcut keys Ctrl-m and Ctrl-M, and depending on the structure of the data, they either converge quickly and stop responding to key presses, or sort the reverse of whatever they sorted the last time. This last behaviour is handy because it makes the table cycle though the four possible orientations. You may prefer the table in another orientation than the one it converges to initially.

This is a simple example, and could have been done by hand, but you can do it with more complicated tables too. This example is from the Junk Charts article Noisy subways:



It's not a perfect diagonal, and wouldn't be even if it was the optimal solution, and I'm not convinced this technique does find the optimal solution. But it's not bad for something so simple.

Friday, 15 August 2008

Clock this

Jon Peltier has written about graphs based on hours of the day (and chosen the perfect title, so I have to make do with second best), with examples attempting to show the day and night cycles together intuitively.

I think the whole day-night split thing is artificial: there's no connection between the times that are twelve hours apart and happen to have the same number under our old mediæval twelve hour day, twelve hour night scheme. I'm surprised how much less Americans use the 24 hour clock, and how they call it "military time" when they do. In Britain we associate it more with the coming of the railways in the nineteenth century, and the need for standard timetables (until trains made national time zones necessary, individual towns had their own time!) I've made a polar area graph on a 24 hour scheme here:
The polar area graph is one of my favourite neglected specialist graph types, but it really only has a chance of being the graph type of choice if

a) you don't mind reading stacked areas instead of position along a common scale (see Cleveland's Hierarchy),
b) the data set is truly cyclic, not linear, and
c) there are two series to compare, one much smaller than the other (here we lack a second, smaller series that could be plotted closer to the centre, to take advantage of the square root relationship between the area of the slices and their radius)

Saturday, 12 July 2008

Sloppy step charts in the media

Major news organizations seem so slick, that it surprises me when they fall down on the simplest of graphical tasks. On 10 July MSNBC ran a story, How to value life? EPA devalues its estimate about the decline in the dollar value of one life used to weigh the cost of pollution prevention (if lives were even static in value, you'd expect them to rise with inflation).  The graph they showed was this one from the AP



I doubt that the value fell at a constant rate during certain periods, but that's what the chart seems to show, even though all graph drawing software has the means to represent the actual situation-- immediate drops on certain dates. Jon Peltier has written about this, in Line Chart vs. Step Chart.

The Wall Street Journal got it right with Clinton's Road to Second Place on 4 June:

Sunday, 15 June 2008

Excel area chart with colour invert if negative

My intention is for this blog to be commentary on graphs and data, rather than instructions on drawing graphs using any particular program. But I can make an exception, and here's a technique I've worked out for making Area charts in Excel that change colour below the zero line. This is something that comes as a standard option in bar charts. consider the following table:



This makes a bar chart okay

But not such a good area chart

We can create a table that splits the positive and negative values



but the result is disappointing

The areas simply go to zero at the next value, which is not what we want. This happens even when the Excel Time-scale X axis type is selected by going to Chart..Chart Options..Axes and selecting "Time-scale". However, Time-scale has one feature that lets us easily fix the problem. Unlike the ordinary category axis, it does not present values in the same order as they appear in the table, but in strict time order. So if we make a new set of rows below our first set



where the formulae are (left to right, assuming the original table header started at cell A1)

=IF($A2*$A3>0,NA(),$A2)
=IF($A2*$A3>0,NA(),$B2+($B3-$B2)*$A2/($A2-$A3))
=IF($A2*$A3>0,NA(),0)
=IF($A2*$A3>0,NA(),0)


then the two areas should meet the zero line at the same interpolated date!

Remember, this only works if you've selected the "Time-scale" option in Chart Options. If you're looking for more Excel help, see the Jon Peltier and Andy Pope links on the right.