Thursday, 4 September 2008

Reorderable Tables

I've been interested in the idea of reorderable tables since I read about Jacques Bertin coming up with the idea in his 1967 book Semiologie graphique. The idea is that if you don't have a preferred order for your rows and columns, why not order them so the values in the cells make a rough diagonal across the grid? This makes the patterns much more straightforward to detect.

But I didn't know how to implement it in a spreadsheet until I googled on the word "reorderable", which turns out to be used for Bertin's tables, and almost nothing else. Some of the hits describe something called the "barycenter heuristic", which is really quite simple. It's based on the idea that each row or column has a "centre of gravity" that you can calculate, and then sort by. Take for example this table from Juice Analytics review, 5 Options for Embedding Charts in a Web Page

I added formulae for calculating the barycentre of the rows and columns, of the following form:


Then, because Excel sort up/down is temperamental without well-defined headers, and left/right is even more so, I recorded macros like this:

Application.Goto Reference:="SORT_COLS"
Selection.Sort Key1:=Range("SORT_COLS"), Order1:=xlDescending, Header:=xlNo,

Application.Goto Reference:="SORT_ROWS"
Selection.Sort Key1:=Range("SORT_ROWS"),
Order1:=xlAscending, Header:=xlNo,

These use the pre-created named ranges SORT_ROWS and SORT_COLS shown in red and blue here:

Now when I run the two macros, the table sorts itself into this:

I linked the macros to the shorcut keys Ctrl-m and Ctrl-M, and depending on the structure of the data, they either converge quickly and stop responding to key presses, or sort the reverse of whatever they sorted the last time. This last behaviour is handy because it makes the table cycle though the four possible orientations. You may prefer the table in another orientation than the one it converges to initially.

This is a simple example, and could have been done by hand, but you can do it with more complicated tables too. This example is from the Junk Charts article Noisy subways:

It's not a perfect diagonal, and wouldn't be even if it was the optimal solution, and I'm not convinced this technique does find the optimal solution. But it's not bad for something so simple.


Hadley Wickham said...

Another term worth searching for is "seriation", which gives rises to numerous other techniques for reordering tables.

derek said...

Thanks for the tip. I see that seriation is a word in use in archaeology, where they're not really dealing with a re-orderable set of categories, but searching for an underlying order as yet unknown. Under those circumstances, the diagonal might not be appropriate, you can get U's and V's and even S's :-)

Bertin shows a table similar to the ones in the Wikipedia article, taking a complex set of properties of Corinthian columns and sorting them by style in time.

The Wikipedia article also suggests the word "ordination", which is the word ecologists use for it. Which takes us back to Bertin and S. S. Stevens.