Transparent Wikipedia visualization

At the coffee machine the other day I was talking to John Winn about my forthcoming intern project with Linda Becker, and about the new word tree visualization on Many Eyes that I found. Fernanda Viegas and Martin Wattenburg gave a riveting PARC talk about Many Eyes which I picked up from Andrew‘s post on his Information Aesthetics blog. In it they mention the surprising (to them) number of text based data sets (e.g. Shakespeare plays) which were uploaded to Many Eyes. But Many Eyes only had one simple text visualization – the tag cloud; so Fernanda and Martin locked themselves away for a week and brainstormed hundreds of text visualizations. Then their team implemented the best one of them, the word tree. I do like the text tree, here’s how it is described on the Many Eyes site:

A word tree is a visual search tool for unstructured text, such as a book, article, speech or poem. It lets you pick a word or phrase and shows you all the different contexts in which it appears. The contexts are arranged in a tree-like branching structure to reveal recurrent themes and phrases.

Martin gives a number of examples of its use on Many Eyes, from political speeches, through literary texts, to a funny example of the text people use in lonely hearts ads:

Back to the subject of this post. John wasn’t familiar with Fernanda’s work visualizing the history of Wikipedia articles. So I explained History Flow, the 2003/2004 work building visualizations like this one to show the build up of different authors’ edits of a Wikipedia article. History Flow is written up in a brilliant CHI paper that shows just how much Wikipedia behaviour can be gleaned from studying these diagrams.

But the diagram, the visualization, is separate from the page itself. One couldn’t stare at the diagram and thus read the source article. It turns out that John had done his own visualization of Wikipedia pages. John reasoned that edits to a page can be thought of as a quality metric, i.e. a piece of text that survives multiple edits is likely to be of reasonable quality. Here’s that example of John’s idea again:

John describes the idea on his Wikipedia user page : the age of the text is reflected by its colour so that standard text is over two years old whereas text that is only ten minutes old is rendered on a red background. I’m not sure this is the best way to do it – the red colouring both draws attention to the new text and also makes it harder to read, but there is something interesting about the data visualization not obstructing one’s reading of the source article.


