Not crazy about Wordle

You’re probably familiar with Wordle. It’s a neat application that picks up the most common words in a text and arranges them in a pretty word cloud. As a toy, it’s quite fun.

But there’s an idea seeping in among the blogs I read, suggesting that Wordle clouds are actually useful as a communication tool. For instance, here is a flyer produced for a talk by Deepak Singh. He liked it a lot:

They used my delicious feed to build a tag cloud using Wordle. The results are wonderful and give you a better insight into things that I get interested about that I could have told you in 5 minutes myself.

Michael Nielsen liked it too, wondering “if only we could replace all speaker bios this way…”

I, however, thought it shows much of what is wrong with our ultra-caffeinated Web 2.0. Looking at the flyer, I can guess that Deepak is interested in a number of topics: the semantic web, bioinformatics, programming, science, etc. And yet I can’t guess, at all, what are his opinions on any of these topics, nor how he relates them. What does he think of the state of the art of bioinformatics? Is he particularly interested in applying cloud computing to healthcare systems? I have no idea. It’s as if the word cloud processed Deepak’s ideas to take out all of their nutritional value and gave me just the pure sugar: exciting, yummy, and devoid of nourishment. Oddly, the title of his talk (“Science Big. Science Connected”) fits the flyer; I picture myself reading it as a caveman or as an ape, jumping all over at the sight of big, pretty, meaningless words, basking at the glory of scientific progress.

More recently, some students at my Department have been producing Wordle clouds of their Masters’ Theses. Here’s Christian Muise’s, and here’s Neil Ernst’s, for instance. The result is just as problematic; I guess Christian studies something to do with implicants and Neil built something called “Jambalaya”, but I can’t go any further.

To compare, here are two summarizations of my Masters’ work. The first is a Wordle cloud of my 2005 thesis, the second is the abstract of the paper that goes along with it:

mscthesiswordle

Anchoring and adjustment is a form of cognitive bias that affects judgments under uncertainty. If given an initial answer, the respondent seems to use this as an ‘anchor’, adjusting it to reach a more plausible answer, even if the anchor is obviously incorrect. The adjustment is frequently insufficient and so the final answer is biased. In this paper, we report a study to investigate the effects of this phenomenon on software estimation processes. The results show that anchoring and adjustment does occur in software estimation, and can significantly change the resulting estimates, no matter what estimation technique is used. The results also suggest that, considering the magnitude of this bias, software estimators tend to be too confident of their own estimations.

And a second comparison (which is a blatant plug to my recent work with Gina Venolia, to be presented at ICSE this May). Here is the Wordle cloud of “The Secret Life of Bugs”, and the abstract of the corresponding paper:

secretwordle

Every bug has a story behind it. The people that discover and resolve it need to coordinate, to get information from documents, tools, or other people, and to navigate through issues of accountability, ownership, and organizational structure. This paper reports on a field study of coordination activities around bug fixing that used a combination of case study research and a survey of software professionals. Results show that the histories of even simple bugs are strongly dependent on social, organizational, and technical knowledge that cannot be solely extracted through automation of electronic repositories, and that such automation provides incomplete and often erroneous accounts of coordination. The paper uses rich bug histories and survey results to identify common bug fixing coordination patterns and to provide implications for tool designers and researchers of coordination in software development.

In both cases, the second version actually gives you some information beyond “estimation stuff” or “bugs stuff”; it’s not the full story but perhaps it’s enough to tell you whether you should read it or not.

There are some cases where playing around with Wordle may be a bit illuminating. I once heard that the most common words in Bernal Diaz del Castillo’s The Truthful History of the Conquest of New Spain were God, blood, and gold. This is probably false, though it is still a pretty good symbolic summary of the Conquest. But in general we shouldn’t fool ourselves thinking that disconnected pretty words are a good substitute for actual sentences and ideas.

About Jorge Aranda

I'm currently a Postdoctoral Fellow at the SEGAL and CHISEL labs in the Department of Computer Science of the University of Victoria.
This entry was posted in Academia, Information visualization. Bookmark the permalink.

12 Responses to Not crazy about Wordle

  1. Neil says:

    Yeah, but the pretty colors, Jorge! Sounds like somebody has a case of the Mondays!

  2. Jorge says:

    I do have to admit the colors are pretty🙂

  3. plagal says:

    It’s actually a bit funny that the Wordle for your 2005 thesis essentially boils down to

    ESTIMATION SOFTWARE

    It’s big, uppercase and makes the eye ignore everything else.

  4. Jorge says:

    Yes. And other prominent terms are “Estimate”, “estimates”, and “estimators”, which suggest the need for stemming the terms before forming the cloud.

  5. You miss the point Jorge. Visualization of any type has a specific use, and in this case it is /not/ to explain the research done — only to provide a glimpse into it.

    Use Case 1:
    My parents know nothing of boolean logic, let alone enough to understand my abstract, but the wordle gives them something to look at for a fair bit of time, see the buzz words of what I’m doing, etc

    Use Case 2:
    Self analysis of your own text. Yes, I could gleam the same information from a list of words sorted by frequency with the stat beside it, but when you generate a wordle you get the pop-factor (ie. ‘ESTIMATION SOFTWARE’). In my case, an early wordle showed ‘c2d’ (the competition’s solver) was larger than ‘c2o’ (my solver) — this led me to go back and rework how I described things to toot my own horn rather than others.

    Use Case 3:
    It’s all about the ease and enjoyment of getting quick concept information, not explanation. Example:
    http://tinyurl.com/8wcz6c

    Morale: Don’t be hatin’ on the wordle😉

  6. Jorge says:

    I actually liked seeing your (and Neil’s) word cloud. As I said, they’re fun. And I also enjoyed to generate my own —there’s something about feeding your own words into Wordle and seeing what comes up that is very appealing. My point is that it only gives us the buzzwords, it doesn’t provide any substance about the underlying ideas.

    The example in your third use case is really interesting from this perspective: since the word clouds are almost empty of meaning, we can manipulate them to our advantage very easily. Look, I’ve been for Obama since I was born, but the Obama vs. Bush clouds are an inappropriate comparison: they used the “all horizontal” setting with Obama and the “half and half” with Bush, they used a smoother font and better colors for Obama’s speech, and they went with a particularly jarring cloud for Bush, with the word “freedom” completely unbalancing everything (Wordle does this sometimes, you just have to regenerate the cloud). Then you have people reading unwarranted things into the cloud (see for instance the 12th comment comparing which word appears in the centre). Those clouds are worth a chuckle, but we shouldn’t read anything in them beyond that.

  7. Deepak says:

    Didn’t realize Tim’s wordcloud on my delicious history would get this far. It depends on the goals and the audience. The reason it worked there was that the goal was to let people know about my interests, rather than what I was going to say about those interests. Given that there is a blog, anyone who wants to find anything out could easily go and find out what the views are.

    If you want to jump into things in depth, of course that’s not going to work, but to prove a visual snapshot. Does anything jump out?

  8. Jorge says:

    “The reason it worked there was that the goal was to let people know about my interests, rather than what I was going to say about those interests.”

    Thanks Deepak. Sure, if this is all you want, then a word cloud works quite well.

    We can imagine a continuum between a pure laundry list of topics and an in-depth exploration of those topics —what I was complaining about is how close these word clouds are to the first end of the continuum, compared to how we perceive them to be further towards the second end.

  9. Lorin says:

    Forget Wordle, it’s that ICSE paper that sounds really interesting to me!

  10. Jorge says:

    Lorin, you’re making me blush🙂

    Will you go to ICSE too?

  11. Lorin says:

    I would love to go to ICSE. However, my wife’s expected due date for our second child is the first week of May, so no traveling for me.🙂

    Maybe ESEM?

  12. Jorge says:

    That is fantastic news, Lorin! Congratulations! We’ll eventually catch up in person, I’m sure.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s