The IROP paper

If you keep track of recent developments in empirical software engineering, you may have already heard of the fantastic IROP study. I was too busy writing a paper to blog about it when Andreas Zeller presented it at PROMISE 2011, but here I go, in case you haven’t read it.

Basically, Zeller, Thomas Zimmermann, and Christian Bird did what I’m afraid some researchers in our field do on a regular basis: take some mining tools and some data, and then go nuts with them—abuse of them in the most absurd ways imaginable. Luckily, Zeller, Zimmermann, and Bird did it on purpose and as a parody.

Here’s what they did: take Eclipse data on code and errors, and correlate the two to find good predictors of bugs. Sounds sensible. But they did the correlation at the ASCII character level. So it turns out, for Eclipse 3.0, the characters that are most highly correlated with errors are the letters ‘i’, ‘r’, ‘o’, and ‘p’. What is a sensible researcher to do facing these findings? Well take those letters out of the keyboard, of course! Problem solved:

The IROP keyboard

They then go over a supposed half-baked validation study with three interns, who reported great success in adapting to a life without ‘i’, ‘r’, ‘o’, and ‘p’ in their keyboards. Trial feedback:

We can shun these set majuscules, and the text stays just as swell as antecedently. Let us just ban them!

Near the end, the authors go over everything that’s wrong with their approach (lack of theoretical grounding, dishonest use of statistics, and a long et cetera). It’s a fun read, and instructive. Research, in general, needs more parodies. If you like this one, some of my other favourites are:

About these ads

About Jorge Aranda

I'm currently a Postdoctoral Fellow at the SEGAL and CHISEL labs in the Department of Computer Science of the University of Victoria.
This entry was posted in Academia, Software development. Bookmark the permalink.

6 Responses to The IROP paper

  1. Anonymous says:

    Here’s a video of an earlier presentation of the paper:

  2. Or not so silly… I was taught in physical anthropology class long ago that far more errors in complex patterned movement are made by the RIGHT hand for righthanders (especially upward, since that’s more awkward and unusual). Our left hand is actually much better at performing difficult movements such as precise typing. That choice of I for loop index isn’t the only reason for buffer overflows, but who knows, it may not have helped!

    • Jorge Aranda says:

      Russell, that sounds like an interesting explanation, until you look at the data. From the paper:

      Note that while we pretend to look at multiple releases of the Eclipse project, our “IROP” finding is based on the 3.0 release alone. In Eclipse 2.0, the “IROP” principle becomes the “Namp” principle; in Eclipse 2.1, it becomes the “Nogl” principle.

      So the “I” character was only correlated with errors in one version of Eclipse.

  3. First re the additional data provided:
    But looking at the temporal sequence, from namp to nogl to irop, and this simply says (so far) that the more data that accumulate, the more clear the pattern I noted having been told about years ago by my professor. Namp and Nogl are hardly counterevidence since they also heavily implicate the right hand – so from a purely statistical point of view this actually provides more evidence re the notion that physical anthropology, psychology and physiology are having an influence that’s large enough to be observed, statistically.

    I’m sure we could use more data, but we also shouldn’t decide out of thin air, because it pleases us, that everything we know and have known for a long time about significant differences in accuracy between hands (and possibly fingers) – should somehow vanish somehow the moment a keyboard becomes involved. We may associate keyboards and programmers with logic and uniformity, but logical thoughts don’t type, fingers and highly-biologically-constrained brains do. Seriously, who wouldn’t expect that pinkies might be less accurate than middle fingers, for example?

    No doubt, the data are sparse, but (relative) absence of evidence isn’t evidence of absence (of some correlation), especially when that no correlation would seem to contradict previous knowledge.

    Second, it’s not necessarily nonsensical to suppose that very small cognitive difficulties could accumulate as patterns of errors. As I’ll get to, we already know this happens. Brains aren’t magic; even tiny distractions can have an effect, whether or not subjects are conscious of those distractions. Numerous psychological experiments have shown this, it’s simply not in doubt.

    More precisely stated, it is not unreasonable to presume, from the wide (and pervasive) range of supporting data available, that the mechanism that is now commonly called “decision fatigue” is involved, causally speaking (especially in those cases where the words involved in programming errors are concerned rather than obvious typos). One would certainly expect the difference in difficultly positioning various fingers to have only a slight effect in distracting a programmer, but that’s precisely what statistics are employed to discern – slight but persistent effects, isn’t it?

    Decision fatigue could be said to be subsumed by distraction effects – I take it that introducing the new term is justified by the (sometimes remarkable) lag involved.

    There’s a good popular article about decision fatigue at:
    Do You Suffer From Decision Fatigue?
    Published: August 17, 2011

    In summary, it would seem very possible that the authors of this joke study merely stumbled backwards upon one more of a legion of known cases of decision fatigue/unconscious distraction. Sounds like we don’t have enough evidence to say one way or another, of course, but since there is a LOT of evidence from other experiments that would predict an effect/correlation: surely it’s a bad idea to rule it out of court by imposing purely a priori reasoning (that ignores at least one well-established effect) and not even consider seriously the notion of a possible correlation.

    Maybe it’s awfully optimistic to suppose that you could reduce software bugs by, say, 1 tenth of 1% just by using left hand letters more often in general, or using them more for functions where significant logical errors such as buffer overflows commonly occur; but would this be such a bad thing if it were possible? This very preliminary finding may also suggest that employing shorter variable names where possible may reduce error, etc, etc – although more than half-a-century of psychology more than suggests this, too. No doubt it is reasonable to start involving psychologists in the design (and maybe redesign) of computer languages.

    Safety engineering in other areas of life does in fact take into account human cognitive asymmetry. For example, no nation now trying to decide whether we should drive on the left or the right of the road (as in Britain and Japan) would choose “left”. The experts would scream bloody murder, in that case, because we now know without any reasonable doubt that the convention of driving on the left causes more accidents and deaths. A slight effect, but real and important – measured in lives. Are software errors, in this day and age, and the era to come, really all that much less significant?

    People who use logic a lot often tend to unconsciously assume that logic has more influence upon our lives and decisions than it does, and biology, less. Ironically, it is not the logical nature of our brains that causes this error, but the robust and organic, and messy associational nature of the brain, which we cannot truly free ourselves from.

    • Jorge Aranda says:

      “Namp” and “Nogl” are counterevidence to the argument that “I” is associated with errors because it’s often used as a counter variable. They’re also counterevidence to the argument that one should look for software errors at the character level.

      Regarding the argument that we make more errors with the right hand than with the left hand—you seem to imply that software errors are mostly typos. But for lots of software development nowadays, typos are easily detected by IDEs. Rather, software errors tend to be caused by semantic problems. The study of cognition can help to figure out how to reduce our tendency to make semantic mistakes. But I’m afraid all you’d gain from telling developers—and the world at large—that they should stop using their right hand to type would be a big dose of derision and laughter (and perhaps the suggestion that you deliver your proposal in writing, at length, using only your left hand to type it, of course).

  4. You seem to be concentrating on my (hopefully) amusing illustration of “i”, but – so or not so – it’s not the thesis, which stems from evidence about handedness. This illustration was intended to amuse, not distract, and I’m sorry if it’s done the latter.

    The core hypothesis is that the right hand has more difficulty with complex patterned movements, of which typing is obviously one. Namp and Nogl both contain 3 out of four letters on the right hand, so they are surely further evidence (however slight) and not counter evidence of the relevance of my physical anthropology professor’s factoid re the relative clumsiness of right hands.

    I think you’ll find in what I’ve written above that I specifically anticipated any possible misinterpretation of my remarks as being about typos. It may seem strange that puzzling out semantic problems would be affected by the problems of moving fingers accurately, but I don’t think any psychologist would expect anything else, today. Even without mentioning Decision Fatigue, the evidence is very clear now that human beings are just very bad at multitasking of any kind when they can do it at all; and of course, programmers are almost always trying to multitask; both imagining and typing up a program at the same time. Who knows, to reduce errors, perhaps we should be training programmers not to think while they type or type while they think, but to alternate these two tasks if we want the fewest possible errors. (But that speculation would be another discussion, I don’t wish it to be a further distraction.) The mention of Decision Fatigue isn’t about typos, of course, but about two kinds of decisions interfering – logical/semantic tasks vs the (largely unconscious) decisions that make precise finger movements possible.

    “all you’d gain from telling developers—and the world at large—that they should stop using their right hand to type” seems particularly unfair, just a blatant straw man argument, since what I’ve said instead is that we might consider using fewer right handed letters, particularly where logic errors (not typos) are more likely might be a small (but extremely cost effective) help in reducing software bugs.

    Right now, we (since Ritchie anyway) seem to be deliberately overusing the right hand for such things as counters, one presumes on the false assumption that movements of the right hand are easier and therefore more accurate. We have (if my professor was right) very good evidence that’s false.

    I won’t be surprised if the effect is small or nonexistent. I am shocked and distressed that the possibility can be tossed out a priori as if we were still medieval monks, or, God Forbid, string theorists. Evidence isn’t a good thing only when it conforms to our expectations – it’s truly useful when and only when it violates our expectations. Otherwise our prejudices would suffice.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s