Catenary

Entries categorized as ‘External cognition’

Offloading and evolution

November 15, 2006 · 2 Comments

I just finished reading Carl Zimmer’s very fine book Evolution: The Triumph of an Idea; which is a great introduction to the topic and covers a lot of ground, from Darwin’s life aboard the Beagle to host-bacteria arms races (stop having antibiotics, people!) to man-caused mass extinctions to whales with legs to the invention of language.

After talking about cognitive offloading last week, and about how we’re as dumb as cavemen, I found it curious to stumble upon this text in the book:

The artifacts that [early] humans left behind speak to a profound shift in the way humans saw themselves and the world. And that shift may have given them a competitive edge. “Something happened about 50,000 years ago,” explains Klein. “It happened in Africa. These people who already looked quite modern became behaviorally modern. They developed new kinds of artifacts, new ways of hunting and gathering, that allowed them to support much larger populations.”

Researchers can only speculate for now about what brought the shift about. Some have proposed that the creative revolution was purely a matter of culture. Anatomically modern humans in Africa experienced some change – perhaps a population boom – that forced their society to cross some kind of threshold. Under these new conditions, people invented modern tools and art. “Cro-Magnons were perfectly capable of going to the moon neurologically, but they didn’t because they weren’t in a social context where the conditions were right,” says White. “There was no challenge to provoke that kind of invention.”

Whatever the reason for the appearance of modern behaviour, slightly after humans started tinkering with artifacts and tools, they spread out of Africa and replaced (or perhaps wiped out through competition or diseases) Neanderthals and Homo erectus around the world. “In an evolutionary flash, every major continent except for Antarctica was home to Homo sapiens. What had once been a minor subspecies of chimp, an exile from the forests, had taken over the world.”

(Note: If you’re thinking of buying the book, be aware that the 2006 paperback edition does not come with those lavish illustrations the Amazon reviews mention. It’s still worth it, but I can only imagine how much better the original -and sold out- 2001 edition is, since this is a subject that really benefits from images.)

Categories: Books · External cognition · XCog

Fun with representations VI – Sharing the load

November 7, 2006 · 3 Comments

In Cognition in the Wild, a book I’ll be coming back to later and often in this blog, Ed Hutchins expands on an observation by Herbert Simon, who said that the complicated movements and trajectories of an ant on the beach tell us more about the beach than about the ant.

Beach antSimon was emphasizing the importance of context in cognition, but Hutchins goes a step forward: “Let us assume that we arrive just after a storm, when the beach is a tabula rasa for the ants. Generations of ants comb the beach. They leave behind them short-lived chemical trails, and where they go they inadvertently move grains of sand as they pass. Over months, paths to likely food sources develop as they are visited again and again by ants following first the short-lived chemical trails of their fellows and later the longer-lived roads produced by a history of heavy ant traffic. After months of watching, we decide to follow a particular ant on an outing. We may be impressed by how cleverly it visits every high-likelihood food location. This ant seems to work so much more efficiently than did its ancestors of weeks ago. Is this a smart ant? Is it perhaps smarter than its ancestors? No, it is just the same dumb sort of ant, reacting to its environment in the same ways its ancestors did. But the environment is not the same. It is a cultural environment. Generations of ants have left their marks on the beach, and now a dumb ant has been made to appear smart through its simple interaction with the residua of the history of its ancestor’s actions.”

Of course, the whole point of the story is what it implies about us: I am as dumb as a barbaric caveman, but I got a better environment. We accumulate knowledge, we embed it in our surroundings, and the next generation will take for granted things that are unbelievably hard for ours to figure out. If humankind manages to stick around for some thousands of years more, when it looks back to our current intellectual and technological achievements they’ll appear as rudimentary to them as the Bronze Age seems to us.

This, in a nutshell, is the reason why good representations of information, which I have been discussing over and over, are so powerful. They synthesize the data we gather about the world into a format that is easily accessible for other people -and for ourselves- any time we need them. An encyclopedia represents centuries of inquiry and discovery; a name tag helps us remember the name of that stranger we just met. Both of them and all other representations in between help make us smart by holding knowledge in our stead.

This quality of representations, which may be referred to as cognitive offloading (since it saves us cognitive effort), comes in two flavours. I’ll call them memory offloading and rules offloading here.

Memory offloading: This is the easy type to spot – anything that holds data for us counts. Books are an obvious example – instead of having their authors talk to us, we conveniently get access to their thoughts whenever and wherever we want. Phonebooks, websites, and dictionaries, hold massive amounts of information that we’ll never need in full, but which we can query on demand. Keys in the computer keyboard in front of you have labels to indicate the characters to which they correspond. We take photographs because they hold much more detail, for a much longer time, than our natural memory can. And so on.

People that can perform at an expert level without the aid of these representations are pretty impressive. My favourite example is blind chess players. The point of blind chess is never looking at the board, having instead the “image” of the match in your mind. It’s a very serious handicap if only one opponent is playing blind – most chess players can’t hold a dozen moves before their mental image crumbles down. Still, my chess instructor fifteen years ago would consistently beat the crap out of me even with this advantage in my favour. How did he do it, I have no idea. And grandmasters can win some half-dozen simultaneous matches while playing blind!

Anyway, my point is that most people depend on memory-offloading representations to do most of their tasks, and we find it extraordinary when someone seems not to. This dependency extends to the other type of offloading as well –although people have a harder time seeing it.

Rules offloading: Remember the nine numbers game I talked about a while back? The relationship between that game and tic-tac-toe is that of a representation with poor rules offloading versus one with rich rules offloading. When playing the nine numbers game, you need to keep track of several rules:
• Addition rules and additive properties
• That the purpose of the game is to add up to 15
• That you need to get to 15 with three numbers exactly

Instead, when playing tic-tac-toe, these rules get lumped into a visually intuitive one:
• That the purpose is to form a straight line from one end of the grid to its opposite end

As I’ve said before, this makes the game much easier, and accessible even to people with no arithmetic knowledge. Similarly, one can always make the nine numbers game more complex by adding rules that increase our cognitive load –for example, representing the numbers in the binary system (1, 10, 11, 100, …, 1001), instead of the decimal system (1, 2, 3, …, 9); which would force us to perform some extra, unusual calculations.

Software tools are particularly helpful when it comes to offloading rules. In a way, this is precisely what they do. Spreadsheets simplify a multitude of tasks –for instance, preparing a detailed budget or calculating the standard deviation of a series of numbers. Email clients connect to servers using a complex protocol that is unreasonable for us to follow personally. In general, computers have come a long way from the card systems of the past to the commodity of our monitors and connectivity, simply because so many of the rules necessary to operate them are embedded in the devices.

Offloading knowledge in representations is so powerful that I’m often surprised to find resistance to doing so. I guess this is what bothers me about developers who don’t comment their programs because “the code speaks for itself” – those who dismiss syntactic enhancements as sugar, who proclaim that Real Programmers code in assembly, or who brag that they don’t need and never will need debuggers. Maybe code speaks for itself, but it speaks slowly. Rejecting these advantages now and then, as a habit to train yourself, may be a good idea (just as it’s a good idea to be able to perform mathematical operations without a calculator); but rejecting them in your professional life is frankly foolish, as foolish as competing as playing blind chess in a world-class tournament.

We are as smart as our environment allows; if we want to achieve greater goals we should free our minds by sharing as much of the work as possible with it.

(Beach ant photo by kitsu)

Categories: External cognition · XCog

Ben Shneiderman on Creativity and Visualization

October 13, 2006 · 2 Comments

Ben Shneiderman, a professor at the University of Maryland’s Human-Computer Interaction Lab and author of Leonardo’s Laptop, gave a talk at Ryerson University yesterday and at the University of Toronto today, on two different topics:

At Ryerson he talked about creativity support tools. I was a bit frustrated by his approach to the topic: the creativity segment of the talk was inconclusive, and although Shneiderman stressed the importance of careful empirical case studies to evaluate tools (with which I fully agree), he never really described them nor the way his team addressed the challenges of doing good empirical work in the area.

Treemap

The talk at the University of Toronto, on visualization of high dimensional information (which I talked about last week) was more satisfying. The visualization tools Shneiderman presented (mostly the same ones as in Ryerson, but with more time -or so I felt) are really, really appealing – very polished in comparison to most academic prototypes. I’ll cover them in future posts, but if you want a glimpse, you can check the already famous Treemaps (with applications in the stock market) and the Hierarchical Clustering Explorer, which is way more impressive than this link suggests.

The demos were fun, but since they took most of the talk’s time there was not a lot of room to discuss the principles behind them. Shneiderman didn’t really explain the theory that supports the tools, the criteria used to evaluate their effectiveness, nor the limits of these approaches. I actually got the feeling that the tools were built at least partially on a hunch, not on an information visualization theory, and that there isn’t an articulated theory behind some of them, but I might be wrong. I’ll expand on this when I learn more.

Categories: External cognition · Information visualization

Fun with representations V – Maps of the abstract world

October 4, 2006 · 10 Comments

Representing information means mapping it into a particular medium –focusing on certain elements of the original data, ignoring the irrelevant ones, and, ideally, simplifying the process of understanding and using it. Unfortunately, our resulting information ‘maps’ are sometimes inappropriate: they may be ambiguous, unintuitive, or downright misleading. To illustrate what I mean, here are some examples of good and bad mappings:

Numbering systems: Our Arabic number system is tremendously effective. It’s not just that there are only ten digits to learn. Assessing magnitudes is easy: a simple glance to a number can tell us if we’re talking about a small (3) or a large (320150297) quantity. And since we have ten fingers in our hands, it’s natural for us to count in base 10.

In comparison, Roman numerals suck (is MXLI less than CCCXLXXI?). They’re only really useful to sound pretentious, as in my title above. And our standard scientific notation, although it uses Arabic numerals, can be very misleading to novices: 3.0×10^1 is humongous when compared to 3.0×10^-10, yet it’s difficult to conceptualize the difference in magnitudes.

Spatial, n-dimensional information: It’s much easier to represent two-dimensional information, such as the Physics problem below, in diagrammatic form rather than as sentences.
Pulley problem
The equivalent textual problem statement would have to be several lines long (“There are two masses hanging on a frictionless pulley located at the top of a…”), and it can’t convey the simplicity of the picture. The diagram gives an instantaneous overview of the problem’s information that is perhaps impossible to beat with text. However, things begin to mess up at three dimensions. Is this structure
Cube
representing the left or the right cube below?
Cube 3 Cube 2

For complex three-dimensional structures, a two-dimensional representation is rarely satisfying. Incidentally, this three-dimensional ambiguity is the central motive in many optical illusions and Escher drawings such as this one, Belvedere:
M.C. Escher - Belvedere
Geographical maps are effective at conveying spatial information because, even though they are a two-dimensional representation of a three-dimensional world, data on the height of landmarks and streets is not normally necessary. But we sometimes require three-dimensional information to make sense of our environment, and maps won’t capture this effectively. For example, in Toronto, many people use the CN Tower as a compass of sorts (“the tower is to my right, so I’m facing East”), yet street maps don’t give it the relevance we do.

Can we represent more than three dimensions in a two-dimension diagram? Not really –we can try, but the results (by using colours, animations, etc.) are never entirely satisfactory, even for a small number (four, five) of dimensions. Mapping high-n-dimensional structures in a diagram in two dimensions is impossible by all practical means.

ClefMusical notation systems: We have a notation system to represent musical compositions that is quite effective for Western music. Since Western music is based on a 12-tone scale, with relatively rigid rhythms, the notation makes a pretty good job at recording this information. Unfortunately, it’s useless to describe some kinds of more flexible, traditional Asian and African music. People have attempted to represent these in other systems, with varying degrees of success. For some types of music, repetition and imitation are still the most successful strategies to pass on musical knowledge.

Phonetic notations: One of my early embarrassments when learning English was finding out that recipe is not pronounced as recite. And I was already in Toronto while still calling wheat with the same termination as sweat, not as sweet (so asking, naturally, for wheat bread always led to puzzled looks). Written English is terrible as a phonetic notation. Spanish is much better, but it still doesn’t help me when I try to pronounce non-Spanish sounds, such as those in even the most basic Polish (czesc!) or Chinese (xie xie) words.

In comparison, the International Phonetic Alphabet is precise and comprehensive. Once you learn it (not an easy task!) you can use it to find out, exactly, how to pronounce words in practically any language.

I write about all of these examples because in my very own Software Engineering field, there is a strong community convinced that models and diagrams are the best way to represent software constructs. We have modelling languages to represent almost any software-related concept you can think of: objects, scenarios, states, classes, goals, beliefs, design rationales, threats, risks, you name it. What we don’t have is any real indication that our diagrams map satisfactorily to constructs in the world.

The problem is that we’re dealing with very abstract, very difficult to represent concepts, not with the two-dimensional structures of High School Physics problems. Where did we get the idea that use-case diagrams are an appropriate high-level mapping of human-computer system interactions? What leads us to believe that goal analysis diagrams are an accurate depiction of the real goals of stakeholders? Is a sequence diagram really better than pseudocode to represent the logic of a scenario? Simply put, there is no convincing evidence justifying any of these beliefs.

(Ontological analyses help us address these issues, by pointing out, for instance, that a weak point of entity-relationship diagrams is their difficulty at expressing entities with fuzzy boundaries and non-entities (such as fluids, thoughts and intentions). But ontological analyses do not answer the question of whether a representation appropriately conveys what it should convey to other humans –such as entities in the entity-relationship diagram case.)

I am not claiming that software engineering diagrams are inadequate mappings to the real world. I’m claiming we don’t know, that –considering we’re using these diagrams as communication artifacts– we should know, and that we’re giving too little thought to these matters in our community.

Categories: Cognition · External cognition · Software development · XCog

Syntax is not sugar

September 20, 2006 · 1 Comment

Let’s say we’re representing some information visually with a standard directed graph. We have four nodes (B, C, D, and E) all pointing to another one (A). We have several choices to display the graph. Here are two:

Nodes 1 Nodes 2

Are they equivalent?

In terms of explicit information being represented, they are. Both diagrams map to the same graph. Spatial placement of nodes is not relevant information in standard graphs (and in fact it is often performed by a visualization algorithm). However, a human will interpret them differently. Even at a glance, the first diagram represents centrality, the second one, hierarchy. More attention leads to more ambiguous and tacit meanings. In the diagram to the left, is node B conceptually “closer” to C than to E? Is it opposed to E? In the diagram to the right, is there a temporal reason for having first B, then C, D, and at last E?

We can play with the spatial structure of graphs a bit more. Each of the following tells a different story:

Nodes 3    Nodes 4    Nodes 5

We humans use many hints and tricks, not always consciously, to communicate among ourselves. When communicating face to face, most of the content of the message is passed not through words, but through tone, volume, body language, and subtle moves, sounds and pauses. Face to face communication is very rich. When communicating with diagrams we also use plenty of extra “channels”. Spatial placement is one, but colours and weights, among others, help us to convey information that is impossible to capture formally:

Nodes 6 Nodes 7 Nodes 8

So we see diagrams convey information implicitly, and a bit ambiguously. People in software projects use them this way all the time. What about other ways to represent information in software development projects? What about code? You’d think software code is a rigid, formal communication tool. It’s not. When communicating through code (yes, we do), we use whitespace, comments, and variable names to convey non-explicit information. Anyone with at least basic code reading skills will be able to detect personal coding styles just as we detect handwriting. We can spot sloppy hack jobs and thoroughly thought out methods; delicate fragments and routine calls.

This richness of communication is only possible thanks to some degree of syntactic flexibility in the languages we use. However, syntactic considerations have a bad reputation in many areas of academic Computer Science (perhaps in every area of Computer Science, except Human-Computer Interaction). Syntactic improvements to programming and modeling languages are second-class to the real stuff, the semantic formalisms and “expressive power”. The clearest indication that this is so is the term we use to refer to syntactic improvements: syntactic sugar.

It’s supposed to be a slightly derogatory term, a disincentive for discussing the topic. The sugar metaphor carries meanings such as the following:
• Sugar is tasty, but unnecessary. We can live without sugar. It adds calories but no nutrients.
• Kids love sugar, but us grownups have a developed palate and only enjoy it in moderate quantities.
• An excess of sugar leads to nasty health problems. Or as Alan Perlis said, “Syntactic sugar causes cancer of the semicolon”
• If a dish is not sweet enough, we can fix it easily: just add some more sugar! Everyone knows how to do it, and there’s no skill involved.

I think the term subsists because many computer scientists and programmers have an antiquated, simplistic model of coding. They believe the only communication act involved in programming is that of the developer formally describing to a computer what instructions to follow:

Naive view of communication

If this was the case, I would accept that the formal semantics of a language are the only thing that really matters. But in all but trivial cases, programming is not talking to a computer. There are many, many other communication uses of the same code in a development project –most of them among humans:

Communication as it happens

All of these interactions are vital in software projects. Software development, then, is not just talking to a machine -it’s talking to lots of other people through code and models. And since people will grasp meaning both from semantic and from syntactic cues in code and in models, both need our attention.

(Yes, I’m saying there’s semantics in syntax. Or rather, that our syntax is guided by informal semantics. Or, more eloquently and famously, that the medium is also the message.)

Computer Science loves formalism, but most of real software development consists of informal and implicit exchanges of information among people. Software engineering research must accommodate these informal exchanges. Syntax is not sugar –it is an elementary ingredient to make sense of our projects and activities. It should be studied as such.

Categories: Cognition · External cognition · Software development · XCog

Fun with representations IV – Chaotic libraries

September 17, 2006 · 1 Comment

Alright, moving on with the representation series! This time I’ll start with an old puzzle that I, by coincidence, got from Steve Easterbrook and, separately, from Angelika Mader in Dagstuhl with a couple of weeks’ difference.

We have an 8 by 8 grid such as the one in the picture. We also have rectangular tiles that cover two adjacent squares of the grid. First question: can we completely cover the grid using these tiles? It should be easy to see that yes, we can do that quite easily.

Grid 8 by 8 Two square tileTwo-square tiles

But let’s make the puzzle a bit more complicated: we’ll take away the squares at two opposite corners of the grid, leaving 62 squares instead of 64, as shown below. Second question: Can we completely cover this grid? If so, how? If not, why not? (No overlaps, and no putting tiles out of the grid!)

Grid with cut corners

The answer may not be obvious this time. We can try to work out a solution only to see it fail in the end:

Failed attempt

(I’ll go over the solution now, so stop here if you wan to work it out yourself!)

Let’s substitute the 8 by 8 grid with (surprise!) a chessboard. If you take away two opposite corner squares, you’ll end up with a shape like this:

Chessboard with cut corners Two square chess tile Two-square tiles

Note how the two removed squares are of the same color. In this case we have 32 remaining whites and 30 remaining blacks. And since our rectangular tiles each cover 1 black and 1 white cells, we will only be able to cover, at most, 30 and 30. There will always be two squares of the same colour remaining, so there is no way to arrange our tiles in the grid.

The answer to this puzzle is quite elegant. It’s so nice that it works not only with 8 by 8 grids, but with square grids with even lengths of any size (including, say, a million by million grid!).

What is it about this second perspective on the problem that makes its solution so trivial? It avoids the issue with the first: problem size. With the first perspective (where we’re putting tiles in the grid to try and stumble upon a solution) there are so many permutations that it’s hard to see whether we’ve tried them all or, perhaps, the real solution is still hiding somewhere, and we just need one more try to find it.

Re-representing the problem in terms of number of blacks and number of whites avoids the size problem entirely. We don’t need to try out every permutation, we don’t need to try out even one.

I have mentioned before how defining whether a representation is useful or not depends on the tasks it is needed for, and on the context of its reader. Now I’m adding a third consideration: the usefulness of a representation depends on how it handles the size of the problem it is representing. Some scale up pretty well, some others don’t, and for large problems the difference is key.

Two more examples of problem-size issues before I’m gone. Both have to do with chaotic libraries.

In the book “How Would You Move Mount Fuji?”, William Poundstone describes several puzzles that high-tech companies, particularly Microsoft, like to ask to their hiring candidates. Apparently they like to ask, among others, the following: How would you locate a specific book in a big library? There’s no cataloguing system and no librarian to help you.

Poundstone explains: “Suppose the books are in random order, which they might be, for all you know. In that case, the best you can do is to scan the shelves methodically (…) On the average, you would expect to have to scan half the library to find a given book.” Ouch!

However, he continues, it is perfectly reasonable to expect that the library will have an order, any order, straightforward or bizarre. What we should do in such a situation, then, is to map out the library and try to detect patterns in the types of books in each shelf. “The best approach is to first try to learn the system, then use that system to direct your search for the book you want.” In other words, we need to transform our perspective; to go from a representation where size is all important (needle in a haystack), to one where size becomes almost irrelevant. Puzzle-solving, in particular, frequently depends on this type of transformation.

The most deliciously complex library that I have ever read about is Jorge Luis Borges“Library of Babel”; an “indefinite, perhaps infinite” collection of rooms, shelves, and books. The Library has existed forever, and there is nothing outside of it. Each of its books has a seemingly random sequence of letters, and some of this Universe’s inhabitants have conjectured that there are books for every possible permutation of characters, meaningful or not –every story and non-story can be found in it. At one point, the narrator mentions a superstition some librarians have: that someone, somewhere, has deciphered the order of the Library, that “there must exist a book which is the formula and perfect compendium of all the rest: some librarian has gone through it and he is analogous to a god.” Hereticals, on the other hand, “maintain that nonsense is normal in the Library and that the reasonable (and even humble and pure coherence) is an almost miraculous exception”.

I’d say you must read this short story if you haven’t. You can find its full text here.

Categories: Cognition · External cognition · XCog

Fun with representations III – Hidden in plain sight

August 29, 2006 · 8 Comments

A while back, as part of a series of fascinating studies of perception in chess, Simon and Chase showed a chessboard to people with several degrees of chess expertise, for very brief moments, and asked them to reproduce the position of the pieces in the board they saw, using a second board and set of pieces.

For half of their runs, they used reasonable mid-game positions, such as the following:

Realistic Chessboard

For these cases, expert chess players were able to reproduce the position faster and more accurately than novice players, and they needed fewer ‘peeks’ at the original board too.

Now, for the second half, they used positions with about the same number of pieces than the first, but the pieces were placed at random cells of the board. Here’s an example of my own:

Random Chessboard

If you don’t know chess, this image will be just as cryptic as the previous one. You would probably take as much time to reproduce it, and make as many mistakes too.

If you’re competent at chess, however, the second image will feel ‘wrong’. It will make no sense to you. If you’re an expert, it may even look like an abomination. And if you were to try to reproduce the position, you’ll lose your advantage over novices -you’ll perform just as slowly and inaccurately as them, perhaps even worse.

That’s what Simon and Chase discovered. Furthermore, they found that experts tended to add clusters of pieces at once. They conjectured that, when looking at a game position, a chess master does not see the same things mere mortals see. Somehow, after years of training, they get used to identify structural patterns and interactions between pieces. And they learn to exploit their ever-expanding knowledge base at will, almost unconsciously, so that when shown a ‘reasonable’ position they grasp it effortlessly, but when shown something that doesn’t make sense, their elaborate mental model is useless to understand it. In my previous example, for instance, I (with a merely competent chess knowledge) can identify these clusters:

Chess clusters

Meanwhile, a novice does not have access to this wealth of information. They don’t see the clusters and structures experts see, and so they have to work out the position piece by piece, no matter the structure’s degree of normalcy.

This phenomenon happens all around us, in any domain where expertise plays a role. Knee radiographyFor example, I’ve always been confused when doctors point to abnormalities in radiographies that I simply do not see. On the other hand, expressions such as, say, a simple quadratic function, are instantly recognizable for me -I have a visual image that goes hand-in-hand with the expression, and I’ve seen and applied it enough that it probably means to me more than what it means to someone with a high school level math education.

Once a community of experts starts to discuss their domain, they will inevitably create words, or assign new meanings to old words, to refer to concepts they use commonly and for which their natural language falls short. This development of terminology is a sign that the domain is becoming mature and well explored. To stick with chess, for example, players will talk about controlling the centre, forking, and open columns naturally. Game discussions frequently use these loaded terms, so representations that use them are economically convenient. However, this practice raises the entry barriers for newcomers (as anyone who has listened to doctors discuss would agree).

Incidentally, it is sometimes also the case that a novice sees things that an expert will not. The expert assumes things that, in strange cases, may not be true. For example, consider the following chess retro-puzzle from Raymond Smullyan:

Retro Puzzle

Black has made the last move. What was it, and what was White’s previous move?

The puzzle, as it stands, has two possible answers. Try to figure them out. I discuss them in the next two paragraphs, so skip them if you just don’t care!

For both answers, it should be obvious that Black’s last move was with the king -no other piece available- and from the cell below of where it currently is (it cannot come from the right because that would imply an impossible check from the white king). This means it escaped from a bishop check. But how did the bishop get to that seemingly unreachable position in the first place? The first answer is that the check was revealed by another piece -but it would have to be a piece no longer on the board. The only possibility, then, is for a white knight to jump (from b6) to the board’s corner, uncovering the check. The black king then escaped the check by capturing the knight, leading to the current position.

The second answer depends on realizing that, perhaps, we’re not looking at the board from the perspective of White, but of Black. If that’s the case, we can explain the bishop’s placement as a promoted pawn! A white pawn moves to the bottom row, gets promoted to a bishop, checks, and the Black king escapes to the corner of the board.

Some people have a hard time seeing the second answer. It runs against two standard assumptions of chess -that White’s side is displayed on the bottom, and that when we promote a pawn we promote it to a piece of greater power than a bishop. However, if you’re not familiar with these conventions, but know enough of chess to understand how pieces move, you may even outperform an expert chess player (being, in Dan Berry’s terms, a “smart ignoramus“, a person whose ignorance of a domain, paired with a sharp intelligence, leads him or her to ask valuable questions that experts would not think of asking.

I think I’ve abused of chess long enough. In the end, what I want to do is remind us that people see, literally, different things in the same representation. Their understanding of it, and their potential with it, will depend on their domain and language expertise. Meaning is in the mind of the beholder. And so representations are not just useful or not -they are useful for a particular person, or type of person, to accomplish a particular task.

Next in the series: How to search for books in a library that has no index, and Borges’ Library of Babel.

Categories: Cognition · External cognition · XCog

Fun with representations II – Where is the train going?

August 23, 2006 · 4 Comments

Continuing with the last post’s discussion, right now we’re in the business of finding out why are some representations better than others. As a warm-up, then, let’s try to figure out the following:

Which of these representations of geographical data is better?

a) A map of the northeast of the American continent?

Ticket to Ride map

b) A city-to-city distance table of the area?

Distance Table

or c) A series of instructions to go from Pittsburgh to Montreal?

Pittsburgh to Toronto

Take the I-79 N, then the I-90 E, cross to Canada on the Peace Bridge, then take the QEW until you reach Toronto…

Toronto to Montreal

…then leave Toronto through HWY 401 E, take AUT 20, and finally AUT40. Montreal is right around the corner.

(Warning: Not necessarily the best route! According to Ticket to Ride you can also do Pittsburgh -> New York -> Montreal with the same number of train cars!)

So, which is better? The answer, of course, is that this is a silly question. How can we say which representation is better if we don’t know what is it used for?!

All three choices are, for the right tasks, more useful than the others. The first gives you the overall picture of the terrain, an understanding of the geography of the area. The second, information that you could need to plan trips, but that could be hard to calculate by yourself. And the third gives specific details to help you to get to a particular destination.

Defining the purpose of the representation, therefore, is an absolutely necessary first step. Our first rule for evaluating representations, then, is to ask ourselves: what is the representation supposed to be good for?

The nine numbers game, for example, might actually be a better representation than tic-tac-toe if what you want is to practice additions. For some equations, Polar coordinates allow much simpler and more elegant expressions than their Cartesian counterparts. Academic journals, finally, may make for an extremely dry read, but this apparent obfuscation makes for content less prone to misinterpretations than, say, Jorge making his point with plastic trains.

I’m not just stating the obvious. Disregarding the purpose of proposed representations turns out to be a depressingly common mistake, at least in the Software Engineering field:

  • Many proposals are offered as a one-size-fits-all. The most prominent example, perhaps, was Doug Ross (back when I was born!) presenting Structured Analysis as a “language for communicating ideas”, as generic as that sounds. Which types of ideas? All types, it seems. Now, SA (which describes processes, and is the grandpa of Data Flow Diagrams) is certainly useful for some tasks, but it’s crazy to suggest that it’s a good alternative to communicate the goals of stakeholders, power relations in an organization, or the stuff I’m writing down here.
  • Most language proposals do not consider their context of use. Let’s take UML, for example. Who is supposed to do the modelling? Who is supposed to have access to the models? What other documents are they supposed to substitute? Are the models used before coding -as an explanation tool-, during coding -as a reference-, instead of coding -as abstracted code-, or after coding -as maintenance docs? Or as all of the above? Evaluating UML for each of these possibilities leads to entirely different studies and considerations!
  • Some evaluations of representations have incorrect usage assumptions. A comprehension study by Snook and Harrison, for instance, compared a formal specification written in Z to its implementation in Java code. They found that there was no significant difference between them. The problem of the study, of course, is that real software teams do not face the choice of “should we use Z or Java?” The correct question is “should we use Z or something else to specify the system that we will eventually build in Java?”

The list goes on, but I think I’ll stop here since the main idea is clear. Even though asking ourselves what should a representation be good for seems like an evident first step, it is still important to make a point of it to avoid these types of problems.

Next up: Chess studies and radiographies!

Categories: Cognition · External cognition · XCog

Fun with representations I – Nine numbers

August 19, 2006 · 6 Comments

Here’s a two-player game for you to try out:

You need nine cards, numbered 1 to 9. You and your opponent take turns picking cards -each card can only be picked once. The first player with three cards that add up to 15 wins the game.

Nine Numbers

Before you keep reading, it’s best if you give it a try, as a thought experiment. You’ll notice how, although the game is not really very complex, a winning strategy is a bit hard to find.

OK, moving on. Now imagine those same nine cards, but arranged in a 3×3 grid with a ‘magic square’ pattern (where each row, column, and diagonal adds to the same number -in this case, 15). You get something like this:

MagicSquare

If you try it out, you’ll notice the game becomes much easier to grasp. Every possible three-number combination adding up to 15 is a row, column, or diagonal in this square! You’ll surely also notice that the game becomes strangely familiar:

Tic Tac Toe

So there you go -it turns out the nine numbers game is, in fact, exactly the same as the tic-tac-toe game. The difference is that tic-tac-toe is a much simpler version of it, but the two games map perfectly to each other, however non-intuitive that might be.

Snap quiz: You’ve picked the numbers 3, 5, and 8. I’ve picked 1, 4, and 7. It’s your turn. What number should you pick? Well, mapping to tic-tac-toe makes it easy -number 2 of course!

Tic Tac 2

The interesting point here is that problems can be represented in such ways that their solutions become clearer or more obscure. The two games above are isomorphic, but that doesn’t mean that they require the same amount of cognitive effort from their players. They don’t even require the same type of knowledge. The first game requires that we know how to add numbers, and forces us to perform additions constantly. A player without math skills would have a hard time with it. The second game, on the other hand, requires no math -only that we know how to detect lines visually, something that comes almost effortlessly to us.

The problem of adequate representations extends to many problems beyond tic-tac-toe. A diagram of pulleys in a Physics problem is more helpful than its equivalent description in English. Arabic numerals are easier to handle than Roman numerals. Code in Python is clearer than its Assembly counterpart -you get the idea. Herbert Simon (from whom I got the nine numbers game) used to say that problem solving consists of re-representing a problem until its solution becomes trivial.

The key, then, is to be able to define why is a representation better than another one, in a fairly predictable manner. Finding the reason would allow us to design representations that make our cognitive tasks easier. However, this analysis turns out to be a little complicated: there are actually plenty of reasons that could make one representation preferrable to another, and they are not always evident. Over the next few entries I will be covering some of them.

Categories: Cognition · External cognition · XCog