Why code is hard to read

We often feel that code is hard to read. The question is: why? The common arguments that you will hear may go something like:

  1. “The code is not commented!”
  2. “The programmer didn’t bother organizing!”
  3. “The programmer used funny hacks!”

And the list goes on. You’ve heard them all before.

Now, these are true. If someone writes a convoluted mess of code that doesn’t work, fine, toss it.

But if someone writes a convoluted mess of code that works, and now you have to deal with, understand and/or worse, maintain it, then you most likely cussing the guy under your breath. However, beyond this, lies a deeper assumption that causes many of us to find code difficult to read – we forget that code is really a language.

Languages are not trivial exercises

I think we can all almost universally agree that learning a language is not a trivial exercise.

I’m speaking of any form of linguistic expression – human languages (English, German, French, Chinese), musical languages (notation, embellishments), and so on. To truly master, say, the English language, takes years and years of study, constant use, immersion, and deliberate practice. How many people can truly claim to speak more than three languages fluently, and live up to it?

A good litmus test of that someone would be that he is able to translate between those three languages with ease, while maintaining (as far as possible within each language’s limitations), the tone, meaning, nuances and character of the expression being translated. Sounds easy? Not at all.

computerCode1

However, when it comes to code, some sort of disconnect kicks in. We feel that just because code is used by a computer, it isn’t really a language. You know, it’s a “lesser” language. It’s just programming. Okay. While in a certain sense, that argument has a little support, because after all computers are not humans (hopefully!), and hence what it comprehends needs to be inherently simpler, more constrained.

If you’re a bit on the philosophical side, then perhaps you’d argue that if humans are considered to be finite beings, then a greater degree of finite-ness needs to be applied to things we create – computers and computer languages being good examples. Either way, they’re arguments that share the same vein.

Code is a language!

But see.

Human languages are: created by humans, evolved by humans, documented by humans, used by humans.

Computer languages are also: created by humans, evolved by humans, documented by humans, used by humans. (and then translated and fed to a computer)

My point is this: computer languages are as much language as your human languages. Learning, comprehending and mastering Assembly, Forth, C/C++, Python, Lisp, Haskell, SmallTalk is not less involved than learning French, German, Tagalog or Chinese. Yes, they may have lesser grammar, lesser syntax, less things to remember, less ways of expressing something, and hence require less time to learn, but they are absolutely languages, and non-trivial.

How can we reasonably expect to read a book on a new computer language, tinker around with a few hello worlds, maybe write a small project in that language if you’re determined or enthusiastic enough, and then expect to comprehend code written by a experienced programmer or guru in that language?

That’s akin to spending 3 months with a cassette (or CD) filled with Greek conversational language material, finding a couple of Greek friends to teach you some hard pronunciations, maybe attempt to say a few smart phrases that you picked up with them, and then try to grab a book written in Greek and expect to have reasonable comprehension of it. Of course, I’m assuming you didn’t know Greek to start with.

And I’m not even talking about understanding the nuances, humor, wit and all the other subtleties that comprises a true book-reading experience.

The day we cease thinking of computer languages as lesser languages, is the day we in the computer field will take a good step forward in comprehending code.

So, Why is code hard to read?

Back to the point: Why is code hard to read?

Answer: Because you’re not fluent at the language.

Why is this book hard to read?

Answer: Because you’re not fluent at the language.

When you’re not fluent at a language, you spend a disproportionate amount of time fighting the language, rather than looking for meaning. And code is all about meaning, about intentions. If you’re not reading for meaning, then you’re most likely reading for syntax, and that makes you a compiler (uh, of sorts), and not a programmer understanding a piece of code.

So what aids meaning? If we treat code as a language, then we can apply the constructs of language to it. I’m no expert in (human) languages, and certainly no linguist. If you give me that litmus test I mentioned above, don’t waste your time. I’ll fail. Bad.

But let’s look at some common language constructs and see how it completely relates to code (from Assembly to Lisp):

Code has structure

When reading code, look for its structure. And every language has a slightly different structure. It is very unlikely that all you have is a single piece of code with everything meshed together. Programmers are human, and humans need structure because we can’t keep every single thing in our heads at one go. Certainly not for most meaningful pieces of code.

We need to understand the structure of the programming language. This means that if you’re in Python, you need to understand packages and modules, and how they relate to directory structure, since Python uses directory structure as an implicit structuring system. You also need to understand the meaning of importing a package or module, and how it affects things. As an example:

pythonStructure

Python Package (win32com) organized in a directory hierarchy

Here, you see all sorts of Python language elements. From the hierarchical directory structure that gives meaning to packages and modules, to the docstrings at the beginning of the classes/methods, to the way exceptions are handled and the style in which they are used, and so on.

Then you need to understand the internal structure. That means your Python classes, and so on. Once you know what structure you’re looking for, you can then immediately see where the structure fits, and where it doesn’t. If it doesn’t, you can then look for the “why”. Is it because of convenience? Laziness? Genius? Or perhaps your mental model of the structure was wrong? In any case, you’re now looking for meaning, for conventions and the breaking of conventions. You’re now looking at the programmer’s intentions.

Once all these things become natural (i.e. you are fluent), will your mind be able to automatically abstract away all those details and allow the meaning of the code to shine through. Your eye will be naturally led to the substance of the code, not the syntactical wrappers needed to make the code work.

Similarly, if you’re in C, then you need to understand #includes, header files (.h), code files (.c) and their conventions. What is the convention of putting what where? Again, an example:

cStructure

Code has vocabulary

This is the easy one, and unfortunately where we get very tempted to stop. Once we have the vocabulary of the language, we feel like we’ve grasped it. I myself am so guilty of it. Vocabulary of course is very wide, and I’m specifically talking about syntax here: “Oh this is how loops look like”, “Ah this is how I create variable argument functions”, “Man this is the way to create a continuation”.

Buttoning these things down is the first baby step in taking command of a language. It is nowhere near knowing the language. Again, at this point you’re reading for syntax, for keywords, for library functions that you recognize. If you’re not reading, but writing code at this point, you’ll end up taking what you know from another language, and apply that to this language, with the new syntax.

The obvious example: if you come from C, and you’re learning Python for the first time, you’ll probably open a file in Python this way:

f = open("myFile", "w")
f.write("I'm noob at Python!")
f.close()

Why would you do that? Because you just took the C way of handling files: fopen() became open(), fwrite() became write(), and fclose() became close(). You’re translating, you’re not expressing. Pythonically:

with open('myFile', 'w') as f:
  f.write('I rock at Python!')

The point is that vocabulary (syntax) is not enough. That’s your first step, nothing more. Which brings me to the next point.

Code has phrases and idioms

If you’re good at a human language, and you pick up two books (assumed to be written by professional writers/authors), you would most likely be able to instantly tell that the books were written by different people. If you’re even better at that language, then you may be able to easily tell between two books by the same author, written at different points in time. Writers get influenced by style all the time, and they consciously or unconsciously adopt bits and pieces of styles that they admire, and fine tune their own style. An obvious case in point would be if the writer improved.

theHungerGamesBookCover

I like to bring up the example of Suzanne Collins here, the masterful author of The Hunger Games trilogy. Like many, I guess, the movie sparked my interest in grabbing those books. To me, there is a marked contrast between Collins’ first book The Hunger Games and her third book Mockingjay. Her style has improved greatly (to my eye). If you don’t like my saying it has “improved”, since style can be subjective, than I’d propose that it is at least different and distinguishable. To Suzanne Collins fans: I’m not saying the first book was bad. Far from it. I’m saying that there’s a great improvement – something wonderful.

Where does all this “style” and “beauty” come in? Phrasing, and language constructs. The way words are strung together, the way sentences are string together to make a paragraph, and the way paragraphs are strung together to make chapters, and the way chapters make books, all give rise to a singular reading experience that is unique. And the way such stringing happens differs from language to language.

Same for code.

Code is strung together in certain ways that makes them beautiful, elegant and comprehensible within that language. Some of those ideas transcends languages, and some are specific to a language, because of that language’s capabilities and culture. If you are able to capitalize on that language’s capabilities and culture, you end up with powerful code that is concisely and elegantly expressed, and your code becomes easy to read.

When it comes to understanding code, recognizing those constructs, idioms and style allows you to pattern-match against them, and hence lets you focus on the meaning of the code. The idioms, you immediately recognize. The meaning is what’s changing between different pieces of code, and what you should be directing your attention at. If you do not recognize the idioms, you will be reading the code by line, and forming a micro picture to comprehend a segment of the code. That’s not helping your aim of understanding the code at all.

Here’s a very contrived example, in C:

while (*a++ = *b++);

If you recognize this because you’ve done enough C, then you’re immediately thinking at a higher level: null-terminated string copying. You’ll figure out what a and b mean, and skip everything because you already recognize it. You may even be thinking of all the ways that can go wrong, and making a note to check/fix them.

However, if you don’t recognize the “idiom”, then you’ll be thinking of pointers, pointer arithmetic, loops and termination conditions. Not so fun.

Big Caveat: Note the use of the word idiom in double quotation marks. I am not saying that piece of code is good. That’s two separate matters. In fact if you ask me, that’s a bad piece of code. However, if you do enough C, I guarantee you’ll see it, and hence it’s an “idiom”, just not a good one. And that means you recognize it, which is my point.

If you only have time for this…

Really, my point is that I think code being hard to read is sorely misjudged. A lot of people, in general, I feel, are judging from the vantage point that once one has understood the syntax of a language, he should be able to interpret the code (which I reiterate is a language, nothing less) as though everything will just make sense. When it doesn’t, code in general is judged to be hard to read.

However, that sentiment is often fueled by the fact that the person (or many people) in question don’t yet have a sufficient mastery of the language. If I were to take a crash course in German, and then pick up a German novel, I will read for 5 minutes, put the book down and swear that German is hard to read. Fair? Absolutely not.

Now, I’m not saying that there is no code is hard to read once you’ve mastered the language. First of all, there is never complete mastery. And more importantly, some code is just plain bad. Let’s take it to the extreme:

main(c,i,t,s,a) { 
  for(t=i=2,s=a=0;t>1|i;t=t?c>47?a*=10,a+=c-48,t:(i=(t<2?s+=a,i-1:a),a=0,1-(c==45)):!c) {
    c=getchar();
    c*=c>47&c;<58|c==45;
  }
  printf("%dn",s);
}

That’s hard to read. No matter what. And yes I copied it from a code golf example, and is really to drive the point home, that’s all.

I’d like to believe that I’m fairly strong at the English language, and yet there are some books which are very unforgiving to read, much less comprehend. Those books, if I were to force myself to read them, feel like I’m fighting the language more than trying to grasp meaning. And in no way am I enjoying the book. Same thing goes with code.

Then there are “dialects”. If you believe me that I’m decently strong at English, it does not mean I can comprehend Shakespeare the way I can consume a (modern) English novel. I cannot. It’s certainly much easier for me than if I’m weak at English, but it’s not the same. Similarly, reading Old English poses a far greater degree of challenge than said novel.

For code, mastering Python makes it that much easier to read Ruby, but it does not make it equivalent to reading Python. Mastering Lisp makes figuring out Haskell that much easier, but it’s not the same. And it should not be treated as the same.

Finally, code is unique in the sense that is has other elements that either aid or worsen comprehension. Syntax highlighting is one, but that has nothing to do with the code itself, but has to do with the display of code. Indentation is another and it makes a huge difference. Then there is variable naming. However, all these are elements of “good style”. Yes, they help to make reading easier, but they do not replace knowing the language and everything that makes that language, well, that language. In other words, it does not replace fluency.

In essence, code (and words) becomes much easier to read when a level of fluency has been reached in the language. Because at that point, you’re freed from syntax and comprehension of the language itself, and you can devote yourself to comprehending the meaning and intentions of the written code (or word).

So, want to grok code? Spend time and become fluent.

Want to reverse code? Spend even more time and become even more fluent.