Pascalcase, camelcase and snakecase

Every programmer, or really just anybody who does code, probably has come across a myriad of coding styles – some pleasant, some not so pleasant. In the midst of the entirety of the “coding style”, is this little, but very significant, segment on naming conventions. Indeed coding style represents a lot more than how you name your variables, functions, methods and classes, but it would be an easy argument to say that naming convention is one of the biggest, if not the biggest, influence on how a piece of code looks like. After all, it’s the first thing your eyes will notice – it is the very look of the code.

Over time, two naming conventions have become dominant: camelCase (and it’s cousin PascalCase), and under_scores. There is an interesting article here, titled CamelCase vs underscores: Scientific showdown that does a sort of informal, semi-scientific study on which naming convention is superior.

I’m going to severely abuse Python here, to illustrate to an extreme extent, the two conventions:

<code class="python">class handle_email(object):
  def send_email(message):
    try:
      smtp_obj = smtp_lib.SMTP(&#039;smtp.srv.org&#039;, 25)
      smtp_obj.send_mail(&#039;from@srv.org&#039;, &#039;to@srv.org&#039;, message)
      smtp_obj.quit()

    except smtp_exception:
      print "Error: unable to send email"

class HandleEmail(object):
  def sendEmail(message):
    try:
      smtpObj = smtpLib.SMTP(&#039;smtp.srv.org&#039;, 25)
      smtpObj.sendMail(&#039;from@srv.org&#039;, &#039;to@srv.org&#039;, message)
      smtpObj.quit()

    except SMTPException:
      print "Error: unable to send email"

In case you didn’t visit the Whatthecode article linked above, I’m going to take a leaf from the author and ask you to choose, instinctively and before reading on: which do you like? It’d be quite unlikely that both are equally as good or as bad, since everybody feels differently. If you voted the first, then you’re clearly a under_score kinda guy. If not, you’re a camelCase person.

I boil down my choice of which is better to an aesthetic factor, and three key factors. What I’m saying here is that this is my personal, over-the-years and thought-through opinion, and my justification for it. I am not touting this at fact. Hence, that said, I’d like to take a shot at:

  1. Which simply looks better?
  2. Which gives you the most information?
  3. Which is easier to code in?
  4. Which is better for comprehending code?

1. Which simply looks better?

This is perhaps the most not-related-to-coding question, as it deals with nothing more than aesthetics. This has more to do with what the eye (and brain) perceives as beauty, than with actual issues such as comprehension and conciseness, which will be dealt with later. As such, the answer to this question is really just a matter of personal taste.

I personally will vote for the under_score convention here, as it sits well with my notion of what beautiful prose should look like – well-spaced, easy to pick apart key words, more spread out lines, and so on.

My opinion: under_score.

2. Which gives you the most information?

I’d argue that camelCase, together with PascalCase, has the highest fidelity of information. There are two reasons for that. Firstly, camelCase and PascalCase are distinctive, yet belong to the same naming convention. Hence, using camelCase to represent one set of things (say, function names, variable names), and PascalCase to represent another (say, class names, module names), gives immediate clues into the origin and purpose of a given “thing”. Once you’ve assimilated this and it has become second nature, you’ll be reading and comprehending code much faster, perhaps without even realizing it.

Secondly, the camelCase/PascalCase convention is the more condensed of the two conventions. numItems and MilitaryBoat is shorter than num_items and military_boat. On a single identifier, the length may not make much of a difference. However, code isn’t made up of one or two identifiers on a single line (with a few exceptions), it’s made up of numerous. I believe that the amount of -relevant- information in a given line has a direct translation to how easy it is to comprehend the code. We, as humans, don’t have infinite abilities to keep everything in our heads, and a visual reference is very important in aiding understanding when we cannot memorize and connect everything mentally.

Hence, the conciseness of representation has a huge bearing on me. As a small point, many underscores on a line is also.. ugly. Again, that’s personal opinion.

My opinion: camelCase/PascalCase.

3. Which is easier to code in?

Again, for this I’d go with camelCase/PascalCase. Hitting shift on a new word is just easy to hit (at least on QWERTY keyboards) than the darn underscore key. Even for decent touch-typists, of which I consider myself, the error rate on hitting the underscore hit is easily infinitely higher than hitting the shift key, because you can hardly miss the shift key.

Hence, camelCase/PascalCase is often much, much easier to type, fast.

My opinion: camelCase/PascalCase.

4. Which is better for comprehending code?

There’s no real difference between reading other people’s code and reading your own code after some period of time. They all rely on your ability to comprehend the code that you see. Taking comments out of the picture, which should not be there to replace or fix badly named identifiers anyway,

I’d say that camelCase/PascalCase is again, better. camelCase/PascalCase accentuates the interactions between identifiers, which is the (more) important issue, rather than figuring out the name of the identifier.

Let me make the counter-argument first: under_scores closely mimic the way words are crafted into sentences in English, and hence is certainly easier to read.

I don’t agree. Yes, it is easier to read – as words. It is not easier to comprehend – as code. Code is not about one or two lines, it’s about blocks. This is akin to paragraphs in a language, not sentences. A single line of code hardly as much meaning in the grand scheme of things. The act of introducing blanks (represented by underscores) splits words up so it’s easy to read them quickly, but you lose the meaning that you are looking at an identifier. Taken from a “sentence” point of view, you lose the ability to distinguish the interactions of identifiers with each other. Taken from a “paragraph” point of view, you have a bunch of harder “sentences” to read.

Consider this:

remaining_chars = end_of_file_index - current_file_index

and this:

remainingChars = endOfFileIndex - currentFileIndex

The interactions between the three identifiers (in this case, variables) is extremely clear in the second case, but less so in the first case. Throw in your curly braces, indentations, other forms of “whitespace”, and things become less and less clear. The obvious counter-argument is that IDEs will beautifully color identifiers, making my point moot. I disagree that it’s moot though. Yes, indeed syntax highlighting resolves the issue to a large extent. However, my gripe is that firstly, not all syntax highlighting schemes are appropriately designed. Secondly, there are often times when you browse through code that is not syntax highlighted. Thirdly, I prefer having two forms of pattern matching for my brain to hook on (shape, and color), rather than one (just colors).

Let’s try this again, with colors this time:

remaining_chars = end_of_file_index - current_file_index

and this:

remainingChars = endOfFileIndex - currentFileIndex

Different, yes. Sufficient, perhaps, But two levels of distinction is still better.

My opinion: camelCase/PascalCase.

My Personal Feel

My conclusion is this: camelCase/PascalCase has a good number of things going for it, sufficient to overcome the few issues that it poses.

I am not oblivious to some of the issues with this naming convention. I know about the fact that sometimes it doesn’t deal well with special words (e.g. URLCharacters), single letters (e.g. MyIPhone), and so on. However, I do feel that these can be gotten around with by choosing better names.

I also realize that camelCase/PascalCase takes some time to get your brain around identifying capitals as word boundaries. However, once you’ve gotten that part down (which doesn’t take long, really), the benefits are apparent.

The human brain is a intricate and extremely powerful pattern matching machine. The introduction of a character that not just represents, but looks like the blank space character impedes on critical function of our pattern matching abilities – the ability to know when we are looking at a single “thing”, an identifier, and when we are not (i.e. we are looking at interactions between identifiers).

In addition, camelCase/PascalCase naturally introduces greater variation in the way things are named, allowing for conscious or even subconscious deduction of the meaning of an identifier. I’m talking about the fact that when I see PascalCase, I immediately thing class, and when I see camelCase, I immediately think function or variable (assuming that’s the convention of whatever you’re reading). Hence, when I read

Vehicle.getType()

I know what Vehicle and getType are. Immediately. No mousing over to check what the IDE tells me, no attempting to jump to function definitions, etc.

vehicle.get_type()

just doesn’t convey the same fidelity of information. IDE or not.