Skip to content

Code is Made of Words

August 27, 2011

I recently had to fish around in some old code and stumbled upon this:

categories.remove(0L); // remove 0 from categories

Wow, thanks dude. If you hadn’t told me, I would’ve spent hours trying to understand what this line does. With the comment, it’s all clearer. Right?

Alright, calm down. This is obviously the wrong kind of comment, and in fact, there shouldn’t be a comment here in the first place. But I’m getting ahead of myself. Let’s cool off a little with a flashback.

Flashback: Avish Learns to Code

When I was first taught how to code — not just hack, but seriously code, professionally — comments were an important part of the craft. We were taught to comment our code frequently, explaining what we were doing. People were talking about ratio of green-to-black (comments were green back then, at least in the IDEs we were using) and in exercises you could lose points in for under-documented code.

That made some kind of sense, at the time. We were working in C and C++, with a huge gap between the mental model of what we were doing (“I’m looking for the next space in this string”) and what the code said (“advance this pointer-to-char until it points to 0x20”). Comments were our way to bridge this gap. When we wrote assembly code, we commented every single line: with assembly code the gap is just so wide that you need to maintain a constant bridge between the code and your thoughts.

The Problem with The Solution

But that was then, and now is now. Programming languages are immensely more expressive and much closer to natural language, or at least, to the natural language of whoever is reading the code. And yet, something from the old days remains in many of us. It manifests in pieces of code that look like that line at the top of this post:

categories.remove(0L); // remove 0 from categories

Even copy-and-pasting this here makes me uneasy. Brrr.

The problem with this comment, of course, is that it adds absolutely nothing to the code. It explains exactly what the code is doing, but the code itself provides the exact same information, even using the exact same words. What’s really missing here is a higher level of understanding.

Being somewhat familiar with the codebase, I can tell you the comment should probably have read:

categories.remove(0L); // remove the "no-category" ID

This comment adds information that wasn’t there before: 0 is used to mark the absence of a category, it says. We’re not removing an arbitrary number, we’re removing a special constant that means “no category”. That makes sense. Hopefully.

But if that’s what we wanted to say, there are far better ways to say it. We could have written this instead:

categories.remove(NoCategoryId);

That’s much better: the code explains itself without any help, and we made it a little bit better by introducing a constant in place of the arbitrary number literal. An even better solution requires a slightly different design:

categories.remove(null);

There is already a word in your language to indicate the absence of a value. Why not use it?

We’re Better Off Without Them

Modern programming languages allow us to be incredibly expressive in our code. By introducing properly named pieces of code, we can state whatever we wanted to say using the name of a constant, method or variable, rather than a piece of static English text.

And that’s the point I’m trying to make here: that it is preferable to express yourself in code than it is in comments (this is far from being a new idea, but I felt a need to state my own version of it). In other words, code that needs commenting is inferior to code that doesn’t.

A direct corollary is that comments that explain what the code is doing are a code smell: They indicate that the code is not written in a way that allows it to be understood without external aid. These days, you can’t blame this on the language anymore; the fault lies with the code. Think about this the next time someone tells you they’re a great programmer because they comment all of their code.

The What, the Why and the Says Who

The way I see it, there are three types of comments worth talking about.

Wait, before that, let’s get something out of the way real quick and just forget about docstrings, XML documentation, JavaDocs or whatever they’re called in your language. These are nice, sweet pieces of IDE candy that provide a quick reference to the people using your code. They explain the framework, the design, the API, the little details of the can-I-pass-a-null-here-or-will-you-blow-up-on-me kind. I’m not talking about these. I’m talking about comments that try to explain the code itself.

As I was saying, there are three types of comments worth talking about: “what” comments, “why” comments, and “says who” comments.

“What” comments explain what the code is doing. We just covered these. They are used in places where someone reading the code might not understand what the code does. As I said above, comments of this type are a code smell. In any modern programming language, there’s simply no justification to writing code that can’t be understood simply by reading it.

“Why” comments explain why we’re doing something in a particular way — these are used in places where a potential reader would understand what the code does, but might wonder “why do it this way and not the other way?”. They might look like this:

// we build a hashset here so that we can easily find duplicates later with decent performance

You’d often see this kind of comments applied at the method or class level, rather than at the statement level. They should also be pretty rare, considering any decent language or framework works very hard to ensure that in the vast majority of cases the straightforward way of doing something is the most appropriate (that’s a pretty fundamental design goal of anything that’s meant to be used by others).

If you find yourself writing too many of this kind of comments, you might be using the wrong language or framework, or just not using it the right way. Here, too, a comment is an indication that the code is lacking in some way.

“Says Who” comments explain the business requirement behind the code. This covers places where a reader understands what’s being done, but not why it should be done in the first place; the answer to that question is “because the business says so”. For example, you might see something like this:

if (user.IsGuest) continue; // guest users don't participate in user statistics.

The need for this type of comments can be reduced if the code is modeled in a way that captures these requirements clearly in the language of the product (I believe what Domain-Driven Design calls The Ubiquitous Language is an evolution of this concept). In this example, an easy solution is to introduce a method:

if (!ShouldParticipateInUserStats(user)) continue;

This piece of business knowledge, which we previously expressed in a comment, must now be written in code:

bool ShouldParticipateInUserStats(User user) {
    return !user.IsGuest;
 }

“Who should participate in user stats? Everyone but guest users”. Code doesn’t get much clearer than this. Again, we transformed a comment into a piece of code, and improved the quality of our code: we captured a business requirement in its own little method, where it can easily be tracked down and changed when the need arises. Win!

The TL;DR

The next time you find yourself writing a comment, ask yourself which kind of comment you’re about to write, and remember that code is made of words, too.

Advertisements

From → software

10 Comments
  1. I actually figured what the categories.remove(0L); does before reaching your explanation, A small win for non-coder me!

    From a business analyst point of view, sometimes comments, or even captured methods for unique requirements are a must, as sometimes I find myself sitting down with one of our programmers trying to track down changes done in the code based on some “Emergency business need” that weren’t documented anywhere.

    Good post mate.

    • Avish permalink

      Actually, I got that one wrong. Someone who knows better alerted me to the fact that 0 actually means “no category” in our code, and is not the ID of the root category. Just goes to show how helpful that comment was.

      Anyway, I updated the post.

  2. Great summary avish and great set of patterns people can easily follow

  3. What’s your opinion on /// method comments. Not the public ones someone will read as an outside user where you should probably be verbose (though even there I’m not sure how much). I’m talking about internal private methods that only you people debugging/maintaining the code will read.

    I always find myself feeling silly overexplaining obvious things, but when I don’t do it, I feel guilty. After all “obvious” is so subjective.

    • Avish permalink

      Personally I use a convention where the XML docs (/javadocs/docstrings) serve as documentation that comes instead of reading the code. That is, if you read these comments you can skip reading the code in the method body, unless you’re interested in *how* it does what it does. I wouldn’t use it to document the “why” or the “says who” or the “how”, only the “what” (e.g. “Finds the least profitable client in a list” and not “Uses a heap to sort the list of clients and extract the least profitable”).

      The same advice for “what” comments applies here too: “what” can be better described by properly naming the method or modeling the code in an alternative way which makes things more clear.

      So no, XML docs definitely do not have to be there on every internal/private method. I’d feel guilty about writing too many of these, not about not writing them. Explaining obvious things is a waste of time, and most of the devs on your team should agree on what is considered “obvious”. If some things are obvious to you but not to others, teach the others and lift them to your level using code reviews, emails, wiki, whatever. Comments are counter-productive in this process since they mark the supposedly obvious thing as “requires constant explanation”. Conversely, if you write an email teaching your team about that clever trick you just did, it becomes part of their toolbox and doesn’t require commenting when used again.

  4. Right on.

    This reminds me of a case where someone created a method’s javadoc by copy-pasting all the arguments and method name, and putting some spaces around them. Like this:

    /**
    * Builds a person from the proto.
    *
    * @param age – the age
    * @param siblings – the number of siblings
    * @param name – the name
    *
    * @returns the built person
    Person buildPersonFromProto(int age, int numSiblings, String name) {

    • Avish permalink

      Unfortunately some places require documenting every method this way. Naturally, devs will find the path of least resistance (like the above) and, being devs, will eventaully automate it with tools like GhostDoc. That’s a tool that specializes in translating code to English, in order to generate comments that by definition add nothing on top of the original code.

      In fact your example looks a lot like GhostDoc output; a human would have at least known that “proto” here means “prototype”. Or “protocol”, I guess.

      • ripper234 permalink

        proto = a proto buffer = an instance of a Protocol Buffer.

        I just made this code up, but the original was similar.

        • Avish permalink

          I figured that might be it but there was no mention of protobuf anywhere else. Doesn’t matter much, as long as we agree that these comments add no value.

  5. Two years of waiting for your next post and I get this?! Just kidding. Great post. Keep it coming!

Say Something

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s