An Avish of Sorts

January 14, 2012

Everything Is Smelly

I keep seeing this kind of descriptions in code and in design reviews:

// Contains all the logic for handling Products public class ProductsService { ... }

To me, they looks like this:

Clean ALL the things!

That is, while these attempts are backed by good intentions, they are ultimately doomed to fail and get us farther from the stated goal of cleanliness.

My impression is that “all the things about X” classes are a misguided attempt at increasing cohesion: putting closely-related things together. Or put more formally:

In computer programming, cohesion is a measure of how strongly-related or focused the responsibilities of a single module are.

This Is Not Cohesion

Now, cohesion in itself is a worthy value in design, and is a major element in my “Shift-Deletable Feature” design philosophy (that’s another post for another day). And surely, “all the logic for handling products” is strongly-related: after all, it all deals with handling products. Right?

Well, not really. When we use “all”, and its close friend “everything”, in a class’s description, we usually imply a grouping together of different things, with a single common shared trait or relation between them. If that group of things was cohesive, there would be a better description for it that doesn’t use such foul language. Having to use “all” or “everything” has that familiar, sour smell of an SRP violation, which is closely related to (but not entirely the same as) lack of cohesion.

How Our Need of Structure Impairs Our Design

So why do we keep doing this? Sometimes we don’t know better (and it doesn’t help that new programmers are sometimes taught that this is a good way to structure code), but sometimes we’re misled to think it’s a good idea because it feels organized. Grouping together all the things that have to do with something, as broad as it may be, appeals to our inner nerd’s sense of organization and structure.

In real life, this behavior is justified: putting all those papers and bills and receipts and coconut-scented plastic trees together in an “all the things that relate to the car” folder makes perfect sense. This way, when we need something related to the car, we just have to look through that folder to find it.

But code doesn’t work that way. We don’t organize code just so it’ll be easy to find; we also try to make it easy to change, which is far harder. To achieve this, we need a Separation of Concerns approach, not a Grouping of Loosely-Related Stuff one. Uncle Bob’s alternative wording of the SRP comes in handy here: “A class should have one and only one reason to change”. If a class handles “all the things” that have to do with something, then by definition it has all those reasons to change.

There’s nothing inherently wrong with putting somewhat-related things in the same container; but if we really want to make it easy for us to find things and be able to change one concern without affecting others, then we need to keep on nesting smaller and smaller containers until we reach a single container per concern.

In code, those single-concern containers are classes. For everything else, we have namespaces, folders, projects, packages or whatever your language of choice provides to satisfy your inner organizational urge.

So, whenever I see “all” or “everything” used in a class’s description, I get suspicious and start asking some hard questions.

The comic behind this image is way better than this post, you should totally read it.

August 27, 2011

Code is Made of Words

I recently had to fish around in some old code and stumbled upon this:

categories.remove(0L); // remove 0 from categories

Wow, thanks dude. If you hadn’t told me, I would’ve spent hours trying to understand what this line does. With the comment, it’s all clearer. Right?

Alright, calm down. This is obviously the wrong kind of comment, and in fact, there shouldn’t be a comment here in the first place. But I’m getting ahead of myself. Let’s cool off a little with a flashback.

Flashback: Avish Learns to Code

When I was first taught how to code — not just hack, but seriously code, professionally — comments were an important part of the craft. We were taught to comment our code frequently, explaining what we were doing. People were talking about ratio of green-to-black (comments were green back then, at least in the IDEs we were using) and in exercises you could lose points in for under-documented code.

That made some kind of sense, at the time. We were working in C and C++, with a huge gap between the mental model of what we were doing (“I’m looking for the next space in this string”) and what the code said (“advance this pointer-to-char until it points to 0x20”). Comments were our way to bridge this gap. When we wrote assembly code, we commented every single line: with assembly code the gap is just so wide that you need to maintain a constant bridge between the code and your thoughts.

The Problem with The Solution

But that was then, and now is now. Programming languages are immensely more expressive and much closer to natural language, or at least, to the natural language of whoever is reading the code. And yet, something from the old days remains in many of us. It manifests in pieces of code that look like that line at the top of this post:

categories.remove(0L); // remove 0 from categories

Even copy-and-pasting this here makes me uneasy. Brrr.

The problem with this comment, of course, is that it adds absolutely nothing to the code. It explains exactly what the code is doing, but the code itself provides the exact same information, even using the exact same words. What’s really missing here is a higher level of understanding.

Being somewhat familiar with the codebase, I can tell you the comment should probably have read:

categories.remove(0L); // remove the "no-category" ID

This comment adds information that wasn’t there before: 0 is used to mark the absence of a category, it says. We’re not removing an arbitrary number, we’re removing a special constant that means “no category”. That makes sense. Hopefully.

But if that’s what we wanted to say, there are far better ways to say it. We could have written this instead:

categories.remove(NoCategoryId);

That’s much better: the code explains itself without any help, and we made it a little bit better by introducing a constant in place of the arbitrary number literal. An even better solution requires a slightly different design:

categories.remove(null);

There is already a word in your language to indicate the absence of a value. Why not use it?

We’re Better Off Without Them

Modern programming languages allow us to be incredibly expressive in our code. By introducing properly named pieces of code, we can state whatever we wanted to say using the name of a constant, method or variable, rather than a piece of static English text.

And that’s the point I’m trying to make here: that it is preferable to express yourself in code than it is in comments (this is far from being a new idea, but I felt a need to state my own version of it). In other words, code that needs commenting is inferior to code that doesn’t.

A direct corollary is that comments that explain what the code is doing are a code smell: They indicate that the code is not written in a way that allows it to be understood without external aid. These days, you can’t blame this on the language anymore; the fault lies with the code. Think about this the next time someone tells you they’re a great programmer because they comment all of their code.

The What, the Why and the Says Who

The way I see it, there are three types of comments worth talking about.

Wait, before that, let’s get something out of the way real quick and just forget about docstrings, XML documentation, JavaDocs or whatever they’re called in your language. These are nice, sweet pieces of IDE candy that provide a quick reference to the people using your code. They explain the framework, the design, the API, the little details of the can-I-pass-a-null-here-or-will-you-blow-up-on-me kind. I’m not talking about these. I’m talking about comments that try to explain the code itself.

As I was saying, there are three types of comments worth talking about: “what” comments, “why” comments, and “says who” comments.

“What” comments explain what the code is doing. We just covered these. They are used in places where someone reading the code might not understand what the code does. As I said above, comments of this type are a code smell. In any modern programming language, there’s simply no justification to writing code that can’t be understood simply by reading it.

“Why” comments explain why we’re doing something in a particular way — these are used in places where a potential reader would understand what the code does, but might wonder “why do it this way and not the other way?”. They might look like this:

// we build a hashset here so that we can easily find duplicates later with decent performance

You’d often see this kind of comments applied at the method or class level, rather than at the statement level. They should also be pretty rare, considering any decent language or framework works very hard to ensure that in the vast majority of cases the straightforward way of doing something is the most appropriate (that’s a pretty fundamental design goal of anything that’s meant to be used by others).

If you find yourself writing too many of this kind of comments, you might be using the wrong language or framework, or just not using it the right way. Here, too, a comment is an indication that the code is lacking in some way.

“Says Who” comments explain the business requirement behind the code. This covers places where a reader understands what’s being done, but not why it should be done in the first place; the answer to that question is “because the business says so”. For example, you might see something like this:

if (user.IsGuest) continue; // guest users don't participate in user statistics.

The need for this type of comments can be reduced if the code is modeled in a way that captures these requirements clearly in the language of the product (I believe what Domain-Driven Design calls The Ubiquitous Language is an evolution of this concept). In this example, an easy solution is to introduce a method:

if (!ShouldParticipateInUserStats(user)) continue;

This piece of business knowledge, which we previously expressed in a comment, must now be written in code:

bool ShouldParticipateInUserStats(User user) {
    return !user.IsGuest;
 }

“Who should participate in user stats? Everyone but guest users”. Code doesn’t get much clearer than this. Again, we transformed a comment into a piece of code, and improved the quality of our code: we captured a business requirement in its own little method, where it can easily be tracked down and changed when the need arises. Win!

The TL;DR

The next time you find yourself writing a comment, ask yourself which kind of comment you’re about to write, and remember that code is made of words, too.

September 23, 2009

Refactoring in the future is not an excuse for messy code in the right-now

– “Hey, what’s up with this messy code here?”
– “Oh yeah, don’t look at that, I’ll refactor that soon.”
BZZZT. Wrong answer.

Your code is here to stay — at least for a while, and probably longer than you think. Accept that.
Any piece of code you write is an integral part of the codebase from the second you’ve committed it to source control; it will be used in situations you did not expect when originally writing it, and it will mutate. Because that’s what happens to code in a team-owned codebase.

It’s true: your code will never enjoy the peaceful solitude in exile you intended for it. Any other developers who might come across your messy code will change it, aligning themselves with the messy style around then. They will split it to different parts during one of their own refactorings, spreading your messiness where it’ll be hard to find and remove. And worse of all, they will mimic it, making your messiness the common rule rather than the exception that should never have been there in the first place. All because they assume that anything they find in the codebase is up to quality standards, because otherwise you wouldn’t have committed it.

See, in a team environment, we all implicitly assume that any code in the codebase belongs there until proven otherwise. We assume nobody intentionally commits a broken window and leaves it there. Let me put it this way: your teammates think you’re a responsible developer. Congratulations.

The absolute minimum you can do, if you do find yourself committing sloppy code thinking “I really should refactor this soon”, is to break this assumption by clearly labeling your code as bad. Boldly declare your sloppiness in the comments. Make sure no one could ever mistake this code for good, usable code. Let people know you did a half-assed job, and that they shouldn’t assume this is the correct way to do things. The quality of the codebase comes before your personal dignity.

What’s that? You want to keep your dignity? Right. Well, luckily there’s another option: don’t commit sloppy code in the first place.

Alright, I know what you’re thinking, but stay with me here. I’m not saying everything you write must be flawless. Far from it. Your code doesn’t have to have the most elegant design or the most efficient algorithm, but once you’ve decided on a particular way to implement something, do it in the best way you can, according to your own personal quality standards (and those of your team, if they happen to be different from your own).
Extract common functionality where applicable. Separate concerns. Write automated tests for your code. Maintain naming and styling conventions, for crying out loud. These things aren’t “a nice finish” applied only to your best code. They’re the bare essentials of your work, even — and particularly — when you know it’s flawed. If anything, they will make your future refactoring easier (what do you know, maybe someone else will even do it for you when they see that piece of code, instead of going “WTF?!”). Even if they won’t, at least you’ve gotten into the habit of writing quality code all the time.

Developers around you assume they can count on you to keep the codebase nice and clean. Don’t let them down.

June 24, 2009

The Math of Traffic Jams

Whenever I’m caught in a traffic jam I do three things: I try to relax, I make sure there’s some good music playing in my car stereo, and I start contemplating the mathematics of traffic jams.

In my geeky mind I start thinking about how the jam accumulates when a critical mass of vehicles slows down at the same time, and then how the road can still be jammed long after the original cause has been cleared; or how a jam seems to always form in certain points on certain roads and then clear at another specific point down the road, despite neither of the two points being remarkably different that those around them.

You’d often think something like “oh sure, the jam forms around this interchange because everybody’s pouring in from the east trying to get to work, and so by that interchange it should clear because obviously everybody who joined earlier wanted to get to Route 4, right?”, but it’s rarely like that. It’s more like “the jam starts at this non-remarkable point and then suddenly stops at this other non-remarkable point, and it seems to have nothing to do with cars joining the route or leaving it via interstates”.

I often think about how interesting it might be to model the cars (or more correctly — their drivers) in a computer simulation, and then observe, from outside the system, how the jam is formed and then cleared. If only I knew how to model it correctly.

Why am I telling you all this? Because it turns out some other people had been sharing my thoughts on the subject, only they were knowledgeable enough to actually model everything and run a simulation:

Poor little fellas, stuck in a traffic jam on a road that doesn’t even lead anywhere, and it’s all for our sick entertainment.

Anyway, now they claim to know how to solve “traffic jam equations” and thus how to design roads to better avoid potential traffic jams. I’ll drink to that.

More info in this BoingBoing post which is in turn a summary of this article from Wired. Worth a read if you’re a traffic jam aficionado, like me.

June 20, 2009

Yo, world.

Thought I’d try this blog thing again. It’s a shame: Facebook and Twitter have stolen all of my thoughts. Nevertheless, here goes.