Joel Spolsky once made a famous observation:
There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:
It’s harder to read code than to write it.
This is why code reuse is so hard. This is why everybody on your team has a different function they like to use for splitting strings into arrays of strings. They write their own function because it’s easier and more fun than figuring out how the old function works.
It’s been more than 20 years since, so this is no longer considered a “subtle reason,” having firmly graduated into the sphere of common knowledge. But Joel never really explored in depth why reading code is more difficult than writing it. Many explanations have been offered over the years:
- Because when you are writing code, you often don’t immediately clutter it with all the exception handling and edge cases, but this is often present when reading code.
- Because programming languages attempt to balance readability with brevity.
- In most languages, small and seemingly insignificant notational changes can change the meaning of code.
- Because programming languages can’t be read aloud.
- Because when you are writing code you know the problem context, the mental model, and you know what the code is supposed to do.
- Because reading code requires working memory to remember and hold references to other places and things (functions, variable names, imported classes from other files, etc), only some subset of which will turn out to be relevant. In contrast, when writing code, you only need to remember precisely what you need.
Contrast this with, say, English prose: The meaning of the text does not generally depend on making references to other concepts defined only elsewhere in the text. It can be read aloud, in a linear order. And the grammar and syntax in English is more loose, because ambiguous constructions are allowed (“John greeted my dog and then he left.”).
So it’s hard to argue that working memory requirements, references, and syntax makes code hard to read. They certainly do. But everything mentioned so far hits at peripheral and contributory issues without truly nailing the central reason: human language is designed exclusively to encode ideas. Its only purpose is to record and transmit meaning. But programming languages are instead designed to encode computer commands unambiguously; documenting the meaning of the problem to be solved is left to optional activities like choosing good variable and method names, adding code comments, and generally things that have nothing to do with the correctness or successful execution of the program.
So reading code is hard because we must simultaneously decode the literal meaning of the instructions alongside the documentation about the problem it is designed to solve. And as most professional software developers come to learn, it takes great skill to write non-trivial programs correctly, but even more skill to do it in a way that simultaneously encodes the intent and meaning of the code too.