Comment Driven Development for low-level design before coding

When trying to write code samples in front of someone else, for example during an interview, it's a good idea to start by writing down the steps required before converting those to code. Instead of writing them down just as a list, format them as comments with blank lines in between to make filling out the code easier.

int main(void)
{
    // fizz buzz example:
    // test integers in range 0 to 100

    // print fizzbuzz if divisible by 3 and 5

    // print fizz if divisible by 3 only

    // print buzz if divisible by 5 only

    // if none of the above conditions, just print the number
}

It's much easier to debug the algorithm or design by rearranging and changing those comment lines than by changing the code once you're committed to all the details - variable names and types, function names and signatures all need to be kept in sync. It allows you to explain your thinking out loud to others and yourself before you get into the implementation phase. Many practicing coders recognize that this is a good idea

"I realized that's what I do in job interviews... I already do this but only in one weirdly specific environment" - Carter Morgan on BookOverflow

I've also found myself doing this when the algorithm takes a bit more thought than I can handle by writing immediately in code, or when whiteboarding a design with a colleague.

Comment First

John Ousterhout takes this to the next level in Chapter 15 of his book "A Philosophy of Software Design" where he advises to always write the comments first, before the code and before the tests. Since we recognize that this is at least sometimes a useful practice, it shouldn't be difficult to accept but nevertheless there are objections, which I will come to but first let's see the arguments in favour.

Writing the comments is part of the design process

As he says,

"The best time to write comments is at the beginning of the process, as you write the code. Writing the comments first makes the documentation part of the design process. Not only does this produce better documentation, but it also produces better designs and it makes the process of writing documentation more enjoyable."

While writing the comment, you can determine if the abstraction is right. An overly long explanation probably means that you've got more design thinking to do. Realised that this is the case at this stage prevents the need for later refactoring. What are the inputs, outputs, side effects and exceptions that are inherent in your design, and can anything be simplified at this stage? Have you considered the number of classes and trade-off between small simple classes with tiny amounts of functionality against smaller numbers of deeper classes, as John advocates for in Chapter 4? Have you considered making your classes more general-purpose, as he advocates in Chapter 6 as leading to simpler overall programs?

This method allows your focus to be on one aspect at a time. Writing the comments for the interfaces means that you are not thinking too much about the implementation at this point, and getting distracted by it. Commenting on the interface is setting it up for later when users can determine what they need to provide without having to read the implementation. John puts it this way: 

"if users must read the code of a method in order to use it, then there is no abstraction"

Objections

We are just wasting time commenting code that hasn't been written yet; let's get it working then describe how it works.

This developer is explicitly opting for what John calls the tactical approach: Write small pieces of functionality that satisfy a requirement and hope that it all fits together in the end, and construct a post-hoc design rationale around that. It's a surprisingly popular method and probably comes about because of two influences: the beginner mindset and TDD.

Beginners have more problems getting things working because of all the low level issues - syntax, language, libraries, computing concepts, numerical representations etc, than they do with software design. So it's an achievement to get something working at all, and the best way to design the program seems like a luxury consideration that they don't have the capacity to think about at that stage. In any case, they are not in a position to decide which design will be better until they try out some possibilities and learn over the longer term what makes for good design, although this book should accelerate that.

Our code is self-documenting

Long method names are certainly better than cryptic short ones when it comes to a public interface, but they can't really be a substitute for a few lines of comments. If you claim they are equivalent, then you are asking all users of the method to type out the documentation every time they call it. There's a developer efficiency trade off here, but it shouldn't go all the way to the other extreme of very short method names at the expense of readability.

Long method names can still be cryptic because you can't really indicate all your conditions, assumptions, limits and criteria in the name, look at this example from Clean Code:

isLeastRelevantMultipleOfLargerPrimeFactor(int candidate)

Whilst there are plenty of methods whose operation is so common and/or simple that they don't need a lot of description, for example, finding a matching item in a list, anything which is unusual, uncommon, custom to the application or non-obvious should be documented for maintenance reasons.

Trisha Gee in "Top Developers Don't Need Code Comments" explains that Martin Thompson at LMAX extracts methods with the same name as the line-level comment it replaces, by showing the example of checkIfTheRestaurantExists(), a simple null check. But this doesn't work when you need a longer comment, and means you have to jump to other methods before you get an understanding of the one you're currently looking at.

Test-Driven Development is considered to be best-practice

In his interview with Book Overflow, John recounts a meeting with Rob Pike , a respected name in software if there was one. He asked Rob, "What are your pet peeves?" and received this reply

"Test-Driven Development is a total nightmare, it produces the worst software here at Google." - Rob Pike

John isn't against testing. In Chapter 19, after a section on Unit Tests and how important they are because they facilitate refactoring, he says

"Although I am a strong advocate of unit testing, I am not a fan of test-driven development. The problem with test-driven development is that it focuses attention on getting specific features working, rather than finding the best design".

You might think what is wrong with getting features working, isn't that what we are here to do, don't we get paid and judged for completing Jira tickets?

John's reasoning is that there are lots of ways of getting working features into a system but without some designing time and thought it will get more and more complex, leading to bugs and difficulty adding more features.

The main thrust of the book is that complexity is what kills software projects, more than anything else.

We need to find the time for design at the module level otherwise our tower of abstractions will collapse on top of us.

A tactical approach will work for a while but is not a good plan for any software intended to be developed for longer than say 6 months.

"TDD is debugging a system into existence rather than designing it." - John Ousterhout

In TDD, you are encouraged to/(not allowed to do anything other than) start on a piece of work by writing a test that will fail. Either the functionality is just not implemented yet so the application and tests will build but fail with an error message, or there is no such code so it will fail to build - this is counted as a failing test. Then you add only sufficient application code to pass the test. Thus you are removing the bug of a failing test by adding code.

Now we can see what he means by saying

the units of development should be abstractions, not features. Once you discover the need for an abstraction, don't create the abstraction in pieces over time; design it all at once (or at least enough to provide a reasonably comprehensive set of core functions). This is more likely to produce a clean design whose pieces fit together well.

Consequently, we realise that nearly all the Jira tickets are written badly, they should be abstractions not features.

The caveat is that this is applicable to designing new functionality, not fixing bugs, where John acknowledges that TDD ensures that fixes solve the problem they are intended to and stay fixed.

By using TDD, we are writing tests before code and these are a form of documentation, so we don't need to write comments first

Tests are a poor form of documentation. There's noise around the actual test that is distracting (fixtures, assertions for pre-conditions, setup and teardown) and you'd have to read several tests to gain the knowledge that could be in a one-line comment. Example of usage are always helpful but they can't be as comprehensive as comments and are always distant from the code.

Trisha Gee claims that test names can justify and explain the code's functionality. Her example of  MakeSureClientAOrdersGoThroughThisWorkflowUnderTheseCircumstances() means that she doesn't need a comment to explain what the test is doing. But this misses the point that the test is far from the code that it is testing, and that long name is going to become unwieldy when ThisWorkflow and TheseCircumstances are replaced with specific cases.

If you're only writing one test at a time before coding and running it, these tests-as-documentation are necessarily fragmented instead of being a considered design whole.

Writing anything other than code or tests is too close to doing design up front

The TDD philosophy incorporates the idea of having to rework code many times as the understanding of the problem improves, so there should be no problem revising the design in the same evolutionary way. But by restricting your thinking to only code and tests, you are avoiding writing down a formula or a problem statement in prose which allows you to explore ideas in the problem domain, not an implementation in the solution domain.

Zach Tellman has a great explanation of where the current attitude to avoiding design came from. If you want to shake up an industry like software, using radical ideas by taking existing approaches and turning all the knobs up to 10 will get you noticed and produce some interesting lessons, conference talks and books. But it rarely produce the best results, as any creative person knows.

Writing anything other than code or tests is too close to writing comprehensive documentation

The agile manifesto is necessarily terse, but by "documentation" it means user requirements, system specifications, manuals etc. If your goal of working software falls over when more people are added, or when it gets into its seventh month, or when developers leave, because you've only been using face-to-face conversation, then the project has failed.

I'm not convinced you're arguing in good faith if you think that comments, scattered over the source code, helpful though they may be, could be what was meant by the manifesto's authors as "comprehensive documentation". These guys had all been writing software long enough to know the difference and expected their audience to know the difference.

Comments are always out of date

This argument is about what happens to a codebase over the long term, the code changes as required, and your tests must change too, but comments could easily be left to get out of date by these changes. But you should be having code reviews and pull requests where comments get checked just as thoroughly as the code, so that the team can always rely on them being correct.

Let's instead focus on the usefulness of comments first - at the design stage, as a aid to getting the code written. In Better Embedded System Software, Phil Koopman says

"an important point of doing design is to help the developer understand what is going on before the code is written." (page 107).

If they become useless later, then they could always be removed after they have served their purpose. It is still a good idea to write them before the code, and the resulting improved code doesn't deteriorate by itself if you end up deleting the comments months later.

You could, but people don't, say the same thing about tests written before the code. Tests that get out of date when the code changes tend to get commented out or #ifdefd out until the team gets the time to go back and rewrite them. The reason they don't say that about tests is that they run, and failing tests can be used to prevent moving to the next stage - pushing, merging to main, releasing. It takes discipline to keep adding tests to match new features, just as it takes discipline to check that the comments match the code. The fact that the previous set of tests still pass says nothing about the lack of tests for new functionality, that still has to be manually checked by reviewers.

Furthermore, method names can get out of date too, as more functionality gets added. Changing those to keep up is nowadays aided by our IDEs having renaming functionality that understands scope, but is still more work than updating a comment in one place.

Writing the comments first is expensive because we'll have to keep modifying them as the code changes. It would be cheaper to do it once at the end.

John's estimate for the time spent writing comments is 5% of the total development time, that sounds about right to me. Bearing in mind that it takes much longer to decide what to build and fighting all the libraries, layers and hardware issues than it does to write any of the software. He claims that delaying the comments to the end will only save a fraction of this 5% so is insignificant. From experience, projects always run out of time at the end, not the beginning, so moving work back is good idea because late work tend to get omitted in the rush to the finish line. Maybe many developers are secretly hoping for this because they don't want to write the documentation anyway.

His other claim is that writing comments first will mean that the abstractions will be more stable before you start writing code which will save time during coding. When I think of time spent rewriting code because I've changed my mind about the right abstraction, I think that can grow large easily and is the area where bugs creep in, when things work a different way. Avoiding those bugs is definitely a time saver.

We don't care if there are no comments or documentation, we can read the code to see what it does.

This elite developer can not only quickly understand everyone else's code but has memorised all the design decisions and reasoning that lead to this code, and it's interactions with the rest of the codebase. But for how long? What about other developers, or successors, or when the company decides to bring in contractors or share code with partners? Is this developer just making himself indispensable to the business?

If you think that it's hard to believe anyone would actually make this argument, see this discussion where John alleges that Bob Martin claims 'when a group of developers is working on a body of code they can collectively keep the entire code "loaded" in their minds, so comments are unnecessary: if you have a question, just ask the person who is familiar with that code' and Bob seems to agree with that.

Trisha Gee also argues that the knowledge transfer involved in pair programming is a reason for having no comments in the code.

Of course, code should be readable by anyone familiar with the language, libraries, operating system and underlying hardware; and John spends Chapter 4 on choosing names, Chapter 17 on consistency and Chapter 18 on Code Should be Obvious saying all this. But he still advocates for comments because they are not only easier to read than code but contain information at a different level of detail to the code.

The reason for code change is in the commit log

When you're maintaining the code, you're looking at the code; you're not looking at the commit log unless you've narrowed the problem down to a particular set of lines or files. So you're not going see the commit log even if your tools readily expose git blame while you're browsing because your focus is on how the code works now, not it's history. Only once you think you've got a root cause are you likely to use these tools to see how and why this bug came about, and who made that change. In any case, the commit log doesn't always travel with the code. Files can get copied into new projects, sent to other groups or get chopped around so much that the commit log isn't available.

Are the commits granular enough that the commit log comment is unmistakably tied to the lines that changed? Although we know we should be committing sets of changes that always work together, developers are often tempted to delay committing until a larger set of changes is ready and that means it's harder to associate the commit log comment with particular lines.

By definition, the commit log is written after the code, so it can't provide the same value as design comments written prior to the code.

We'll auto generate the comments later using Doxygen or AI, or a technical author will write them for us

Those comments will be superficial and of dubious benefit to later maintainers, but more importantly they can't be useful in the design process if they are generated after the implementation. If there are not generated by the original programmer then they are more likely to misleading, wrong or inadequate.

Doxygen tooling in IDEs automatically writes comments for methods by analysing the signature, so saving the developer some typing. But I'm worried that it's also saving the developer some thinking. Going back to Charles Petzold's 2005 essay about Intellisense:

"In fact, IntelliSense has become the first warning sign that you haven’t properly included a DLL reference or a using directive at the top of your code. You start typing and IntelliSense comes up with nothing. You know immediately something is wrong.

And yet, IntelliSense is also dictating the way we program.

For example, for many years programmers have debated whether it’s best to code in a top-down manner, where you basically start with the overall structure of the program and then eventually code the more detailed routines at the bottom; or, alternatively, the bottom-up approach, where you start with the low-level functions and then proceed upwards. Some languages, such as classical Pascal, basically impose a bottom-up approach, but other languages do not.

Well, the debate is now over. In order to get IntelliSense to work correctly, bottom-up programming is best. IntelliSense wants every class, every method, every property, every field, every method parameter, every local variable properly defined before you refer to it. If that’s not the case, then IntelliSense will try to correct what you’re typing by using something that has been defined, and which is probably just plain wrong."

Similarly, Doxygen's comment generator is forcing us to write the code first. But we can push back, and write our own comment before we go into the implementation, and only later revisit the method to autogenerate the Doxygen documentation lines where we can paste in our original comment as the @brief. Let's not get bullied by our tools into working in a suboptimal order.

This is isn't a radical new idea

Right, it's been around for a while, here are a couple of other blog posts about it and a hacker news discussion.