Thursday, May 15, 2008

Design Principle: Don't Repeat Yourself

There's a design principle I neglected to mention in my initial list but which certainly merits attention.  That principle is this:  whenever possible, don't repeat yourself (DRY).  Put another way, do things one time, in one place rather than having the same or similar code scattered throughout your code base.

There are two main reasons to follow the DRY principle.  The first is that it makes change easier.  The second is that it helps substantially when it comes time for maintenance.

I was once told by an instructor in a design patterns class that the Y2K problem wasn't caused by using 2 digits to represent 4.  Instead, it was caused by doing so all over the place.  While the reality of the situation is a little more complicated than that, the principle is true.  If the date handling code had been all in one place, a single change would have fixed the whole codebase rather than having to pull tons of Cobol coders out of retirement to fix all the business applications.

When I first started working on DVD, I inherited several applications that had been written to test display cards.  They had started life as one large application which switched behavior based on a command-line switch to test various aspects of DirectDraw that DVD decoders relied upon.  Some enterprising young coder had decided we should have separate executables for each of these tests so he made 6 copies of the code and then modified the command-line parser to hard-code the test to be run.  The difficulty here is that we now had 6 copies of the code.  Every time we found a bug in one application, we would have to go make the same fix in the other 5.  It wasn't uncommon for a bug to be fixed in only 4 places.

Shalloway's Law:  “When N things need to change and N>1, Shalloway will find at most N-1 of these things.”

This principle applies to everything you write, not just to copying entire applications.  When you find yourself writing the same code (or substantially the same code) in 2 or more places, it is time to refactor.  Extract a method and put the duplicated code in that method.  When the code is used by more than one application, extract the code into a function call that you put into a shared library.  This way, whenever you want to change something, a change in one place will enhance all callers.  Also, when something is broken, the fix will automatically affect all callers.

Note that this is a principle and not a law.  There are times when substantially similar code is just different enough that it needs to be duplicated.  Consider the alternatives first though.  Can templates solve the problem?  Would a Template Method work?  If the answer is no to everything, then duplicate the code.  You'll pay the price in increased maintenance, but at least you'll be aware of what you are getting yourself into.  It might not be a bad idea to put a comment in the code to let future maintainers know that there's similar code elsewhere that they should fix.


  1. I heard a very similar rule-of-thumb from my OO prof back in college. He worked at SGI and was just teaching as an adjunct, but I still have the phrase burned into my mind because I adhere to it to this day: "If you find yourself doing a copy/paste while you're coding, you're likely doing something wrong".

  2. Object-oriented design and design patterns can seem complex. There are a lot of ideas and cases to consider.

  3. This is my number one rule (its corollary is "don't hard-code constants").
    If there's anything that will make your successor want to hang you in effigy (thanks Raymond), it's copying and pasting code.

  4. No doubt I'm telling Grandma how to suck eggs here but..
    This reminds me very much of the traditional computer science arguments relating to cohesion and coupling.  If you are tempted to cut & paste a piece of code (functionality) from somewhere, with the intention of modifying it slightly, there is a good argument for generalising the existing functionality to handle the new requirements as this will lead to less code to maintain in future.  The counter argument is that by modifying the existing functionality you risk breaking systems that depend on that functionality, and require additional interfaces to that functionality.  Basically, do we increase cohesion (good), or minimise coupling (good), where the two are in opposition.  My OO approach is usually to add a new interface to the existing functionality, in order to minimise potential for breaking existing code, i.e. favouring strong external cohesion at the cost of some extra coupling.  Where I'm adding additional parameters to support new requirements, I base it on the nearest interface, and change the body of that interface to call the new interface with default values for new parameters.  This is an attempt to maximise internal cohesion.
    Interestingly ( or not :) ), I'm seeing quite a bit of discussion on cyclomatic complexity at the moment on another forum, and I find that changing older and legacy interfaces to call newer interfaces as a mechanism for maximising internal cohesion also reduces cyclomatic complexity.
    Really just a load of obscure technical jargon referring to the same thing, and nothing that new to experienced programmers.

  5. [...] a good programmer cultivates the virtue of laziness. (But not just any laziness: you must be aggressively, proactively lazy!) -- Chris Pine, dealing with the DRY rule