Monday, February 28, 2005

Why building software isn’t like building bridges

I was having a conversation with a friend the other night and we came across the age-old “software should be like building buildings” argument.  It goes something like this:  Software should be more like other forms of engineering like bridges or buildings.  Those, it is argued, are more mature engineering practices.  If software engineering were more like them, programs would be more stable and projects would come in more on time.  This analogy is flawed.

Before I begin, I must state that I’ve never engineered buildings or bridges before.  I’m sure I’ll make some statements that are incorrect.  Feel free to tell me so in the comments section.

First, making software, at least systems software, is nothing like making buildings.  Engineering a bridge does not involve reinventing the wheel each time.  While there may be some new usage of old principles, there isn’t a lot of research involved.  The problem space is well understood and the solutions are usually already known.  On the other hand, software engineering, by its very nature, is new every time.  If I want two bridges, I need to engineer and build two bridges.  If I want two copies of Windows XP, I only engineer and build it once.  I can then make infinite perfect copies.  Because of this software engineering is more R&D than traditional engineering.  Research is expected to have false starts, to fail and backtrack.  Research cannot be put on a strict time-line.  We cannot know for certain that we’ll find the cure for cancer by March 18, 2005.

Second, the fault tolerances for buildings are higher than for software.  More often than not, placing one rivet or one brick a fraction off won’t cause the building to collapse.  On the other hand, a buffer overflow of even a single byte could allow for a system to be exploited.  Buildings are not built flawlessly.  Not even small ones.  I have a friend who has a large brick fireplace inside their room rather than outside the house because the builders were wrong when they built it.  In large buildings, there are often lots of small things wrong.  Wall panels don’t line up perfectly and are patched over, walls are not square to each other, etc.  These are acceptable problems.  Software is expected to be perfect.  In software, small errors are magnified.  It only takes one null pointer to crash a program or a small memory leak to bring a system to its knees.  In building skyscrapers, small errors are painted over.

Third, software engineering is incredibly complex—even compared to building bridges and skyscrapers.  The Linux kernel alone has 5.7 million lines of code.  Windows 98 had 18 million lines of code.  Windows XP reportedly has 40 million lines of code.  By contrast, the Chrysler building has 391,881 rivets and 3.8 million bricks.

Finally, it is a myth that bridge and building engineering projects come in on time. One has to look no further than Boston's [thanks Mike] Big Dig project to see that.  Software development often takes longer and costs more than expected.  This is not a desirable situation and we, as software engineers, should do what we can to improve our track record.  The point is that we are not unique in this failing.

It is incorrect to compare software development to bridge building.  Bridge building is not as perfect as software engineers like to think it is and software development is not as simple as we might want it to be.  This isn’t to excuse the failings of software projects.  We can and must explore new approaches like unit tests, code reviews, threat models, and scrum (to name a few).  It is to say that we shouldn’t ever expect predictability from what is essentially an R&D process.  Software development is always doing that which has not been done before.  As such, it probably will never reliably be delivered on time, on budget, and defect free.  We must improve where we can but hold the bar at a realistic level so we know when we've succeeded.

18 comments:

  1. I actually agree with the meat of your post 100%, and don't care to argue those points at all. I would however like to point out that the Big Dig fiasco took place in the jolly old city of Boston, rather than in NJ.

    ReplyDelete
  2. But it's easy to see why confusion might arise to an un-experienced person.


    Often software serves to function in the same way as a bridge or building, and so a metaphor may be thought of. Expanding this metaphor to places it doesn't belong is the problem.

    ReplyDelete
  3. Thanks for the correction Mike. I've made a change in the post to give credit to Boston for the Big Dig.

    ReplyDelete
  4. Some new wheels were invented in the construction of the Millau Bridge. http://www.technologystudent.com/struct1/millau1.htm

    ReplyDelete
  5. I agree that, for the most part, making software is not engineering. I've been struggling with the pros/cons of that lately. Ok - maybe I've just been inspired by the sudden wealth of new "process" we have for Longhorn. Or maybe inspired a bit by my father, who is an actual engineer and likes to poke fun at my job title.


    (IMHO) The real reason is because engineering involves building things whereas making software consists mostly of creating bad analogies. Watch me try my hand at it, proving I'm not an engineer. ;-)


    The reason working on Windows isn't really engineering (despite our job titles) is that people's lives don't depend on it. Why do those civil engineers building the bridges tend to use proven techniques/materials/etc.? Why do they have professional licensure? Because bridges can kill people. If (when?) Windows runs on a kidney dialysis machine, there will be a need for real engineers and not just engineers in name. Like those NASA folks James mentioned, for example.


    I'm not buying the "fault tolerances different" argument, either. Software can have bugs and still work. There can even be buffer overruns (ghasp!). Sad as it may seem, people don't expect software to be perfect. Have any idea how commonly people reboot to solve their problem on a home computer? They probably complain about it in something like the way I complain about the poor design of the paper towel dispenser in the men's room at work. I try to pull one towel out and half the time the whole stack falls out and ... you get the idea. If paper towel dispenser doesn't work as something analagous to bug for you I can similarly rant about how there's always at least one elevator broken in one building I go to or how the exit door nearest my office always seems to stick. My architectural UX sucks. Minor hassles. I can't fix them, so I just accept it and move on.

    Note: Those flaws in the building were not dire problems in the building's core architecture. If that were horribly flawed the building would be condemned. People's lives again. Not like software, really. I'd like to formally apologize to that tortured analogy.

    Also note: I'm not saying any of this is an excuse for being a bad tester and I'm not saying I want to ship bugs - just trying to add perspective.

    ReplyDelete
  6. Interesting Discussion... I'm in the same boat as Drew. My father is a Civil Engineer and catch hell sometimes for the "improper" use of the word engineer. Some of my thoughts can be found here:


    http://blogs.msdn.com/jeremyk/archive/2005/03/0

    2/383759.aspx

    ReplyDelete
  7. I agree with Drew. As long as computers are not central to life or livelihood, the systems that control them can be treated as "art." The artist says, "this is my statement, deal with it." Once they ARE central to life or livelihood -- necessary for our jobs, controlling our automobiles or home appliances, regulating medical equipment, etc. -- then they must leave the realm of art and enter the realm of true science and (civil) engineering.


    I think we are at, or very close to, the time when people are not just able, but REQUIRED by the realities of society and economy, to "live" in the infomatic "homes" our industry produces. So it might be worthwhile to recall the Code of Hammurabi:


    "If a builder build a house, and it collapse and kill the owner, the builder shall be put to death."


    How many people are trusting their jobs and their fortunes to Windows, or personal computers in general? Let's rephrase that: how many MILLIONS of people are doing so? How soon before lives are at stake? What if we had to live under the Code of Hammurabi, as applied to software "structures"?


    Back when I was writing software for a living, I always tried to pretend that I was subject to Hammurabi's laws. I got as close as I could to meeting that standard, but had Hammurabi's laws actually been enforced, I wouldn't be here now! Fortunately, putting the bar that high was considered a game of "overkill" in those days. Perhaps not so much anymore.

    ReplyDelete
  8. I think that software development definitely belongs in an engineering class. Mainly because engineering is the process of solving difficult or new problems. I think the comparisons here in this post are apples and oranges but both require the same type of problem-solving skills. I mean this is why they made us take all of those silly 'weed-out' courses in college, right?

    ReplyDelete
  9. Everyone, make sure to catch the parallel discussion going on over at JeremyK's blog as well (link above).


    I think there are probably 2 issues here:


    1) Is software development an "engineering" discipline?

    2) Can adopting more process from "real engineering" make software engineering better?


    To the first, I say it is largely a semantic argument. There are aspects of software development that are very much like engineering. If engineering is, as Jeremy says, "the application of known technology, materials, and knowledge to solve a problem" then software development is clearly engineering. It is, however, a different sort of engineering than civil engineering. Civil engineering, with so many lives on the line is, by its nature, very conservative. Software engineering can be this way too. I recall back in high school hearing a Boeing engineer explain what they had to do to develop software for airplanes. One thing that stuck out to me is that they don't ever use dynamic memory allocation. There are no news or deletes in their code. This is software engineering that is more analogous to civil engineering. It's also software engineering that will develop new capabilities very slowly. Most software development is not so conservative. It doesn't have to be. Does this make it not an engineering practice? Opinions will differ but I say no.


    To the second point, which is where the meat of my original essay was, I think the answer is not really. The answer to the first question, that software development and civil engineering have vastly different goals, dictates that the answer to the second has to be largely no. Unless software development wants to cut back vastly on what we try to accomplish, we cannot become as fault-tolerant as, say, a bridge. A good example is Trusted Solaris. I don't have any experience with this product lately but 10 years ago it was much more secure and stable than the untrusted variant, it was also much slower and more feature-deprived. There is a tradeoff.


    The point here is that we must acknolwedge that our craft is materially different than civil engineering and that there is no free lunch in becoming more like civil engineers. It would have a radical effect on what we can accomplish.

    ReplyDelete
  10. I agree with Steve that there is "no free lunch" in becoming more like civil engineers, and also that doing so would have a radical effect on "what you can accomplish." Both are true, as far as they go.


    The questions that such statements dodge, are, howewer, "What is 'accomplishment'"? And, "To what extent is a software package 'art and entertainment,' as opposed to practical, reliable tool?" When you ship software under the status quo, what utility and value have you provided the customer, exactly? If you ship something that has to be rebooted five or six times a day, occasionally eats data (inspiring a paranoid "save often" and "backup nightly" regimen), is subject to frequent re-installation or updating, and is susceptible to virus attacks or other security breaches, you have to count the user's frustration, not to mention time and data lost, against whatever "improvement" in his situation your software purports to provide.


    The challenge to produce trusted, reliable software on anything faster than a glacial timetable is, probably, not as sexy as the challenge to get the latest bell or whistle to market before the other guy. But it is a REAL engineering challenge. Software developers have to decide how much they want to be entertainers and dabblers, and how much they want to produce packages that "just work," conveniently, efficiently, and reliably. If they decide to go down the latter path, then their task will be to try to put as much well-characterized innovation into each successive product generation as possible, and to reduce the amount of time between generations. That will NOT be done by maintaining that software is essentially "R&D" (and more "R" then "D"). It WILL be done (I believe) by people who understand the different, yet complementary and interdependent natures of "R" and "D," and who structure their operations to optimize the execution of each development stage.


    Anyone who likes the "R&D," high-uncertainty approach to development should probably veer over into games and entertainment wares. I don't say this mockingly. The point is that, while art and toolmaking (or appliance making) can be blended, the expectations for art are entirely different than the expectations for tools and appliances: still, both are vitally needed in our society, and the practitioners in each area are to be respected for the different value they contribute.


    Just, please, don't treat an OS as if it were a one-of-a-kind work of art. It IS a building. Architects can make buildings beautiful, and that's to be encouraged. But at heart, they are to serve a purpose with convenience, safety and reliability.

    ReplyDelete
  11. James, I'm afraid you're reading too much into what I'm saying. I am not trying to justify shoddy code, especially at the operating system level. We must, however, recognize that there are different standards for different levels. The scheduler and the virtual memory system have to be like bridges. In Windows NT/2000/XP, they are. code changes are taken very seriously and a lot of effort is put into quality at that level. Because of this, you'll rarely see an XP machine blue screen except for a driver bug (which we don't write).


    That, however, is not really the point of my original essay. What I was attempting to do was to counteract the notion that if we just act more like civil engineers, we'll finish projects on time, on budget, and with no bugs. Civil engineering projects often run over their deadlines and over their budgets. They do ship with few (vital) bugs. However, they also ship with few features.


    The implications of this are clear. Treat the critical parts of an OS or system like civil engineering and make it stable first and add features later. However, at the periphery, this doesn't work. History is repleat with examples of solid, featureless products that were defeated by more flashy, but less stable products. Consumers want features and price, not stability. Why else would they buy $30 no-name DVD players instead of something from Toshiba or Sony which probably last twice as long and have half as many bugs?


    We at Microsoft are trying, via the trustworthy computing iniative, to walk the line between features and stability on the periphery. How well are we doing that? This remains to be seen. Windows Server 2003 had a lot of features and was also very solid. We might be doing well. Watch and see.

    ReplyDelete
  12. Software Development IS like bridge building when you have a bunch of software developers involved in a contest to make a bridge out of some random parts. See my post, "Bridge Building and Software Development" at http://blogs.msdn.com/dave_froslie/archive/2005/03/07/389113.aspx.



    One point that you can take out of my post is that I took most of the engineering out of bridge building. I didn't know the strength of my materials. I didn't try to calculate the forces applied to various components to see where my weak points were on the bridge. I didn't run a simulation to learn more about my design. I'm not a civil engineer, but I would guess that these would be parts of his approach to a typical design.


    I would agree with much of Steve's commentary, but yet there are still some similiarities between the disciplines in terms of challenging requirements, the use of design patterns, and project management challenges. As Steve concludes, there is no doubt that we have to continue to get better in our software development endeavors.

    ReplyDelete
  13. I was listening to an interview with Alistair Cockburn tonight on my way home and thought he had some...

    ReplyDelete
  14. Scott Rosenberg just published a new book called Dreaming in Code about a project to create a new personal

    ReplyDelete
  15. PingBack from http://warrenseen.com/blog/2007/02/21/a-freelance-programmers-manifesto/

    ReplyDelete
  16. Jt Gleason contends that building software is not like building bridges because of the halting problem.

    ReplyDelete
  17. Attention Mapping: The 10-Point Exercise (tags: webdesign process) A Freelance Programmer’s Manifesto (tags: freelance business software) Why building software isn’t like building bridges (tags: software development)...

    ReplyDelete