Ruminations on Computing: November 2005

Wednesday, November 30, 2005

Language Inefficiencies

I spent some time yesterday trying to learn Perl. I'd looked at it some a few years back but never had a use for it. I now have a need to write a tool for our build environment and so, based on what is available, I am required to use Perl. The first thing I noticed about Perl is that it is very powerful. The second thing I noticed is that it is really ugly. There are too many ways to accomplish the same result. Many new languages suffer from this same fate, only to a lesser extent. What I want to talk about is keeping a language small and compact. If there is a way to accomplish a task, please don't invent another one with slightly different syntax just to save a few keystrokes. C++ is pretty good about this. Other than the multiple ways to cast, there is not much redundancy in the language. The namespace is fairly unencumbered by keywords. Modern languages like C#, Perl, Ruby, and Python, however, seem to burn namespace like it is going out of style. They invent new keywords and operators that add nothing to the language. One of my favorite examples comes from C#. The operator 'as' strikes me as wholly unnecessary. The following code accomplishes the same result:

CFoo cf = myObj as CFoo;

- and -

CFoo cf;

if (myObj is CFoo) {

cf = (CFoo) myObj;

}

In both cases, we are checking if myObj is of type CFoo and if so, setting the variable cf to point to it. Why the need for as? What does it add to the language?

Perl is much, much worse. A glaring example is the 'unless' operator. Instead of typing if(!foo), you can type unless(foo). Much better, right? No. There are many other redundancies. && and 'and' do the same thing. I can choose whether or not to use parentheses around my subroutine calls. You can put if before or after the code you want executed. The list could go on almost indefinitely. The worst part about Perl is that the advocates are proud of all of this redundancy. Even newer, more compact languages such as Ruby have redundant operators.

Here is my rule for adding a new operator or keyword to a language: Does this operator give me the ability to do something I couldn't do before? If yes, consider adding it. If no, reject it.

I'm not here to bash Perl or any of the other languages. They all have their place and are powerful. They also all have a large following. However, it seems like they could be even better if they were a bit more careful. Making something already possible merely syntactically easier has the effect of making the language more complex. While making something a bit simpler to express, you have made the language harder to learn and to retain in memory. If only language authors would think a second time before making an addition to their language. Having hundreds of keywords and multiple ways to do everything makes the language harder, not easier to use.

Friday, November 4, 2005

Blog Meltdown?

I've noticed a trend recently. Or, at least I think I have. I have certainly noticed a few datapoints that look like a trend. Blog software is failing to scale. For all the penetration of blogs and the millions of people hosting them, they are starting to fail. Here are a few examples: Scoble's comments started failing. Wil Wheaton found his blog FUBARdN. Belmont Club got too big for blogspot. I'm sure there are lots of other examples but those are the few that I've noticed in the past few months. In each case, the author of the blog had to move to a new location and start again. This is a painful process for everyone. Users need to remember the new url, RSS readers need to be redirected, bookmarks updated, and search engines need to realize that the old site isn't the best answer. As I write this Scoble's new site is #2 on Google, #8 on MSN Search, and #13 on Yahoo.

Why are these sites failing? I think there are two contributing factors. First, the concept of a blog means that it is often run by a normal person. That means a person without an IT staff to back them up. Once a database starts corrupting itself, you have a lot of work if you want to fix things. This might explain WWdN but not Scoble or Belmont Club which both ran on large blogging sites that should, in theory, have had an IT staff behind it. Perhaps not an IT staff dedicated to that site, but at least one committed to the code in general. This comes to my second contention: much of the blog code is poorly written. I suspect that much of this code was hacked together quickly and wasn't really built for or tested to high stability. It takes a different sort of code to merely get something working than it does to get something to scale really big. I think in many ways the web is just starting to learn this. Many successful websites are still overgrown proofs of concept. They haven't had the major shaking-out period that something of this scale normally has.

Are these three sites just anomalies or are they signs of things to come? I suspect that there are a lot more failing sites waiting in the wings.

Thursday, November 3, 2005

You Aren't Gonna Need It - Or Are You?

A coworker recently pointed me at this page from the C2 Extreme Programming Wiki. The basic theory is that, if you find code that isn't being used, delete it. That is fair enough. The controversy comes when the code in question is potentially useful in the future. What then do you do? The XP advocates on C2 say you should delete that as well. I don't agree. If the code in question is potentially useful in the future, I suggest commenting it out (most likely using #if 0). The arguments against leaving the code in generally revolve around confusing someone. They might run into the code and try to understand it only to figure out later that it isn't called. They might be searching the code and run across it. There is also an argument that the code will affect compilation times. If you comment out the code, none of these apply. The response is often why not just delete the code and if someone really needs it, they can go back in the source control system and retrieve it. In theory, they can. In practice, they can't. I haven't run across a source control system that allows for efficient temporal searching. Without that, the only way to know that the code exists is to remember that it does. Not only that, but you also have to remember when it existed. Somehow, I think that is unlikely to happen very often. This leads us then to Steve's rules for retaining dead code. Retain dead code iff:

1) It does something that will potentially be useful in this area of the code later. An example might be a slow c-based algorithm for something that is now optimized in assembly. Someday you may have to port to a new system and you'll want that reference code.

2) It is not causing you problems. If this code starts causing problems, get rid of it.

3) The code works. If you know it is broken, get rid of it.

This also leads to a feature request for authors of code editors: Please color code the code between #if 0 and #endif blocks as a comment.