Ruminations on Computing: January 2008

Tuesday, January 29, 2008

The Tipping Point: Not True?

There's a popular business book out right now called the Tipping Point by Malcolm Gladwell. In it he explains how big trends are started by a few people. He calls them connectors and mavens. These are the people who know everyone.

I read the book about a year ago and found myself skeptical. Some parts resonated with me, others rang hollow. For example, the author explains Paul Revere's successful ride and the failure of William Dawes (who made a similar ride with much less effect) by saying that Paul Revere was a connector while Dawes was not. Except, he never proves Revere was a connector except by way of results. Connectors are able to move the masses and we can see this because Paul Revere was a connector and moved the masses. How do we know he was a connector? Well, because he moved the masses. That's circular at best.

Fast Company examines the phenomemon described by Gladwell and finds it lacking. A very interesting read.

Sunday, January 27, 2008

Unboxing Drobo

My Drobo arrived today. It is a slick device. As one accustomed to the PC world, even the unboxing was a treat. The typical piece of PC hardware comes in utilitarian packaging. You're just going to throw it away anyway, so why put effort into it? Drobo takes a more Macintosh-like approach. Unboxing is an experience. The box is just brown cardboard like any other, but opening it one is greeted with the words, "Welcome to the world of..." on top. Lifting the smaller box out reveals the word "Drobo" on a cloth bag surrounding the unit. The inside of the main box is all colored black. Even the polystyrene protectors--typically white--are colored black.

The drobo itself is solid. It is smaller than I imagined and looks slick with the glossy front panel and matte sides. The faceplate comes off to reveal the drive bays. The front panel appears to be held in place by magnets making for an easy fit that won't break when the plastic connectors become weak.

I bought Western Digital's new Green Power (GP) drives for it. These have variable spindle speeds and some other features to reduce the power consumption to about half of a typical drive. Along with this comes a reduction in sound and heat. They're a little slower than the top drives, but not too much. Inserting the drives is trivial. Just push them into place.

I'll post later on how well it actually works.

Tuesday, January 22, 2008

Trying Windows Home Server

Over the weekend I installed Windows Home Server on a spare box that I had. So far, I'm impressed. The interface is very slick. Installation of the client software called the "Connector" is easy. Go to a share on the server and install. All connections and setup are automated from there. The main purpose for the system, backup, is easy. By default all drives and everything but temporary files are backed up. You can configure the backup to exclude any drives or directories you want. Backup was quick over my gigabit network. Backups are scheduled each night between 12:00 and 6:00 am. The server will retain 3 backups by default but you can change this. I haven't yet tried restoration from any of the backups so I don't know how well that works. I'll need to try that before I'm fully comfortable with the system.

The installation takes the first 20gb of your largest drive and installs the OS on it. It takes the rest of that drive as for the drive pool. The first drive, also called the primary drive, is reserved for tombstone files which apparently mark the location on the other drives where each of the files resides. Reportedly if the primary drive fails, the tombstone files can be recreated from the additional drives. Backed up data is not stored here unless it is the only drive on the system. Initially it was for me but when I added a second drive, the server automatically rebalanced all of the files to the second drive.

There are two sorts of data that WHS handles. There are backup files and there are shared folders. Backup files are more or less hidden from you and are accessed via a special interface. Shared folders are network shares that can contain files, music, photos, etc. Each of the shared folders can be set to be duplicated across the drives or be left as a single instance. As near as I can tell, the backup files are not duplicated. If someone knows differently, please let me know.

When drives are added to the system, they can either be made part of the storage pool or kept separate. If separate, they act just like any drive on a windows system. If made part of the storage pool, they are virtualized into one large drive. WHS will balance files across the storage pool and can be made to create redundant copies of any file folders you designate. Drives can be removed from the pool if there is space to move the files to other drives.

The server is extensible via what are called add-ins. Installing add-ins is done by copying them to a particular shared folder. After that, they show up as available in the server console. Installing them is just a few clicks. I've found two that are very useful. Whiist allows you to create simple web sites. The Duplication Info add-in shows you which drives duplicated files are located on. It can also be used to see what sort of files are on each drive. By default WHS treats the pool as one large, opaque virtual drive. This lets you penetrate that barrier and see how the server is utilizing the space.

I have visions of making my storage pool be a Drobo. I have a Drobo on order. I'll see if it works at all in this role.

Overall I'm very impressed with Windows Home Server. It's not designed for those who want to be able to turn every knob but for the fire and forget crowd, it's great.

Note: I'm not on the WHS team and don't have any inside information so don't take anything I say here as canon. These are just my observations.

Monday, January 21, 2008

Design to Interfaces

This is the 2nd article in the Design Principles to Live By series.

An interface is--for the purposes of this post at least--an abstract definition of the functionality of an object. It is the signature of a class divorced from its implementation. Think of it as the list of methods (and properties) that define a class. Concrete examples are COM interfaces, Interfaces in C# or Java, or even just classes with at least one (but preferably all) pure virtual function in C++. An interface need not be structural. In late-binding languages like Smalltalk, it can just be an agreed-upon list of methods that all classes of a certain type will implement. In the simplest form, it can just be a list of function prototypes in a C .h file. The important part of an interface is that it abstractly describes a unit of functionality. It forms the contract between the class implementing it and the class calling that implementation.

The use of an interface allows client software to be written without a reliance upon or even knowledge of the underlying implementation. This yields two primary advantages. First, it allows for parallel development. Second, it provides for more flexibility in the design.

If a project is being worked on by more than one programmer, interfaces are essential. The work between multiple coders is usually partitioned into discrete chunks and each person is responsible for one or more of these. Each of these units will have to interact with the other units in some way. The wrong way to code is to wait for the other person to finish their work before beginning yours. If Programmer A is writing some functionality that Programmer B needs, B will have to wait for A to have at least a prototype ready before he can being his work. However, if programmer A designs an interface, he has created a contract for how his code will be interacted with. B can then immediately begin work. The two don't need to coordinate as closely and B is not gates on the work of A. When they combine their work, it will--in theory-- just work. Of course, in practice there will be some kins to work out but they should be minimal.

Let's take an example from my RoboRally project of last summer. There were four of us on the project. We were scattered across the country and couldn't closely collaborate. We partitioned the work into four parts and defined the interfaces that formed the seams between our work. I was coding the game logic. I was responsible for tracking robot locations, moving pieces around, and calculating the interactions between the robots. Another person was responsible for the UI that allowed players to choose their moves. We agreed upon an interface for the players. They would have a list of cards, and a robot. We agreed upon the interface for giving a list of robots to my game engine and for the UI to tell me to process a turn. With that, I was free to write my game engine code without having to worry about how the UI worked. The UI programmer was also freed up to write the turn-selection logic without needed to know how I would later process the turns. All he needed to do was to package the data into the agreed-upon format and I would do the rest.

The second advantage of using interfaces is the flexibility they provide. This advantage is accrued even on a single-coder project. Because the various functional units in the program are written to interfaces and not to concrete classes or implementations, the details of the implementation can be varied without the client code having to change. This is the impetus behind the "encapsulate what varies" principle. It is impossible to predict when the need will arise to change a portion of the code. The more you use interfaces, the more easily the program will be modified. This is especially helpful if the variation turns out to be an addition instead of a replacement. Changing one image encoder for another can be done by changing the implementation of a concrete class, CJpegEncoder. Clients don't need to know anything changed. However, if client code is creating the image encoder class directly and statically linking to its functions, adding a second option for image encoding becomes hard. Each place where the client creates and interacts with the encoder needs to be modified. If, instead, the client code only uses IImageEncoder, the code doesn't need to care if it is interacting with CJpegEncoder or CPngEncoder. It makes the same calls.

Another example. We have tests dating back to the time I began at Microsoft 10 years ago. A few years back a new testing system was rolled out across the team and our tests needed to conform to it. The primary thing that needed to change was the logging. Prior to this testing system, logging was a haphazard affair. Logs were consumed by humans so as long as the text clearly delineated a pass and a failure, everything was fine. The new system had a centralized reporting system including an automated log-parsing system. This required revamping the way each of our tests logged.

We had 3 main classes of tests. There was one class which used a common test harness. This harness could take dlls containing a logger. The interface to the dll was standardized. With one new dll we were able to convert a whole class of applications. No code in the test applications needed to be modified. The second class did their own logging, but each contained a function, Log(string), that did the logging. Each of these applications had to be touched, but the modifications were simple. The Log function was replaced by one that called into the new logging system. The modifications were quick. The third class of tests was the minority of our tests, but these were written without any concept of interfaces. Each did logging in its own way and usually by passing around a file handle which the various test functions used fprintf to log to. This worked fine, at the time, but the implementation was tightly bound to the client code. Each line that logged had to be modified. This took months. The moral of the story: Use interfaces. Even where you don't need them yet.

Friday, January 18, 2008

Prefer Loose Coupling

This is the 3rd post in the Design Principles to Live By series.

Coupling is the degree to which software components depend on each other. Tight coupling is a high degree of dependence. In practical terms, it means one component needs intimate knowledge of how another works in order to successfully interact with it. As much as possible, tight coupling should be avoided. Tight coupling has many negative consequences. It can make testing difficult, increase complexity and limit re-use. Code which is loosely coupled is code which does not need to understand the details of the other components in the system. Interaction should be through well-defined interfaces and semantics.

When one component is heavily reliant upon another, it cannot be tested except in combination with the other. Testing become a much more complex task. Not only does this make life difficult for the test team, but it deters the use of unit tests. Tight coupling can also reduce the testable surface. If ComponentA explicitly creates and then utilizes ComponentB and may even rely on side effects of ComponentB, that seam cannot be tested. ComponentB cannot be replaced which means any interaction with it cannot be tested. Less testing means less confidence in the product and ultimately a resistance to change later.

Each component that is tightly coupled to another component increases the complexity of the system. To understand the system requires understanding the details of each coupled component. Increased complexity in turn increases initial development time, difficulty of change, and thus the resistance to change. Complex units are harder to understand, design, and build. Loosely coupled components can be developed and changed independently of each other. Tightly coupled components musts be created in a coordinated fashion. This implies a delay to one delays the other and a change to one changes the others. Change is similarly a cascaded endeavor. Changing ComponentB requires also changing ComponentA. Anyone relying on ComponentA may also need to be changed. What should be a simple fix turns into a long series of changes and trust in the system decays. Because testing cannot be done in isolation, testing the fix also becomes harder. Because of this, fixes that might be useful are rejected as too costly.

Finally, tightly coupled components are hard to re-use. Using one part requires using many others. The bigger the coupled system, the less likely it is to fit the need. Large systems have a greater likelihood of requiring unacceptable dependencies or use models. It is hard to understand the side-effects of utilizing a large, complex code based. A small, independent component is easier to trust.

Tight coupling is not just a lack of interfaces. It is requiring an understanding of the semantics of the interface. It is possible to have a tightly coupled set of components which all have well-defined interfaces. DirectX Media Objects (DMOs) were envisioned to be small, independent pieces of the DirectShow media pipeline. Unfortunately, each DMO had its own initialization interfaces. There was no standard way to set up the objects. Once set up, the data could flow in an abstracted manner, but setting up a pipeline required knowing what each DMO was so it could be configured properly. This is a tightly coupled system. Each application had to know specifically which DMOs it was using and couldn't use others without first being programmed to understand them.

Another example was a video player application we were working on at one point. It was supposed to have an abstracted pipeline to playback could be done via MediaFoundation or DirectShow or other pipelines. One proposed design utilized PropVariants heavily in its interfaces. Not only are they onerous to use from C++ but they make for terrible interfaces. They look flexible, but in reality cause tight coupling. Each user of the interface has to understand what values will be put into those PropVariants.

Consider this rule of thumb. If you ever see code like this:

switch(type) {

typeA: do something;

typeB: do somethingElse;

default: error;

}

Stop! Your code is too tightly coupled. This isn't the only tight coupling pattern, but it is one of them. Another is if you are reaching into the guts of an object. If you are accessing the members of an object and not through a method/property, you are too tightly coupled.

Wednesday, January 9, 2008

Do We Need A New Kind of CS Degree?

Joel Spolsky suggests that we should have something called a BFA in Software Development. That is, a Bachelor's in Fine Arts focused on creating software. I think he's onto something. I've called for something similar in the past. Presently there are two sorts of degrees that seem to be offered in the market. There are what I call IT degrees that are designed to help a person operate in the information technology field. They teach database programming, visual basic, web design, etc. They don't generally teach hard-core programming. There are also the traditional CS degrees which are geared toward doing research in computer science. Their roots in the math departments of old show and there is a strong emphasis on proofs and mathematical modeling. There is a need for a 3rd option. What Joel calls the BFA in Software Development and what I called Software Craftsmanship is a degree that teaches someone the art of programming. The graduate from a CS school knows how to prove the correctness of a for loop and can characterize an algorithm as NP-complete, but doesn't have much practical experience. There is a certain class of person that shys away from the math required for a CS degree today but who would make a great programmer. Mot of the time these people take a few classes and then quit school to fend for themselves on the open market. If we offered a program that taught better programming techniques focusing on different sorts of languages, longer-term projects, more design patterns/oo, debugging, testing, etc., I think we would find the uptake to be solid. There are a lot of people that want to program but don't want to do research. They also don't want to be a DBA in the back end of some giant corporation. Today, there is no program at the college level for a person like this.

As a counterpoint, several professors are claiming that today's CS programs aren't mathematical enough. I see their point. There are places that need more formal methods. The lack of a BFA in Software Development is probably partially at fault here. CS programs are becoming watered down because students don't want the formal methods. Perhaps splitting the CS degree into CS and SD would help create a pipeline of those who are trained in formal methods and also create a pipeline of those who are trained in creating working software in an efficient manner.

The authors also attack the use of Java as the first programming language. Joel also wrote an excellent post about this a couple years back. This is something I can support. Java is too easy. It's a fine language for writing production code, but it doesn't force the programmer to understand the underlying computer. It hides too much of what is really going on. This means that when the abstraction leaks, those who have only learned Java don't know what to do. They don't understand pointers and memory. They haven't had to write low-level programs of their own and don't understand how those parts might be doing things. They don't interact closely with the OS and don't understand what it might be doing which is causing problems. Anyone who learns to program in C or C++ can learn Java (or C#). The reverse is not true.

Tuesday, January 1, 2008

Two Software Development Worlds

I was recently listening to an interview with Joel Spolsky. The main subject is interviewing and hiring, but in the course of the interview Joel touches on an interesting point. He says that there are two major types of software: Shrinkwrap and Custom (listen around the 40 minute mark). These have very different success metrics and thus should be created in different ways.

Custom

This might also be called internal or in-house software. It is software that is written with one customer in mind and will only be run on one system. This sort of software makes up, Joel claims, 70% of all software being written today. Think about the intranets, inventory management software, etc. that IT groups everywhere are creating.

Joel makes the point that because custom software is only ever going to be used for one purpose and in a restricted environment, there is a steep falloff in return on investment. After the software "works", there is very little advantage to making it better. It is at this point that development should stop.

Another important point is that there really isn't any competition in internal software. Companies won't generally fund two groups to write the same software. This has implications for the definition of success. For internal software, there will be a list of requirements. If those requirements are met, the project is a success. Being a little faster, slightly easier to use, or having one more feature doesn't make a project any more successful.

Shrinkwrap

Shrinkwrap software is software that is created for sale to others "in the wild." This is software that is written for a general user category. It might be shrink-wrapped software like Windows or Office or it might be web-based like Salesforce.com. The important point is that it will be used in many environments by many different people.

Unlike custom software, the return on investment for shrinkwrap software is greater. When the program is functional, it has only met the minimum bar for entry. It must be much better (more robust, more features, more usable) before it can successfully compete. Each feature, bug fix, etc. helps to increase market share.

These differences have implications for the type of person required, the sort of teams, and perhaps even the development methodologies that can be employed. I'll revisit some of these implications in a future post.

Joel talks about some of these implications here.

Welcome to 2008

Not a great start for 2008 out here in Seattle. We had a big fireworks display at midnight on the Space Needle. Unfortunately, there were some glitches. The fireworks were coordinated by a computer, but it had some "glitches" and eventually they had to set them off by hand.

Here's to a great new year. Time to wipe the slates clean, make a few resolutions, and try some new things.