Ruminations on Computing: April 2008

Monday, April 28, 2008

A Microsoft-Yahoo Takeover Primer

Marc Andreessen has a great blog post today laying out the possibilities in the Microsoft-Yahoo talks. Unlike most posts on the subject, this one isn't trying to guess what might happen. Instead, it lays out the options and the forces affecting those options. What is a proxy battle? How would it take place? Who are the investors we're talking about? What is a tender offer? How is it affected by a poison pill? If you are following the subject, check out his post. It's a good primer for the rest of the pontificating on the subject.

Prefer Composition Over Inheritance

It's probably about time to bring my "Design Principles To Live By" series to a close. This is the last scheduled topic although I have one or two more I may post.

Let's begin with some definitions:

Composition - Functionality of an object is made up of an aggregate of different classes. In practice, this means holding a pointer to another class to which work is deferred.

Inheritance - Functionality of an object is made up of it's own functionality plus functionality from its parent classes.

For most non-trivial problems, there will be similar code needed by multiple classes. It is not a wise idea to put the same code in more than one place (a topic for another day). There are two strategies in object-oriented programming which attempt to solve the problem of duplicate code. The one most popular in the early days was inheritance. Shared functionality was implemented in a base class which allowed each child class to inherit that functionality. A child would just not implement foo() and the parent would do the work. This works, but it is not very flexible.

Suppose that the shared functionality is some kind of encryption algorithm. Each child class will only inherit from one base class. What if there is a for different encryption algorithms? It would be possible to have multiple base classes, say AESEncryptionBase and DESEncryptionBase, but this necessitates multiple copies of the child classes--one for each base class. With more than 2 base classes, this become untenable. It also becomes very difficult to change out the encryption routine at runtime. Doing so means creating a new object and copying the contents of the old object to it.

Another difficulty is the distortion of otherwise clean class hierarchies. Each child should have an "is-a" relationship with its parent. Is a music file and AESEncryptionBase? No. Here is a particularly telling examples from Smalltalk. In Squeak (the dominant open-source Smalltalk implementation), Semaphore inherits from LinkedList. Is Semaphore a linked list? No. A linked list is used in the implementation, but a sempahore is not a specialization of linked lists.

A better approach is to contain the new functionality via composition. A class should contain instances of objects it needs to utilize functionality from. In the music file case, it would have a pointer to an EncryptionImpl class which might be AES, DES, or ROT13. The class hierarchy will stay smaller and the music file implementation does not even need to be aware of which encryption method it is using. In the Semaphore case, Semaphore would contain a LinkedList object which it would use to do the work. Clients of Semaphore would not be expecting LinkedList functionality. Extraneous methods would not need to be disabled. Composition would also allow for more flexibility later. If an implementation based on a heap or a prioritized queue were found to be advantageous, they could be without clients of Semaphore knowing.

Think twice before inheriting functionality. There are times when it is a good idea such as when there is a logical default behavior and only some child classes need to over-ride it, but if the intent is to utilize the functionality rather than expose it to child class callers, composition is almost always the right decision.

Friday, April 25, 2008

A History of Filesystems

Ars Technica has a very interesting article about the history of filesystems. They cover all the major systems including FAT (MS-DOS), HFS (Mac), NTFS (NT), Ext2/3 (Linux), and many others like the Amiga. They also cover upcoming systems like ZFS. If you have interest in the systems space, check it out.

Monday, April 21, 2008

Know That Which You Test

Someone recently related to me his experience using the new Microsoft Robotics Studio. He loaded it up and proceeded through one of the tutorials. To make sure he understood, he typed everything in instead of cutting and pasting the sample code. After doing so, he compiled and ran the results. It worked! It did exactly what it was supposed to. The only problem--he didn't understand anything he had typed. He went through the process of typing in the lines of code, but didn't understand what they really meant. Sometimes testers do the same thing. It is easy to "test" something without actually understanding it. Doing so is dangerous. It lulls us into a false sense of security. We think we've done a good job testing the product when in reality we've only scratched the surface.

Being a good tester requires understanding not just the language we're writing the tests in, but also what is going on under the covers. Black-box testing can be useful, but without a sense of what is happening inside, testing can only be very naive. Without breaking the surface, it is nearly impossible to understand what the equivalency classes are. It is hard to find the corner cases or the places where errors are most likely to happen. It's also very easy to miss a critical path because it wasn't apparent from the API.

There are three practices which help to remedy this. First, program in the same language as whatever is being tested. A person writing tests written in C# against a COM interface will have a hard time beginning to understand the infrastructure beneath the interface. It can also be difficult to understand the frailties of a language different than the one being coded in. Each language has different weaknesses. Thinking about the weaknesses of C++ will blind a person to the weaknesses of Perl. Second, use code coverage data to help guide testing. Examining code coverage reports can help uncover places that have been missed. If possible, measure coverage against each test case. Validate that each new case adds to the coverage. If it doesn't, the case is probably covering the same equivalency class as another test. Third, and perhaps most importantly, become familiar with the code being testing. Read the code. Read the specs. Talk to the developers.

Friday, April 11, 2008

Slow blogging season

I apologize for the very light blogging of late. I've been busy working on the project for my latest class at the University of Illinois. CS classes really take a lot of time at the end of the semester. At the beginning you just have reading, homework, and lectures. At the end they pile a project on top of that. Depending on the class, that can mean a lot of work. This isn't the worst, but I'm adding code to a large codebase which means a lot of time spent understanding it and a little time coding. It's a lot simpler to write a project from scratch than to add functionality to something large. As I'm taking a 500-level OS class, we're modifying an OS (Windows CE 6.0) and thus the code base is pretty big.

It's coming together and I hope to get back to blogging more soon. I just took a Microsoft class for Senior SDETs and have a lot of interesting ideas to blog about...