Ruminations on Computing: June 2007

Saturday, June 30, 2007

Why Computers Can't Play Go

About a decode ago a computer finally beat the best chess mind on the planet. Computers can play nearly every game we've invented better than us. However, one game stands out as unbeaten. The ancient Chinese game of Go. The Times of London has an interesting article on why the computer still can't beat an average human Go player.

The computer can play chess because it can calculate the results of all possible moves and then pick the best one. In chess, there are usually 25-35 possible moves at any point. In Go, there are 250. This makes the brute force strategy a lot harder. Almost impossibly hard. The thinking goes that playing Go well is all about recognizing patterns--something computers are notoriously bad at. If so, being able to play Go well would be a breakthrough affecting lots of other areas of computing from vision to AI.

This reminds me, I still have to find the time to learn to play Go. It's one game I've never played yet.

When Should You Refactor?

Refactoring is the process of changing the structure of code without changing its behavior. It should be used to ease the addition of features. Because the outcome is code that "smells" better, sometimes people get confused and think that refactoring is an end to itself. I disagree with that sentiment. While we should strive to write code that smells good, we shouldn't spend extra effort after we finish beautifying the code. There is always the possibility that no one will need to touch that again in which case the beautifying was wasted effort. After all, the compiler doesn't care how pretty it is.

Remember this: refactoring is not an end to itself. Programmers are paid to add features or fix bugs thus making features work better. Refactoring--by itself--accomplishes neither of these. If you are touching code just to refactor it, stop. Wait until you have a need to change something first.

Here are some rules of thumb for deciding when to refactor:

Even if code is ugly, if you don't have to change it, don't fix it. You just risk breaking things.
If you are adding some functionality and it is hard to implement, refactor.
If you have to change the same code a second time, assume you'll probably be back a 3rd and 4th time and refactor.
If you don't have unit tests, think long and hard before refactoring. Without tests, you can't know you didn't break anything. Write unit tests first if necessary.

The fallout of this is that, when making the first change to a piece of code, you should resist refactoring. If the change will be difficult without refactoring, then do it. However, if there is a "hack" you can do to fix the bug or add the feature, prefer that. Most of the time, ugly and quick is better than pretty and slow. If you come back to the same code again, then refactor.

Thursday, June 28, 2007

Avoiding Overdesign

I'm reading Dreaming In Code and I came across this really cool quote from Linus Torvalds:

Nobody should start to undertake a large project. You should start with a small trivial project, and you should never expect it to get large. If you do, you'll just overdesign and generally think it is more important than it likely is at that stage. Or, worse, you might be scared away by details. Don't think about some big picture fancy design. If it doesn't solve some fairly immediate need, it's almost certainly overdesigned.

He's totally correct. I've seen many projects fail (or at least go way past their ship dates) because they were worried about solving the next problem and not this one. Almost every time we plan for our perceived future needs, we plan wrong. By the time we get to the future, the needs have changed. Instead, it is better to design a flexible product that can be changed in many ways, rather than planning for the next big specific change.

Writing flexible code is possible by following basic OO design principles like writing to interfaces, keeping coupling to a minimum, keeping classes coherent, etc. If you do that, you'll be able to easily refactor your way to success.

Wednesday, June 27, 2007

Trade Accuracy for Understanding

I found myself giving this advice to two people today. It came in the context off preparing a presentation for upper management. The desire was to communicate an understanding of what (and why) we are creating a piece of technology. The difficulty was in trying to convey the information without overwhelming the audience. This can be tricky. Engineers are especially bad at this. Why?

Engineers know a lot about what they are working on. It's fair to say that they know more about what they work on than almost anyone else does. Engineers are also taught to be accurate. Being inaccurate gets you in trouble when you are designing something. You can't be "close enough" in the weight-bearing characteristics of a bridge. You can't be "close enough" to the specification when writing a class driver. You have to be accurate.

Managers of engineers, on the other hand, know less. It's not that we're dumber, it is that we are more diversified. Consider the mind to be able to hold a finite capacity of knowledge. It can either be filled with a lot of knowledge on a few topics or a little knowledge on a lot of topics. Engineers are the former, managers the latter.

Therein lies the rub. Engineers need to convey some subset of their knowledge to management. However, management does not have the same understanding. Management doesn't need it and probably can't afford it.

When you ask an engineer to summarize, he or she will try to be very succinct but not lose any information. This actually amplifies the problem. Now you have the same amount of technical detail but with less explanation. That's not a solution.

Instead, the solution must be lossy. You have to throw away information in order to convey the main point. Sometimes when you throw away that information, things get a little distorted. That is to say, inaccurate. This grates on most engineers. However, it is exactly what is needed.

Let's use an example from another realm of life. There is a story that George Washington cut down a cherry tree when he was young. He was so honest that he went and told his father what he had done. Is the story accurate? We don't know. Probably not fully. That's okay though, we're trying to teach a moral about honesty and to describe the character of the United States' first president. Those goals are both accomplished. Trying to explain how it probably wasn't a cherry tree or it wasn't his fathers or how he took a month to tell or even how in other dealings in life he wasn't always honest may be accurate, but they distort the true picture.

Now let's look at the real world case. We're trying to convey why we should test audio systems for their output level. The engineer wants to say something like, "Full-Scale Output Level (or just Output Level for short) on a PC is the amplitude of the analog signal that comes out of the jack/speakers when a digital full-scale waveform is applied to the codec." There's a great detailed description here. Instead, to describe this quickly one might just say, "Output level is volume and if it isn't high enough, the volume won't be high enough." That's not fully accurate, but it conveys the necessary information better. If someone is really interested in the subject, there is plenty of time to go into detail.

The important thing is to convey a kernel of truth in a way someone can easily latch onto. Conveying something wholly false is bad, but so is conveying the truth in so much detail that it can't be grasped. Often times it is necessary to trade accuracy for understanding.

Thursday, June 14, 2007

Some Thoughts on Smalltalk

As part of my OO class, I'm learning Smalltalk. More specifically, I'm learning Squeak which is an open implementation of Smalltalk. What follows are some of my observations about the language. I'm assuming that most readers are unfamiliar with it as Smalltalk is not one of the popular languages any more.

Smalltalk is a very small core language. The syntax describes classes, objects, variables, blocks (closures), and messages. That's about it. All method calling is via message passing.
Everything else is in the library. Even most control flow operations are part of the library, not language primitives.
The library is all open. Not only the library, but the entire environment. Everything has its source code open. Not just source code that is available for download and compiling. It's all in the environment. When you debug an error being thrown, you can debug through the error generation and handling code. Even the UI from which you called the offending code is at the top of the stack. This has the result that you can modify *everything* about the system should you want to.
Because everything is open, the library feels chaotic. Often the same functionality is implemented in different ways in different parts of the system.
Convention rules over language features. Whereas in most languages there is a syntax for pure virtual functions or interfaces, in Smalltalk, there isn't. When you want to declare a pure virtual function, you just have it call subclassResponsibility (or subclassResponsibilityMarker or fred). If you want to instantiate an abstract class, you can. There are no constructors. There are no truly private methods. Both are merely conventions.
There are no source files. Everything is done in the programming environment. There is a way to export .st files but a) they apparently aren't standard and b) they are only partially human readable. If you want to add a method, you click here. If you want to create a class. Click there. As someone that uses Vim to code, this is very strange indeed. This also has the effect of rendering standard tools like grep and regex useless.
Everything is dynamic. There are no types. There are no (or few) errors found by the compiler. If you want to pass an array to something that takes a string, everything will work until you try to use a string method, when it pukes. If there is a syntax bug in your code, you won't find it until you actually execute the method. No wonder these guys pioneered unit testing.
Precedence rules are strange. There are three kinds of messages and they are each executed in a specific order. Within each class, the ordering is left to right. Standard math conventions are ignored. 8 + 3 * 6 is 66, not 26.
Polymorphism comes from having messages with the same name, not from an interface as modern languages define them. There is no inheritance involved. It's almost like calling methods inside template code or macros in C++.
All data is protected. The only way to access class/instance variables is through methods.

Smalltalk is a very interesting language, but it is old. Newer languages have refined the concepts and implemented them better. I don't think Smalltalk makes a great production language anymore. However, there is a lot useful here. Learning it can be beneficial.

Wednesday, June 13, 2007

Week-Long Management Class

Microsoft invests in its managers. This week it has sent me to a week-long residential management class. This is the second such class I've taken. In this one we learned some common management techniques and theories, but most of the time was experiential. You take away so much more when you actually experience things instead of just hearing or seeing them. It's amazing what you can accomplish with 40 Microsoft managers in a week away from their e-mail. I'll try to share some of my key takeaways with you in the coming days.

Monday, June 11, 2007

Coding for Humans

For this class I'm in I have to read Smalltalk Best Practice Patterns by Kent Beck. He has an interesting quote in the first chapter. He says, "[W]hen you program, you have to think about how someone will read your code, not just how a computer will interpret it." He's right. The biggest thing I see in new programmers (either new graduates or newly self-taught coders) is that they don't understand maintenance. In school you never have to revisit a program after you hand it in. In writing your own code, you don't often revisit it a long time later and you rarely change someone else's code. In industry, there's a lot of work on old code. For instance, we have a test shell that began life as a 16-bit application. The code originally ran on Win 3.1. That's old. It has undergone a lot of changes since then and is now Unicode-clean, 64-bit clean, and fully object oriented. However, some of the old code still survives. The initial code I wrote to test DVD playback 8 years ago is still used daily. If this code were written without thought to the next person, it would have been replaced with code that was. It takes too long to read opaque code. It's too easy to make mistakes when maintaining it.

Always write your code in the most direct manner possible. If you can save a few processor cycles but will make the code harder to read, stop. Write the obvious code until/unless a profiler tells you that code path needs to be optimized. Even then, comment the code well. Especially then, comment the code well. The worst thing is to run into optimized code without comments. It's sometimes not even worth trying to understand.