Thursday, November 27, 2008
Happy Thanksgiving 2008!
Wednesday, November 26, 2008
Using Perl for Mass In Place Editing
Have you ever wanted to update some text in a bunch of files all at once without a hassle? I had reason to do this recently and turned to Perl for my solution. Once I found the implementation quirks, it turned out to be quite easy. This functionality is pretty well documented, but not for Windows. There are some gotchas trying to do it there. What follows is what I found to work.
I had a set of files that were database queries for a particular milestone in our product release. Let's call it M1. I needed to update them all for a different milestone that we'll call M2. Each file was an XML version of a query so text parsing seemed like the logical course of action. Perl is good at text processing so I went that route. First off I had to install Perl.
Perl is capable of accepting a single line of code at the command prompt so there is no reason to actually author a perl script. The command for this is -e (execute?). To change something inline, you can use the -i (in place) command. The command looks something like this:
perl -pi.bak -e "s/M1/M2/g" file.txt
The -i.bak means to rename the original files with the .bak extention. In theory -i alone will delete the originals but ActivePerl wouldn't accept this.
The -p tells perl to run the command over every line of standard input (i.e. over every line of the file).
The "s/M1/M2/g" command is a regular expression telling it to substitute M2 for M1 globally. It could be any regular expression. Note that most examples of this online use only single quotes ( ' ), but this doesn't work on Windows. One hint: If the command fails, try adding -w to the command line to generate warnings.
The above command will change all instances of M1 to M2 in file.txt. What I wanted to do was to replace it in every file. Simple, I'll just change file.txt to *.*. Sorry, no dice. ActivePerl doesn't accept this nor does it accept *. Time for some more command-line action. There is a for command that can be utilized at the cmd prompt which fits the bill. Use it like so:
for %i in (*.txt) do perl -pi.bak -e "s/M1/M2/g" "%i"
This command will iterate over all the files (*.txt) and execute the command following the do. You have to quote the trailing %i because filenames might contain spaces.
There you go, quick and dirty text replacement from the command line. Note that perl regular expressions are capable of much more than simple search and replace. You can use this technique to accomplish anything they can.
Is there an even simpler way to do this? There probably is. If so, please let me know in the comments.
12/4/08 - Updated with -p thanks to Maurits.