Once you know how to use some of the Unix command line utilities, it is hard to go back to doing things the Microsoft way – where data manipulation seems to always be an awkward cut and paste/find and replace/filter with Excel.

GNU logoI have found that Native Win32 ports of some GNU utilities fills the gap nicely. It is a collection of some of the more useful utilities generally found on Unix, and the 3 MB zip file fits nicely on a USB key. Ideal for the times when you just want to grep through a file and don’t want to (or are not permitted to) install a more complex Unix-like environment such as Cygwin.

Some of the utilities I have found useful are…

If I had two files that contained a list of usernames and I wanted to remove any duplicates, I could use comm to create a list of usernames only in file1, usernames only in file2, and usernames in both files. comm file1.txt file2.txt

If I had a csv file containing columns for username, first name, surname, and password, I could easily create a new file with columns for just username and password with cut -d, -f1,4 userfile.txt.

diff shows the difference between two files. diff3 shows the difference between three files. sdiff shows the differences side by side (kind of like windiff, but text-based).

GNU Awk is a very powerful text manipulation tool.

Searching through a file with a regular expression is much more powerful than the Windows search facility.

If I had two csv files, one with columns username, and password, and the other with columns username, first name, and surname, I could use join to achieve the same result as a join in SQL – adding first name and surname to the same line as password, wherever the usernames matched. Join can also be used to join columns where there is no match; it could be considered the opposite operation to cut.

Generates an md5 hash of the input file. This is useful for checking the integrity of some of the software you download. Once I had to create a large number of user accounts before running a load test. Rather than creating them through the application front-end by hand or with a LoadRunner script, I dumped them directly into the database. The password field was an md5 hash of the user’s password.

The stream editor is generally used for simple one-line find and replace operations. I have never used it as Awk seems to do much the same task. Note that LoadRunner uses sed to convert C-based web scripts to Java (look for the file web_to_java.sed in your LoadRunner installation directory).

Sorts a file. Sort can accept multiple files as input, so it is also useful if you want to join web server logs from multiple servers and sort them by time before analysing them.

Splits a single file into multiple files. This can be based on the number of lines or the number of bytes. If you’ve ever needed to quickly divide a file of usernames between automated test scripts, this is much less painful than the old Excel cut-and-paste.

Output unique lines in the input file. This utility only compares adjacent lines, so it is usually necessary to sort the list first.

Counts the number of lines, words and characters in the input file.

If you’ve ever wanted to download web content without having to sit in front of your computer while you do it, wget can be a very powerful tool. Making a copy of an entire website is as simple as wget −−mirror −−domains=www.myloadtest.com −−wait==5 www.myloadtest.com


Published On: January 8, 2005Tags:

One Comment

  1. Anonymous June 19, 2005 at 7:53 pm - Reply

    See a good explaination of wget at everything2.

Leave A Comment