Objects in the shell: why PowerShell’s design makes sense to me

A while back, a friend said “PowerShell is what happens when you ask a bunch of drunk lizards to make Bash shitty.” Another friend replied that his understanding is that PowerShell is driven by a desire to “modernize” the shell by piping objects instead of strings. Now knowing that I got my start as a Unix and Linux sysadmin, you might expect me to take the “it’s Bash, except awful” side. But you’d be wrong.

Full disclosure: I have not used PowerShell in any meaningful sense. But from what I know about it, it represents a real improvement over the traditional Unix shell (of whatever flavor) for certain use cases. Some sysadmins lionize the shell script as the pinnacle of sysadminry. This is in part because it’s what we know and also because it’s hard. Oh sure, writing trivial scripts is easy, but writing good, robust scripts? That can be a challenge.

Shell scripts are a glue language, not a programming language (yes, you can write some really complicated stuff in shell scripts, but really what you’re doing is gluing together other commands). PowerShell, in my view, is closer to a programming language that you can script with. This fits well with the evolution in systems administration. Sysadmins in most environments are expected to be able to do some light programming at a minimum. We’re moving to a world where the API is king. Provisioning and configuring infrastructure is now a very code-heavy exercise.

The object focus of PowerShell is truly a compelling feature. I think about all the times I’ve had to use awk, sed, cut, and others to slice up the output of a command in order to feed selective parts into the next or to re-order the output. A machine-parseable medium like JSON or XML makes programmatic gluing much easier.

When running interactively in the shell, strings are much easier for humans to deal with. In those cases, convert the objects to strings. But crippling machine output for the sake of humans doesn’t seem productive. At least not when both can get what they need.

If the Unix shell were being designed today, I think it would have some very PowerShell-like features. PowerShell has the advantage of being a latecomer. As such, it can learn from mistakes of the past without being constrained by legacy limitations. Microsoft is serious about making Windows a DevOps-ready platform, as Jeffrey Snover said at LISA 16. To do this requires a break from historical norms, and that’s not always bad.

Parsing SGE’s qacct dates

Recently I was trying to reconstruct a customer’s SGE job queue to understand why our cluster autoscaling wasn’t working quite right. The best way I found was to dump the output of qacct and grep for {qsub,start,end}_time. Several things made this unpleasant. First, the output is not de-duplicated on job id. Jobs that span multiple hosts get listed multiple times. Another thing is that the dates are in a nearly-but-not-quite “normal” format. For example: “Tue Mar 18 13:00:08 2014”.

What can you do with that? Not a whole lot. It’s not a format that spreadsheets will readily treat as a date, so if you want to do spreadsheety things, you’re forced to either manually enter them or write a shell function to do it for you:

function qacct2excel { echo "=`date -f '%a %b %d %T %Y' -j \"$1\"  +%s`/(60*60*24)+\"1/1/1970\"";

The above works on OS X because it uses a non-GNU date command. On Linux, you’ll need a different set of arguments, which I haven’t bothered to figure out. It’s still not awesome, but it’s slightly less tedious this way. At some point, I might write a parser that does what I want qacct to do, instead of what it does.

It’s entirely possible that there’s a better way to do this. The man page didn’t seem to have any helpful suggestions, though. I hate to say “SGE sucks” because I know very little about it. What I do know is that it’s hard to find good material for learning about SGE. At least HTCondor has thorough documentation and tutorials from HTCondor Week posted online. Perhaps one of these days I’ll learn more about SGE so I can determine whether it sucks or not.

Ugly shell commands

Log files can be incredibly helpful, but they can also be really ugly.  Pulling information out programmatically can be a real hassle.  When a program exists to extract useful information (see: logwatch), it’s cause for celebration.  The following is what can happen when a program doesn’t exist (and yes, this code actually worked).

The scenario here is that a user complained that Condor jobs were failing at a higher-than-normal rate.  Our suspicion, based on a quick look at his log files, is that a few nodes are eating most of his jobs.  But how to tell?  I’ll want to create a spreadsheet that has the job ID, the date, the time, and the last execute host for all of the failed jobs.  I could either task a student to manually pull this information out of the log files, or I can pull it out with some shell magic.

The first step was to get the job ID, the date, and the time from the user’s log files:

grep -B 1 Abnormal ~user/condor/t?/log100 | grep "Job terminated" | awk '{print $2 "," $3 "," $4 }' | sed "s/[\(|\)]//g" | sort -n > failedjobs.csv

What this does is to search the multiple log files for the word “Abnormal”, with one line printed before each match because that’s where the information we want is.  To pull that line out, we search for “Job terminated” and then pull out the second, third, and fourth fields, stripping the parentheses off of the job ID, sorting, and then writing to the file failedjobs.csv.

The next step is to get the last execute node of the failed jobs from the system logs:

for x in `cat failedjobs.csv | awk -F, '{print $1}'`; do
host=`grep "$x.*Job executing" /var/condor/log/EventLog* | tail -n 1 | sed -r "s/.*<(.*):.*/\1/g"`
echo "`host $host | awk '{print $5}'`" >> failedjobs-2.csv;
done

Wow.  This loop pulls the first field out of the CSV we made in the first step.  The IP address for each failed job is pulled from the Event Logs by searching for the “Job executing” string.  Since a job may execute on several different hosts in its lifetime, we want to only look at the last one (hence the tail command), and we pull out the contents of the angle brackets left of the colon.  This is the IP address of the execute host.

With that information, we use the host command to look up the hostname that corresponds to that IP address and write it to a file.  Now all that remains is to combine the two files and try to find something useful in the data.  And maybe to write a script to do this, so that it will be a little easier the next time around.