How I broke KDE Plasma by changing my shell (and also writing a bad script)

My friends, I’d like to tell you the story of how I spent Monday morning. I had a one-on-one with my manager and a team coffee break to start the day. Since the weather was so nice, I thought I’d take my laptop and my coffee out to the deck. But when I tried to log in to my laptop, all I had was the mouse cursor. Oh no!

I did my meeting with my manager on my phone and then got to work trying to figure out what went wrong. I saw some errors in the journal, but it wasn’t clear to me what was wrong.

Aug 31 09:23:00 fpgm akonadi_control[5155]: org.kde.pim.akonadicontrol: ProcessControl: Application '/usr/bin/akonadi_googlecalendar_resource' returned with exit
code 253 (Unknown error)
Aug 31 09:23:00 fpgm akonadi_googlecalendar_resource[6249]: QObject::connect: No such signal QDBusAbstractInterface::resumingFromSuspend()
Aug 31 09:23:00 fpgm akonadiserver[5159]: org.kde.pim.akonadiserver: New notification connection (registered as Akonadi::Server::NotificationSubscriber(0x7f4d9c0
10140) )
Aug 31 09:23:00 fpgm akonadi_googlecalendar_resource[6249]: Icon theme "breeze" not found.
Aug 31 09:23:00 fpgm akonadiserver[5159]: org.kde.pim.akonadiserver: Subscriber Akonadi::Server::NotificationSubscriber(0x7f4d9c010140) identified as "AgentBaseC
hangeRecorder - 94433180309520"
Aug 31 09:23:01 fpgm akonadi_googlecalendar_resource[6249]: kf5.kservice.services: KMimeTypeTrader: couldn't find service type "KParts/ReadOnlyPart"  
                                                           Please ensure that the .desktop file for it is installed; then run kbuildsycoca5.

What broke

Before starting the weekend, I had updated all of the packages, as I normally did. But none of the updated packages seemed relevant. I hadn’t done any weird customization. As “pino|work” in IRC and I tried to work through it, I remembered that I had added a startup script to set the XDG_DATA_DIRS environment variable in the hopes of getting installed flatpaks to show up in the menu. (Hold on to this thought, it becomes important again later.)

I moved it out of the way to get things cleaned up (by removing the plasma-org.kde.plasma.desktop-appletsrc and plasmashellrc files). Looking at the script, I realized I had a syntax error (a stray single quote ended up in there) while trying to set XDG_DATA_DIRS. Yay! That’s easy enough to fix.

Why it broke

Except it was still broken. It was broken because I referred to XDG_DATA_DIRS but it was undefined. Why didn’t it inherit it? Ohhhhh because fish doesn’t use the /etc/profile.d directory.

So remember how I did this in order to get Flatpaks to show up in my start menu? I could have sworn they did at some point. It turns out that I was right. The flatpak package installs the scripts into /etc/profile.d, which fish doesn’t read. So when I switched my shell from Bash to fish a while ago, those scripts never ran at login.

How I “fixed” it

To fix my problem, I could have written scripts that work with fish. Instead, I decided to take the easy route and change my shell back to bash. But in order to keep using fish, I set Konsole to launch fish instead of bash. Since I only ever do a graphical login on my desktop, that’s no big deal, and it avoids a lot of headache.

The bummer of it all is that I lost some of the configuration I had in the files I deleted. But apparently the failed logins made it far enough to modify the files in a way that Plasma doesn’t like. At any rate, I didn’t do much customization, so I didn’t lose much either.

Objects in the shell: why PowerShell’s design makes sense to me

A while back, a friend said “PowerShell is what happens when you ask a bunch of drunk lizards to make Bash shitty.” Another friend replied that his understanding is that PowerShell is driven by a desire to “modernize” the shell by piping objects instead of strings. Now knowing that I got my start as a Unix and Linux sysadmin, you might expect me to take the “it’s Bash, except awful” side. But you’d be wrong.

Full disclosure: I have not used PowerShell in any meaningful sense. But from what I know about it, it represents a real improvement over the traditional Unix shell (of whatever flavor) for certain use cases. Some sysadmins lionize the shell script as the pinnacle of sysadminry. This is in part because it’s what we know and also because it’s hard. Oh sure, writing trivial scripts is easy, but writing good, robust scripts? That can be a challenge.

Shell scripts are a glue language, not a programming language (yes, you can write some really complicated stuff in shell scripts, but really what you’re doing is gluing together other commands). PowerShell, in my view, is closer to a programming language that you can script with. This fits well with the evolution in systems administration. Sysadmins in most environments are expected to be able to do some light programming at a minimum. We’re moving to a world where the API is king. Provisioning and configuring infrastructure is now a very code-heavy exercise.

The object focus of PowerShell is truly a compelling feature. I think about all the times I’ve had to use awk, sed, cut, and others to slice up the output of a command in order to feed selective parts into the next or to re-order the output. A machine-parseable medium like JSON or XML makes programmatic gluing much easier.

When running interactively in the shell, strings are much easier for humans to deal with. In those cases, convert the objects to strings. But crippling machine output for the sake of humans doesn’t seem productive. At least not when both can get what they need.

If the Unix shell were being designed today, I think it would have some very PowerShell-like features. PowerShell has the advantage of being a latecomer. As such, it can learn from mistakes of the past without being constrained by legacy limitations. Microsoft is serious about making Windows a DevOps-ready platform, as Jeffrey Snover said at LISA 16. To do this requires a break from historical norms, and that’s not always bad.

Ugly shell commands

Log files can be incredibly helpful, but they can also be really ugly.  Pulling information out programmatically can be a real hassle.  When a program exists to extract useful information (see: logwatch), it’s cause for celebration.  The following is what can happen when a program doesn’t exist (and yes, this code actually worked).

The scenario here is that a user complained that Condor jobs were failing at a higher-than-normal rate.  Our suspicion, based on a quick look at his log files, is that a few nodes are eating most of his jobs.  But how to tell?  I’ll want to create a spreadsheet that has the job ID, the date, the time, and the last execute host for all of the failed jobs.  I could either task a student to manually pull this information out of the log files, or I can pull it out with some shell magic.

The first step was to get the job ID, the date, and the time from the user’s log files:

grep -B 1 Abnormal ~user/condor/t?/log100 | grep "Job terminated" | awk '{print $2 "," $3 "," $4 }' | sed "s/[\(|\)]//g" | sort -n > failedjobs.csv

What this does is to search the multiple log files for the word “Abnormal”, with one line printed before each match because that’s where the information we want is.  To pull that line out, we search for “Job terminated” and then pull out the second, third, and fourth fields, stripping the parentheses off of the job ID, sorting, and then writing to the file failedjobs.csv.

The next step is to get the last execute node of the failed jobs from the system logs:

for x in `cat failedjobs.csv | awk -F, '{print $1}'`; do
host=`grep "$x.*Job executing" /var/condor/log/EventLog* | tail -n 1 | sed -r "s/.*<(.*):.*/\1/g"`
echo "`host $host | awk '{print $5}'`" >> failedjobs-2.csv;
done

Wow.  This loop pulls the first field out of the CSV we made in the first step.  The IP address for each failed job is pulled from the Event Logs by searching for the “Job executing” string.  Since a job may execute on several different hosts in its lifetime, we want to only look at the last one (hence the tail command), and we pull out the contents of the angle brackets left of the colon.  This is the IP address of the execute host.

With that information, we use the host command to look up the hostname that corresponds to that IP address and write it to a file.  Now all that remains is to combine the two files and try to find something useful in the data.  And maybe to write a script to do this, so that it will be a little easier the next time around.