– Blog FiascoBlog Fiasco

Where I work, we have about thirty networked printers that we manage with CUPS. When CUPS encounters something it doesn’t like, it has a tendency to shut down the affected queue. Sometimes this happens when the server can no longer communicate with the target printer. Sometimes, it happens when a job is submitted with a bad print driver. There have been a few occasions when I’m fairly certain that gremlins were to blame. Whatever the reason, when the queue gets disabled, people can still print to it, but the jobs just pile up.

You might expect that people would complain quickly when their print jobs come out. That’s not always the case. Earlier this week, one of the most heavily-used printers was disabled for about a day before anyone brought it to my attention. By that time, tens of jobs were waiting, and it took over an hour (and a few paper refills) for them all to come out. A quick “cupsenable” is all it takes to fix the problem, so there’s no reason to wait for it to get worse.

Today, I wrote a script (below) to monitor the queues and send a notification to our ticket system if one is down. I considered having it run cupsenable first, but I decided it would be better to not mask the problem. My wife wisely pointed out that if one particular queue is failing regularly, it might be worth knowing about.

#!/bin/bash # check_cups # # Ben Cotton - 2009 # # Checks the status of the CUPS queues and notify if one is down # # The queues we want to check queues=`lpstat -a | awk '{print $1}'` # The e-mail address to notify when a queue goes byebye toEmail='help@example.com' # The e-mail address to use as the 'from' fromEmail='cups@example.com' # The directory to keep the lock files lockDir='/var/run' # The host to say we're complaining about hostName=`hostname | awk -F. '{print $1}'` # Where is mail? mail='/bin/mail -s' # Check the queues for queue in $queues; do lpq -P $queue | /usr/xpg4/bin/grep -q " not " if [ $? -eq 0 ]; then # Have we already complained about this queue? if [ -a $lockDir/check_cups.$queue ]; then true else touch $lockDir/check_cups.$queue $mail $toEmail <<EOF To: $toEmail From: $fromEmail Subject: Print queue $queue on $hostName down The print queue $queue on host $hostName appears to be down. EOF fi else # The queue is up, check to see if it wasn't previously if [ -a $lockDir/check_cups.$queue ]; then rm $lockDir/check_cups.$queue fi fi done

3 thoughts on “”

Ben,

Sounds like you need a notification system. I’m in the middle of my 3rd Nagios install/config, and I’d really recommend it. There’s also Zenoss, but from my experience, it’s much more complicated to configure, though it does do graphing inherently.

It would be really easy to convert the script you wrote into a Nagios plugin, btw. 🙂 I’d be happy to lend a hand if you wanted.

–Matt

Matt,

Actually, we’ve got a Nagios setup that we’re starting to use. I went the bash script route for this because I wanted it to behave exactly how I asked it to without much effort. It’s probably possible to get the same behavior out of the Nagios plugin, but right now all of our Nagios checks behave differently than this particular check does. So this is admittedly a quick fix until I’ve had time to get the base of our Nagios setup configured.

I’ll probably hit you up for some Nagios advice in the future. If you have any checks that monitor the status of NFS, CUPS, and Samba services, I’d certainly be interested in ~~stealing~~seeing them.

Writing nagios plugins are all about translating your real world, manual checks into script-testable conditions.

At this point, 80% of my newest checks are done using SNMP on the linux hosts. With SNMP, I can get a list of all processes running on the remote machines (with PID), free disk space, and things of that nature. The other 20% are shell scripts in which the nagios host ssh’s into the host that its checking, and runs a command, the output of which the script parses and determines whether the host is OK or not. The shell script then returns the proper output and return code to let nagios know what happened.

SNMP is about 10 times faster than SSH, which is why I’m using that now, but for some things, you’ve got to get in there and run commands. If you’ve got these printers shared through some mechanism on the network, it may be as simple to check them as setting them up on the nagios server, then querying their status with lpq, or something similar.

Anyway, keep updating us on how it goes 🙂 I’ll see what I can do about the various service checks, but mine are a combination of checking the open port and checking that the service is running. It would be better to actually attempt to use the service, but I was in a hurry when I was writing mine, as well.

Blog Fiasco

The world's only(?) FOSS/weather/sports/marketing/high-performance computing blog

3 thoughts on “”

Leave a Reply

Share this:

3 thoughts on “”

Leave a Reply