Blog Fiasco

July 1, 2014

Samba configuration: the ultimate cargo cult

Filed under: Linux — Tags: — bcotton @ 4:45 pm

Samba is a magical tool that allows *nix and Windows machines to coexist in some forms of peace. It’s particularly helpful when you want to share files across platforms. I’ve maintained Samba servers at work and at home for nearly a decade now and I don’t pretend to understand it.

Over the years, I’ve come to view Samba as the poster child for cargo cult system administration. I suspect most people Google for their problem and apply whatever magic totem fixes it, without really understanding what’s actually going on. They share this knowledge and perpetuate the magical configuration. Allow me to do the same.

For one of the applications we support at my current job, our normal cluster configuration is a Linux file server with Windows execute nodes. The server provides anonymous read/write access to the execute nodes and forces the user server-side. (It’s a closed environment, so this is just a lot simpler.) During a recent project, we were doing a customer’s first foray into the cloud. We started from a configuration that we used for another customer running the same application. Oh, but this customer uses RHEL 6 servers, so we switched the setup from the RHEL 5 images we had been using.

Crap. That broke it. For some reason, the clients couldn’t write to the file server. After a late night of frantic effort (this was a project with a short timeline), we found we needed to add the following lines:

guest account = rap
map to guest
valid users = rap, @rap
force group = rap
guest ok = yes

That seemed to solve the problem. Apparently there were some changes between the versions of Samba in RHEL 5 and 6. But then we discovered that hosts would start to write and then become unable to access the share. So we added the following:

writeable = yes
guest only = yes
acl check permissions = False

Oh, but then it turns out that sharing a directory over both Samba and NFS can cause weird timestamp issues. After some experimentation, we found it was necessary to stop using oplocks:

kernel oplocks = no
oplocks = no
level2 oplocks = no

So here’s our final, working config. Cargo cult away!

[global]
workgroup = WORKGROUP
netbios name = Samba
encrypt passwords = yes
security = share
log level = 2
socket options = TCP_NODELAY IPTOS_LOWDELAY SO_KEEPALIVE SO_RCVBUF=8192 SO_SNDBUF=8192
kernel oplocks = no
oplocks = no
level2 oplocks = no
max xmit = 65535
dead time = 15
getwd cache = yes
printcap name = /etc/printcap
use sendfile = yes
guest account = rap
map to guest = Bad User

[rap]
comment = File Share
path=/vol/smb/rap
force user = rap
valid users = rap, @rap
force group = rap
read only = no
writeable = yes
browseable = yes
public = yes
guest ok = yes
guest only = yes
acl check permissions = False

April 19, 2014

The right way to do release notes

Filed under: Linux,Project Management,The Internet — Tags: — bcotton @ 8:52 pm

Forever ago (in Internet time), the developer(s?) of Pocket Casts released an update with some really humorous release notes:

Release notes for Pocket Casts 3.6.

As I do, I got thinking about how I felt about it. While my initial reaction was to be amused, I quickly turned to finding it unhelpful. In fact, most apps have awful release notes. My least favorite phrase, which seems to appear in the release notes of every updated app on my phone, is “and bug fixes.”

Despite the title of this post, there’s no one right way to write release notes. The “right” way depends on what you’re releasing, for one. In a Linux distribution like Fedora, release notes could be composed of the release notes for every component package. However, that would be monumentally unwieldy. Even the Fedora Technical Notes — which report only the changed packages, not the notes for those packages — is not likely to be ready by too many people. The Release Notes are a condensed view, which highlight prominent features. The Release Announcement is even further condensed, and is useful for media and public announcements. This hierarchy is a good example of the importance of the audience.

I’ve seen arguments that release notes are unnecessary if the source code repository is accessible. Who needs release notes when you can just look at the commit log? This is a pretty lousy argument. A single change may be composed of many commits and a single commit may represent multiple changes (though it shouldn’t). Not to mention that commit messages are often poorly written. I’ve made far too many of those myself. Even if the commit log is a beautiful representation of what happened, it’s a lot to ask a consumer of your software to scour every commit since the last release.

My preference for release notes includes, in no particular order, a list of new features, bugs fixed, and known issues. The HTCondor team does a particularly good job in that regard. One thing I’d add to their release notes is an explicit listing of the subsystem(s) affected for each point. The exact format doesn’t particularly matter. All I’m looking for is an explanation as to why I should or should not care about a particular release. And “fixed some bugs” doesn’t tell me that.

January 9, 2014

Online learning: Codecademy

Filed under: Linux,mac,The Internet — Tags: , , , , , , — bcotton @ 9:05 pm

Last week, faced with a bit of a lull at work and a coming need to do some Python development, I decided to work through the Python lessons on Codecademy. Codecademy is a website that provides free instruction on a variety of programming languages by means of small interactive example exercises.

I had been intending to learn Python for several years. In the past few weeks, I’ve picked up bits and pieces by reading and bugfixing a project at work, but it was hardly enough to claim knowledge of the language.

Much like the “… for Dummies” books, the lessons were humorously written, simple, and practical. Unlike a book, the interactive nature provides immediate feedback and a platform for experimentation. The built-in Q&A forum allows learners to help each other. This was particularly helpful on a few of the exercises where the system itself was buggy.

The content suffered from the issue that plagues any introductory instruction: finding the right balance between too easy and too hard. Many of the exercises were obvious from previous experience. By and large, the content was well-paced and at a reasonable level. The big disappointment for me was the absence of explanation and best practices. I often found myself wondering if the way I solved the problem was the right way.

Still, I was able to apply my newly acquired knowledge right away. I now know enough to be able to understand discussion of best practices and I’ll be able to hone my skills through practices. That makes it worth the time I invested in it. Later on, I’ll work my way through the Ruby (to better work with our Chef cookbooks) and PHP (to do more with dynamic content on this site) modules.

August 7, 2013

When your HP PSC 1200 all-in-one won’t print

Filed under: Linux — Tags: , — bcotton @ 10:50 am

I don’t think I’ve made it any secret that I hate printing. It’s still an inescapable part of my life, though. Last week, I was printing some forms for an event my wife was running the following day. We had just purchased new ink, so of course that’s the idea time for the paper to completely stop feeding. Wheels sounded like they were turning, but the printer would not pull any paper in. If you find yourself in a similar situation, fear not! I can tell you how to fix it. The first step is to go visit HP’s video on how to clean the rollers and whatnot:

http://www8.hp.com/h20621/video-gallery/us/en/customer-care/1245172367001/hp-psc-1200-not-pick-or-feed-paper/video/

Still here? That must mean you followed the steps in the video to no avail. It’s time to take the printer apart. If your printer is still under warranty or you’re skittish about doing this, then stop right here. Before you do any steps in the video above or my description below, make sure the printer is unplugged.

The first step is to remove the four screws at the top of the printer (one in each corner). You’ll need either a #10 Torx screwdriver or an appropriately-sized Allen wrench (I think 1/16″). Once those screws are loosened, remove the upper body of the printer as shown below. Lift the majority of the body, not just the very top part, or else you’ll just remove the scanner plate. Don’t be too alarmed if the ink access door comes off.

Separating the printer body for removal.

Separating the body for removal.

As you lift the body, carefully remove the two ribbons (shown below) by pulling them directly toward you.

The two ribbons to remove.

The two ribbons to remove.

Give the white wheel on the left side a good shove inward. You may not feel it move, but this is the magic voodoo.

White wheel on the left of the paper roller.

Push really hard on this wheel.

Replace the ribbons by pushing them firmly back into their slots. Put the ink access door back in place and set the printer body atop the printer. Tighten the screws. Plug the printer in, turn it on, and “enjoy” printing once again.

April 23, 2013

Monitoring sucks, don’t make it worse

Filed under: HPC/HTC,Linux — Tags: , , — bcotton @ 10:10 pm

You don’t have to go too far to find someone who thinks monitoring sucks. It’s definitely true that monitoring can be big, ugly, and complicated. I’m convinced that many of the problems in monitoring are not technical, but policy issues. For the sake of clarity (and because I’m like that), let’s start with some definitions. These definitions may or may not have validity outside the scope of this post, but at least they will serve to clarify what I mean when I say things.

  • Monitoring – an automatic process to collect metrics on a system or service
  • Alerting – notification when a critical threshold has been reached

In the rest of this post, I will be throwing some former colleagues under the bus. It’s not personal, and I’m responsible for some of the problem as well. The group in question has a monitoring setup that is dysfunctional to the point of being worthless. Not all of the problems are policy-related, but enough are to prompt this post. It should be noted that I’m not an expert on this subject, just a guy with opinions and a blog.

Perhaps the most important thing that can be done when setting up a monitoring system is coming up with a plan. It sounds obvious, but if you don’t know what you’re monitoring, why you’re monitoring it, and how you’re monitoring it, you’re bound to get it wrong. This is my first rule: in monitoring, failing to plan is planning to not notice failure.

It’s important to distinguish between monitoring and alerting. You can’t alert on what you don’t monitor, but you don’t need to alert on everything you monitor. This is one area where it’s easy to shoot yourself in the foot, especially at a large scale. Many of the monitoring checks were in reaction to something going wrong. As a result, Nagios ended up alerting for things like “a compute node has 95% memory utilization.” For servers, that’s important. For nodes, who cares? The point of the machines is to do computation. Sometimes that means chewing up memory.

Which brings me to rule number two: every alert should have a reaction. If you’re not going to do something about an alert, why have it in the first place? It’s okay to monitor without alerting — the information can be important in diagnosing problems or analyzing usage — but if an alert doesn’t result in a human or automated reaction, shut it off.

Along that same line, alerts should be a little bit painful. Don’t punish yourself for something failing, but don’t make alerts painless either. Perhaps the biggest problem in the aforementioned group is that most of the admins filtered Nagios messages away. That immediately killed any incentive to improve the setup.

I took the alternate approach and weakly lobbied for all alerts to hit the pager. This probably falls into the “too painful” category. You should use multiple levels of alerts. An email or ticket is fine for something that needs to be acted on but can wait until business hours. A more obnoxious form of alert should be used for the Really Important Things[tm].

The great thing about having a little bit of pain associated with alerts is that it also acts as incentive to fix false alarms. At one point, I wrote Nagios checks to monitor HTCondor daemons. Unfortunately, due to the load on the Nagios server, the checks would timeout and produce alerts. The daemons were fine and the cond0r_master process generally does a good job of keeping things under control. So I removed the checks.

The opposite problem is running checks outside the monitoring system. One colleague had a series of cron jobs that checked the batch scheduler. If the checks failed, he would email the group. Don’t work outside the system.

Finally, be sure to consider planned outages. If you can’t suppress alerts when things are broken intentionally, you’re going to have a bad time. As my friend tweeted: “Rough estimates indicate we sent something like 180,000 emails when our clusters went down for maintenance.”

March 14, 2013

So long, Google Reader

Filed under: Linux,The Internet — Tags: , , , , — bcotton @ 3:25 pm

In case you haven’t been paying attention in the past 24 hours, the Pope has killed Google Reader.

What? Oh! Okay, Google is killing Google Reader. On July 1, the best RSS client I’ve ever used will be no more. One of the more interesting aspects of the reaction is seeing how people have used it. I never really got into the sharing feature of Reader, so it didn’t bother me when it was discontinued in favor of Google Plus. For some people, that was apparently the main selling point.

My own use was generally selfish. I just wanted to know when something new was posted to a site. This is especially important for sites that don’t update regularly, as I’m not likely to keep checking a site every day on the off chance it’s been updated. I also don’t want to rely on social media to get updates. If I’ve been offline for a few days, I’m not going to catch up on all of the Twitter, Facebook, and Google+ posts I’ve missed. I will scroll through the entire collection of articles in Google Reader, reading those that seem interesting.

I can buy that RSS has seen a decline in usage (not in utility, but that’s a separate matter). I can understand that Google doesn’t find it worthwhile to keep Reader going. Like Casey Johnston, I suspect that it won’t go away entirely (as you may recall, the real-time editing technology in Google Wave made an excellent addition to Google Docs). But here’s the thing: I don’t really care.

Yes, I use Google Reader on a daily basis. I’m not tied to it, though. Reader doesn’t integrate with any other Google products in a way that’s meaningful for me. So while I have probably spent more time watching this woman’s face than my wife is comfortable with, I’ll make do without Google Reader. I don’t know what I’ll migrate to yet. NewsBlur has been brought up several times, although they currently aren’t allowing new free accounts (presumably due to being crushed by new users in the wake of yesterday’s announcement). I may also go the self-hosting route and set up tt-rss (which may also present an opportunity to run it as a paid service for those who can’t/won’t run it themselves). I still have a few months to figure it out.

February 13, 2013

How do you measure software quality?

Filed under: Linux — Tags: , , , , — bcotton @ 10:30 am

There are two major license types in the free/open source software world: copyleft (e.g. GPL) and permissive (e.g. BSD). Because of the different legal ramifications of the licenses, it’s possible to make theoretical arguments that either license would tend to produce higher quality software. For my master’s thesis, I would like to investigate the quality of projects licensed under these paradigms, and whether there’s a significant difference. In order to do this, I’ll need some objective mechanism for measuring some aspect(s) of software quality. This is where you come in: if you have any suggestions for measures to use, or tools to get these measures, please let me know. It will have to be language-independent and preferably not rely on bug reports or other similar data. Operating on source would be preferable, but I have no objections to building binaries if I have to.

The end goal (apart from graduating) is to provide guidance for license selection in open source projects when philosophical considerations are not a concern. I have no intention or desire to turn this into a philosophical debate on the merits of different license types.

January 15, 2013

Deploying Fedora 18 documentation: learning git the hard way

Filed under: Linux,The Internet — Tags: , , , , — bcotton @ 5:49 pm

If you haven’t heard, the Fedora team released Fedora 18 today. It’s the culmination of many months of effort, and some very frustrating schedule delays. I’m sure everyone was relieve to push it out the door, even as some contributors worked to make sure the mirrors were stable and update translations. I remembered that I had forgotten to push the Fedora 18 versions of the Live Images Guide and the Burning ISOs Guide, so I quickly did that. Then I noticed that several of the documents that were on the site earlier weren’t anymore. Crap.

Here’s how the Fedora Documentation site works: contributors write guides in DocBook XML, build them with a tool called publican, and then check the built documents into a git repository. Once an hour, the web server clones the git repo to update the content on the site. Looking through the commits, it seemed like a few hours prior, someone had published a document without updating their local copy of the web repo first, which blew away previously-published Fedora 18 docs.

The fix seemed simple enough: I’d just revert to a few commits prior and then we could re-publish the most recent updates. So I git a `git reset –hard` and then tried to push. It was suggested that a –force might help, so I did. That’s when I learned that this basically sends the local git repo to the remote as if the remote were empty (someone who understands git better would undoubtedly correct this explanation), which makes sense. For many repos, this probably isn’t too big a deal. For the Docs web repo, which contains many images, PDFs, epubs, etc. and is roughly 8 GB on disk, this can be a slow process. On a residential cable internet connection which throttles uploads to about 250 KiB/s after the first minute, it’s a very slow process.

I sent a note to the docs mailing list letting people know I was cleaning up the repo and that they shouldn’t push any docs to the web. After an hour or so, the push finally finished. It was…a failure? Someone hadn’t seen my email and pushed a new guide shortly after I had started the push-of-doom. Fortunately I discovered the git revert command in the meantime. revert, instead of pretending like the past never happened, makes diffs to back out the commit(s). After reverting four commits and pushing, we were back to where we were when life was happy. It was simple to re-publish the docs after that, and a reminder was sent to the group to ensure the repo is up-to-date before pushing.

The final result is that some documents were unavailable for a few hours. The good news is that I learned a little bit more about git today. The better news is that this should serve as additional motivation to move to Publican 3, which will allow us to publish guides via RPMs instead of an unwieldy git repo.

January 11, 2013

A wrinkle with writing your resume in Markdown

Filed under: Linux — Tags: , , , , — bcotton @ 11:59 pm

For Sysadvent 2011, Phil Hollenback wrote an excellent post called “Write Your Resume in Markdown Already!” Ever since I read it, I’ve been using a Markdown file as the source for my resume, which gets rendered to HTML and PDF as necessary. Recently, someone using Windows tried to open the PDF of my resume in Acrobat Reader. When she did, she got errors about missing fonts. “How odd,” I thought. In the course of several job applications over the past year-plus, I hadn’t heard of any problems. (It’s possible that nobody reviewing my resume used a Windows machine to do it.)

I fired up my Windows XP virtual machine that I keep around for playing Sim City 2000 and installed Adobe Reader. I was able to reproduce the problem, which is always comforting. It didn’t shed any light on the matter, though. I actively avoid doing anything “cute” so that my resume (and other documents) can easily be read by anyone on any platform. After examining the workflow, I figured that the problem had to be in the LaTeX template used to generate the PDF.

One of the features of the template is the conditional use of packages and font settings. Since the pandoc package in Fedora 17 (version 1.9.4.2) no longer includes the markdown2pdf command in previous versions, the –xetex argument that Phil passed isn’t necessary. Removing the control structure resulted in a PDF that looked qualitatively the same, but would open on Windows. Phil’s instructions are still good overall, but they need some tweaks for newer versions of Pandoc.

Here’s a diff, for those of you who are interested:
7,9d6
< $if(xetex)$
< \usepackage{ifxetex}
< \ifxetex
15d11
< \else
18,22d13
< \fi
< $else$
< \usepackage[mathletters]{ucs}
< \usepackage[utf8x]{inputenc}
< $endif$

 

December 7, 2012

Coming up: LISA ’12

Filed under: Funnel Fiasco,HPC/HTC,Linux — Tags: , , , — bcotton @ 2:40 pm

It may seem like I’ve not been writing much lately, but nothing can be further from the truth. It’s just that my writing has been for grad school instead of Blog Fiasco. But don’t worry, soon I’ll be blogging like a madman. That’s right: it’s time for LISA ’12. Once again, I have the privilege of being on the conference blog team and learning from some of the TopPeople[tm] in the field. Here’s a quick look at my schedule (subject to change based on level of alertness, addition of BoFs, etc):

Sunday

Monday

Tuesday

Wednesday

Thursday

Friday

Now I just need to pack my bags and get started on the take-home final that’s due mid-week. Look for posts from me and my team members Matt Simmons and Greg Riedesel on the USENIX Blog.

Older Posts »

Powered by WordPress