Ended, the clone wars have?

I have done my damnedest to avoid posting publicly about Red Hat’s decision to stop publishing RHEL srpms. For one, the Discourse around it has been largely stupid. I didn’t want any part of the mess. For another, I didn’t have anything particularly novel to add. I’m breaking my silence now because the dust seems to have settled in a very beneficial way that I haven’t seen widely discussed. (To be fair, since I’ve been trying to avoid the discussion, I probably just missed it.)

Full disclosure: as you may know, my role at Red Hat was eliminated earlier this year. This does not make me particularly inclined to give Red Hat as a company the benefit of the doubt, but I try to be fair. Also: during my time at Red Hat, I was the program manager for the creation of CentOS Stream. However, I did not make business decisions about it, nor did I have any say on the termination of CentOS Linux or the recent sprm change.

My take on the situation

I won’t get into the entire history or Red Hat Enterprise Linux, clones, or competitors here. Joe Brockmeier’s ongoing “Clone Wars” series covers the long-term history in detail. I do think it’s worth providing my take on the last few years, though, so you understand my take on the future.

First of all, I don’t think Red Hat (or IBM, if you’d rather) acted with evil intent. That doesn’t mean I think the decision was correct, but I do think it was a legitimate business choice. I disagree with the decision, but as much as they didn’t ask me before, they sure as hell don’t ask me now.

If RHEL development started out with a CentOS Stream model, I’m not sure CentOS Linux (and the other RHEL clones) would have existed in the first place. But we don’t live in that timeline, so RHEL clones exist.

There are plenty of valid reasons for wanting RHEL but not wanting to pay for the subscription. It’s not just that people are being cheap. Until 2018, users of Spot instances on Amazon Web Services couldn’t use RHEL. In a former role, we had RHEL customers who used CentOS Linux in AWS precisely because they wanted to use Spot instances. Others used CentOS Linux in AWS because they didn’t want to deal with subscription management for environments that might come and go. (I understand that subscription-manager is much easier to work with now.)

So while Red Hat may be right to say that RHEL clones don’t add value to Red Hat (and I disagree there, too), RHEL clones clearly add value for their users, which include Red Hat customers. It’s fair to say that, for some people, the perceived value of a RHEL subscription does not match what Red Hat charges for it. How to solve that mismatch is not a problem i’m concerned with.

So what now?

Two community-driven clones popped up in the immediate aftermath of the death of CentOS Linux: Rocky Linux and AlmaLinux. Both of these aimed to fill the role formerly held by CentOS Linux: a bug-for-bug clone of Red Hat Enterprise Linux. I never quite understood what differentiated them in practice.

But now duplicated effort becomes differentiated effort. Rocky Linux will continue to provide a bug-for-bug clone. AlmaLinux, meanwhile, will shift to making an ABI-compatible distribution — one where “software that runs on RHEL will run the same on AlmaLinux.” This differentiated effort allows those communities to serve different use cases. They now have their own niche to succeed or fail in.

Time will tell, but I think Alma’s approach is a better fit for most clone users. I suspect that most people don’t need bug-for-bug compatibility (except in the XKCD #1172 scenario). For many use cases, CentOS Stream is suitable. Of course, people make decisions based on what they think they need, not what they actually need. Third-party software vendors may end up being the deciding factor.

Given the different approaches Rocky and Alma are taking, I think Red Hat’s decision ended up being beneficial to the broader ecosystem. I don’t think it was done with that intent, and I am not arguing that the ends justify the means, but the practical result seems positive on the whole.

The tricky problem dilemma

A good sysadmin believes in treating the cause, not the symptom. Unfortunately, pragmatism sometimes gets in the way of that. A recent example: we just rolled out a kernel update to a few of our compute clusters. About 3% of the machines ended up in a troubled state. By troubled, I mean that the permissions on a few directories (/bin, /lib, /dev, /etc, /proc, and /sys) were set to 700, making the machine effectively unusable. For the most part, we didn’t notice this on the affected machines until after they did their post-upgrade reboot, but fortunately we were able to catch a few that hadn’t yet rebooted.

What we found was that / had a sysroot directory and an init file. These are created by the mkinitrd script, which is called by the new-kernel-pkg script, which is in turn called in the postinstall script of the kernel RPM. The relevant part of the mkinitrd script seems to be

TMPDIR=""
    for t in /tmp /var/tmp /root ${PWD}; do
        if [ ! -d $t ]; then continue; fi
        if ! access -w $t ; then continue; fi

        fs=$(df -T $t 2>/dev/null | awk '{line=$1;} END {printf $2;}')
        if [ "$fs" != "tmpfs" ]; then
            TMPDIR=$t
            break
        fi
    done

which creates a working directory in /tmp under normal conditions. However, there seemed to be something that caused / to be used instead of /tmp. Later in the script, several directories are created in $TMPDIR, which correspond to the wrongly-permissioned directories. There’s not a clear indication of why this happens, but if we clean up and reinstall the updated kernel package it doesn’t necessarily repeat itself. After some soul-searching, we decided that it was more important to return the nodes to service than to try to track down an easily-correctable-but-difficult-to-solve problem. We’ll see if it happens again with the next kernel upgrade.

Sometimes, Windows wins

It should be clear by now that I am an advocate of free software.  I’m not reflexively against closed software though, sometimes it’s the right tool for the job.  Use of Windows is not a reason for mockery.  In fact, I’ve found one situation where I like the way Windows works better.

As part of our efforts to use Condor for power saving, I thought it would be a great idea if we could calculate the power savings based on the actual power usage of the machines.  The plan was to have Cycle Server aggregate the time in hibernate state for each model and then multiply that by the power draw for the model.  Since Condor doesn’t note the hardware model, I needed to write a STARTD_CRON module to determine this.  The only limitations I had were that I couldn’t depend on root/administrator privileges or on particular software packages being installed. (The execute nodes are in departments across campus and mostly not under my control.)

Despite the lack of useful tools like grep, sed, and awk (there are equivalents for some of the taken-for-granted GNU tools, but they frankly aren’t very good), the plugin for Windows was very easy.  The systeminfo command gives all kinds of useful, parseable information about the system’s hardware and OS.  The only difficult part was chopping the blank spaces off the end of the output. I wanted to do this in Perl, but that’s not guaranteed to be installed on Windows machines, and I had some difficulty getting a standalone-compiled version working consistently.

On Linux, parsing the output is easy.  The hard part was getting the information at all.  dmidecode seems to be ubiquitous, but it requires root privileges to get any information.  I tried lshw, lshal, and the entire /proc tree.  /proc didn’t have the information I need, and the two commands were not necessarily a part of the “base” install.  The solution seemed to be to require the addition of a package (or bundling a binary for lshw in our Condor distribution).

Eventually, we decided that it was more effort than it was worth to come up with a reliable module.  While both platforms had problems, Linux was definitely the more difficult.  It’s a somewhat rare condition, but there are times when Windows wins.

Flavor of Love

One of the nice things about Linux is that there are so many different flavors to chose from.  Although you can customize it to meet your exact needs, there a good chance that someone has already made a flavor to suit your tastes.  Which flavor you choose is largely a matter of what you’re trying to do, and your favorite way to do it.  At my workplace, we’re a Red Hat shop.  I happen to be fond of the Red Hat products so that works well for me.  However, I find myself facing a bit of a decision.

In 2003 or 2004, whenever my predecessor set up our Linux environment, he put Fedora Core 1 on the workstations and Red Hat Enterprise Linux 3 and 4 on the servers and the larger desktops (the Dell Precision line can be rather finnicky).  I took my job in September 2006, with things largely unchanged.  Since I work at a University, making major changes during the school year is considered bad form, so I had to wait until summer 2007 to begin doing upgrades.  The downside is that FC1 went out of support in the late winter of 2007, but the good news is that I got nearly a full year to re-build software packages and test configurations.  My fellow sysadmin and I, at the encouragement of my boss, decided to put RHEL4 on all of the machines to simplify support.

In the past year, RHEL has proven itself to be a very stable OS, and Red Hat has been quick to release security fixes.  However, there have been several occasions where an updated application has been needed, but it had dependencies that could not be met via up2date.  For example, the Java web plugin for the x64 architecture only works on Firefox 2+.  As of this writing, RHEL4 still uses Firefox 1.5.0.12 (with security patches worked in by Red Hat).  That, at least, was a simple matter of grabbing the RPM.  Of course, now we’re responsible for making sure the subsequent updates get installed by hand.  Even worse is when a package needs a newer glibc than what is provided.  Here’s a hint friends:  if it requires a newer glibc than your distribution provides, don’t bother!

Next summer, I plan to upgrade again.  But what do I put on the workstations?  RHEL is a solid platform, and works exceptionally well in a server environment.  If all you want to do at your desk is check e-mail, surf the web, and type up TPS reports, RHEL provides a good experience to do that.  If you’re trying to run the latest version of your research applications, I’m not sold that it’s the best solution.  There are advantages and disadvantages to choosing RHEL vs Fedora for the desktop

I run Fedora on my desktop/server at home, and it performs like a champ.  It’s not that Fedora crashes with any regularity, but it isn’t necessarily designed for stability.  RHEL is pretty thoroughly tested, so you can pretty much be guaranteed that when a package gets upgraded, it won’t break things.  Fedora gets you newer packages much quicker, but there’s no promises that foo-3.7 won’t break bar-4.2  Fedora also has new releases more frequently than RHEL, and has a much shorter support life (roughly 13 months versus 5 years) – which forces you to update more often.  Of course, if your software’s dependencies necessitate an upgrade regularly, that’s a moot point.

There’s also the issue of package security.  With RHEL, you’re getting your packages from Red Hat’s servers.  With Fedora, you’re generally getting your packages from mirrors.  Generally, you can consider that to be safe.  However, a story featured on Slashdot today shows that it’s not a guarantee.  Is that a reason to forsake Fedora?  Unless your machines contain hyper-sensitive information, the answer is no.

Actually, the second sentence in the previous paragraph isn’t necessarily true (apart from the fact that you can set up your own proxy for the RHN servers).  Beginning in RHEL 5, the up2date package manager is gone, in favor of yum.  Personally, I think yum is better than up2date (although Debian’s apt may be the best), but that wasn’t the reason Red Hat made the switch.  What yum gives you, though, is the ability to add custom repositories.  Which means you can get outside packages easily, and keep them up to date without having to install the updates by hand every time.  It also means that you can set up your own repository for your local custom software.  You have no idea how excited I am about the idea of using rpms in a yum repository to install software on our machines instead of using rdist.

The differences in configuration between Fedora and RHEL are minor, but generally sufficient enough that you’ll need separate configuration trees.  Does adding another OS to your environment cause you to reach for your Rolaids, or can you comfortably absorb it?  For my own workplace, the latter is the case.  So what have I decided?  I have nine more months until next summer, so I’ll punt for now. 🙂