Why HTCondor is a pretty awesome scheduler

In early March, The Next Platform published an article I wrote about cHPC, a container project aimed at HPC applications. But as I wrote it, I thought about how HTCondor has been addressing a lot of the concerns for a long time. Since I’m in Madison for HTCondor Week right now, I thought this was a good time to explain some of the ways this project is awesome.

No fixed walltime. This is a benefit or a detriment, depending on the circumstances, but most schedulers require the user to define a requested walltime at submission. If the job isn’t done at the end of that time, the scheduler kills it. Sorry about your results, get back in line and ask for more walltime. HTCondor’s flexible configuration allows administrators to enable such a feature if desired. By default users are not forced to make a guess that they’re probably going to get wrong.

Flexible requirements and resource monitoring. HTCondor supports user-requestable CPU, memory, and GPU natively. With partitionable slots, resources can be carved up on the fly. And HTCondor has “concurrency limits”, which allow for customizable resource constraints (e.g. software licenses, database connections, etc).

So many platforms. Despite the snobbery of HPC sysadmins, people do real work on Windows. HTCondor has almost-full feature parity on Windows. It also has “universes” for Docker and virtual machines.

Federation. Want to overflow to your friend’s resource? You can do that! You can even submit jobs from HTCondor to other schedulers.

Support for disappearing resources. In the cloud, this is the best feature. HTCondor was designed for resource scavenging on desktops, and it still supports that as a first-class use case. That means machines can come and go without much hassle. Contrast this to other schedulers where some explicit external action has to happen in order to add or remove a node.

Free as in freedom and free as in beer. Free beer is also the best way to get something from the HTCondor team. But HTCondor is licensed under the Apache 2.0 license, so anyone can use it for any purpose.

HTCondor isn’t perfect, and there are some use cases where it doesn’t make sense (e.g. low-latency), but it’s a pretty awesome project. And it’s been around for over three decades.

HTCondor Week 2015

There are many reasons I enjoy the annual gathering of HTCondor users, administrators, and developers. Some of those reasons involve food and alcohol, but mostly it’s about the networking and the knowledge sharing.

Unlike many other conferences, HTCondor Week is nearly devoid of vendors. I gave a presentation on behalf of my company, and AWS was present this year, but it wasn’t a sales pitch in either case. The focus is on how HTCondor enabled research. I credit the project’s academic roots.

Every year, themes seem to develop. This year, the themes were cloud and caching. Cloud offerings seem to really be ready to take off in this community, even though Miron would say that the cloud is just a different form of grid computing that’s been done for decades. The ability to scale well beyond internal resources quickly and cheaply has obvious appeal. The limiting factor currently seems to be that university funding rules make it slightly more difficult for academic researchers than just pulling out a credit card.

In the course of one session,  three different caching mechanisms were discussed. This was interesting because it is not something that’s been discussed much in the past. It makes sense, though, that caching files common across multiple jobs on a node would be a big improvement in performance. I’m most partial to Zach Miller’s fledgling HTCache work, though the squid cache and CacheD presentations had their own appeal.

Todd Tannenbaum’s “Talk of Lies” spent a lot of time talking about performance improvements that have been made in the past year, but they really need to congratulate themselves more. I’ve seen big improvements from 8.0 to 8.2, and it looks like even more will land in 8.4. There’s some excellent work planned for the coming releases, and I hope it pans out.

After days of presentations and conversations, my brain is full of ideas for improving my company’s products. I’m really motivated to make contributions to HTCondor, too. I’m even considering carving out some time to work on that book I’ve been wanting to write for a few years. Now that would truly be a miracle.