A 650-node 10Gb computer cluster: easy peasy

At Purdue, we have a long history of being a leader in the field of computing.  (After all, Ctrl+Alt+Del was invented by Purdue alum David Bradley.)  Since we’re a pretty geeky campus anyway, it is more than a matter of professional pride, there’s street cred on the line too.  After building a large compute cluster last year, the research computing group on campus decided it needed to be one-upped this year.

Once again, volunteers from around Purdue and a few other institutions gathered to set up the cluster in a single day.  Once again, we finished way ahead of schedule.  This year, approximately 650 nodes went from box to OS install in less than three hours.  Jobs were already running by lunch time.

The process wasn’t entirely smooth though.  For reasons not adequately explained to the volunteers, the 10 gigabit network cards (NICs) were not installed by the vendor.  That meant each machine that was installed had to first be opened and have a NIC installed.  That is what I did for two hours yesterday morning.

The NIC installation process wasn’t too difficult, there were only 4 screws to contend with.  The organizers had expected 15 NICs per person per shift would be installed.  I did 42 in my two hour shift, and several others installed 50 or more.  At several points, they couldn’t get the machines unboxed and on our tables fast enough.

Several hundred more nodes will be installed once the external funding is processed, and it is likely that Coates will end up reaching the maximum capacity of just over 1200 nodes.  This gives it over 10k cores, all joined by 10 gigabit Ethernet connections.  This allows an obscene amount of data to be processed and transferred, which is very helpful in big-data fields like the atmospheric sciences.

Expectations are high for Coates.  It is, like Steele was, the largest compute cluster in the Big Ten at build-time.  Coates is expected to rank in the top 50 internationally when the supercomputer rankings come out in November.  Coates is also expected to be the first academic cluster connected solely with 10Gb that is big enough to achieve international ranking.  Perhaps most importantly, Coates is expected by Purdue researchers to facilitate some serious science.

Even though my contribution didn’t require much technical skill, I take pride in the fact that a whole rack of nodes can transfer data on the fast because of the network cards that I installed.  This cluster is a big deal to those who care about clusters, and it is really nice to be a part of something so geekily awesome.  If you’re one of those people who care about clusters, the technical details are at http://www.rcac.purdue.edu/userinfo/resources/coates/