Book review: The Visible Ops Handbook

I first heard of The Visible Ops Handbook during Ben Rockwood’s LISA ’11 keynote. Since Ben seemed so excited about it, I added it to the list of books I should (but probably would never) read. Then Matt Simmons mentioned it in a brief blog post and I decided that if I was ever going to get around to reading it, I needed to stop putting it off. I bought it that afternoon, and a month later I’ve finally had a chance to read it and write a review. Given the short length and high quality of this book, it’s hard to justify such a delay.

Information Technology Infrastructure Library (ITIL) training has been a major push in my organization the past few years. ITIL is a formalized framework for IT service management, but seems to be unfavored in the sysadmin community. After sitting through the foundational training, my opinion was of the “it sounds good, but…” variety. The problem with ITIL training and the official documentation is that you’re told what to do without ever being told how to do it. Kevin Behr, Gene Kim, and George Spafford solve that problem in less than 100 pages.

Based on observations and research of high-performing IT teams, The Visible Ops Handbook assumes that no ITIL practices are being followed. Implementation of the ITIL basics is broken down into four phases. Each phase includes real-world accounts, the benefits, and likely resistance points. This arms the reader with the tools necessary to sell the idea to management and sysadmins alike.

The introduction addresses a very important truism: “Something must need improvement, otherwise why read this?” The authors present a general recap of their findings, including these compelling statistics: 80% of outages are self-inflicted and 80% of mean time to repair (MTTR) is often wasted on non-productive activities (e.g. trying to figure out what changed).

Phase 1 focuses on “stabilizing the patient.” The goal is to reduce unplanned work from 80% of outage time to 25% or less. To do this, triage the most critical systems that generate the most unplanned work. Control when and how changes are made and fence off the systems to prevent unauthorized changes. While exceptions might be tempting, they should be avoided. The authors state that “all high performing IT organizations have only one acceptable number of unauthorized changes: zero.”

After reading Phase 1, I already had an idea to suggest. My group handles change management fairly well, but we don’t track requests for change (RFCs) well. Realizing how important that is, I convinced our groups manager and our best developer that it was a key feature to add to our configuration management database (CMDB) system.

In Phase 2, the reader performs a catch & release program and find “fragile artifacts.” Fragile infrastructure are those systems or services with a low change success rate and high MTTR. After all systems have been “bagged and tagged”, it’s time to make a CMDB and a service catalog. This phase is the next place that my group needs to do work. We have a pretty nice CMDB that’s integrated with our monitoring systems and our job schedulers, but we lack a service catalog. Users can look at the website and see what we offer, but that’s only a subset of the services we run.

Phase 3 focuses on creating a repeatable build library. The best IT organizations make infrastructure easier to build than repair. A definitive software library, containing master images for all software necessary to rebuild systems, is critical. For larger groups, forming a separate release management team to engineer repeatable builds for the different services is helpful. The release management team should be separate from the operational group and consist of generally senior staff.

The final phase discusses continual improvement. If everyone stopped at “best practices”, no one would have a competitive advantage. Suggested metrics for each key process area are listed and explained. After all, you can’t manage what you can’t measure. Finding out what areas are the worst makes it easier to decide what to improve upon.

The last third of the book consists of appendices that serve as useful references for the four phases. One of the appendices includes a suggested table layout for a CMDB system. The whole book is focused on the practical nature of ITIL implementation and guiding organizational learning. At times, it assumes a large staff (especially when discussing separation of duties), so some of the ideas will have to be adapted to meet the needs of smaller groups. Nonetheless, this book is an invaluable resource to anyone involve in IT operations.

Privacy in the 21st century (or at least this week)

Digital privacy has been in the news this week. The first story involves a judge ordering a woman to decrypt her laptop. There has been a lot of uninformed commentary surrounding this story, and I thought I’d add my own to the pile. My initial reaction was that it was a pretty blatant violation of the Fifth Amendment, but after further reflection, I’m not so sure. I still struggle to find the right parallel to the physical world.

I don’t believe that decrypting the data is self-incrimination, in and of itself. A person can’t avoid a search warrant by simply locking the door. On the other hand, the police already have the data (in some form) in their possession. There’s no requirement that the data be in a form that the state finds convenient.

Overall, I’m not that concerned with this decision. A valid warrant should be sufficient to require a person to turn over documents in an unencrypted form. Failure to comply is rightly contempt of court. The only problem is when a person legitimately forgets the key, because it is nearly impossible to determine if they have legitimately forgotten. Still, I’m not at all convinced that this ruling is a death knell for the Fifth Amendment.

The other story in the news came from Google, who announced that they are changing their privacy policy for accounts (this does not include search, Wallet, and Chrome). This story has caused no end of hand-wringing, but it seems to me like a severe overreaction. From what I can tell, interactions with third party sites hasn’t changed. The changes mostly make it easier for Google services to share data internally.

To me, that’s part of the appeal of using the variety of services Google offers. What’s the point of a single account if the services aren’t tightly integrated? The lack of an opt-out isn’t a compelling argument to me. Anyone who doesn’t like the privacy policy doesn’t have to use the service (though I’ll admit that if you just bought an Android phone, the cost for leaving (assuming an early termination fee with the carrier) can be prohibitive). There’s an adage that states if you’re not paying, you’re the product. I’m fine with my data being more available across my Google services and hope the promised cool things come to pass. If it ever becomes unacceptable to use Google services, I’ll take my ball and go home.

Purdue’s trimester plan

The following is my opinion only. It does not represent the opinion of Purdue University, nor does it reflect any insider information (because I am the last to find out insider information).

Earlier today, Purdue University officially announced a plan to move to a trimester schedule. The summer session would be optional, but encouraged, with the intent of increasing enrollment from 6,000 to 20,000. Making this change, the administration argues, would save students money (because the summer session is cheaper) and allow them to graduate earlier. It would also benefit the University by allowing facilities to be more utilized.

In preparation for an upcoming column, Journal & Courier opinions editor Dave Bangert asked what the area might be like with so many extra students over the summer. Obviously, the addition of an additional 14,000 students would have an impact. My friend Dave at the Silver Dipper might be the most pleased, as he depends on summer sales to support his business and his family year-round. Other local businesses and outdoor events would probably see additional traffic.

It wouldn’t necessarily be great for everyone, though. I can foresee rental properties having some difficulty. Some student-focused apartments offer 9 month leases. During other three months, they do maintenance tasks that are difficult to do when the unit is occupied. Another group that would be negatively impacted is the IT staff in academic departments on campus. Having been in such a role, I know that summers are a critical time to work on large projects and upgrades that aren’t easy to get done. And families who like to spend time on campus might find a busier campus less inviting.

All of this assumes that the plan works and summer enrollment increases. This is by no means a given. Many obstacles will have to be overcome. According to Purdue’s Data Digest, the average salary for all faculty appointments is $93,200. Many faculty are on 10-month appointments, so asking them to teach summer classes would require a considerable increase in payroll. Some faculty may prefer to participate in summer field work instead of teaching classes, and it’s not clear what the plan is if the demand is higher than the available faculty.

The other financial concern is that students won’t be able to fund the summer session. Most financial aid awards are designed around a two-semester-with-summers-off schedule. Although Purdue has set aside several million dollars in financial aid, other funding sources will need to follow suit. Students who rely on summer jobs to save up money for the rest of the year will have to decide between skipping the summer term or taking on additional loan debt.

I’m not convinced that classes that upperclassmen and graduate students need will be any more available with a summer session. In the upper-division meteorology classes, we generally had about 12 students enrolled. This meant that each course was offered once per year. A summer session wouldn’t help with that. Graduate classes can be even more rare, sometimes offered only once every other year. Presumably, undergraduates can opt for summer sessions their first two years and return to a two-semester calendar when they get into more major-specific coursework.

Another issue left unaddressed, at least publicly, is the summer convention schedule. Purdue regularly hosts the state FFA convention, as well as other conferences and conventions. Hosting these events requires meeting space and space in residence halls. Will the campus still be able to support such events with extra students, and will event organizers continue to find Purdue an attractive option?

In the end, it doesn’t particularly matter what my cynical opinion is. Dr. Cordova has announced that the plan will begin this summer, with the intention of building to the 20,000 student goal over several years. I hope the plan works out for the benefit of the University’s students and budget, but I’m not yet convinced that it will.

CNET considered harmful

In my younger days, I made great use of CNET’s download.com website. It was an excellent tool for finding legal software. Apparently, it has also become an excellent tool for finding malware. An article posted to insecure.org describes how CNET has begun wrapping packages with an installer that bundles unwanted, potentially malicious software with the desired package.

This is terrible, and not just for the obvious reasons. It’s bad for the free software community because it makes us look untrustworthy. There’s a perception among some people (especially in the business world) that software can only be free if it’s no good. I suppose that’s one reason some in the community use “libre” to emphasize the free-as-in-freedom aspect. (Of course, not all free-as-in-beer software is free-as-in-freedom. That’s another reason the distinction can be important.)

When this conveniently-bundled malware causes problems for users, it’s not CNET who gets the blame. Users will unfairly blame the package developer, even though the developer had nothing to do with it. For well-established and well-respected packages like nmap, this reputation damage may not be that important. For a new project just getting started — or for the idea of free software in general — this can be devastating.