Blog Fiasco

April 15, 2014

Bug trackers and service desks

Filed under: Musings,Project Management — Tags: , — bcotton @ 5:58 pm

I have recently been evaluating options for our customer support work. For years, my company has used a bug tracker to handle both bugs and support requests. It has worked, mostly well enough, but there are some definite shortcomings. I can’t say that I’m an expert on all the offerings in the two spaces, but I’ve used quite a few over the years. My only conclusion is that there is no single product that does both well.

Much of the basic functionality is the same, but it’s the differenences that are key. A truly excellent service desk system is aware of customer SLAs. Support tickets shouldn’t languish untouched for months or even years. But it’s perfectly normal for minor bugs to live indefinitely, especially if the “bug” is actually a planned enhancement. Service desks should present customers with a self-service portal, if only to see the current status of their tickets. Unfortunately, most bug trackers present too much information for a non-technical user (can you imagine having your CEO using Bugzilla to manage tickets?). While this interface is great for managing bugs, it’s pretty lousy otherwise.

Of course, because they’re similar in many respects, the ideal solution has your service desk and your bug tracker interacting smoothly. Sometimes support requests are the result of a bug. Having a way to tie them together us very beneficial. How will your service desk agents know to follow up with the customer unless the bug tracker updates affected cases when a bug is resolved? How will your developers get the information they need if the service desk can’t update the bug tracker?

Many organizations, especially small businesses and non-profits, will probably use one or the other. Development-oriented organizations will lean toward bug trackers and others will favor service desk tools. In either case, they’ll make do with the limitations for the use they didn’t favor. Still, it behooves IT leadership to consider separate-but-interconnected solutions I’m order to achieve the maximum benefit.

April 6, 2014

The life of a project

Filed under: Project Management — Tags: , — bcotton @ 8:01 pm

A picture on Twitter caught my eye earlier tonight. “The life of a project” captures a person’s mental state through the life of the project. It was particularly meaningful to me because it basically describes what I’ve gone through this past week at work. I’ve been working on a very high stakes proof of concept for a customer with a lot to lose, and it’s been pretty taxing. You can ask my family; I haven’t been much fun to be around the past few days. Fortunately, we’re on the upward slope now: lessons have been learned for phase 2 and the prototype cluster is basically working at this point.

It occurs to me that many projects I’ve been involved with or aware of follow this general pattern. I stopped to think about why that is. For this project, the business requirements weren’t incomplete, although that is often an issue for projects. The technical requirements were missing, though. As it turns out, a few extra DLLs (yes, this is a Windows execution environment) were needed. The software used for this project is pretty awful, and so the error messages weren’t particularly helpful. It took the better part of a day just to leap that particular hurdle.

Another contributing factor was not knowing where we were starting from. (As the old saying goes, “I don’t know where we are, but we’re making good time!”) I thought we were basically cloning a similar project we had done for another customer, and started my work based on that assumption. As it turns out, the environment we were starting from was not nearly as robust and automated as I had been led to believe. The result was two days of effort getting to where I thought we would be after a couple of hours. This wasn’t entirely bad. It gave me an opportunity to work with parts of our product stack that I haven’t worked with much, and it pointed out a lot of areas where future such projects could be done much better. But it was also a great way to add to the already high levels of stress.

Even in a small company, nobody is familiar with everything. Time spent re-explaining what I was trying to do when I had someone new helping me didn’t move the project forward. Likewise, having to hop from person to person in order to get the right expertise was a barrier to progress. I was fortunate that all of my coworkers were extremely willing to help, and I certainly can’t blame them for not knowing everything. Nonetheless, it was frustrating at times.

Certainly the long downward slide that characterizes most of the life of a project can be mitigated. Starting with a more complete understanding of requirements, knowing what you’re starting from, and having the right expertise available can reduce the surprises, frustration, and sadness of a project. Still, I think it’s human nature to experience this to some degree. We always start out confident in our abilities and blissfully ignorant of the traps that are waiting for us.

April 2, 2014

Amazon VPC: A great gotcha

Filed under: HPC/HTC,The Internet — Tags: , , , , , — bcotton @ 9:01 pm

If you’re not familiar with the Amazon Web Services offerings, one feature is the Virtual Private Cloud (VPC). VPC is effectively a way of walling yourself off from all or part of the world. If you’re running a public-facing web server, it might not be so important. If you’re running a compute cluster, it’s a no-brainer. Just be careful about that “no-brainer” part.

While working on a new cluster for a customer today, I was trying to figure out why the HTCondor scheduler wasn’t showing up to the collector. The daemons were all running. HTCondor security policies weren’t getting in the way. I could use condor_config_val from each host to query the other host. I brought in a colleague to double-check me. He couldn’t figure it out either.

After beating our heads against the wall for a while, and finding absolutely nothing helpful in the logs, I noticed one tiny detail in the logs. The schedd kept saying it was updating the collector, but the collector never seemed to notice. The schedd kept saying it was updating the collector via UDP. How many times had I watched that line go by?

The last time, though, it clicked. And it clicked hard. I had set up a security group to allow all traffic within the VPC. Except I had set it for all TCP traffic, so the UDP packets were being silently dropped. As UDP packets are wont to do. When I changed the security group rule from TCP to all protocols, the scheduler magically appeared in the pool.

Once again, the moral of the story is: don’t be stupid.

March 30, 2014

Parsing SGE’s qacct dates

Filed under: HPC/HTC,mac — Tags: , , , , — bcotton @ 9:19 pm

Recently I was trying to reconstruct a customer’s SGE job queue to understand why our cluster autoscaling wasn’t working quite right. The best way I found was to dump the output of qacct and grep for {qsub,start,end}_time. Several things made this unpleasant. First, the output is not de-duplicated on job id. Jobs that span multiple hosts get listed multiple times. Another thing is that the dates are in a nearly-but-not-quite “normal” format. For example: “Tue Mar 18 13:00:08 2014″.

What can you do with that? Not a whole lot. It’s not a format that spreadsheets will readily treat as a date, so if you want to do spreadsheety things, you’re forced to either manually enter them or write a shell function to do it for you:

function qacct2excel { echo "=`date -f '%a %b %d %T %Y' -j \"$1\"  +%s`/(60*60*24)+\"1/1/1970\"";

The above works on OS X because it uses a non-GNU date command. On Linux, you’ll need a different set of arguments, which I haven’t bothered to figure out. It’s still not awesome, but it’s slightly less tedious this way. At some point, I might write a parser that does what I want qacct to do, instead of what it does.

It’s entirely possible that there’s a better way to do this. The man page didn’t seem to have any helpful suggestions, though. I hate to say “SGE sucks” because I know very little about it. What I do know is that it’s hard to find good material for learning about SGE. At least HTCondor has thorough documentation and tutorials from HTCondor Week posted online. Perhaps one of these days I’ll learn more about SGE so I can determine whether it sucks or not.

March 27, 2014

Fun with birthdays

Filed under: Musings — Tags: , , , — bcotton @ 9:09 pm

Sometimes I get distracted by shiny trivia. Shortly before St. Patrick’s Day, I noticed that seven of my Facebook friends happened to celebrate their birth on that holiday. That seemed surprisingly high, so I went through and counted up the birthdays for all 459 of my Facebook friends who have their birthday listed. The results are interesting. I don’t know if they’re meaningful or not.

As you can see from the I-should-have-made-it-larger chart above, any given day is most likely to be the birthday of one of my Facebook friends. It is slightly less likely to be the birthday of none of my friends. That was the most surprising result: I would never have expected that 108 days a year are empty when there are 459 birthdays to go around.

St. Patrick’s Day is the most frequently-birthed day with seven, although June 4 has six. According to the New York Times, those are the 134th and 146th most common birthdays. The most common birthday for those born between 1973 and 1999 is September 16, yet none of my Facebook friends claim that day.

May and December are the most common months for my friends, both with 52 birthdays. January is the least common with 25, though February and November each have 26. February gets some credit for being the shortest month, but it is still among the three months with less than one birthday per day. January does claim the longest stretch of birthday-less days, with eight.

How about days of the month? The 31st has the highest average, due to the 5s contributed by March and May (interestingly, these are the only two days with 5 birthdays). In second place, is the 22nd, which has the highest total count at 24. The lowest is on the 20th, which only has 6 birthdays. Two days before and after are in the 20s, so it’s a notable dip.

The full spreadsheet is available in Google Drive if you want to make your own observations.

March 11, 2014

Considering Bloom’s taxonomy in staffing decisions

Filed under: Musings,Project Management — Tags: , , , — bcotton @ 8:45 pm

A while back, an exam question introduced me to a taxonomy developed by educational psychologist Benjamin Bloom. In researching this work, I was immediately struck by how useful it could be when making decisions about technical staff. Bloom’s taxonomy is composed of three domains. The cognitive domain includes six hierarchical levels (from lowest to highest):

  • Knowledge
  • Comprehension
  • Application
  • Analysis
  • Synthesis
  • Evaluation

Applying these levels can help guide the interview process and provide a measure of a candidate’s abilities. With many technical jobs, though, it’s preferable to ignore the knowledge level. “Knowledge” in this context refers to memorized facts. Some interviews, especially phone screens, tend toward being entirely focused around the knowledge level. Even interviews that are based around programming exercises potentially overemphasize recitation over application. It is far too easy for a nervous interviewee to underperform on memorized facts. In real-world tasks, references are available for facts.

Once a person is hired, they need to be assigned work. If tasks are rated at the level they require, they can be matched to people at the required level. Tracking a person’s task levels can be beneficial as well. Giving someone tasks lower than they’re capable of will erode morale over time (and is a waste of resources), but someone who never gets lower level tasks could probably use a break. By the same token, giving people the occasional higher-level task gives them growth opportunities but too many can cause undue stress. If employees largely self-select tasks, a drop in level can be a warning sign of wider problems.

Of course, such applications are not new. Bloom’s taxonomy, by its very inclusion in an IT project management exam, is clearly not newly applied. It’s just interesting to me that a taxonomy developed for education some 60 years ago could fit technology staffing so well. If it’s new to me, then it’s probably new to someone else, too.

March 10, 2014

Thoughts on the Weather Forecasting Improvement Act

Filed under: Weather — Tags: , , , , — bcotton @ 9:24 am

Insurance Journal reported last week on a bill sponsored by Representative Jim Bridenstine (R-Oklahoma). In a fit of poor reporting, the author says the bill makes the “protection of people and property a priority.” Unfortunately, the National Weather Service mission statement has included “protection of life and property” for years. The bill itself contains no such insulting verbiage. On the surface, it’s actually a welcome relief: a Congressman looking to direct over half a billion dollars of new funding to scientific research and operations. In reality, it strikes me as more of a pipe dream.

The average tornado warning lead time is currently around 13 minutes. The goal of Bridenstine’s bill is a lead time of 60 minutes or more. Stretch goals are good, but a 4x increase is not, perhaps, the most appropriate for legislation. Even so, there’s a question of how valuable such an increase would really be. Increased protection of property is probably not going to be that dramatic with hour-long lead times. It’s not like people can move their houses and businesses out of the way. Some damage could be prevented by securing loose objects and boarding windows, but it’s not likely to be significant.

Protecting life is the more important aspect, but would a one-hour lead time help? I’ve argued for years that there’s definitely an upper bound to lead times after which the returns diminish. My suspicion is that as the lead time grows beyond that point, people become more and more complacent. This argument has been based on hunches and unsubstantiated reasoning. It turns out, there’s evidence that increased lead time has no impact on injuries from tornadoes.

Even if the benefits are minimal, the amount of learning that would have to take place to get lead times up to an hour would aid our understanding of severe weather. The improvements to observation networks and modeling would benefit all areas of weather forecasting. Even  if tornado warning lead times remain unchanged, the scientific impact of this bill would be dramatic. I just worry that it’s setting the National Weather Service up for “failure”.

February 14, 2014

Thoughts on Comcast and Time Warner Cable

Filed under: The Internet — Tags: , , , — bcotton @ 9:32 am

When I wrote a review of Susan Crawford’s Captive Audience two months ago, I didn’t expect to be revisiting it so quickly. Then came the news that Comcast was planning to buy Time Warner Cable, gaining a few million more customers and several regional sports networks. With the acquisition of NBC, Comcast is clearly making a play to be in the content business. There’s not much growth potential left in being a service provider, so it makes sense that Comcast would want to hedge their bets. That’s why I suspect they’re more interested in acquiring regional sports nets (live sports being one of the main reasons people don’t cut the cord) than the few million subscribers they’d pick up if the deal is approved.

It’s not like Comcast and Time Warner were really competing, despite how “competitive” the FCC and Comcast claimed the industry to be a few years ago. The cable companies largely have agreed not to step on each others’ toes. In most places, customers have exactly one choice for cable TV provider. Individual consumers will see no difference in the competitive landscape, so it’s easy to dismiss this as a non-issue (as I initially did). Where this may get interesting is when it comes time for networks to renegotiate carriage agreements. Comcast would have greater leverage to low-ball content providers, potentially squeezing a few out of business. As long as other modes of TV exist (e.g. satellite, AT&T U-Verse), I expect Comcast will remain somewhat constrained in their ability to harm content providers, but they’ll continue to be able to prevent competition from sprouting up.

Of course, it’s not guaranteed that this buyout will occur. Despite the relative ease with which the FCC and the Department of Justice approved Comcast’s purchase of NBC, the landscape has changed somewhat. Denying AT&T’s purchase of T-Mobile was a surprisingly pro-consumer decision, and it’s possible that this deal is doomed as well. I don’t follow Washington closely enough to say what’s likely. All I know is that I can’t wait for Metronet to extend their fiber offering to my neighborhood. I’ve been told it may happen as early as next month.

February 13, 2014

The Sperry–Piltz Ice Accumulation (SPIA) Index

Filed under: Weather — Tags: , , — bcotton @ 10:31 pm

Earlier this winter, Weather.com posted an article about a new index to rate ice storms. Setting aside the illiteracy of the author (the article talks about how the index was used experimentally in 2009), it’s a good introduction to a new-to-me index that can help meteorologists communicate impacts to the public. The Sperry–Piltz Ice Accumulation (SPIA) Index uses ice accumulation, wind speeds, and temperatures to predict the impact of winter storms on public utilities (particularly power lines). The algorithm appears to be protected by copyright, which is disappointing, since it limits the ability of the scientific community to evaluate the methodology.

Communicating impact is one of the major challenges in forecasting. Even when the forecast is technically precise, the general public often doesn’t know what to do with the information. Widespread use of the SPIA Index can help people and utility crews prepare. Unfortunately, the closed nature of the index may limit its adoption.

January 19, 2014

2013 severe weather watches

Filed under: Weather — Tags: , , , , , — bcotton @ 11:41 am

Greg Carbin, Warning Coordination Meteorologist at the Storm Prediction Center, recently updated his website to include maps of 2013 severe thunderstorm and tornado watches. I always like looking at these, because they highlight areas of increased and diminished severe weather threat. It’s important to not read too much into them though. As with hurricanes, it’s not always the frequency of events that makes a year memorable. 2013 was a below- or near-normal year for watches in the areas of Illinois and Indiana that were hit by a major tornado outbreak on November 17.

Tornado (left) and severe thunderstorm (right) watch count (top) and difference from 20 year average (bottom) by county. Maps are by the NOAA Storm Prediction Center and in the public domain.

Speaking of hurricanes, the quietness of the 2013 Atlantic hurricane season is evident in the below-average tornado watch count along the entire Gulf coast. Landfalling hurricanes are a major source of tornado watches for coastal states, so an anomaly in watches is often reflective of an anomaly in tropical activity. Preliminary tornado counts for 2013 are the lowest (detrended) on record. It’s not surprising, then, that the combined severe thunderstorm and tornado watch counts are generally below normal.

Severe weather watches (left) and departure from normal (right) by county. Maps are by the NOAA Storm Prediction Center and are in the public domain.

As you’d expect, Oklahoma and Kansas had the largest number of watches. What’s really interesting about the above map is the anomalously large number of watches in western South Dakota, western Montana, and Maine. Indeed, western South Dakota counties are comparable to Kansas in terms of raw watch count. Of course, that doesn’t mean the watches verified, but it’s an interesting note. Looking back through past years, the last 4 years have been anomalously high in western South Dakota. Is this an indication of a population increase, forecaster bias, or a change in severe weather climatology?

Older Posts »

Powered by WordPress