IT management isn’t a simple matter, and we’re not going to insult anyone by suggesting that any IT managers are lacking in knowledge or have easy jobs. Just the same, Twitter is probably more complicated than the average company, so IT specialists who want to learn how things are done there or put their problems in perspective may be interested in an official post about fail whales.
Ed Ceaser and Nick Kallen, two engineers at Twitter, constructed a sort of essay titled “The Anatomy of a Whale.” In it, they discussed diagnosing and correcting a problem that cropped up a few weeks ago.
We’ll only try to describe the broad strokes here (the full post is 2,000 words long), but Ceaser and Kallen stated, “[W]e used a simple strategy that involves proceeding from the most aggregate measures of system as a whole and at each step getting more fine grained, looking at smaller and smaller parts.”
Then, with some rough spots and quirks identified, they focused on the biggest contributor to the problem, and made the software that talks to Memcached more efficient with its requests, eliminating almost half of Memcached calls.
Hopefully Ceaser’s and Kallen’s walkthrough will prove helpful, or at least interesting. It’s not every day that big companies allow outsiders to have a look at how they handle difficulties.


