Skip to Main Content

A couple hours ago, CrystalTech, the company that hosts the Coding Horror Web site and the Stack Overflow blog, had a technical problem that resulted in their virtual machines (VMs) being 100 per cent deleted with apparently no usable backups.

 

Coding Horror is a popular programming-oriented blog in which the author frequently criticizes other people's code or technology decisions. Stack Overflow, which was co-created by the same person, is a popular site for programming help.

 

This is breaking news, and the details aren't all out yet, but apparently the only backups they had were also on VMs hosted by CrystalTech, and it was all permanently deleted. The owners are currently looking through various caching services (such as Google Cache) to recover as much of the content as possible.

This serves a few very important lessons:

First, it's always a good idea to keep a set of backups as separate from the source as possible. This means backups should be kept off-site, where whatever causes the data loss in the first place is unlikely to destroy the backups.

Second, just because you're hosting your content with what you believe to be a competent company that supposedly follows best practices, that is absolutely no guarantee against these kinds of disasters. Major hosting/data companies like Google and Amazon have had similar data loss incidences in the past. It happens, and it's a mistake to trust that they have recovery capabilities. It's important to always have your own means of recovering lost data.

Third, even the best, most talented developers sometimes get caught with their pants down. For people who make a living by poking fun at shoddy code/infrastructure and providing a service for giving programming advice, the owners of these sites are no doubt extremely embarrassed about this incident. The general sentiment among the visitors, though, is that of sympathy, because we all know what it's like to suddenly and completely lose something we've dedicated countless hours toward. These kinds of failures tend to happen when we're least prepared, when we think, "Everything is fine. Backups aren't a big deal right now, because it's not like I'm making any big changes." We need to remind ourselves of that stomach-sinking frozen moment when we see our work, our investment, our piece of art vanish before our eyes, and we need to remind ourselves of the "I should haves” that we mournfully repeat in those moments.

When I started writing this, the affected sites consisted solely of an apology and a very brief explanation. By now, they have managed to restore parts of the sites, apparently thanks to the help of visitors who have contributed caches of the pages. <>