Skip to Main Content

Microsoft's Dare Obasanjo, who is known for his scathing critiques of technology trends and products, including some of those promoted by Microsoft itself, has just written a blog post analyzing why OpenID, AtomPub, and XML on the Web (including XHTML and SOAP) have failed as technologies.

In his conclusion, he sees a common issue with all three: they were each designed to solve a specific niche problem, but they were promoted as if everyone should be using them. In the end, adapting those technologies for other uses proved too complex and didn't give the benefits they were supposed to deliver.

Finally, he mentions "NoSQL" (a buzz word meaning non-relational databases) as something that developers should analyze with these lessons in mind. Will NoSQL end up joining this list of failures? Let's take a look.

Modern non-relational database systems, such as Google's BigTable or Apache Cassandra (originally developed by and for Facebook), are designed to provide a level of scale that SQL-based databases can't provide. However, they aren't drop-in replacements, and there are several trade-offs to consider.

For example, most non-relational databases are basically key-value stores, where you aren't able to look up data based on any arbitrary criteria, but you instead have to look it up by specific key names that are chosen in advance and stored with the data. If you want to look up a person by his full name, then you'll need the person's information stored with the full name as the key. If you also want to look up the person by an ID number, then that's a separate record with duplicated data. Each time you add a new way to directly reference a piece of data, you'll need to store another copy of that data, and you'll need to keep it all in sync.

Most of these issues have some partial solutions. Some of these database products allow multiple keys to point to the same piece of data, which mostly solves the duplication issue. For those that don't, you could use an indirect approach like using a full name to look up an ID number, and then using that ID number to look up the rest of the person's information, so that information can be stored in one place. Unfortunately, this means you're doubling the time it takes to look up data, by doing two lookups instead of one. It also means that, when you modify the full name value (or whatever value you're using as your key), you'll still have to update the affected keys, which, depending on the database product, can be nontrivial.

Either way, these database implementations still don't support variable search criteria with anywhere near the flexibility that SQL databases do. The more flexibility you try to add into the system, the more you chip away at the performance benefits of a key-value store. Depending on the nature of how your application needs to access data, the non-relational approaches can actually end up less scalable than traditional SQL databases.

This is the kind of thing that Obasanjo is talking about: BigTable and Cassandra were created to solve specific scalability problems with certain types of data, accessed in certain ways, where SQL databases had too much unnecessary overhead. However, a lot of technology publications have been talking about "NoSQL" as the next big thing, giving people the impression that every application that requires a lot of scale needs to switch to NoSQL. This kind of overgeneralized hype tends to mislead organizations into spending lots of money trying to fit their square pegs into round holes, and the years of horror stories that follow can end up threatening a technology even in the areas where the technology really excels.

That said, I do see a major difference that sets NoSQL technologies apart from OpenID, AtomPub and XML on the Web: Those three all involve interactions—negotiations, in a sense—between separate applications by separate organizations, while database choices tend to be much more contained within a single organization or product. If no one else were using BigTable, Google could still happily use it and continue to experience the benefits. The same can't be said about OpenID, AtomPub, or any XML data exchanged between two organizations. This might explain why XML is still doing well in non-network applications, and why I think NoSQL databases will always have their place, despite the misleading hype.<>