It’s no big secret that people have found better substitutes for the traditional relational (SQL) database for all kinds of use cases. My absolute favorite public example of this—just based on number of technologies involved—is Instagram’s infrastructure, which uses PostgreSQL on the backend, but also employs Redis on the front-end.
Anyone who’s trying to build a scalable website today relies heavily on various NoSQL databases, such as MongoDB, Redis, Riak, Cassandra, and Amazon’s Dynamo, to name just a few of the most popular ones.
However, enterprise penetration has been limited. I want to talk about that.
The Benefits of NoSQL: Scale & Performance
The reason that the usage of NoSQL databases has exploded on the web is that they execute some operations blazingly fast, such as atomic document lookups without joins. Furthermore, many NoSQL databases attack the Big Data problem head-on by coming with out-of-the-box support for distributing a database across a cluster of machines (which is very tricky to accomplish with traditional relational databases).
For example, a website like Pinterest (yes, the obligatory Pinterest mention since it seems illegal not to mention it these days), serving 10,000,000 visits a day, with a catalog of data growing exponentially, simply cannot be successful on a traditional, relational back-end. They need layers of caching and persistence to ensure a reliably interactive user experience.
This kind of scale is very different than what you experience in the enterprise, where you get fewer users with different speed expectations.
The Benefits of NoSQL: Flexibility
Aside from performance & scalability, the other major advantage of NoSQL systems is data flexibility.
SQL systems require that you create a schema before doing anything else. Want to build an application? First, build your model. Then start coding. Need to change your model? Good luck, since you have to change every single thing that ever might have depended on your first model.
NoSQL systems turn this on its head. When working with MongoDB, for example, you can start coding your app, storing things in the database as you learn you need to.
Lots of changes are so much easier. If you need a property to be multi-assigned, just do it! You don’t have to worry about creating entire link tables and adding joins and redoing your business logic all over the place just to make this change work.
No Enterprise for NoSQL
Despite these advantages, enterprise penetration of NoSQL databases has been pretty limited to date. Some technical reasons include:
- Poor support for ACID transactions.
- Loose guarantees of data consistency across a grid.
- Limited support for aggregations.
- Limited support for joins.
Basically, many of the things that are needed to maintain consistency of mission critical data do not hold true for the NoSQL databases.
Said another way, if you’re in IT and you’re trying to build a system in support of a mission-critical application, you have been trained to rely on these types of guarantees. It’s mentally unsettling to think about different, softer software guarantees such as “eventually consistency” of some NoSQL databases.
So you have to think about when you don’t need the rock solid ACID transactions of the relational world. And without the pressure of user scale as on the web there is much less motivation to actually go through this exercise.
This leaves enterprises going with newer SQL technologies like Vertica or Attivio that scale very well and are less confusing than NoSQL systems. Or, if they’re really adventurous, using Hadoop for a specific Big Data problem.
Semantic Web Databases: Flexible NoSQL for the Enterprise
One kind of NoSQL system that has been seeing penetration in the enterprise is Semantic Web databases. They don’t offer the same kind of performance and scale that the web-based NoSQL variety does, but instead provide much more flexibility than traditional relational systems while maintaining security & transactional integrity.
Getting back to our IT guy analogy. If you’re deciding between a relational database and a Semantic Web database, it no longer has to be about rock solid data integrity and transactional guarantees, because both systems provide them. It becomes about flexibility and tooling compatibility, which are easier things to wrap your head around.
For example, if you’re dealing with lots of unstructured data, then use a Semantic Web database. If you’re dealing with a schema that you expect to change over time (to incorporate new information types or sources), then use a Semantic Web database.
Thus you can start having a reasonable conversation when you limit the number of differences between the different style systems, instead of being overwhelmed by a class of systems that is somewhat alien. It’s easier to compare Semantic Web databases to relational databases than it is to compare something like Riak to a relational database. I believe this is one reason why Semantic Web databases have made progress in the enterprise where other NoSQL technologies have not.
So what it comes down to is that for decades we’ve had one standard way to store and query important data, and today there are new choices. As with any choice, there are tradeoffs, and for some applications NoSQL databases, including Semantic Web databases, can enable organizations to get more done in less time and with less hardware than relational databases. The trick is to know when and how to deploy these new tools.
Martin Fowler called this Polyglot Persistence (also the source for this post's image), which I think describes the future fantastically. Our job is made both easier and more difficult by the new world of database technology choices available to us.