MySQL, NoSQL, We All Scream for NewSQL
Databases are a cornerstone of any successful content-driven website, and these days there is no shortage of options when designing and configuring the data source of a website or web application. Most open-source projects for the web today integrate well with MySQL, which is perhaps the industry standard for relational databases. MySQL, PostgreSQL, and Microsoft SQL Server power millions of websites today, but the relational database format is not without its critics.
There is a growing trend in the industry towards NoSQL or NewSQL options for web development. So what’s the big difference? Well, relational databases are something of a vintage technology, but they have stood the test of time fairly well, as they are currently the data storage solution of choice for such popular software as WordPress, Drupal, Magento, and more. Relational databases, as you might guess, depend on relationships between types of data.
For example, a blog post might have an author and a category. At the database level, there would be three tables (organized somewhat like spreadsheets) for blogs, authors, and categories. These would link together by way of columns which contain numeric “keys” that correspond to columns from the other tables. While this is a sensible way to think about organizing data, it can also lead to bottlenecks in speed if enough requests are made to the database at one time, since it takes time for the database to link these “keys” together and pull data from three (or more) different tables.
The NoSQL movement has burst on the scene in recent years mainly as an answer for this sort of problem. Google’s “BigTable” database format was one of the main technologies that opened the eyes of many developers, causing them to rethink how they store data. I personally worked with BigTable when designing Klection as a side project a few years ago. The major difference with BigTable, as its name suggests, is that instead of using relationships between several tables, it prefers to store data in one giant table (which does mean some repetition of data) and create “indexes” based on the types of searches that are conducted by the website. Indexes are essentially alternate versions of the giant table that have been sorted in a particular way.
For our earlier blog example, we would now design one table with columns for blog title, blog text, author name, category name, and date posted. Then we might have several indexes where the data has been sorted by title, author, category, and date. These indexes are what allow all blog posts by a certain author to be found quickly, and it is done without having to find “keys” and join multiple tables.
Other NoSQL technologies such as CouchDB, MongoDB, and Cassandra are now in widespread use, thanks to this new way of thinking about data. But traditional relational databases are certainly not out of the picture yet. They have proven to be a reliable, sturdy technology despite some of the speed concerns. And those concerns have been addressed fairly well by such solutions as Memcached. Even Facebook, with its incredible amount of traffic, still uses MySQL for much of its data storage, and the site has an incredible record for uptime (certainly besting the likes of Twitter and Tumblr). Meanwhile, the “NewSQL” movement is essentially an attempt to preserve the relational concepts of SQL while emphasizing certain NoSQL concepts like access speed from the start, but these solutions are not yet in widespread use, and the jury is still out for most of them. But the early results seem very promising.
Bottom Line: Database design is as fundamental to content-driven websites as the plumbing system is to a new house. Careful consideration of the pros and cons of each data storage platform is critical.