A common thread in technology — or for that matter, just about anything humans attempt over time — is evolution. It's not unusual, then, to see things grow and adapt as the world around them changes. That very adaptation means that sometimes older variations of our favorite tools remain, while newer variations appear alongside them to better serve certain, specific purposes.
The database world is no exception. Since the days of E.F. Codd and his introduction of the relational data model, we've seen data stores and their requirements grow from purely transactional models doing ACID (Atomic, Consistent, Isolated, Durable), to requiring more flexibility, ability to scale (via distributed deployment), and speed (in terms of latency and throughput). These improvements may, in many cases, be at the very expense of what made RDBMS solutions preferable in the first place. But how to know?
In this piece, we'll refer to traditional SQL databases — relational database systems — as RDBMSs. Their counterpart, in our comparison, is the category of databases — known as NoSQL — which includes many subtypes.
Overview of SQL / RDBMS
RDBMSs have been the default choice for information storage in financial, manufacturing, logistical, personnel, and other systems since the 1980's. Since Oracle's first commercial offering, by what was then known as Relational Software, was introduced in 1979, the world saw a phasing out of their predecessors — legacy hierarchical and network databases — partly because of ease of use and administration. Examples of RDBMS databases available today include Oracle, MySQL, Microsoft SQL Server, and PostgreSQL.
The fault-tolerance, accuracy, and trustworthiness born of these attributes have made RDBMSs the go-to for transactional data. However, with the advent of the web in the 90's and "Big Data" in the mid-aughts, new requirements appeared.
To deal with the exponentially-growing, semi-structured flood of data that was, by then, becoming commonplace, admins asked questions. Could RDBMSs be distributed, or were they capacity-limited to the hardware they resided on? How did distribution affect throughput? Could RDBMSs support multi-tenancy, and did this threaten transaction isolation? Could relational systems support very wide columns, variable-length records, semi- or even unstructured data?
Often, the next evolutionary step is to tear down your existing assumptions and start over, with your new requirements in mind. Enter NoSQL.
Overview of NoSQL
Actually, a few "NoSQL" solutions had been around a while already, but hadn't gained much traction outside of specific domains. As early as 1966, the unfortunately-named MUMPS programming language supported a key-value database (with ACID transactions) for, ironically, the healthcare industry. Another early key-value store (for Unix) was Ken Thompson's dbm (of which a current successor is BerkeleyDB).
NoSQL is an umbrella term for any database that stores data in a way other than RDBMS's rigidly-typed, schema-immutable, relational tables. Today, there are several common types of NoSQL databases, best suited to different purposes. Among them:
- Key-Value stores: lightning-fast lookup, via a key-value pair, with caching capabilities.
- Document-oriented databases: databases that, like key-value databases, use a key to access an associated value; in a document-oriented database however, the data in the value is not opaque to the system and may be used to optimize the query. Document-oriented databases may store their data in a notation like JSON or XML.
- Distributed/Tunably Consistent databases: distributed databases that let the user configure the tradeoff between consistency, availability, and partition tolerance.
- Wide-Column stores: databases that use tables, rows, and columns, like an RDBMS, but where names and formats of columns can vary by row within the same table.
- Time-Series databases: database optimized for storing and manipulating time series data (where timestamps are best treated as discrete quantities rather than continuous mathematical dimensions).
- Text Search: these databases let you store data in unstructured format or even as plain text.
It's worth mentioning that many NoSQL databases out there today provide features from multiple NoSQL types. For example, both Apache Cassandra and ScyllaDB are wide-column stores and distributed, tunably-consistent databases.
Now that many NoSQL databases achieve their (com)promises, many, if not most, are going back to an SQL-like dialect, to grow their user base with users already familiar with SQL. CQL (Cassandra Query Language), Spark, Hive, or Presto are all notable examples. Elasticsearch has its own SQL variant. So does MongoDB. Even streaming platforms like Kafka are getting in on the SQL act!
Key differences between NoSQL and SQL / RDBMS
In addition to these SQL (-like) query languages, NoSQL and RDBMS databases share at least one more thing in common: the ability to store data. The difference is in how.
NoSQL databases make certain compromises to get a leg up on flexibility, ingestion speed and query throughput. As noted earlier, for example, a tunably-consistent database offers users an adjustable tradeoff on consistency if higher availability or partition tolerance is preferred.
Also, NoSQL databases may be missing or may compromise some of the features inherent to RDBMS, for example, primary keys, foreign keys, rigid schemas, and (hardcoded) data types.
Here's a look at some of the key differences between the RDBMS and NoSQL categories, generally:
|NoSQL||SQL / RDBMS|
|Many types are not table-based, if they are, schemas are not usually fixed||Fixed schema, table-based|
|If distributed, follows CAP theorem (user can adjust the tradeoff between consistency, availability, and partition-tolerance.)||Generally not distributed. Transactions adhere strictly to ACID (Atomic, Consistent, Isolated, Durable) properties.|
NoSQL database types: feature comparison and examples
To even better understand the distinction, it's useful also to look at the differences within the category of NoSQL databases:
|Key-value pair||Redis||Fast, in-memory database platform offering native support for a wide range of data structures. Often used also as part of a cache or message broker system.|
|Text search||Elasticsearch||Full text search and analytics engine; works in near real time, including indexing.|
|Document based||MongoDB||Document (JSON-like) based distributed data store. Currently MongoDB is the most popular NoSQL offering.|
|Time-series||InfluxDB||Simple DBMS for storing time series, events, and metrics.|
|Distributed wide column store||Apache Cassandra, ScyllaDB||Distributed, wide-column store best suited to multi-cloud or multi-data center environment.|
Which is best, when?
If you could answer the question "Why NoSQL?" with only a few words, they would probably boil down to scale (including flexibility of scale), flexibility (including of schema and type constraints), or latency and performance (including throughput). Secondary considerations may include the size of community (and the support and documentation you're likely to find) or the level of industry adoption (as a measure of popularity, and thus reliability and fitness for purpose) of a given solution.
Of course, there's more to it than just that. While NoSQL offerings are built around these concepts, RDBMS databases remain the preferred choice in industries and applications involving transactional data, where ACID-based durability is essential: user authentication and access-rights management, healthcare, banking, resource allocation and inventory management, and shopping carts, to name but a few. RDBMS databases come with the long-established promises of isolation, security, and referential integrity, and there's no seeing those requirements go away any time soon.
However, in applications involving high-volume, fast moving analytics data from sensors, applications, or complementary systems, a NoSQL solution with different features or attributes might be a better fit. What kind of needs do you have? A fast-lookup with caching features, like a key-value store? A more flexible document-store for searching through larger records, or even text search for unstructured data? Maybe a distributed wide-column store with tunable consistency, or a store specialized for time-series data?
NoSQL is an evolution from RDBMS databases to better serve specific needs. The "best" in any case is what suits your needs; remember to assess your existing infrastructure and team to find areas where gaps can become strengths.
Alooma's enterprise data pipeline platform, which supports both RDBMS and NoSQL databases, can help you leverage those strengths. Here at Alooma, we're ready to help — just ask us how.