Relational Databases

Known popularly as SQL databases, these exist since the 70s. PostgreSQL, MySQL and OracleDB are examples of relational databases.
On this article I will try to give a short summary on their most relevant characteristics.

Overview

SQL databases, as the name implies, use a Structured Query Language to interact with it for creating, updating, deleting and reading data. This makes the knowledge required to work with them transferable, bar some vendor specific features (MySQL for example did not support Common Table Expressions – CTEs – till version 8.0 while PostgreSQL had support for it before that). There are standards for compliance with SQL, but all major vendors don’t comply 100% with the definitions of it.

SQL databases store structured, rigid data. All data must adhere to the table definition for column names, types, and constraints. Any sort of deviation from that will result in errors while querying the DB. This structured data allows to create strong relations between different entities, which enables complex queries to be written to fetch data from a lot of tables at once. This is a very powerful feature, but also a common pit of despair for developers; writing and optimizing complex queries requires a lot of knowledge and a poorly optimized query can be the downfall of a system.

One big advantage of working with SQL databases is the guarantees provided out of the box for consistency via transactions by being ACID compliant (more on this later). This means that an operation when executed, if it encounters an error, can have its changes reverted (even for multiple tables).

Possibly the biggest disadvantage of SQL databases is that they were not thought to be from inception horizontally scalable. This means that it is not easy to spread your data – if we end up with lots of it – across different servers. This leads to possible bottlenecks on the infrastructure where you have a giant master database server that must consume all writes.

This scalability issue leads to a lot of false affirmations that SQL databases are slower and not viable at bigger scale.

ACID

ACID stands for Atomicity, Consistency, Isolation and Durability. These properties guarantee that a transaction works as intended.

Atomicity

This defines that all operations on a transaction are executed fully, or no operation is executed at all.

Consistency

This defines that the data is consistent before and after the transaction. Respecting all indexes and constraints.

Isolation

This defines that all transaction operations do not interfere with each other. Isolation can be configured to different levels though, from a more relaxed to a more strict level when it comes to locking access to data, which can lead to some data problems.

From lowest to highest these are the isolation levels:

  1. Read Uncommitted – allows transactions to read data from other uncommitted transactions.
  2. Read Committed – all data read is committed at the moment is read.
  3. Repeatable read – all data read and written is committed at the moment of execution.
  4. Serializable – this level ensures that queries executed concurrently, would have the same outcome as if they were executed sequentially.

The isolation levels are in place to avoid (mostly) the following 3 phenomena defined by the SQL standard (introduced in 1992 – https://en.wikipedia.org/wiki/SQL-92):

  1. Dirty read – a transaction reads data from another uncommitted transaction
  2. Non repeatable read – a transaction reads the same row twice, and it sees different values for each read.
  3. Phantom read – a query executed in a transaction that returns a set of rows (a count operation for example) returns a different set of rows when executed later on in the same transaction.

Below you can find a table showing which level prevents which phenomena:

An extensive explanation with examples for all of them (and more info) can be seen at the following talk:

Durability

This defines that after a transaction is committed, all the changes related to it are persisted to disk. So even if there is a system failure, we know they happened.

Done!

So there it is, a short summary on some characteristics of relational databases. Hopefully this will be useful to shed some light on doubts you might have 😊!
See you on the next post, where I will try to describe a bit NoSQL databases.

Leave a comment