(Slides) DIY – A distributed database cluster, or: MySQL Cluster

2013/06/05 by admin

Ever wondered how a database cluster ticks? Here’s how! During my talk at the International PHP Conference (#ipc13) I tried to build a cluster. I brought a soldering iron, sold, pink, orange and brown cables with me. Then, I tried to sold the thick (reliable, high throughput) brown cable at my company notebook (video coming). Eventually, I failed. Probably, I lacked the theoretical background?! Luckily, I got very theoretical slides with me…

DIY: A distributed database cluster, or: MySQL Cluster from Ulf Wendel

View & Download

No marketing fluff

The talk has 0% marketing fluff. 0% marketing fluff leaves little to say but theory. Thus, the slides explain:

The four kinds of database clusters
Concurrency control: how conflicts are detected
Concurrency control: how conflicts are resolved
Why atomic broadcast (Virtual Synchrony) is cool
How ROWA compares to Quorums
How to split brains
Three different architectures for DIY
How do MySQL Replication/Cluster solutions differ
Why partial replication (partitioning/sharding) is the only way to scale writes
Why MySQL Cluster hates PHP apps
Why MySQL Cluster is the ultimate NoSQL KVS

All this theory helps to create a mind map of clustering solutions available for MySQL. I am no support guy, I am no consulting guy but a curious developer. I cannot tell you how the systems perform in real life but I may help you understanding the basics and asking the real experts nasty questions.

How do MySQL Replication, Tungsten, Galera and Cluster compare?

There are two simple questions to categorize RDBMS replication solutions:

Where can any transaction (including updates) be run?
When will replicas be synchronized

This gives you a simple matrix with four squares:

	Where?
When?	Primary	Update Anywhere
Eager	n/a	MySQL Cluster, Galera
Lazy	MySQL Replication, Tungsten	MySQL Cluster with Replication

An eager replication system waits for all replicas to – at least – receive the transactions changes, the transaction can commit. Thus, the commit rate is determined by the network round trip time. In the worst case an eager cluster may expose a lower commit rate than a single database. WAN and eager usually do not go together well.

Is synchronous really synchronous…?

Eager or synchronous does not mean zero lag between replicas. How big the lag is depends on the algorithm used. For the average SQL user, Galera is more prone to lagging than MySQL Cluster. Galera executes a transaction on one replica, then sends out the updates and commits. Galera commits before the other replicas have caught up. MySQL Cluster executes transactions in parallel on multiple replicas and distributes pending updates to replicas before it commits.

A SQL user view on commit sequence
Galera	MySQL Cluster
Execute transaction	Execute transaction (in parallel)
Distribute updates
Commit	Wait for update of other replicas
(Other replicas eventually commit)	Commit

Is Galera cheating by hiding the time it takes other replicas to commit? No, it is all documented! BTW, the “Virtual” in Virtual Synchrony (their algorithm) does not refer to temporal aspects (see slides). MySQL Cluster can easily compete, if you like the Galera style: see the low level NDBAPI documentation.

A lazy replication system commits at the speed of your disks. Often your disks will have lower latency than the round trip time in your network. The downsides are widely known: copies (slaves) may lag behind, eventual consistency, what if the master crashes.

Three different architectures

On a high level there are three different designs for a distributed database cluster: in-core, middleware and hybrid. A cluster can be described using six building blocks. The first three are obvious: clients, loadbalancer, RDBMS. The next three are: reflector, replicator and GCS. The reflector intercepts transactions in the RDBMS and can control them. It sends all required information about ongoing transactions to the replicator. The (distributed) replicator is responsible for detecting conflicts. The reflector is using a group communication system (GCS) for network communication.

In an in-core design RDBMS, reflector and replicator go in one process. This is the best possible integration with the lowest overhead. MySQL Replication falls into this group. Same about Galera (they have to patch MySQL).

A middleware approach provides a virtual database. It intercepts all statements issued by clients and forwards them to the actual database. Reflector, relicator and GCS may be present but are not part of the database. The challenge is in parsing and handling non-deterministic SQL statements containing functions such as NOW() or TIME(). Also, whenever the database learns a new client protocol, the virtual database has to learn it too. Tungsten is an example of such a solution for MySQL.

A hybrid approach has a reflector plug-in to the RDBMS and a replicator running in its own process. That is pretty much the design of MySQL Cluster. It gets all information it needs from MySQL in an efficient manner.

Handling writes

In the course of the presentation you will learn that there is only one way of scaling writes in a distributed replicated database: partial replication. In a full (e.g. Galera) replication system the formula “more replicas, higher write throughput” is wrong. Every new replica adds load to the entire system as more update transactions can be started and each of them needs to be certified by all the others.

MySQL Cluster does partial replication: it is using partitioning/sharding. A write does not need to be certified by all other replicas but only by those holding partitions affected by the write. Thus, it can scale writes as well as it can scale reads. Also, multiple replicas can participate in the execution of a transaction and their computing power is combined. The big challenge with this approach is in executing those distributed queries in a way that network traffic is minimized. The slides give more insights.

If you have simple key-value style queries, short transactions and few joins, you should have a look at MySQL Cluster. MySQL Cluster is also well suited as a web session store. Thinking that Drupal or WordPress will run a magnitude faster on MySQL Cluster – oh, oh…

Happy hacking!

@Ulf_Wendel