Why do we have to bother about built-in GTID support in MySQL 5.6 at all? Sure, it is a tremendous step forward for a lazy primary copy system like MySQL Replication. Period. GTIDs make server-side failover easier (slides). And, load balancer, including PECL/mysqlnd_ms as an example of a driver integrated load balancer, can use them to provide session consistency. Please, see the slides. But…
… the primary remains a single point of failure. GTIDs can be described as cluster-wide transaction counters generated on the master. In case of a master failure, the slave that has replicated the highest transaction counter shall be promoted to become the master. Its the most current slave. Failover made easy – no doubt! Adequately deployed, you should reach very reasonable availability.
Know the limits of replicated systems
A multi-master (update anywhere) design does not have a single point of failure. But among the biggest is scaling a multi-master solution. Jim Gray and Pat Helland concluded 1996 in "The Dangers of Replication and a Solution": Update anywhere-anytime-anyway transactional replication has unstable behavior as the workload scales up: a ten-fold increase in nodes and traffic gives a thousand fold increase in deadlocks or reconciliations.. N^3 – buuuhhhh, anything worse than linear scale is not appreciated. Guess what: Microsoft SQL Azure is using primary copy combined with partitioning.
In practice things are not that bad, particulary not for a small number of nodes and recent algorithms. For example, MySQL Cluster (related webinar on March 29) is a true multi-master solution – even eager/synchronous. To overcome the write-scale limitations it has built-in partitioning (sharding). The two classical scale-out solutions – replication and partitioning – are combined in one product. If you want extreme performance and are ready to pay for the costs of partitioning… try it.
Anything to learn from the NoSQL kids on the block?
Some other kids offer relaxed eventual consistency just as MySQL Replication does. Sometimes the CAP theorem is cited as an excuse for it . Some leave conflict resolution, even conflict detection to the application developer . A massively scalabale, high available, synchronous update anywhere solution with built-in conflict resolution – the big thing we all dream of – is hard to create.
In the meanwhile… – maybe custer-aware APIs
While we all wait for the one-fits-all solution, there is something we can do. We can start to tell our load balancers precisely what we need and request no higher level of service than needed. Consistency – as in CAP – is one aspect of service quality. We should start to have cluster-aware APIs abstracting the details of replication architectures. Then, our load balancers, including PECL/mysqlnd_ms can hide everything that makes working with a cluster complicated (connection pooling, request splitting and redirection, failover, node selection, load distribution, …). Also, vendors can start to play with consistency to improve performance without messing up application logic.
Below is how you use the PECL/mysqlnd_ms 1.2+ function mysqlnd_ms_set_qos()
to switch between eventual consistency (stale data allowed) and session concistency (read-your-writes). MySQL Replication details hidden behind a function call.
$mysqli = new mysqli("myapp", "username", "password", "database");
if (!$mysqli)
/* Of course, your error handling is nicer... */
die(sprintf("[%d] %s\n", mysqli_connect_errno(), mysqli_connect_error()));
/* read-write splitting: master used */
if (!$mysqli->query("INSERT INTO orders(order_id, item) VALUES (1, 'christmas tree, 1.8m')")) {
/* Please use better error handling in your code */
die(sprintf("[%d] %s\n", $mysqli->errno, $mysqli->error));
}
/* Request session consistency: read your writes */
if (!mysqlnd_ms_set_qos($mysqli, MYSQLND_MS_QOS_CONSISTENCY_SESSION))
die(sprintf("[%d] %s\n", $mysqli->errno, $mysqli->error));
/* Plugin picks a node which has the changes, here: master */
if (!$res = $mysqli->query("SELECT item FROM orders WHERE order_id = 1"))
die(sprintf("[%d] %s\n", $mysqli->errno, $mysqli->error));
var_dump($res->fetch_assoc());
/* Back to eventual consistency: stale data allowed */
if (!mysqlnd_ms_set_qos($mysqli, MYSQLND_MS_QOS_CONSISTENCY_EVENTUAL))
die(sprintf("[%d] %s\n", $mysqli->errno, $mysqli->error));
/* Plugin picks any slave, stale data is allowed */
if (!$res = $mysqli->query("SELECT item, price FROM specials"))
die(sprintf("[%d] %s\n", $mysqli->errno, $mysqli->error));
GTID for clients? Buzz alarm!
PECL/mysqlnd_ms 1.3 does not bring any ground breaking changes with regards to consistency or GTIDs. It can now either use the driver built-in GTID emulation (1.2+) or the server-side GTID feature (1.3+, MySQL 5.6) for session consistency. That’s all. I confess, the slide title is pure buzz. But in every tale is some truth.
Cluster-aware APIs and better load balancer? Follow up!
I’m convinced that good load balancers can make application developers life much easier. Read-your-writes and session consistency is an example how new API calls may come handy. Transparently replacing remote slave accesses with client-side cache accesses (coming with 1.3) is an example how load balancers can optimize overall cluster performance.
Whoever designs a replication solution in 2012 should include the load balancer into his considerations… – even for multi-master.
Happy hacking!