Ulf Wendel

Slides: MySQL 5.6 Global Transaction Identifier and PECL/mysqlnd_ms for failover

The long lasting MySQL replication failover issue is cured. MySQL 5.6 makes master failover easy, PECL/mysqlnd_ms assists with the client/connection failover. Compared to the past this is a significant step towards improving MySQL replication cluster availability, eleminating the need to use 3rd party tools in many cases. The slides illustrate the basic idea, as blogged about before.

There is not much to say about the feature as such. Slave to master promotion works without hassles, finally. Regardless if you do failover because of an error of the current master or switchover because you want to change the master server, its easy now. Congratulations to the replication team!

Limitations of the current server implementation

The global transaction identifier implementation in MySQL 5.6 has a couple of limitations, though. Its not hard to guess that mixing transactional and non-transactional updates in one transaction can cause problems. Its pretty much the first pitfall I ran into when trying to setup a MySQL 5.6.5-m8 (not 5.6.4-m8…) slave using a mysqldump generated SQL dump. MySQL bailed at me and stopped me from failing.

Let a master run a transaction which first updates an InnoDB table. followed by an update of a MyISAM table, followed by another update: t(UInnoDB, UMyISAM, UX). Let the binary log settings be so that this transaction is written as one to the binary log (binlog_format=statement, binlog_direct_non_transactional_updates=0). It is then copied “as is” to the relay log of a slave. Assume that the slave runs with different binary log settings so that t(UInnoDB, UMyISAM, UX) is split up to t(UMyISAM), t(UInnoDB[, ...])and logged as distinct transactions in the slaves binary log.

Worst case: conflicting binary log settings

Master Slave
GTID=M:1 t(UInnoDB, UMyISAM, UX) GTID=M:1 t(UMyISAM)
GTID=M:1 t(UInnoDB[, ...])

Because slaves must preserve global transaction identifiers they got from their master, the two resulting transactions are given the same identifier. The transaction identifier in the slaves binary log is no longer unique, it now refers to two transactions not just one (issue #1). Any slave that would read from the binary log of the above slave may loose the InnoDB transaction because it may refuse to execute a transaction using an id that has been executed already (issue #2).

The workaround? Don’t mix InnoDB and MyISAM updates in one transaction. To me it does not sound that much of an issue in 2012. Please note, I’m describing my experience with MySQL 5.6.5-m8, which is a development version.

The load balancer update

MySQL Replication takes a primary copy approach to replication. A primary/master handles all the updates. Read-only replicas/slaves replicate from the primary. The primary is a single point of failure.

Writes Primary/master
Reads Slave   Slave

The failure of a slave is unproblematic. A client usually has plenty of other slaves to start reading from. If no slave is available, reads can even be forwarded to the master.

PHP
Load Balancer, e.g PECL/mysqlnd_ms

Read
Slave   Slave

All a PECL/mysqlnd_ms user has to do is check for an error after statement execution. If there’s one and the error code hints that the server has gone away, the user reruns the statement. The connection handle remains useable all the time. Upon rerun, PECL/mysqlnd_ms openes a new connection to another slave.

do {
  $res = $mysql->query("SELECT id, title FROM news");
} while (isset($connection_error_codes[$mysql->errno]));

if (!$res) {
  bail("SQL error", $mysql->errno, $mysql->error);
}

A master failure is much more problematic. There is no server to send a write to but the master. The master is a single point of failure. The new global transaction identifier help to reduce the time it takes to put a new master in place after a failure.

PHP
Load Balancer, e.g PECL/mysqlnd_ms

Write
Master

After a master failure some process needs to promote a former slave to the new master and, preferrable atomically, update all other slaves to start replicating from the new master. The below illustration is a bit confusing. It is intentionally. What happens during the simple to say “slave to master promotion” is a complete restructuring of the cluster.

Writes   Primary/master (gone)
Read Write Slave Master (promoted former slave)
Read Read Slave (no change)

Don’t forget to update the load balancer

After the cluster has been reorganized, the load balancer configurations must be updated. PECL/mysqlnd_ms happens to be a driver integrated load balancer. However, other than that, it is not different from a classical load balancer. Whatever process restructures the cluster it must take care of deploying the load balancer configurations afterwards.

Global transaction identifiers are a great help for the biggest part of the failover job – the server side. But, they are no swiss army knife. Don’t forget to update your load balancer configuration – the client side. No matter where it is. Whether it is part of the application code, driver integrated or you are using MySQL Proxy. As long as we are talking primary copy, a master failure will always be a major pain.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

One Comment