Ulf Wendel

Slides: MySQL 5.6 Global Transaction Identifier and PECL/mysqlnd_ms for failover

The long lasting MySQL replication failover issue is cured. MySQL 5.6 makes master failover easy, PECL/mysqlnd_ms assists with the client/connection failover. Compared to the past this is a significant step towards improving MySQL replication cluster availability, eleminating the need to use 3rd party tools in many cases. The slides illustrate the basic idea, as blogged about before.

There is not much to say about the feature as such. Slave to master promotion works without hassles, finally. Regardless if you do failover because of an error of the current master or switchover because you want to change the master server, its easy now. Congratulations to the replication team!

Limitations of the current server implementation

The global transaction identifier implementation in MySQL 5.6 has a couple of limitations, though. Its not hard to guess that mixing transactional and non-transactional updates in one transaction can cause problems. Its pretty much the first pitfall I ran into when trying to setup a MySQL 5.6.5-m8 (not 5.6.4-m8…) slave using a mysqldump generated SQL dump. MySQL bailed at me and stopped me from failing.

Let a master run a transaction which first updates an InnoDB table. followed by an update of a MyISAM table, followed by another update: t(UInnoDB, UMyISAM, UX). Let the binary log settings be so that this transaction is written as one to the binary log (binlog_format=statement, binlog_direct_non_transactional_updates=0). It is then copied “as is” to the relay log of a slave. Assume that the slave runs with different binary log settings so that t(UInnoDB, UMyISAM, UX) is split up to t(UMyISAM), t(UInnoDB[, ...])and logged as distinct transactions in the slaves binary log.

Worst case: conflicting binary log settings
Master Slave
GTID=M:1 t(UInnoDB, UMyISAM, UX) GTID=M:1 t(UMyISAM)
GTID=M:1 t(UInnoDB[, ...])

Because slaves must preserve global transaction identifiers they got from their master, the two resulting transactions are given the same identifier. The transaction identifier in the slaves binary log is no longer unique, it now refers to two transactions not just one (issue #1). Any slave that would read from the binary log of the above slave may loose the InnoDB transaction because it may refuse to execute a transaction using an id that has been executed already (issue #2).

The workaround? Don’t mix InnoDB and MyISAM updates in one transaction. To me it does not sound that much of an issue in 2012. Please note, I’m describing my experience with MySQL 5.6.5-m8, which is a development version.

The load balancer update

MySQL Replication takes a primary copy approach to replication. A primary/master handles all the updates. Read-only replicas/slaves replicate from the primary. The primary is a single point of failure.

Writes Primary/master
Reads Slave   Slave

The failure of a slave is unproblematic. A client usually has plenty of other slaves to start reading from. If no slave is available, reads can even be forwarded to the master.

PHP
Load Balancer, e.g PECL/mysqlnd_ms
Read
Slave   Slave

All a PECL/mysqlnd_ms user has to do is check for an error after statement execution. If there’s one and the error code hints that the server has gone away, the user reruns the statement. The connection handle remains useable all the time. Upon rerun, PECL/mysqlnd_ms openes a new connection to another slave.

do {
  $res = $mysql->query("SELECT id, title FROM news");
} while (isset($connection_error_codes[$mysql->errno]));

if (!$res) {
  bail("SQL error", $mysql->errno, $mysql->error);
}

A master failure is much more problematic. There is no server to send a write to but the master. The master is a single point of failure. The new global transaction identifier help to reduce the time it takes to put a new master in place after a failure.

PHP
Load Balancer, e.g PECL/mysqlnd_ms
Write
Master

After a master failure some process needs to promote a former slave to the new master and, preferrable atomically, update all other slaves to start replicating from the new master. The below illustration is a bit confusing. It is intentionally. What happens during the simple to say “slave to master promotion” is a complete restructuring of the cluster.

Writes   Primary/master (gone)
Read Write Slave Master (promoted former slave)
Read Read Slave (no change)

Don’t forget to update the load balancer

After the cluster has been reorganized, the load balancer configurations must be updated. PECL/mysqlnd_ms happens to be a driver integrated load balancer. However, other than that, it is not different from a classical load balancer. Whatever process restructures the cluster it must take care of deploying the load balancer configurations afterwards.

Global transaction identifiers are a great help for the biggest part of the failover job – the server side. But, they are no swiss army knife. Don’t forget to update your load balancer configuration – the client side. No matter where it is. Whether it is part of the application code, driver integrated or you are using MySQL Proxy. As long as we are talking primary copy, a master failure will always be a major pain.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

2 Comments

  1. How can you mix a MyISAM operation into a transaction?

  2. Henrik,

    that is a very good question! My answer is by try, not by insight knowledge.

    First, you found me using the term “transaction” in an ambitious way. In the flow of my blog post I use it in the way a user would do. In the description of the limitation the view of an implementor has been taken.

    This is my master. Its build from source (Launchpad) couple of days ago.

    mysql> SELECT VERSION();
    +————–+
    | VERSION() |
    +————–+
    | 5.6.5-m8-log |
    +————–+

    I should be running the master using default settings with the exception of:

    gtid-mode=on
    log-slave-updates=true
    disable-gtid-unsafe-statements=true

    In autocommit mode, I create an InnoDB table and a MyISAM table:

    mysql> USE test;
    mysql> CREATE TABLE t_myisam(id INT) ENGINE=myisam;
    mysql> CREATE TABLE t_innodb(id INT) ENGINE=innodb;

    I now leave the autocommit mode, start a transaction and try to update the MyISAM table.

    mysql> START TRANSACTION;
    mysql> INSERT INTO t_myisam(id) VALUES (1);
    ERROR 1784 (HY000): Updates to non-transactional tables are forbidden when DISABLE_GTID_UNSAFE_STATEMENTS = 1.

    I have provoked the situation refered to the in the blog post. MySQL 5.6 bails at me to prevent me from breaking replication.

    Let’s recap what I’m doing: within a “transaction” I attempt an update to a non-transactional table. How can we talk about transactions if there are statements that cannot be rolled back? Well, think of DDL and stuff… – that’s life with MySQL.

    To give more examples it is not sufficient to disable disable-gtid-unsafe-statements=true . Let’s see what happens if I try…

    gtid-mode=on
    log-slave-updates=true
    disable-gtid-unsafe-statements=false

    … MySQL really wants to make sure I don’t fool myself. MySQL 5.6 won’t start.

    120313 16:54:19 [ERROR] –gtid-mode=ON or UPGRADE_STEP_1 requires –disable-gtid-unsafe-statements
    120313 16:54:19 [ERROR] Aborting

    To continue with the binary log issue, I have to turn off the GTID feature on the master. Both master and my slave will bail at me that I’m breaking replication. Who cares… I’m after the binary log entry, nothing else.

    I log in to MySQL 5.6 to run my transaction which contians non-transactional table updates.

    mysql> START TRANSACTION;
    Query OK, 0 rows affected (0.00 sec)

    mysql> INSERT INTO t_myisam(id) VALUES (1);
    Query OK, 1 row affected (0.00 sec)

    mysql> INSERT INTO t_innodb(id) VALUES (1);
    Query OK, 1 row affected (0.00 sec)

    mysql> COMMIT;
    Query OK, 0 rows affected (0.00 sec)

    MySQL is happy. Let’s see what my binary log says:

    #120313 17:04:11 server id 111 end_log_pos 192 Query thread_id=2 exec_time=0 error_code=0
    SET TIMESTAMP=1331654651/*!*/;
    SET @@session.pseudo_thread_id=2/*!*/;
    SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
    SET @@session.sql_mode=0/*!*/;
    SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
    /*!\C utf8 *//*!*/;
    SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
    SET @@session.lc_time_names=0/*!*/;
    SET @@session.collation_database=DEFAULT/*!*/;
    BEGIN
    /*!*/;
    # at 192
    #120313 17:04:11 server id 111 end_log_pos 297 Query thread_id=2 exec_time=0 error_code=0
    use test/*!*/;
    SET TIMESTAMP=1331654651/*!*/;
    INSERT INTO t_myisam(id) VALUES (1)
    /*!*/;
    # at 297
    #120313 17:04:11 server id 111 end_log_pos 373 Query thread_id=2 exec_time=0 error_code=0
    SET TIMESTAMP=1331654651/*!*/;
    COMMIT
    /*!*/;
    # at 373
    #120313 17:04:26 server id 111 end_log_pos 448 Query thread_id=2 exec_time=0 error_code=0
    SET TIMESTAMP=1331654666/*!*/;
    BEGIN
    /*!*/;
    # at 448
    #120313 17:04:26 server id 111 end_log_pos 553 Query thread_id=2 exec_time=0 error_code=0
    SET TIMESTAMP=1331654666/*!*/;
    INSERT INTO t_innodb(id) VALUES (1)
    /*!*/;
    # at 553
    #120313 17:04:28 server id 111 end_log_pos 580 Xid = 26
    COMMIT/*!*/;
    DELIMITER ;
    # End of log file

    Assume this would make it into a slaves relay log. Furthermore, assume this “transaction” had a GTID. Both are purely hypothetical assumptions. MySQL forbids me getting here. Recall this:

    mysql> START TRANSACTION;
    mysql> INSERT INTO t_myisam(id) VALUES (1);
    ERROR 1784 (HY000): Updates to non-transactional tables are forbidden when DISABLE_GTID_UNSAFE_STATEMENTS = 1.

    However, its about explaining why MySQL forbids it. From here on you really need to ask a MySQL Replication expert not me. This is what I see and how I understand the GTID feature. I can’t continue using MySQL 5.6 here because I need binlog_direct_non_transactional_updated=1 about which the MySQL 5.6 reference manual reads: “In MySQL 5.6, this variable has no effect when the binary log format is ROW or MIXED. (Bug #51291)”, http://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log.html#sysvar_binlog_direct_non_transactional_updates

    As all this is already purely hypothetical (you can’t get here unless you hack), I dare to show a binary log excerpt from MySQL 5.5.16 . I set:

    mysql> SET GLOBAL binlog_direct_non_transactional_updated=1;
    mysql> SET SESSION binlog_direct_non_transactional_updated=1;

    .oO( Better set both, SESSION should be sufficient )

    Then, I run my usual transaction which contains updates to transactional and non-transactional tables.

    mysql> START TRANSACTION;
    mysql> INSERT INTO t_myisam(id) VALUES (1);
    mysql> INSERT INTO t_innodb(id) VALUES (1);
    mysql> COMMIT;

    And, here you go with the binary log. This binary log is quite remarable. What looked like one transaction to the client has been recorded as two. The original transaction contained updates to non-transactional tables, so, rollback…

    # at 127609
    #120313 17:34:14 server id 1 end_log_pos 127677 Query thread_id=730 exec_time=0 error_code=0
    SET TIMESTAMP=1331656454/*!*/;
    BEGIN
    /*!*/;
    # at 127677
    #120313 17:34:14 server id 1 end_log_pos 127774 Query thread_id=730 exec_time=0 error_code=0
    SET TIMESTAMP=1331656454/*!*/;
    INSERT INTO t_myisam(id) VALUES(2)
    /*!*/;
    # at 127774
    #120313 17:34:14 server id 1 end_log_pos 127843 Query thread_id=730 exec_time=0 error_code=0
    SET TIMESTAMP=1331656454/*!*/;
    COMMIT
    /*!*/;
    # at 127843
    #120313 17:34:25 server id 1 end_log_pos 127911 Query thread_id=730 exec_time=0 error_code=0
    SET TIMESTAMP=1331656465/*!*/;
    BEGIN
    /*!*/;
    # at 127911
    #120313 17:34:23 server id 1 end_log_pos 128008 Query thread_id=730 exec_time=0 error_code=0
    SET TIMESTAMP=1331656463/*!*/;
    INSERT INTO t_innodb(id) VALUES(2)
    /*!*/;
    # at 128008
    #120313 17:34:25 server id 1 end_log_pos 128035 Xid = 3588
    COMMIT/*!*/;
    DELIMITER ;
    # End of log file

    After all, what do we see here? We see that MySQL 5.6 has some safe guards that prevent you from fooling yourself if using GTIDs and t(U[InnoDB], U[MyISAM], U[X]) becomes t(U[MyISAM]), t(U[InnoDB], U[X]) you end up with two transactions using the same GTID…

    Before you ask about reordering:

    mysql> START TRANSACTION;
    mysql> INSERT INTO t_innodb(id) VALUES (3);
    mysql> INSERT INTO t_myisam(id) VALUES (2);
    mysql> INSERT INTO t_innodb(id) VALUES (4);
    mysql> COMMIT;

    Here’s the binary log:

    # at 128103
    #120313 17:56:18 server id 1 end_log_pos 128200 Query thread_id=732 exec_time=0 error_code=0
    SET TIMESTAMP=1331657778/*!*/;
    INSERT INTO t_myisam(id) VALUES(2)
    /*!*/;
    # at 128200
    #120313 17:56:18 server id 1 end_log_pos 128269 Query thread_id=732 exec_time=0 error_code=0
    SET TIMESTAMP=1331657778/*!*/;
    COMMIT
    /*!*/;
    # at 128269
    #120313 17:56:27 server id 1 end_log_pos 128337 Query thread_id=732 exec_time=0 error_code=0
    SET TIMESTAMP=1331657787/*!*/;
    BEGIN
    /*!*/;
    # at 128337
    #120313 17:56:12 server id 1 end_log_pos 128434 Query thread_id=732 exec_time=0 error_code=0
    SET TIMESTAMP=1331657772/*!*/;
    INSERT INTO t_innodb(id) VALUES(3)
    /*!*/;
    # at 128434
    #120313 17:56:24 server id 1 end_log_pos 128531 Query thread_id=732 exec_time=0 error_code=0
    SET TIMESTAMP=1331657784/*!*/;
    INSERT INTO t_innodb(id) VALUES(4)
    /*!*/;
    # at 128531
    #120313 17:56:27 server id 1 end_log_pos 128558 Xid = 3600
    COMMIT/*!*/;
    DELIMITER ;
    # End of log file

    Or, in short:

    INSERT INTO t_myisam(id) VALUES(2)
    INSERT INTO t_innodb(id) VALUES(3)
    INSERT INTO t_innodb(id) VALUES(4)

    … happy hacking seems to be an appropriate greeting!