Wonders of Global Transaction ID injection

2011/05/20 by admin

SQL injection is wonderful! MySQL Proxy can do it, mysqlnd plugins – even written in PHP (not Lua or C) – can do it. Global Transaction IDs are wonderful. A mashup of the PHP replication plugin and global transaction ID injection, makes your replication cluster fail-over much smoother and opens up an opportunity for an API to support consistent reads from slaves "immediately" after a write. Less hassle identifying and promoting a new master for fail-over, even better read load balancing – my last proposal for the future of the PHP replication plugin.

What?

Think of a global transaction ID as a unique identifier for a change set in a database cluster. Replicas in the cluster use the global transaction ID to track changes. Because global transaction IDs are unique cluster-wide, you can easily compare the replication progress among the replicas, in particular, if the global transaction ID contains a sequence number.

MySQL replication does not use cluster-wide global transaction IDs. Change sets are identified by a log file name and a log file position. Name and position are local to a machine. The tuple (file=001, position=1272) on slave A does not necessarily address the same change set from master B as the tuple (file=001, position=1272) on slave C also replicating from master B.

Use case: server fail-over

Let’s say you have MySQL replication cluster like A -> B -> C. A is the master. B is a first level slave, C is a second level slave. You write some data on the master. This creates a logical change set. The master records the change set in (file=001, position=100). B replicates it and puts the change set in (file=004, position=272). C reads the change set from B and remembers the position (file=004, position=272) to continue reading. B fails and you want C to continue reading changes from A. How do you find that B needs to continue reading from A at (file=001, position=100) ? It is not always an easy task…

One change set (INSERT), three identifiers (file, position)
Server A, master
file=001	position=100	INSERT …
Server B, first level slave
file=004	position=272	INSERT …
Server C, second level slave
file=002	position=161	INSERT …

Things would be much easier if there was a global transaction ID. The master (server A) would record a global_trx_id=18271 together with the change set. B would leave it untouched and add the global_trx_id=18271 to its change log. C would read the change set 18271 and remember the position. If B fails, C could continue reading changes from A immediately. C would know that it has to continue at 18271. This is the idea behind WorkLog #3584.

One change set (INSERT), one global transaction ID
Server A, master
file=001	position=100	global_trx_id=18271	INSERT INTO …
Server B, first level slave
file=004	position=272	global_trx_id=18271	INSERT …
Server C, second level slave
file=002	position=161	global_trx_id=18271	INSERT …

Jan shows in MySQL Proxy and a Global Transaction ID a wonderful, client-side, SQL injection based hack to maintain a global transaction ID. The PHP mysqlnd replication and load balancing plugin could learn to do the very same for you.

Create a MEMORY table which is replicated with a single UNSIGNED BIGINT and increment it at the end of each transaction.

CREATE TABLE trx (
trx_id BIGINT UNSIGNED NOT NULL
) ENGINE=memory;

INSERT INTO trx VALUES ( 0 );

When ever you commit a transaction UPDATE the trx_id field:

UPDATE trx SET trx_id = trx_id + 1

From: http://jan.kneschke.de/projects/mysql/mysql-proxy-and-a-global-transaction-id/

Dear MySQL database administrator, please, drop us a note, if you want this to be implemented in the PHP plugin! And now, to you, dear PHP application developer.

Use-case: consistent read from slaves

MySQL replication is asynchronous. Slaves can lag behind masters, slaves may not have replicated all writes performed on the masters. A read from a slave is eventual consistent. If you need consistent reads, you usually have to read from the master. The mysqlnd plugin has already a neat configuration option (master_on_write) to automatically read only from the master after the first write . However, this adds read load to the master and MySQL replication is about read scale out …

By help of a global transaction ID the client can even try to do a consistent read from (asynchronous) slaves after a write to the master. If the client knows the global transaction ID of the write, it can identify a slave for reading which has already replicated the write.


/* write to master */
$link->query("INSERT ...");
$global_trx_id = mysqlnd_ms_get_last_global_transaction_id();
  
/* 
  read from a slave which has replicated global transaction ID or, 
  if no slave found, use master for consistent read
*/
mysqlnd_ms_set_service_level($link, MIN_GLOBAL_TRX_ID, $global_trx_id);
$link->query("SELECT ...");
$link->query("SELECT ...");

Whether the plugins API should be build on top of the idea of a service level (pseudo-code above), the plugin should search a consistent slave before issuing the read or it is left to the application to try an optimistic read, is not nailed in stone. Blog comments are as welcome as are contributions to the wiki page https://wiki.php.net/pecl/mysqlnd_ms?&#raw_bin_ideas_rfcs…

Issues

Client-side SQL insertion is a hack. In an heterogenous environment, with clients of different kind, it is all to easy to forget to insert the necessary SQL.

	PHP 5.3+	PHP 5.2	MySQL promt
Global trx id insertion?	Automatic, Plugin	Manual	Manual

Which SQL exactly? Good question. Jan is giving an example with a MEMORY table in his blog post. A perfect choice to demo the idea, a perfect choice for a blog post! But good enough for production use? Giuseppe proposes hacking the hack – both the hack and his hack of the hack use different SQL than Jan does.

Choose yourself. My thinking is that the plugin should – at best – propose a generic approach in the documentation but otherwise provide hooks to let the application insert whatever SQL is best for the application.

End of brainstorming

All of the above is brainstorming. No promise it will ever materialize.

I don’t have any more "big" ideas or keywords in my notes for the PHP replication plugin. Its time for me to go back to the source editor for checking what Andrey did to prepare support for partitioned replication…