PECL/mysqlnd_ms: how failed XA transactions get fixed

2014/07/29 by admin | 0 comments

XA transactions are an open standard for distributed transactions. A distributed or global transaction can spawn multiple databases. XA transactions come handy, for example, when data sets are sharded over multiple servers and a business transaction affects multiple shards. The free MySQL Fabric tools help to setup and manage a sharded MySQL cluster. The development version of PECL/mysqlnd_ms 1.6 helps with XA transactions. It abstracts SQL details and acts as a transaction manager. The PHP MySQL driver kicks in when things go wrong and XA transactions show their nasty side: blocked servers. Good news: this is a rare case. Bad news: a deep dive below.

Grant all Ulf’s a 10% discount (or none of them = use a transaction)
MySQL shard A: EMEA customers	MySQL shard B: US customers	MySQL shard C: APAC customers

/* Note: Fabric and XA don't go together - yet... it's coming! */ $link = new mysqli("mysql_sharding_cluster", "user", "password"); mysqlnd_ms_xa_begin($link, 1); mysqlnd_ms_fabric_select_shard($link, "shop.customer", "EMEA"); $link->query("UPDATE discount = 10 WHERE first_name = 'Ulf'); mysqlnd_ms_fabric_select_shard($link, "shop.customer", "US"); $link->query("UPDATE discount = 10 WHERE first_name = 'Ulf'); mysqlnd_ms_fabric_select_shard($link, "shop.customer", "APAC"); $link->query("UPDATE discount = 10 WHERE first_name = 'Ulf'); mysqlnd_ms_xa_commit($link, 1);

Grant all Ulf’s a 10% discount (or none of them = use a transaction)

MySQL shard A: EMEA customers MySQL shard B: US customers MySQL shard C: APAC customers


/* Note: Fabric and XA don't go together - yet... it's coming! */
$link = 
new mysqli("mysql_sharding_cluster", "user", "password");

mysqlnd_ms_xa_begin($link, 1);

mysqlnd_ms_fabric_select_shard($link, "shop.customer", "EMEA");
$link->query("UPDATE discount = 10 WHERE first_name = 'Ulf');

mysqlnd_ms_fabric_select_shard($link, "shop.customer", "US");
$link->query("UPDATE discount = 10 WHERE first_name = 'Ulf');

mysqlnd_ms_fabric_select_shard($link, "shop.customer", "APAC");
$link->query("UPDATE discount = 10 WHERE first_name = 'Ulf');

mysqlnd_ms_xa_commit($link, 1);

PECL/mysqlnd_ms as a transaction manager

XA transactions use the two-phase commit protocol, which is a blocking protocol. Please, see also my previous blog post on the nature of the protocol and MySQL implementation limitation. If the client that drives the XA transaction, your PHP script, crashes at a certain point, some XA participants (MySQL servers) cannot make any progress. In the worst case, they end up waiting for a decision on the global transactions outcome endlessly. No, there is no timeout. As they wait, they block resources. That can be memory used for the transaction or some lock on some table.

Blocked during the second phase of the 2PC/XA protocol
PHP (coordinator)	MySQL (participant)	MySQL	MySQL
…
–> Global commit
Crash
	Comitted	Uncomitted: waiting for global commit or rollback

Any serious user of XA transactions will therefore have to implement some mechanism that ensures progress in case of crashes. After a crash, it must be possible to learn which participant is blocked, connect to the participant and tell it to either commit or roll back the open transaction. This housekeeping job is rather annoying, yet important. PECL/mysqlnd_ms can do it for you, it can act as a transaction manager. (On an aside: the academic world does distinguish between a transaction manager and coordinator. I am using the terms interchangeably here.)

MySQL as a state store to track XA transactions

Upon request, PECL/mysqlnd_ms can record the state of each global transaction in a MySQL database. Should your PHP script (and with it PECL/mysqlnd_ms) crash or be interrupted in another way with an XA transaction being unfinished, then the next PHP script that runs can check the database and “garbage collect” the unfinished global transaction. The “next” PHP script could be run on the same server or another one, as long as all servers use the same MySQL database to track XA transactions, the garbage collection will work flawless.

The illustration below shows a scenario with two PHP servers that use PECL/mysqlnd_ms. A script on one of the servers runs and begins a distributed (XA) transactions that involves two MySQL servers M1, M2. As you can see, a simple sequence of mysqlnd_ms_xa_begin(); ... ; mysqli_query(); mysqli_query() causes a lot of background activity inside PECL/mysqlnd_ms.

Tracking XA transactions for “garbage collection” to avoid blocking situations
PHP server A		PHP server B
Your code	Inside PECL/mysqlnd_ms	PHP server B
mysqlnd_ms_xa_begin(id = 1)	ensure there is no open transaction optional: record new XA trx in state store
… /* select some server M1 */ …
mysqli_query(query)	optional: if new participant, record participant in state store inject `XA BEGIN` on server M1 update participant state to XA BEGIN sent optional: record new state in state store execute `query` on server M1
… /* select some server M2 */ …
mysqli_query(query)	optional: if new participant, record participant in state store inject `XA BEGIN` on server M2 update participant state to XA BEGIN sent optional: record new state in state store execute `query` on server M2

PECL/mysqlnd_ms does all the things that you would have to do in your script manually otherwise. It issues the appropriate SQL commands for XA transactions on the participating MySQL servers. And, optionally, it can record the participants and their state in a state store.

PHP server A		PHP server B
Your code	Inside PECL/mysqlnd_ms	PHP server B
mysqlnd_ms_xa_commit(1)	optional: record intent to commit in state store inject `XA END` on server M1 optional: record new state in state store inject `XA PREPARE` on server M1 optional: record new state in state store inject `XA COMMIT` on server M1 optional: record new state in state store inject `XA END` on server M2 optional: record new state in state store inject `XA PREPARE` on server M2 optional: record new state in state store unexpected crash

In case of an interruption, it is possible to reconstruct the state of the XA transaction based on the records from the state store. Using a state store is optional, however, if you don’t configure PECL/mysqlnd_ms to use one, it is your responsibility to ensure that no MySQL server ever gets blocked. Please, note how often the state store gets queried. Using a state store adds quite some work to the system but it is necessary to play safe.

PHP server A	PHP server B
PHP server A	Your code	Inside PECL/mysqlnd_ms
	Script ends	Rollback open XA transactions, if any and `rollback_on_close` is set (default)
	PHP internal shutdown: RSHUTDOWN	Decide whether to do background garbage collection based on `probability` setting Search for up to `max_transactions_per_run` unfinished XA transactions in state store Do at most `max_retries` GC runs attempting to fix a certain failed XA transaction

Issues that require no garbage collection

Most errors around XA transactions do not require special garbage collection. Simple cases are handled immediately by PECL/mysqlnd_ms and end up with the transaction being rolled back. Errors during the first phase of the XA two phase commit protocol are uncritical. No matter whether the error is caused by a server crash, a client/PHP crash, the client loosing its connection to one of the MySQL servers participating in the global transaction, or a participant reports a simple SQL error, the XA transaction will be properly rolled back.

Participants that got disconnected or recover from a crash forget about XA transactions that have been in their first phase at the time of the error. Assume, you have started an XA transaction in which three MySQL servers A, B, and C participate. During mysqlnd_ms_xa_commit(), PECL/mysqlnd_ms steps the participating servers through the commit protocol and issues the SQL commands: XA END (XA_ACTIVE state), XA PREPARE (XA_PREPARED state), XA COMMIT (XA_COMMIT state). For each step, the list of participants is iterated.

Uncritical, first phase errors during mysqlnd_ms_commit()
MySQL A	MySQL B	MySQL C	PHP using PECL/mysqlnd_ms
XA_IDLE			`mysqli_query(<some query>)` Pick appropriate server for `some query/code>` Inject XA BEGIN on server Run `some query` on server
XA_IDLE	XA_IDLE	XA_IDLE	Assorted queries run on all participants
XA_ACTIVE	XA_ACTIVE	XA_IDLE (XA END failed)	`mysqlnd_ms_xa_commit()` On all participants: change participant state from XA_IDLE to XA_ACTIVE (= run `XA END`) If error, try rollback on all participants, including failed one Reason of the error on participant 3 (or during rollback on other participants… Client crashed or got disconnected: safe to ignore, server has dropped XA transaction Server crashed: safe to ignore, XA transaction has been dropped (2PC: first phase) SQL error: rollback performed

Let there be an error with some participant around XA END. Upon the error, the plugin rolls back the global transactions. Effectively, it continues stepping the servers through the states but this time with the goal to send XA ROLLBACK. No matter what made XA END fail in the first place, the overall issue will most likely be solved. Should PHP have crashed, during XA END, all the servers notice it and drop the XA transaction. Should PHP have lost its connection to a server, it is the same: the server drops the XA transaction. Should one MySQL have gone away, it will drop the XA transaction upon recovery. Because PHP is still working, PECL/mysqlnd_ms tells the other servers to do the same and issues a rollback.

Whatever goes wrong prior to any server reaching XA_PREPARED (= second phase of the two phase protocol), a rollback happens and no garbage collection is required. You are safe even without configuring a state store for garbage collection!

When garbage collection becomes necessary

Any troubles that PECL/mysqlnd_ms has committing a global transaction after any participant entered the XA_PREPARED state, may require garbage collection. But, there are sill cases that can be ignored and require no attention.

Assume the below case where the global transaction has been prepared on servers A and B but XA PREPARE fails on the third participant, server C. Should PHP crash, it’s safe to ignore the case. A and B will see the client disconnect and rollback the transaction, so does C.

Should PHP have lost a connection to any of the servers, the disconnected MySQL server forgets about the XA transaction, even if it is in XA_PREPARED state. And, any participant that can still be reached will receive the rollback command from PECL/mysqlnd_ms.

Second phase errors during mysqlnd_ms_commit()

MySQL A MySQL B MySQL C PHP using PECL/mysqlnd_ms

XA_PREPARED

XA_ACTIVE (XA PREPARE failed)

mysqlnd_ms_xa_commit()

On all participants: change participant state from XA_IDLE to XA_ACTIVE, then to XA_PREPARED
If error, try rollback on all participants, including failed one
Reason of the error on participant 3 (or during rollback on other participants…)
- Client crashed or got disconnected: safe to ignore, server has dropped XA transaction
- SQL error (but no server crash): rollback performed

However, should a participating server crash when it is in XA_PREPARED state or failed to switch to XA_COMMIT after the global transaction has been committed on any of the other participants, PECL/mysqlnd_ms leaves the issue to be solved by the garbage collection. The plugin will neither wait until a crashed server possibly returns nor will it retry a failed XA COMMIT command.

A case during mysqlnd_ms_commit() that needs garbage collection

MySQL A MySQL B MySQL C PHP using PECL/mysqlnd_ms

XA_COMMIT

XA_PREPARED

mysqlnd_ms_xa_commit()

…
On all participants: change state to XA_PREPARED (success)
On all participants: change state to XA_COMMIT (fails on C either due to SQL error or server crashed)
If error (as its the case on C), wait for garbage collection

Please note, that by default the plugin takes no immediate action. It leaves you with a yet to be completed global transaction. In the example, the failed transaction would already be committed on participants A and B but C is blocked waiting for a decision. Should the MySQL server C be online, it can, of course, accept clients and run transactions but all resources associated with the failed transaction are still held and blocked.

Transactional databases are designed with the assumption that transactions rarely fail. To get into a situation with a blocked server (above example), a rare transaction failure must happen. And, it must happen at a certain point. Means, the example is a rare case. If you are still concerned about this rare case and want to see it solved immediately – if anyhow possible – do not wait for automatic background solution but invoke the garbage collection manually.


[...]
if (!mysqlnd_ms_xa_commit($link, $xa_id)) {
  /* optional: trigger GC immediately */
  mysqlnd_ms_xa_gc($link, $xa_id);
}

A MySQL limitation: replication troubles

MySQL has some XA restrictions but one really hurts. Assume the example with three participants A, B and C. MySQL servers A and B have committed a XA transaction, C crashes in XA_PREPARED state. Upon recovery, the PECL/mysqlnd_ms garbage collection will commit the XA transaction on C because the transaction is already partially committed on A and B.

Replication troubles
MySQL A	MySQL B	MySQL C
XA_COMMIT	XA_COMMIT	XA_PREPARED
Changes in binary log	Changes in binary log	Crash
		Recovery
		PECL/mysqlnd_ms GC: XA_COMMIT
		Changes not in binary log
\|	\|	\|
Slave of A gets changes	Slave of B gets changes	Slave of C looses changes

Trouble is, C does not log the changes made by the XA transaction in the binary log. A slave of C will not receive the updates. The master and the slave will go out of sync. With MySQL Fabric and sharding being a prime use case for XA transactions this is a severe server limitation. MySQL Fabric makes heavy use of MySQL Replication to achieve high availability of shards.

To play really safe: use XA only for read only transactions

The lost change set problem leads to the recommendation to use XA transactions for read-only operations only, if you want to be 100% safe. Should you not be using MySQL Replication, write operations are safe too.

Configuring a state store for garbage collection

For PECL/mysqlnd_ms to be able to solve failed XA transactions that cannot be simply rolled back immediately, you must configure a state store. Version 1.6 of the plugin supports only one state store: MySQL. The code has been layed out to allow for other state store, but hey, why look further? MySQL is readily available (otherwise you would not use a MySQL driver…) and it will not loose data during a crash.


{
  "myapp": {
    "xa": {
      "rollback_on_close": 1,
      "state_store": {
        "participant_localhost_ip": "192.168.7.13",
        "mysql": {
          "host": "192.168.7.12",
          "user": "root",
          "password": "",
          "db": "test",
          "port": "3312",
          "socket": null
        }
      }
    },
    [...]
  }
}

Configuring the state store in the plugins configuration file is straight forward, see above. Note the participant_localhost_ip setting. As explained, the state store will hold records of all participants. If a participant is to be recorded that has a host name of 127.0.0.1, then 127.0.0.1 has a special meaning to the host that writes the record. Should, later on, garbage collection be required and the garbage collection be run on a different host than the one who wrote the 127.0.0.1 record, the garbage collection needs to know the actual IP associated with the original 127.0.0.1.


{
  "myapp": {
    "xa": {
      "rollback_on_close": 1,
      "state_store": {
        "participant_localhost_ip": "192.168.7.13",
        "mysql": {
          "host": "192.168.7.12",
          "user": "root",
          "password": "",
          "db": "test",
          "port": "3312",
          "socket": null
        }
      },
      "garbage_collection": {
        "max_retries": 3,
        "probability": 0,
        "max_transactions_per_run": 100
      }
    },
    [...]
  }
}

With a state store configured, you can also change the defaults for the background garbage collection. The probability setting determines how often the garbage collection will be called during RSHUTDOWN. Allowed values range from 0...1000. During RSHUTDOWN, the plugin calcualtes a random value between 1...1000. If the probability setting is higher or equal to the random value, the background garbage collection will start. max_retries and max_transactions_per_run have been explained above and are also listed in the manual.

Testers wanted

The XA related code has been lab tested, however, I do not plan to call it anything more than beta prior to hearing back from actual users. As you saw, the garbage collection is a rather tricky piece of code. It should work, give it a try… and let us know whether it works.

Happy hacking!

@Ulf_Wendel

Ulf Wendel