Ulf Wendel

PECL/mysqlnd_ms: summer time, (connection) pool time – Fabric support internals

The MySQL replication and load balancing plugin for PHP, PECL/mysqlnd_ms, aims to make using a cluster of MySQL servers instead of a single server as transparent as possible. Should you ever want to migrate your PHP app from a single MySQL server to any kind of MySQL cluster, install PECL/mysqlnd_ms and see how much breaks. A simple app, might need no code changes at all. Your APIs connect call remains unchanged, your query execution calls remain the same: one connection handle transparently switched to an appropriate cluster node (read-write splitting, weighted load balancing, …).

/* Or, use PDO_MySQL... One user handle for an entire cluster */
$link = new mysqli("mycluster", ...);  
/* Goes automatically to a slave, if read-write splitting is enabled */
$link->query("SELECT ...");
/* All actual connections to master or slave will use UTF8 from now on */
$link->set_charset("utf8");
/* With read-write splitting on, goes to master */
$link->query("INSERT ...");

Tell PECL/mysqlnd_ms that connect("mycluster", ...) means you want to connect to a cluster “mycluster” which has, for example, is a MySQL Replication cluster with master A and slaves B, C. Call $link->select_db("mydb"), $link->autocommit(false), $link->set_charset("utf8"), … and no matter to which node (A, B, C) you actually get automatically connected, the connection has those settings! Unless, you are using the sharding feature of our current, still very limited MySQL Fabric support… A new internal connection pool shall cure this and more. A tour through PECL/mysqlnd_ms internals to follow.

PECL/mysqlnd_ms example configuration for a simple MySQL Replication setup

{
    "myapp": {
        "master": {
            "master_A": {
                "host": "localhost",
                "socket": "/tmp/mysql.sock"
            }
        },
        "slave": {
            "slave_B": {
                "host": "192.168.2.27",
                "port": "3306"
            },
            "slave_C": {
                "host": "192.168.2.28",
                "port": "3306"
            }
        }
    }
}

How a single user handle stands for many actual connections

No matter whether you use PECL/mysqlnd_ms for a simple MySQL Replication cluster or a farm of MySQL servers that shard data one or the other way, it always confronts you with a massive change: a single user conection handle is mapped internally to many actual connections.

PHP script (*.phpt) Inside PHP (*.c)

$link = 
  mysqli_connect("mycluster", ...);

Object representing $link
List of masters Actual connection to master_A
List of slaves Actual connection to slave_B
Actual connection to slave_C
List of filters Load Balancer: Random

Furthermore, to keep the number of actual connections low, the PECL/mysqlnd_ms plugin for mysqlnd, will – by default – use lazy connections. A lazy connection to a node is not established before a query is to be run on the node. Assume you do $link = mysqli_connect("mycluster"); $link->set_charset("UTF8");. What happens inside PECL/mysqlnd_ms is very different from the normal case. Normally, without the plugin, $link would be one actual, established connection to a MySQL server and the server would immediately reply to your set_charset() command.

PHP script (*.phpt) Inside PHP (*.c)

$link->set_charset("utf8")

Loop over all actual master and slaves connections:

  • If lazy: recall setting for later connect()
  • If established: execute on server

With the plugin, we take the command and dispatch it to all masters and slaves. At this point the internal master and slave lists may hold already established or lazy connections. If its an established connection, the command is executed on the server. If its a lazy connection, the setting is remembered but the command execution itself is delayed until the connection is established. This is what I like to call semi-transparent. However, because we delay command execution in case of lazy connections, errors may also be delayed until the connection is established – be aware.

Still, this is about as transparent and comfy as things can get, if there is such a thing as a connection state (note that I’m ignoring the SQL side). An alternative had been to disable features and strip down connections to something without much of a state, however, different story…

MySQL Fabric: the auto-everything clustering vision

Probably a decade after we should have had a solution ready, PECL/mysqlnd_ms was ready. But MySQL Replication was lacking badly behind any of the better NoSQL clustering solutions. Automatic master failover? Considered a user task. Have clients discover the new master automatically? Unthinkable! Sharding for MySQL Replication? Considered a user task. MySQL Proxy? Still alpha. Sure, after all the years their had been third-party solutions for all this but the MySQL offering became less attractive.

MySQL Fabric is an administration tool for farms of MySQL servers. It is for the mass of the “all I need is a backup server…”-users that want nothing but automatic failover for MySQL Replication. And, it is for handful of “we host several thousand MySQL servers”-experts that go crazy and really need sharding.

MySQL Fabric – monitoring, administration -> Any MySQL Replication cluster
| Master (fails)
asks Fabric about nodes, e.g. master Slave Slave
|
^- Clients, e.g. PHP using PECL/mysqlnd_ms

The pure existance of an official administration tool is nice, but without a way for clients to ask about the current structure of the cluster it had been only yet another tool. The communication channel between clients and Fabric is what adds value. For example, should a master fail, Fabric promotes a new one – like any other tool. Then, the clients need to learn about the new master. This can finally be automated, be done out-of-the box: the clients simply ask Fabric for the new master.

The story is similar with sharding. You tell the driver (here: PECL/mysqlnd_ms) which shard key you are going to use, the driver asks Fabric for a list of appropriate shards. Usually, Fabric will reply with not only one shard but return a master and a couple of slaves because you want HA for your shards and use a MySQL Replication cluster to each and every shard. Then, PECL/mysqlnd_ms picks a shard for you.

$link = new mysqli("myfabric", ...);
mysqlnd_ms_fabric_select_shard($link, $table, $shard_key);
$link->query("SELECT ...");
$link->query("INSERT ...");

PECL/mysqlnd_ms matches all Fabric requirements but implementations clash

On the first look, this seemed peanuts to support. Johannes added the code for talking to Fabric and replaced the internal server lists. This has been our all idea. But, simple testing quickly revealed we had been wrong.

PHP script (*.phpt) Inside PHP (*.c)

mysqlnd_ms_fabric_select_shard(...);

Object representing $link
List of masters Load from Fabric
List of slaves Load from Fabric

When we have a sequence like $link->set_charset("utf8"); mysqlnd_ms_fabric_select_shard($link, ...), then the charset setting is forgotten. The charset was set on the actual connections. But, we have loaded a new server list from Fabric and with it we got new connections.

PHP script (*.phpt) Inside PHP (*.c)

mysqlnd_ms_fabric_select_shard($link, ...);

  • Load master list from Fabric
  • Load slave list from Fabric

$link->set_charset("utf8");

  • Set all master connections to utf8
  • Set all slave connections to utf8

mysqlnd_ms_fabric_select_shard($link, ...);

  • Load server lists from Fabric
  • utf8 charset setting lost

Furthermore, there was a problem with our load balancing filters. The load balancing filters (random, round-robin) support assigning a weight to a server. The higher the weight, the more requests we dispatch to the server at average. You can use this to optimize for proximity/latency or to assign load dependending on the size of your machines. For the weightes load balancing, the filters create cached information during startup phase based on the initial server lists. It is assumed that the server lists will not change until the script ends. However, Fabric does exactly that: replace server lists during runtime. Hence, this was broken too. The breakage is of little relevance as you would never combine the classic weighted load balancing with Fabric but still.

Last but not least, there is our new distriuted/XA transaction support. Imagine you need a transaction that spawns two shards. You start it on one shard and PECL/mysqlnd_ms does all the XA related SQL magic for you in the background. Then, you switch to another shard. All the connections get swapped out including the one on which you started the XA transaction. Upon swapping out the connections, they get closed and MySQL aborts the distributed transaction…

PHP script (*.phpt) Inside PHP (*.c)

mysqlnd_ms_xa_begin($link, ...);
$link->query(...);

  • Begin transaction on shard/server A

mysqlnd_ms_fabric_select_shard($link, ...);

  • Load server lists from Fabric
  • Connection to A closed, transaction aborted

SOLVED, well, on the way to be

There is now an internal connection pool abstraction layer that solves all these problems. Connections are added to the pool and removed from it through API calls. There is some reference counting that will prevent a connection from being close while it is still needed for an ongoing distributed transaction.

The pool gets to know all the API commands that have been dispatched to connections to align their state. At the time of writing we monitor change_user(), select_db(), set_charset(), set_server_option(), set_client_option(), set_autocommit and ssl_set(). When you switch from one shard to another using mysqlnd_ms_fabric_select_shard(), the pool can replay the commands. The knowledge about the last charset choosen is no longer bound to the connections. The pool knows the charset and can align the state. When done, the pool can inform interested parties, such as a filter, about the new list of active connections.

PHP script (*.phpt) Inside PHP (*.c)

$link->set_charset("utf8");

  • Inform the connection pool charset=utf8
  • Set all connections to utf8

mysqlnd_ms_fabric_select_shard($link, ...);

  • Load server lists from Fabric
  • Tell the pool to replace active connection list
  • Keep connections ope which are still referenced, e.g. for XA transactions
  • Ask the pool to replay setting, e.g. charset=utf8
  • Tell all listeners, e.g. filter about new connection list

In theory, we should now be able to start supporting Fabric properly in PECL/mysqlnd_ms. But, some details are still open and not finished. For example, the order in which the pool replays state alignment commands when the connection lists are replaced, is somewhat random. This is likely to cause issues when settings depend on each other and order of execution. In the worst case, we may be forced to offer a user API call to allow setting an order and burdening the developer with this task.

Looking forward: reduce connection overhead

A positive side effect of having a pool abstraction is that we may be able to reduce connection overhead. Assume, your application is constantly switching between two shards A and B. We load the server lists for A and B exactly once from Fabric, then (in theory, yet) we cache them. When you query shard A, we open at least one connection. If the shard itself is a MySQL Replication cluster, we may have to open one connection for reading and one for writing. The next minute you switch to shard group B. So far, we have closed the connections to shard group A. Assume you now go back to shard group A. We close the connections to B and reopen some to A… what a waste.

Shard can be MySQL Replication clusters for HA
Shard group A   Shard group B
Master   Standalone/Master
Slave Slave  

With the pool abstraction in place, we put all the connections we ever open in a pool, when you first use a shard group. Consider the case again where your application switches from shard group A to B and back to A. When you switch from one shard to another, we don’t necessarily close the current connections. We mark them inactive. Then, we check whether we already have connections to the new shard open. If so, we mark them active and use them. Otherwise, we open new connections. In a sequence of switching from A to B and back to A, the last switch will – likely, depending on load balancing rules – boil down to marking already open connections as active.

New internal connection pool
All connection list
Active (= currently used for load balancing)
Shard group A server Master connection
Shard group A server Slave connection
Inactive (= currently not used)
Shard group B server Connection to B

No pool management/administration possible yet

The new connection pool does not break any existing functionality. No surprise, because the classic functionality of PECL/mysqlnd_ms does not really need and use the new pool. It is the Fabric support for which we need the pool. And, the Fabric support by PECL/mysqlnd_ms is not much tested. Until this is testing is done, the implementation has been completed and we trust the new pool code, you should consider it internal. Because it is internal, there is no way for you to manage or administrate the pool. Not yet.

There are some new statistics that let you monitor pool activity only.

Pool monitoring statistics
pool_masters_total Number of master servers (connections) in the internal connection pool.
pool_slaves_total Number of slave servers (connections) in the internal connection pool.
pool_masters_active Number of master servers (connections) from the internal connection pool which are currently used for picking a connection.
pool_slaves_active Number of slave servers (connections) from the internal connection pool which are currently used for picking a connection.
pool_updates How often the active connection list has been replaced and a new set of master and slave servers had been installed.
pool_master_reactivated How often a master connection has been reused after being
flushed from the active list.
pool_slave_reactivated How often a slave connection has been reused after being
flushed from the active list.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

Comments are closed.