Ulf Wendel

Global transaction ID support for PECL/mysqlnd_ms

The catchy theme/motto of the PECL/mysqlnd_ms 1.2 release will be Global Transaction ID support. Hidden behind the buzzword are two features. We will allow users to request a certain level of service from the replication cluster (keyword: consistency) and we will do basic global transaction ID injection to help with master failover. Failover refers to the procedure of electing a new master in case of a master failure.

Global Transaction ID support is the 1.2 motto/theme

The two features are somewhat related, thus the theme. In very basic words, the idea of a global transaction ID is to have a sequential number in a table on the master. Whenever a client inserts data, the ID/counter gets incremented. The table is replicated to the slaves. If the master fails, the database administrator checks the slaves to find the one with the hightest global transaction ID. Please find details, for example, in Wonders of Global Transaction ID injection.
What the plugin will do is inject a user-provided SQL statement with every transaction to increment the global transaction counter.

However, there is also a client-side benefit to global transactions IDs. If you want to read-your-writes from a replication cluster, you usually query the master. You won’t go to the slaves, because you do not know if they have replicated your writes already. In case you need read-your-writes, set the master_on_write config setting in version 1.1. In version 1.2 we can offer more, if you want and need it. We can search for a slave who has replicated the global transaction ID of your write to reduce the read-your-write load on the master. The keyword here is consistency and the background posting is Consistency, cloud and the PHP mysqlnd replication plugin. However, consistency is not nearly as nice as a motto as the catchy global transaction ID theme.

Of course, the day the MySQL Server has built-in Global Transaction IDs, we don’t need to do the injection any more. Meanwhile, we give it a try… a report from the hacks of the past two days. Feedback is most welcome.

Warning: this now becomes a posting for hackers, not users. If you are not after implementation details, stop reading. The big news is the theme, nothing else. If you don’t trust any software you have not developed yourself but you like the idea of a replication and load balancing plugin, continue reading.

First try: injection

… our first attempt on global transaction ID injection is straigt forward. By default, injection is done only for queries that go to the master. By default, all PHP MySQL APIs use auto commit. In the most basic case we just inject SQL before the query from the user. Doing it first avoids hassle, if the users statement returns a result set. Injecting before the users statement also means, we increment regardless of the success of the users statement.

$mysqli->query("SELECT 1");
$mysqli->query("INSERT INTO test(id) VALUES (1)");


SELECT -> 
  slave -> 
    auto commit on -> 
      query(SELECT)

INSERT -> 
   master -> 
     auto commit on -> 
        query(INJECTED), query(INSERT);


Optionally, we allow doing the injection on slaves as well. It can be configured if errors caused by injected SQL are ignored or reported, e.g. if the global transaction ID sequence table is unavailable.

If not in auto commit mode, we do the injection when the user invokes the user APIs commit() function. This is possible as of PHP 5.4. We do not monitor all statements to catch query(COMMIT) calls. Same constraints as for 1.1′s trx_stickiness config setting.

Andrey, the king of mysqlnd, proposed to consider query(INSERT), ..., query(INJECTED). In this case we would not increment the global transaction ID, if the users INSERT fails. However, its something for the king himself to evaluate. In other words: its beyond my skill level to do within hours. I’m somewhat sceptical its worth the efforts.

We also started looking into using multi statements. In this case, we prepend the users statement with the SQL to maintain the global transaction ID and run the resulting statement as a multi statement. Shown is a prefixing example. Its implemented as a hack for buffered non-prepared statements. We need to benchmark, if its worth the complicated logic over the initial approach.

$mysqli->query("INSERT INTO test(id) VALUES (1)");


INSERT -> 
  master -> 
    set_server_option(MULTI_STATEMENT_ON) -> 
       query(INJECTED;  INSERT) -> 
         more_results ->
           next_results ->
             store_results ->
               set_server_option(MULTI_STATEMENT_OFF) 

First try: service level

In the area of "consistency" and service level, our first approach looks promising. The time Andrey invested into the 1.1 release to implement the filter logic starts to pay off.

In short, filter mean that we have a sequence of independent tools to find a node for running a statement. A bit like Unix command line tools that are connected on the command line with a pipe.

query(SELECT) ->
  all masters, all slaves ->
    filter(LOADBALANCING)  ->
      certain slave

We now have a new quality-of-service or "consistency" (qos) filter. For background information on the idea, see Consistency, cloud and the PHP mysqlnd replication plugin.

If, for example, the quality-of-service (consistency level) you need from the cluster is read-your-writes, you can set it in the plugin configuration file and create a filter chain like this:

query(SELECT) ->
  all masters, all slaves ->
    filter(QOS, STRONG_CONSISTENCY) ->
      all masters, no slaves ->
        filter(LOADBALANCING)  ->
          certain master

This is not much of a win over the already existing master_on_write configuration setting. However, together with the new filter, we also introduced a new API call to change the filter chain at runtime. You don’t have to configure read-your-writes (master_on_write) when setting up the plugin, you can set at runtime – on demand.

Let a filter chain like this be given:

query(SELECT) ->
  all masters, all slaves ->
    all masters, all slaves ->
      filter(LOADBALANCING)  ->
        certain slave


Then, at run-time you place an order in your shop and you need to read-your-writes for a short period, you do:

mysqlnd_ms_set_qos(MYSQLND_MS_STRONG_CONSISTENCY);
$mysqli->query("SELECT id FROM orders");
/* ... do more queries that must not return stale data ... */


This will change the filter chain accordingly on-the-fly.

query(SELECT) ->
  all masters, all slaves ->
       filter(QOS, STRONG_CONSISTENCY) ->
         all masters, no slaves ->
           filter(LOADBALANCING)  ->
             certain master


Once you are done with the consistent reads, you can go back with one API call to eventual consistency (use masters and slaves, which may or may not serve current data.

That can save you a good number of SQL hints required in 1.1, if not using master_on_write.

… and back to the beginning

With the new filter and the new API call, we can also allow things like this:

$mysqli->query("INSERT INTO orders(...)");
$global_trx_id = mysqlnd_ms_get_global_trx_id($mysqli);
mysqlnd_ms_set_qos(MYSQLND_MS_SESSION_CONSISTENCY, $global_trx_id);
$mysqli->query("SELECT id FROM orders");
/* ... do more queries that must not return stale data for table orders... */


In this case we can read from any master and any slave which has replicated a certain global transaction ID. This is where we get back to the beginning. And, we open up for the future.

Imagine, with a distant release (not 1.2!), you could ask for data that is no older than 2 seconds. The plugin would either read from the master, or a slave lagging no more than 2 seconds, or fetch the result from a local TTL cache, such as PECL/mysqlnd_qc, with a TTL of 2 seconds…

mysqlnd_ms_set_qos(MYSQLND_MS_EVENTUAL_CONSISTENCY, MAX_LAG, 2);
$mysqli->query("SELECT id FROM news");

Happy hacking to all of us…

@Ulf_Wendel

2 Comments

  1. My understanding is that the INJECTION is updating a table.
    What is not clear is the following:
    1) when and how and by whom the table is created;
    2) how does the binary log look like after a couple of insert/delete/update with injections. Could you supply a snippet?

    Thanks

    Giuseppe

  2. Hi Giuseppe,

    nothing fancy, fast or clever came to our mind. My inprecise wording is causing confusion.

    Ultimately, all my hope is on a server-side solution of the global transaction ID problem. Pure client-side approaches have too many shortcomings. I expect a proxy based solution – MySQL Proxy or Tungsten Proxy – to be more capable. Therefore, my current thinking is not to spend too much time on this, unless users love it.

    (0) By query(INJECTION), query(ORIGINAL) I wanted to show two queries, two calls to the query() function, two distinct COM_QUERY commands. Before the users query(ORIGINAL), we’ll transparently insert an extra query(INJECTION). This means, we add extra round-trips.

    INJECTION itself is some user-supplied SQL. Yes, it is an update statement such as, for example, UPDATE trx SET trx_id = trx_id + 1. I need to look up the SQL you proposed for this. Its been the best proposal, I’ve found since May…

    Because query(INJECTION) stands for a plain vanilla SQL statement, it should be just another, plain vanilla statement in the binary log (2). Users should not update the trx table, ever. The plugin increments it for every statement in auto commit more or, if not in auto commit mode, whenever the commit() function is called

    Autocommit mode:

    User issues: query(INSERT INTO test … )
    We run: query(UPDATE trx …) , query (INSERT)

    (1) The global transaction ID (trx) table needs to be created manually by a database administrator, when setting up a master for the first time.

    It is likely that we stick to this manual solution. If a node (master or slave) is added to the cluster, it is neccessary to delploy the PHP clients configuration file to inform them about the new node. Deploying the configuration file means that you have to manually update the PHP plugin configuration files. So, if cluster changes require PHP plugin config file updates whenever a master is added/removed/changed, why not make the create of the trx table a manual process either?

    When I started hacking, I added an option to my PHP plugin config for the CREATE. I dropped it as soon as I realized it would not be a matter of five minutes – maybe in a later release. I want to see and hear user reactions before we look into it. For example, if we removed a manual operation (create trx table manually on every new master), what about the other manual operations, why not allow run-time manipulation of the plugin configuration to add/remove nodes on the fly (compiled out currently, half-way-done, untested, …), why not read out MySQL status information, for example, SHOW SLAVES to automatically detect new slaves (an Andrey proposal…)… – no, not yet.

    Feedback first. Thanks for yours!