Replication plugin | filter | conquer = 1.1.0 coming

2011/09/28 by admin

The soon to be announced version 1.1.0-beta of the mysqlnd replication and load balancing plugin (PECL/mysqlnd_ms) for PHP introduces a new concept of filters to the plugin. Filters take a list of servers to pick one or more of it. Filters can be chained, similar command line tools. Imagine a future with a filter chain like: user_multi | roundrobin, table_partitioning | random_once, table_partitioning | adaptive_loadbalancer, … For example, user_multi | roundrobin will first invoke a callback and then apply static round robin load balancing to the servers returned by the callback set with user_multi . Or, table_partitioning | adaptive_loadbalancer would first apply replication partitioning rules (db.a -> server_a, db.b -> server_b…) and then a candicate is choosen for statement executions using some sort of adaptive load balancing implemented by the adaptive_loadbalancer…. future music, though!

The filters of 1.1.0

The filter concept is not fully developed in 1.1.0-beta. Foundations had to be laid first. Backward compatibility of the plugins configuration file format was broken between alpha and beta. As of 1.1.0, JavaScript Object Notation (JSON) format is used instead of PHP (php.ini) configuration file format. JSON allows defining hierarchical data structures and is easy to process, thanks to PHPs build in JSON support. Please, follow the links to the mysqlnd_ms manual for details.


{
    "myapp": {
        "master": {
            "master_0": {
                "host": "localhost"
            }
        },
        "slave": {
            "slave_0": {
                "host": "192.168.78.136",
                "port": "3306"
            },
            "slave_1": {
                "host": "192.168.78.137",
                "port": "3306"
            }
        },
        "filters": {
            "roundrobin": [

            ]
        }
    }
}

The following filters are included in 1.1.0-beta.

random
Replaces pick[]=random, pick[]=random_once, implements random and random once load balancing.
roundrobin
Replaces pick[]=roundrobin, implements roundrobin load balancing.
user
Replaces mysqlnd_ms_set_user_pick_server(), callback for user-defined server selection

How filters work

Whatever kind of MySQL replication cluster your PHP application may use, it will be faced with two tasks whenever a SQL statement is to be executed:

Identify the servers capable of executing a statement
- Read or write request?
- Accessing a database or table not replicated to all nodes?
Choose one of the candidates to execute the statement
- Random choice?
- Round robin load balancing?
- Load balancing adaptive to system load?

By answering those questions an application filters the list of configured replication nodes until one node is left. And this is, what filters are about. Filters stepwise reduce the list of candidates for statement execution. In pseudo-code this looks like:


foreach (filters as filter) {
   servers = filter->reduce(statement, servers);
   if (1 == count(servers))
     break;
}

Filtering is an iterative process. There can be more than one filter. In a way you can compare chaining filters to chaining small, easy to maintain small command line tools on a shell using the shell’s pipe operator. Instead of having one complex, hard to use super-duper tool you combine small tools to get the job done.


grep -R "filter" en/reference/mysqlnd_ms/ | sort | head -n 1

Like a command line tool accepts parameters, some of mysqlnd_ms 1.1.0-beta filters accept parameters as well. For example, the random filter has a "sticky" argument to enable random once load balancing.

Filter chains today

Plenty of code got re-factored between 1.0.1-alpha and 1.1.0-beta to make the plugins C code base more modular by introducing filters. Thus, we decided to act stepwise, offering no filters providing new major features in 1.1.0. None of the 1.1.0 filters is a "multi-filter".

There are two types of filters, lets call them "single-filter" and "multi-filter&;. All filters get the statement to be executed and a list of servers as an input. Some filters reduce the server list down to one server. Those filters shall be called "single-filters". The random, roundrobin and user filter are such single-filters. They are supposed to output exactly one server. In the future there may be multi-filters. A multi-filter outputs one or more servers. The output of a multi-filter requires further processing until one server has been picked for statement execution.


n >= 1 servers -> single-filter -> 1 server
n >= 1 servers -> multi-filter -> m <= n servers

Chaining the 1.1.0 single-filters makes little sense and is thus forbidden by the plugin. You get an error when you try to do something like random | roundrobin. Random always outputs only one server. Round robin with only one server…

A fictive table partitioning multi-filter

The full potential of the filter concept becomes visible with a fictive "table" filter. A table filter to support MySQL replication partitioning rules.
MySQL replication can be instructed to restrict replication to certain schema objects for certain replication nodes. Imagine you have an application using two databases "web" and "reports". The "web" database is very popular and you replicate it from the master to all your slaves. The "reports" database is not so and it is sufficient to have it available on the master and one of the slaves.


{
    "myapp": {
        "master": {
            "master_0": {
                "host": "localhost"
            }
        },
        "slave": {
            "slave_0": {
                "host": "192.168.2.27",
                "port": "3306"
            },
            "slave_1": {
                "host": "192.168.78.136",
                "port": "3306"
            },
            "slave_2": {
                "host": "192.168.78.137",
                "port": "3306"
            }
        },
        "filters": {
            "table": {
                "rules": {
                    "reports.%": {
                        "master": [
                            "master_0"
                        ],
                        "slave": [
                            "slave_0"
                        ]
                    },
                    "web.%": {
                        "master": [
                            "master_0"
                        ],
                        "slave": [
                            "slave_0",
                            "slave_1",
                            "slave_2"
                        ]
                    }
                }
            },
            "random": {
                "sticky": "1"
            }
        }
    }
}

Of course, you want the plugin to be aware of these partitioning rules. The fictive table filter will serve you. You add it to your filter chain and teach it about the data distribution. The table multi-filter analyzes the statement to be executed and outputs a list of one or more replication nodes capable of executing the statement. Then, you pass the output into the random single-filter to pick a candidate for query execution. Here is some pseudo-code to illustrate what happens:


servers = array(
 "master" => array("master_0"),
 "slave" => array("slave_0", "slave_1", "slave_2)
)
statement = "SELECT * FROM reports.daily";

servers = filters->table->reduce(statement, servers)
servers = filters->random->reduce(statement, servers)

execute(statement, servers[0])

And, no, in theory, there’s nothing that would stop us from supporting more than one master with such a fictive table filter…

However, it has been quite a bit of work to lay the foundations of the filter concept. 1.1.0 is about laying foundations. Whishes and requests for the future are much appreciated. Please, leave a comment or drop us an email.

Happy mysqlnd_ms 1.1.0-beta testing!

PS: I hope Andrey will release 1.1.0-beta tomorrow.