Ulf Wendel

PECL/mysqlnd_ms: let web requests be your M/S load balancer

Introducing the best load balancing policy for PECL/mysqlnd_ms: "random_once". A (PHP) web request is short-lived. A web server can handle multiple requests per second. There are many web servers running an application. For each web request choose a random slave to use for the rest of the request. Don’t do load balancing, don’t switch connections during a web request . Result: even load distribution, less connection overhead and less connections overall, cache loves you, less hassle switching stateful slave connections…

Load balancing policy random_once

Load balancing policies featured by PECL/mysqlnd_ms

As of today, PECL/mysqlnd_ms, our PHP mysqlnd toy plugin for replication load balancing, features four built-in load balancing policies:

  1. random: for each read query, choose a random slave
  2. round-robin: for each read query, choose the next slaves from the slave list in a round-robin fashion
  3. user defined: picking a server is left to the PHP script
  4. new random_once: for each web request choose a random slave for use with all read queries for the rest of the request.

The "random_once" is inspired by one of the most basic approaches to support MySQL replication in a PHP script, see the below snippet. At the beginning of the script run you open two connections: a master connection, a slave connection. The slave is picked in a random fashion. You send every write using $master connection handle to the master. And, of course, you detect and direct all reading queries to the slave via the $slave connection handle. In some cases, such as for critical write-read sequences if you must be able to read what you had just written, you skip the R/W split to run your critical write-read sequence on the master. In other words: you can overrule the default R/W split, if need be.

$master = new mysqli("master", "user", "password", "database", "port");

$slave_list = array("slave_a", "slave_b", "slave_c", "slave_d");
$slave_idx = mt_rand(0, (count($slave_list) - 1));
$slave = new mysqli($slave_list[$slave_idx], "user", "password", "database", "port");

/* For the rest of the script: manual R/W split */
$master->query("INSERT INTO table_a(col_a) VALUES (1)");
$slave->query("SELECT * FROM table_b");
$slave->query("SELECT * FROM table_a");

The plugin counterpart: C for comfortable

PECL/mysqlnd_ms implements R/W split detection in C. Automatic R/W split decisions can be overruled by the application using SQL hints. The new "random_once" load balancing policy implements the above logic for picking a slave. But because its a plugin and (semi-)transparent. You have only one connection handle to deal with in your PHP script. This makes it possible to use the plugin with existing applications without (significant) code changes. Just replace the host name with a config section name, that’s about it (full example).

/* automatic pooling */
$mysqli = new mysqli("mysqlnd_ms_config_section", "user", "password", "port");

/* automatic R/W split and balancing */
/* master */
$mysqli->query("INSERT INTO table_a(col_a) VALUES (1)");
/* slave x, x = random pick from S1, S2, S3 */
$mysqli->query("SELECT * FROM table_b");
/* slave x, no balancing, no switch to another slave */
$mysqli->query("SELECT * FROM table_a");

(config, see also full example)

[mysqlnd_ms_config_section]
master[] = master
slave[] = s1
slave[] = s2
slave[] = s3
pick[] = random_once

Less connections

The new load balancing policy significantly improves the plugins use of connections maintaining even load distribution. MySQL is certainly very fast establishing a connection, much faster than other databases. However, there is always a cost: the network connection itself and the server resources occupied by a server connection thread. The less connections a PHP web requests opens, the better.

This is where "random_once" shines. Let’s say your set up consists of one MySQL master and three MySQL slaves. To make HTML table hacking easier for me, let’s further assume you run only two web servers. Let’s take a snapshot of the system when both web servers run two PHP requests and each PHP script running has run one write query and four read queries through PECL/mysqlnd_ms.

Load balancing system snapshot: round robin policy
Web server A Web server B
Req A.1 Req A.2 Req B.1 Req B.2
Conn M Conn S.1 Conn S.2 Conn S.3
1 write 2 reads 1 read 1 read
  • 4 open connections to the master
  • 12 open connections to the slaves
    • Slave S.1: 4 connection, 8 reads
    • Slave S.2: 4 connections, 4 reads
    • Slave S.3: 4 connection, 4 reads

Compare the two system snapshot illustrations from above and the beginning of the blog ost. Using round robin policy to load balance queries during script run we see a total of 16 connections (4 master, 12 slave). With random-once it is only 8 connections (4 master, 4 slave). Query distribution is comparable in the example. In both cases two of the three slaves have run four reads and the third slave has run eigth reads. Over time, distribution will become more even with both random once and round robin. However, illustrating that is beyond my rusty HTML table hacking skills ;-).

Chance of better cache usage

If a PHP script runs many similar SELECT during a request, there is a chance that upon subsequent execution the query is served from the database server caches. But that hot cache chance is only given, if the load balancer does not switch connections in between two SELECT queries. The built-in random and round-robin load balancing policies do switch connections for every query. The new random-once policy picks one random slave when the PHP script starts and uses it until the script ends – together with the web request. This increases the chance of better cache usage.

Statefull vs. stateless slave connection

One big pitfall is system immanent to any driver redirecting queries over multiple connections: state. Connections have a state (see also "PHP: Master slave load balancing in mysqlnd – how transparent?") which may "forbid" switching. With "random_once" this is not solved but cured a bit because there is only one slave connection. It is still up to the application to hint the driver/plugin which connection (master or slave) to use for SQL units that rely on connection state. But there is only one slave connection, not many, making it a bit easier.

Comments welcome

Please recall that PECL/mysqlnd_ms is a proof-of-concept. Its far from stable, we are still trying to figure out how to do things. If you have any idea to share, … happy hacking!

4 Comments

  1. Pingback: abcphp.com

  2. one of my concerns is that the second you start writing to a master with reads from a slave you open yourself up to all sorts of issues with replication lag. therefore i prefer an approach where you mark certain requests as “(potentially) write” and then use the master and anything else just uses the slave:
    http://groups.google.com/group/symfony-devs/browse_thread/thread/5bceb9918e2b4947/2a250d0f4d9a5e79

    of course this has limits in scaling, but with ESI (edge side includes) you can keep requests smaller and more focused. plus it makes it easier to use multiple masters and sharding. need to finally work on the above and write a proper blog post.

  3. Lukas, thanks for the feedback. Interesting and valuable as usual… 🙂

    The “always use master after first write” is an interesting idea. I think its worth implementing in C. Clever. May not always work as you point out.

    The next idea of marking a (web) request read-only is even better. Thinking out loud: so, you suggest we allow not only per-query but also per-connection and per-request overruling of automatic decisions. Cool! We shall do that.

    Today you’d just disable mysqlnd_ms at all for a per-request setting. But that is not nice as you’d have to deploy the masters twice. Once in the mysqlnd_ms config and once in your application.

    I like that idea a lot. I’ll make it the last 1.0 feature addition.

    THANKS

    Ulf