Ulf Wendel

PHP: Transparent load balancing and sharding with mysqlnd

Psst, you want some client-side MySQL load balancing with and without sharding for your PHP application? PHP 5.3 has something to offer for you. It is free. It requires no to very little changes to your applications. It is open source but works with closed source applications as well. You won’t find this information in any book, because some book authors still do not cover "mysqlnd". Read on for 30+ lines of PHP to add round-robin connection load balancing and, 70+ lines of PHP to add MySQL master slave replication or sharding support to your application without changing the application. Black magic from Manmuckiel.

The goal

Round-robin connection load balancing

Client   Server
connect(‘Host A’) mysqlnd plugin
30+ lines PHP
-> MySQL on ‘Host B’

Load balancing and sharding by query redirection and rewriting

Client   Server
connect(‘Host A’)
query(‘Query A’);
mysqlnd plugin
70+ lines PHP
-> MySQL on ‘Host B’
runs query ‘Query B’

The PHP 5.3 source tree contains a PHP licensed full-fledged MySQL client library called "MySQL native driver for PHP" or short "mysqlnd". Any of the three existing PHP MySQL extensions (mysql, mysqli, PDO_MYSQL) can make use of it. As a new development the mysqlnd library has a modular design featuring a mighty plugin interface.

The mighty plugin interface has attracted the attention of Mayflower/thinkPHP and David Soria Parra leading into the development of PECL/mysqlnd_uh (uh = user handler). By help of PECL/mysqlnd_uh it is possible to write mysqlnd plugins, the magic back boxes above, using PHP instead of C. For prototyping and spreading ideas PHP is the best choice, for production use consider rewriting mysqlnd plugins prototyped in PHP using C for performance and feature-completeness.

Client stack
Any application (Oxid, WordPress, phpMyFAQ)
MySQL extension (mysql, mysqli, PDO_MYSQL)
mysqlnd library
mysqlnd plugin
PHP or C based
|
MySQL Server

Any mysqlnd plugin runs on the client-side of you web architecture. Unlike with MySQL Proxy there cannot be a single point of failure in your architecture. Nor is there the risk of an overloaded central bottle-neck as all logic is on the client. Scaling horizontally, scaling by adding machines to your setup is as easy as it can get if using mysqlnd plugins. The mysqlnd client-side query cache plugin PECL/mysqlnd_qc is the best example.

No or very little application changes are needed to use mysqlnd plugins. The mysqlnd library and its plugins operate on a layer beneath the application. Some plugin applications are transparent. The application does not need to know about the plugin and will not recognize it. For example, a simple round-robin connection load balancer can operate without any information from the application. On the contrary, the client-side cache plugin PECL/mysqlnd_qc and the sketched sharding load balancer work most efficiently, if the application hints them what to do, e.g. for how long a query result may be cached. Decoupling database library plugins and applications makes application updates very easy – even updates of closed source applications.

mc + ms + uh = black balsam from Manmuckiel

Words from Mats have put pieces in the brains and source code of Andrey Hristov (PECL/mysqlnd_mc, PECL/mysqlnd_ms), located in Mannheim , Johannes Schlüter based in Munich as well as the Mayflower (PECL/mysqlnd_uh) folks and myself who lives in Kiel together. In some ways the main contribution from Mats has been to remind me of the long requested load balancing blog posting… who cares.

30+ lines for round-robin connection load balancing are given below. Grab the latest PHP 5.3 source code, fetch the source of PECL/mysqlnd_uh and compile yourself a PHP binary configured with --with-mysql=mysqlnd --with-mysqli=mysqlnd --with-pdo-mysql=mysqlnd --enable-mysqlnd-uh. Fix the code from below, upate the host names, put the code into your auto_prepend_file and open a MySQL connection using mysqli or PDO_MYSQL but stay a way from the mysql extension because it is dated. It doesn’t matter which API you use in your application.


<?php
class load_balancer_rr extends MySQLndUhConnection {
	protected $servers;
	protected $server_count;
	public function __construct($servers, $verbose) {
		$this->servers = $servers;
		$this->server_count = count($servers);
	}

	public function pick_server($host) {
		if ('bypass_lb:' == substr($host, 0, 10)) {
			$server['host'] = substr($host, 11);
		} else {
			$rand = mt_rand(0, $this->server_count - 1);
			$server['host'] = $this->servers[$rand];
		}
		return $server;
	}

	public function connect($res, $host, $user, $passwd, $db, $port, $socket, $mysql_flags) {
		$host = $server = $this->pick_server($host, $user, $passwd, $db, $port, $socket, $mysql_flags);
                printf('Redirected to %s'\n', $host);
		return parent::connect($res, $host, $user, $passwd, $db, $port, $socket, $mysql_flags);
	}
}

$server_list = array('host_a', 'host_b');
$rr = new load_balancer_rr(array('host_a', 'host_b'));
mysqlnd_uh_set_connection_proxy($rr);
?>

The script hooks the connect() function of the mysqlnd database library. The mysql, mysqli and PDO_MYSQL extensions will use the mysqlnd database library if you compile your PHP using the configure line --with-mysql=mysqlnd --with-mysqli=mysqlnd --with-pdo-mysql=mysqlnd. Any connection attempt made by PHP using any of the extensions is forwarded to the mysqlnd library.

PECL/mysqlnd_uh installs a mysqlnd plugin which exports a set of classes which you can use to proxy mysqlnd database library calls. MySQLndUhConnection is such a class. The class minics the mysqlnd library internal connection class. The user-defined class load_balancer_rr from the script refines some methods of the build-in class.

class load_balancer_rr extends MySQLndUhConnection {
...
}
$rr = new load_balancer_rr(array('host_a', 'host_b'));
mysqlnd_uh_set_connection_proxy($rr);

An object of load_balancer_rr is created and passed to PECL/mysqlnd_uh by calling mysqlnd_uh_set_connection_proxy($rr);. Internally PECL/mysqlnd_uh has overwritten almost all mysqlnd library internal functions and replaced the, with its own implementation. Whenever the mysqlnd library is called by any of the extensions, the call ends up in the overwritten library function provided by PECL/mysqlnd_uh. PECL/mysqlnd_uh checks if the user has defined a callback for the function in question and if so, it calls the callback. Effectively, the load_balancer_rr object proxies mysqlnd internal calls.

The user defined callbacks are implemented in objects of certain type, such as objects of the type MySQLndUhConnection. The original mysqlnd function implementations are still available for calling. Within the class load_balancer_rr they can be called using the code>parent:: operator.

The user proxy, the user-defined callbacks are installed using the auto_prepend_file directive to make sure that they are in place before your application starts. Remember: we do not want to touch the application source code. Within the user-defined callbacks we manipulate the host parameter of the connect() function to implement transparent round-robin load balancing.

public function connect($res, $host, $user, $passwd, $db, $port, $socket, $mysql_flags) {
  $host = $server = $this->pick_server($host, $user, $passwd, $db, $port, $socket, $mysql_flags);
  printf('Redirected to %s'\n', $host);
  return parent::connect($res, $host, $user, $passwd, $db, $port, $socket, $mysql_flags);
}

Let's recap. The application uses any of the extensions (mysql, mysqli, PDO_MYSQL), which export an PHP API for using MySQL. The extensions in turn use the mysqlnd database library to talk to MySQL using the MySQL Client Server Protocol and doing the dirty work on the wire. The mysqlnd database library allows plugins to replace (C) functions it provides for use by the extensions. Think of mysqlnd as a collection of objects. Plugins can replace some of the objects to hook or proxy their methods. Those proxy functions need to be written in the programming language of the mysqlnd library, which is C. PECL/mysqlnd_uh is a mysqlnd plugin written in C. PECL/mysqlnd_uh does some magic to enables you to replace mysqlnd library internal functions with PHP functions, no C skills needed for writing a plugin.

Things that happen within the mysqlnd library or its plugin, such as manipulating the host parameter passed to of the connect() function or rewriting query strings, are transparent from an applications points of view.

Nasty? I would not say so. Mighty? Certainly! Fun? Absolutely! Security? Pfft. A new generation of mysqlnd plugins will address some security and stability concerns. The code is in trunk at php.net

<?php /* Any PHP MySQL application */?>
|   ^
v   |
mysql or mysqli or PDO_MYSQL   mysql or mysqli or PDO_MYSQL
> mysqlnd library <-> mysqlnd plugin
PHP or C
>
MySQL Server

Load balancing + sharding + rewriting = 100+ lines

Round-robin load balancing may serve you fine, if you have all data on all your database servers and, it does not matter which server the application connects to. But there are many situations when you want to connect to a certain server or a certain group of servers. For example, you may want to send read-only questions to slaves in a replication setup and write queries to the masters. Or, you want to pick a server located close to the client to minimize network latency. You may even be using a complex sharding logic bringing up the need to design and adapt your application to the data distribution logic you have choosen. BTW, if you are using sharding, check out the asnychronous query feature of mysqlnd.

Assuming you didn't have the possibility to design your application from day one to support data distribution and sharding or you do not want to modify your application, consider a mysqlnd plugin. A mysqlnd plugin can analyze the query that your application runs and redirect them for load-balancing or sharding purposes to other databases (schemata), database server (replication: master/slave) or even rewrite the querries without having to touch your application. This will work best, if the application can hint the plugin how to do the rewrite. A possible way to hint the plugin is to embed SQL hints into the query string. This is the way choosen by the PECL/mysqlnd_qc query cache plugin.

link = connect("Host A"
query(link, "Query A");
mysqlnd plugin
100+ lines PHP
MySQL on "Host A"
runs "Query A"
query(link, "Query B"); MySQL on "Host B"
runs "Query B"
query(link, "Query C"); MySQL on "Host A"
runs "Query C"

Andrey has presented an interesting approach in PECL/mysqlnd_ms (master slave slit for replication). The plugin parses the SQL query run by the application and tries to detect if it is a read-only query scanning for a SQL hint. If the query is not read-only then it is to be send to the master server in an MySQL replication setup. No matter which database server your application is connected to, the plugin makes the query and its reply end up on the server it should, for example on a replication master.

Application connects to "Host A"
Runs "Query A", " Query B", "Query C"
  |  
connection handle for "Host A"
  |  
PHP MySQL extension (mysql, mysqli, PDO_MYSQL)
  |  
connection handle for "Host A"
  |  
mysqlnd library
mysqlnd plugin
100+ lines of PHP
Org conn to "Host A"   New proxy conn to "Host B"
|   |
"Query A", "Query C"   "Query B"
|   |
MySQL on "Host A"   MySQL on "Host B"

Say the application creates a connection to MySQL on "Host A" and gets a gets a connection handle from any of the PHP MySQL extensions (mysql, mysqli, PDO_MYSQL) in reply. It then runs a query on the connection which shall end up on "Host B" on the one connection handle for "Host A". The application does that because it has not been prepared for MySQL replication or sharding. A mysqlnd plugin can audit all queries, catch queries that shall go to "Host B", establish an additional connection to "Host B" in the hidden and without the know of your application to send the queries in question to "Host B". For the duration of the query execution and fetching query results the plugin will send messages from the new connection to "Host B" to the application via the applications connection handle to "Host A".All the time the application is using one connection handle. Depending on the query and the plugin the application gets replies either from MySQL on "Host A" or "Host B", "Host C", "Host ...", "Host X" - it is transparent from an application point of view. You get MySQL master slave replication or sharding support in your application without touching the application.


<?php
class load_balancer_query_sharding extends MySQLndUhConnection {

	protected $connections;
	
	public function __construct($verbose) {
		$this->connections = array();
	}
	
	public function parse_sql_hints($query) {
		$hints = array();		
		if (preg_match("@^/\*(.*)\*/@ismU", $query, $matches)) {			
			$parts = explode(",", $matches[1]);
			foreach ($parts as $part) {
				$tmp = explode("=", $part);
				if (isset($tmp[0]))
					$hints[trim($tmp[0])] = trim($tmp[1]);
			}
		}		
		return $hints;
	}

	public function query($res, $query) {
	
		$thread_id = $this->getThreadId($res);
		$hints = $this->parse_sql_hints($query);
		if (isset($hints['db'])) {
			/* new proxy connection */
			$mysqli = new mysqli('localhost', 'root', 'root', $hints['db'], 3306, '/tmp/mysql.sock');			
			$mysqlnd_conn = mysqlnd_uh_convert_to_mysqlnd($mysqli);
			$proxy = new MySQLndUHConnection();
			$this->connections[$thread_id] = array(
				'mysqli' => $mysqli,  /* need to copy here to increase refcount and prevent close */
				'mysqlnd_conn' => $mysqlnd_conn, 
				'proxy' => $proxy,
				'active' => true,
			);
			$ret = $this->connections[$thread_id]['proxy']->query(
				$this->connections[$thread_id]['mysqlnd_conn'], $query);
		} else {
			$ret = parent::query($res, $query);
		}
		
		return $ret;
	}
	
	public function getFieldCount($res) {
		$thread_id = $this->getThreadId($res);
		if ( isset($this->connections[$thread_id]) && $this->connections[$thread_id]['active']) {
			return $this->connections[$thread_id]['proxy']->getFieldCount($this->connections[$thread_id]['mysqlnd_conn']);
		}
		return parent::getFieldCount($res);
	}
	
	public function storeResult($res) {
		$thread_id = $this->getThreadId($res);
		if ( isset($this->connections[$thread_id]) && $this->connections[$thread_id]['active']) {
			$this->connections[$thread_id]['active'] = false;
			return $this->connections[$thread_id]['proxy']->storeResult($this->connections[$thread_id]['mysqlnd_conn']);
		}
		return parent::storeResult($res);
	}
}

/* Create proxy object and pass list of participating servers to it */
$debug_info = true;
/* TODO: change connection parameter to match your setup! */
$proxy = new load_balancer_query_sharding($debug_info, 'localhost', 'root', 'root', 'test', 3306, '/tmp/mysql.sock');
mysqlnd_uh_set_connection_proxy($proxy);

70+ lines of black magic PHP: proof-of-concept of load balancing with master slave split/sharding query redirection and rewriting. The example code is a proof-of-concept. PECL/mysqlnd_ms (master slave) is a bit closer to a complete solution and written in C. The example PHP code does not proxy all mysqlnd functions that need to be overwritten. This blog tries to illustrate the idea behind but does not aim to give a full-fledged production ready working example.

For demonstration purpose the 70+ lines of black magic parse the query string for a SQL hint. The SQL hint controls the behaviour of the plugin. Using SQL hints is not a must, if you can make a decision based on the query string alone, but SQL hints can make your life easier. You can use the SQL hint to tell the plugin that a query is a master query in a replication setup, you can implement a SQL hint that turns off any manipulations done by the plugin, you can pass user information for picking a shard... - whatever you need. You can develop the proxy plugin in the language all of you speak: PHP!

The class load_balancer_query_shardingunderstands only one hint: /* db=name */. Whenever the proxy detects the hint it will open a new proxy connection to the database and run the next query on the database. Something as basic as that has been used to force focussing on the proxy code. Proxy decision logic can be made as complex as you like later on.

$mysqli = new mysqli('localhost', 'root', 'root', 'test', 3306, '/tmp/mysql.sock');
$res = $mysqli->query("SELECT DATABASE() AS _greeting FROM DUAL");
$row = $res->fetch_assoc();
printf("MySQL schema is '%s'\n", $row['_greeting']);

$res = $mysqli->query("/* db=phptest */SELECT DATABASE() AS _greeting FROM DUAL");
$row = $res->fetch_assoc();
printf("MySQL schema is '%s'\n", $row['_greeting']);

$res = $mysqli->query("SELECT DATABASE() AS _greeting FROM DUAL");
$row = $res->fetch_assoc();
printf("MySQL schema is '%s'\n", $row['_greeting']);


nixnutz@112hp2510p:~/src/php/php-src/branches/PHP_5_3> sapi/cli/php ext/mysqlnd_uh/examples/blog_shard.php
MySQL schema is 'test'
MySQL schema is 'phptest'
MySQL schema is 'test'

Try the load_balancer_query_sharding proxy using the mysqli extension: connect to a database, fetch the value of DATABASE(), set a hint to trigger the creation of a proxy connection to another database, fetch the value of DATABASE() again using the one and only connection handle that you have opened in your script and you will see that the database has changed. Your query has been run on the proxy connection.

The Beginning

The PHP code presented does not aim to be production ready. The code>load_balancer_query_sharding proxy does not overwrite all of mysqlnd library functions that need to be overwritten. It also ignores the fact that a connection has a state. Depending on your application it may or may not be possible to switch connections at an arbitrary point during the run of the application. A plugin will never be able to know such details. You will have to think about ways to hint the plugin, for example, by embedding SQL hints into query strings.

The round-robin connection load balancer is close to some real solution. One would need to check performance, make configuration of the pool of participating MySQL hosts a bit more sophisticated, say make it read server lists from a Memcache deamon and that is about it. It is good enough to try on development machines for deciding if the architecture is something you want or not. If you find performance to be poor, I have never benchmarked PECL/mysqlnd_uh, this may not be killer argument against using mysqlnd plugins. You can always rewrite in C.

The mysqlnd plugin API documentation is on php.net, there are various blog postings, proof-of-concept plugins on PECL (PECL/mysqlnd_ms, PECL/mysqlnd_mc, PECL/mysqlnd_sip), plugins in alpha/beta status (PECL/mysqlnd_uh) and the client-side query cache plugin which is production quality (PECL/mysqlnd_qc). Comic lovers go here to see presentations.

Code, documentation and ideas are around. Put them together in Manmuckiel or whereever you live.

2 Comments

  1. Your load_balancer_rr->pick_server() function is not round robin when your using a random number to pick a server from an array. round robin so around an ordered list, eg: 1->2->3->1->2->3… etc.

  2. Thanks, for pointing this out, Paul! Yes, I made a very basic (ashaming) mistake. Using round robin instead of random is not much of a change. One would need to hold the servers – as you say it – in an ordered list and, pick servers from the ordered list, for example. in FIFO order. That server list could come from an array and be configured upon instantiation of the proxy plugin class or be hold in a non-volatile, persistent place. The array is only accessible for the current PHP interpreter. Using APC/Shared Memory one could share the same list of servers among many PHP processes of the same machine, using Memcache one could share the list of servers among many machines running PHP worker processes. There is hardly a limit in making it more complex.

    However, thanks for pointing out that I am mixing up random and rount robin algorithms!