Ulf Wendel

2011/04/18
by admin
1 Comment

PHP replication and load balancing plugin: master_on_write, documentation

Hours ago, I have committed the first documentation draft for PECL/mysqlnd_ms. The mirrors should show it on Saturday. PECL/mysqlnd_ms is a replication and load balancing plugin for the mysqlnd library. Its latest feature, master_on_write, helps to work around the issue of replication lag. If master_on_write=1, the plugin stops load balancing statements after the first non read-only statement has been run by a PHP script. Instead of running read-only statements on the slaves, the plugin runs all statements on the master. This is done to ensure that the client will always read the latest data, even the data just written and not yet replicated to the slaves. Thank you, Lukas for the blog comment with the idea!

Thank you, Chris for this presentation!

master_on_write – always use master after first write

MySQL replication is asynchronous. Whenever you perform a write operation on a MySQL master server, it takes some time until it is replicated on all slaves. The write operation is not blocked until all of the slaves have replicated it. The write returns immediately after its been executed on the master. Slaves may fall behind the master and not show the latest data updates.

  Master Slave A Slave B
t1 INSERT    
t2   (replicated) INSERT (stale) SELECT
t3     (replicated) INSERT

For example, you could have a replication setup with one master and two slaves. You use PECL/mysqlnd_ms to direct read-only statements to the slaves and all other statements to the master. Your slaves are a bit behind the master, one a little less (slave A), one a little more (slave B). At the point in time t1, you insert some data. PECL/mysqlnd_ms sends the query to the master. Immediately after the insert operation, in the same script, at time t2, you read from the table you have modified. PECL/mysqlnd_ms picks slave B to execute the SELECT statement. Slave B has not catched up on the latest updates from the master and returns stale results.

Continue Reading →

2011/03/02
by admin
2 Comments

PECL/mysqlnd_ms: let web requests be your M/S load balancer

Introducing the best load balancing policy for PECL/mysqlnd_ms: "random_once". A (PHP) web request is short-lived. A web server can handle multiple requests per second. There are many web servers running an application. For each web request choose a random slave to use for the rest of the request. Don’t do load balancing, don’t switch connections during a web request . Result: even load distribution, less connection overhead and less connections overall, cache loves you, less hassle switching stateful slave connections…

Load balancing policy random_once

Load balancing policies featured by PECL/mysqlnd_ms

As of today, PECL/mysqlnd_ms, our PHP mysqlnd toy plugin for replication load balancing, features four built-in load balancing policies:

  1. random: for each read query, choose a random slave
  2. round-robin: for each read query, choose the next slaves from the slave list in a round-robin fashion
  3. user defined: picking a server is left to the PHP script
  4. new random_once: for each web request choose a random slave for use with all read queries for the rest of the request.

The "random_once" is inspired by one of the most basic approaches to support MySQL replication in a PHP script, see the below snippet. At the beginning of the script run you open two connections: a master connection, a slave connection. The slave is picked in a random fashion. You send every write using $master connection handle to the master. And, of course, you detect and direct all reading queries to the slave via the $slave connection handle. In some cases, such as for critical write-read sequences if you must be able to read what you had just written, you skip the R/W split to run your critical write-read sequence on the master. In other words: you can overrule the default R/W split, if need be.

$master = new mysqli("master", "user", "password", "database", "port");

$slave_list = array("slave_a", "slave_b", "slave_c", "slave_d");
$slave_idx = mt_rand(0, (count($slave_list) - 1));
$slave = new mysqli($slave_list[$slave_idx], "user", "password", "database", "port");

/* For the rest of the script: manual R/W split */
$master->query("INSERT INTO table_a(col_a) VALUES (1)");
$slave->query("SELECT * FROM table_b");
$slave->query("SELECT * FROM table_a");

The plugin counterpart: C for comfortable

PECL/mysqlnd_ms implements R/W split detection in C. Automatic R/W split decisions can be overruled by the application using SQL hints. The new "random_once" load balancing policy implements the above logic for picking a slave. But because its a plugin and (semi-)transparent. You have only one connection handle to deal with in your PHP script. This makes it possible to use the plugin with existing applications without (significant) code changes. Just replace the host name with a config section name, that’s about it (full example).

/* automatic pooling */
$mysqli = new mysqli("mysqlnd_ms_config_section", "user", "password", "port");

/* automatic R/W split and balancing */
/* master */
$mysqli->query("INSERT INTO table_a(col_a) VALUES (1)");
/* slave x, x = random pick from S1, S2, S3 */
$mysqli->query("SELECT * FROM table_b");
/* slave x, no balancing, no switch to another slave */
$mysqli->query("SELECT * FROM table_a");

(config, see also full example)

[mysqlnd_ms_config_section]
master[] = master
slave[] = s1
slave[] = s2
slave[] = s3
pick[] = random_once

Less connections

The new load balancing policy significantly improves the plugins use of connections maintaining even load distribution. MySQL is certainly very fast establishing a connection, much faster than other databases. However, there is always a cost: the network connection itself and the server resources occupied by a server connection thread. The less connections a PHP web requests opens, the better.

This is where "random_once" shines. Let’s say your set up consists of one MySQL master and three MySQL slaves. To make HTML table hacking easier for me, let’s further assume you run only two web servers. Let’s take a snapshot of the system when both web servers run two PHP requests and each PHP script running has run one write query and four read queries through PECL/mysqlnd_ms.

Load balancing system snapshot: round robin policy
Web server A Web server B
Req A.1 Req A.2 Req B.1 Req B.2
Conn M Conn S.1 Conn S.2 Conn S.3
1 write 2 reads 1 read 1 read
  • 4 open connections to the master
  • 12 open connections to the slaves
    • Slave S.1: 4 connection, 8 reads
    • Slave S.2: 4 connections, 4 reads
    • Slave S.3: 4 connection, 4 reads

Compare the two system snapshot illustrations from above and the beginning of the blog ost. Using round robin policy to load balance queries during script run we see a total of 16 connections (4 master, 12 slave). With random-once it is only 8 connections (4 master, 4 slave). Query distribution is comparable in the example. In both cases two of the three slaves have run four reads and the third slave has run eigth reads. Over time, distribution will become more even with both random once and round robin. However, illustrating that is beyond my rusty HTML table hacking skills ;-).

Chance of better cache usage

If a PHP script runs many similar SELECT during a request, there is a chance that upon subsequent execution the query is served from the database server caches. But that hot cache chance is only given, if the load balancer does not switch connections in between two SELECT queries. The built-in random and round-robin load balancing policies do switch connections for every query. The new random-once policy picks one random slave when the PHP script starts and uses it until the script ends – together with the web request. This increases the chance of better cache usage.

Statefull vs. stateless slave connection

One big pitfall is system immanent to any driver redirecting queries over multiple connections: state. Connections have a state (see also "PHP: Master slave load balancing in mysqlnd – how transparent?") which may "forbid" switching. With "random_once" this is not solved but cured a bit because there is only one slave connection. It is still up to the application to hint the driver/plugin which connection (master or slave) to use for SQL units that rely on connection state. But there is only one slave connection, not many, making it a bit easier.

Comments welcome

Please recall that PECL/mysqlnd_ms is a proof-of-concept. Its far from stable, we are still trying to figure out how to do things. If you have any idea to share, … happy hacking!

2011/03/01
by admin
2 Comments

A missed load balancing opportunity?

Write-read sequences can be tricky for users of MySQL replication, if what has been written needs to be available for reading immediately. Slaves could be behind. You have to run the sequence on the master. If MySQL semisynchronous replication would tell the client which slave is in sync, a client library could select that slave for load balancing.

The more we look into load balancing for PECL/mysqlnd_ms, the PHP mysqlnd master-slave replication load balancing proof-of-concept plugin, the more restrictions and issues come up. An obvious restriction is that connections have a state.

No solution: binlog monitoring

But not only state is a "trouble maker". Critical write-read sequences can be another one. If the client must be able to "see" the result of a write send to the master immediately, the safest choice is to run the next read on the master. The downside is what the master gets some read load. Alternatively, the client could try to monitor replication binlog or replication lag to choose a slave who has already replicated the write. If it hasn’t been replicated yet, the load balancer could block the client for a wait period until a slave catches up before the load balancer sends the read to the master. Uuups, conceptually that is what semisynchronous replication does. Of course, the server does it in a much more elegant way. And, if the server told PECL/mysqlnd_ms which slave has catched up…

No solution: global transaction ID

Another solution to the critical write-read sequence is using a global transaction ID as proposed by Jan in "MySQL Proxy and a Global Transaction ID". The idea is to maintain some sort of transaction ID or record version which is replicated by the server. The load balancer will pick a slave that, for example, has already replicated the critical write by checking the global transaction ID.

Shall PECL/mysqlnd_ms use a global transaction ID?

PECL/mysqlnd_ms could do the magic of maintaining a global transaction ID in C. Let us know if you want this. To me it does not really sound as it is a bit of a hack, but who cares about hacks, if it solves problems…

2011/02/24
by admin
Comments Off on What if PECL/mysqlnd_ms does not implement failover?

What if PECL/mysqlnd_ms does not implement failover?

Implicit and automatic connection failover done by the driver is broken by design, correct? Should PECL/mysqlnd_ms implement automatic failover? Today Johannes showed me an IRC snippet about TAF (Transparent Application Failover). Maybe, I fail miserably but I don’t see a need for automatic failover in a master slave replication loadbalancing plugin. Let the application do it. In the end, this is also what TAF does – via a callback.

Say you run a query. The master slave load balancing plugin recognizes that it is a read query, a SELECT. The plugin picks a server for query execution, for example, the first of the configured slaves. The plugin sends the query to the slave but the slave is not reachable. Today, PECL/mysqlnd_ms should report an error. The plugin will not try to failover to another server.

/* A function from your database abstraction layer */
function my_query($mysqli, $query, $retry_count = 0) {

 if ($retry_count > 3)
    return false; 

 if (!($ret = $mysqli->query($query))) {
   /* Error: 1053 SQLSTATE: 08S01 (ER_SERVER_SHUTDOWN)  */
   if (1053 == $mysqli->errno) {
     /* ouch, slave is not available, try next... */
     return my_query($mysqli, $query, ++$retry_count);    
   } 
 }

 return $ret;
}

/* myapp refers to a group of master/slave servers */
$mysqli = new mysqli("myapp", "user", "password", "database", "port");
my_query($mysqli, "SELECT * FROM table");


(Remember that with PECL/mysqlnd_ms $mysqli represents a connection pool)

With today’s proof-of-concept your application is responsible for handling the unavailability of a server. Your application must handle database errors anyway: even MySQL can fail, believe it or not. In the example, I catch error 1053 (ER_SERVER_SHUTDOWN) as a possible hint that a slave has become unavailable. If so, I’m running the same query again. The plugin will – depending on the configured load balacing mode – pick another server and, run the query on, say, the second configured slave. A clean, simple design, no spaghetti. The application knows what to do, application decides.

Again, how transparent can (shall) the plugin be?

Once could argue that the plugin is no longer truly transparent. True, the application must be aware of replication being used and the application must have code handling what to do if an automatically selected slave server becomes unavailable. But the plugin cannot be 100% transparent anyway, as explained earlier in PHP: Master slave load balancing in mysqlnd – how transparent?. For example, the plugin cannot track user-defined SQL variables and ensure that those SQL variables have the same value on all configures MySQL servers.

Instead of giving up early we could also implement the above PHP logic (or a similar one) in C and do automatic failover within the plugin. But then, what will happen with this code sequence if there is a need for implicit failover when running the third query? When running the third query it is not allowed to switch the server, one must not break apart the unit of work. Instead of a unit of work constructed with SQL hints , I could have also used a SQL transaction to show my point.

$mysqli->query(sprintf("/*%s*/SET @a=1", MYSQLND_MS_SLAVE_SWITCH));
$mysqli->query(sprintf("/*%s*/SET @a=@a+1", MYSQLND_MS_LAST_USED_SWITCH));
$mysqli->query(sprintf("/*%s*/SELECT @a as _a", MYSQLND_MS_LAST_USED_SWITCH));

Support loves you: a manual full of execeptions…

If automatic failover cannot be handle a situation in a transparent way, it needs to be documented. I fear the list of exceptions will be long. People will be trapped, support will be busy (good for the business!) and in the end the recommendation from support to the customer will be to turn off automatic failover.

Let’s not do it then. Let’s educate the users to do failover themselves. Let’s not have callbacks like with TAF that move error handling (= failover) from where it belongs (in your app, where you run the query) to a callback which you attach afterwards to your app because you do not want to refactor it for its new duties. Let’s just accept that some things must be handled by the application. Such as failover.

I’m curious to read the blog comments… maybe I’ll be called crazy, … maybe some say I’m that wrong its not worth commenting…

2011/02/23
by admin
Comments Off on PHP: Application calling plugin – on communication

PHP: Application calling plugin – on communication

Imagine PECL/mysqlnd_ms could keep certain reads on certain slaves in a MySQL master slave setup. You would have a good chance to profit from hot database caches. You would not only distribute read load, scale out horizonally and so forth due to using MySQL replication. You would also optimize your reads for hot caches.

PECL/mysqlnd_ms is our master slave replication load balancing proof-of-concept mysqlnd plugin playground (see also previous blog posts). It allows you to define a callback which is responsible for picking the server that shall run a query. That userspace callback could be doing the read redirection optimization for hot caches.

The eye-catcher: hot cache optimization

(pecl_mysqlnd_ms_config.ini)

[myapp]
master[] = master_host:port_or_socket
slave[] = slave_host:port_or_socket
pick[] = user
pick[] = random

(PHP configuration file, php.ini)

mysqlnd_ms.enable=1
mysqlnd_ms.ini_file=pecl_mysqlnd_ms_config.ini

(Your PHP application)

function pick_server($connected_host, $query, $master, $slaves, $last_used_connection) {
  if (stristr($query, "FROM table_a"))
    /* optimize for hot cache */
    return "slave_for_table_a";
  else 
    /* use built-in random load balancing policy */
     return NULL;
}
mysqlnd_ms_set_user_pick_server("pick_server");

Tempting, easy, enjoy. You got the eye-catcher, lets talk about how the callback knows where to send the query.

How app and callback/plugin can communicate

Your applications can consist of tens of thousands of lines of code. Rules for optimizing query distribution could be very basic or very complex ones. If basic and, not changing often, the decision making rules are probably easy to implement. If complex, the callback may be the wrong place to decide. Guess you had 50 more of those stristr() calls I used above. It would be better if the callback got a short message from the application where to send the query. Take a step back, replace the word “callback” with “plugin built-in distribution logic”: same story. For complex distribution rules, the application needs to tell the callback/plugin in a simple, easy to understand yet rich language what to do.

The proof-of-concept is using SQL hints to let the application direct queries to any of the configured master or slave servers. A SQL hint is a SQL comment. If a system does not understand it, it ignores it. The hint is harmless. The syntax is short, parsing is fast. The hint can transport a message without the need for additional API calls to send the message. And, the SQL hint can be part of a SQL dump. Any system importing the SQL dump does not need to know anything about the SQL hint. The only drawback I can think of is that the SQL hint blows up the SQL a bit – marginally longer parsing times, marginally bigger log files. However, if that really is an issue, one could filter out the SQL hint after reading it.


/* a sql hint */SELECT * FROM test
/* a sql hint */INSERT INTO test(col_a) VALUES (1)

Searching for the words…

If we look at the SQL hint as sort of a protocol, we find an envelope and a payload: /* payload */. The payload holds the words of the language spoken by application and callback/plugin. The built-in language of the plugin has only three words to make processing fast: use slave, use master, use last used.

The hot-cache-optimization idea, if we wanted to have it built-in, would need another word, for example “use server id=x”. Does that sound familiar to you, server id? Sharding…

My attempt to collect feedback via blog comments worked quite well so far – thank you! It encourages me to ask another question.
Q: Shall there be such a built-in word "use server id=x"?

Simplified Jsonize

You user callback, responsible for picking a server and redirecting queries, could speak simplified jsonize. Simplified jsonize is a language with an indefinite amount of words. I think, the built-in language should not be so rich. It is too complex. But you may need a complex language to keep the rule engine in the callback both short and powerful.

function pick_server($connected_host, $query, $master, $slaves, $last_used_connection) {
  /* default:  built-in random/round robin/... load balancing policy */
  $server_to_use = NULL;

  if (preg_match("@^/\*.+\*/@ismU" $query, $matches)) {
    /* a sql hint */
    if ($json = json_decode($matches[1])) {
       /* we have parsed our own language */
      
        /* ... do something meaningful */
        $server_to_use = $json->properly_used_not_like_in_demo;
    }
  }

   return $server_to_use;
}

Simplified jsonize is using the json format to encode arbitrary words of language in an easy for you to parse way.

2011/02/21
by admin
2 Comments

PHP: Master slave load balancing in mysqlnd – how transparent?

How transparent does master slave load balancing support for mysqlnd have to be? Transparency, when driven to the extremes, has three limiting factory: power users not needing it, increase of messages send to MySQL, time intensive SQL monitoring and parsing. The PECL/mysqlnd_ms proof-of-concept mysqlnd plugin is almost transparent to make it easy to use. Almost because there are some design limits. Please let us know how you rate the severity of these limits. For example, can we ignore transactions? Your feedback on the previous blog post has been most helpful – thanks!

Give me four paragraphs, one minute, for the introduction and background.

A bit transparent
A foggy day. Somewhat transparent.

Change: a connection handle represents a pool

By default PECL/mysqlnd_ms hides what it does from the application developer. PHP application open a connection using any of the existing PHP MySQL APIs – mysqli, mysql, PDO_MySQL – and get a connection handle. Traditionally the connection handle represents one physical network connection to one server (1:1 relationship). Every MySQL query executed with the connection handle ends up on the same server.

/* all of those will be load balanced */
$mysqli = new mysqli("second_best_forum", "user", "password");
$mysql = mysql_connect("second_best_forum", "user", "password");
$pdo = new PDO("mysql:host=second_best_forum", "user", "password");

You made 50% of the introduction….

With the master slave load balancing plugin a connection handle represents a pool of physical network connections to the master and slave servers (1:n relationship). Although semantics have changed, the API is still the same. A connection handle shall behave like ever to make using the master slave load balancing plugin as easy as possible.

´

Standard   PECL/mysqlnd_ms
$connection_handle   $connection_handle
1   1
|   | | |
1   n
MySQL Server   Master Slave 1 Slave n

Connections have a state

The trouble is that every connection in the pool has its own state. Whenever the state of a connection changes one needs to decide if the change shall be reflected in the other connections from the pool. For example, if you change the character set when connected to Slave 1, do you expect Slave n, which may be running your next load balanced query, to use the character set you have just set? When it comes to character sets there is not much of a discussion. Character set play into string escaping (e.g. mysqli_real_escape_string()) and thus the change of the character set must be dispatched to all connections in the pool, must it?

You made it….

We are now at the hearth of the discussion. You can change the character set either via API calls (e.g. mysqli_set_charset()) or using SQL statements such as SET NAMES. It is strongly recommended to use the API calls. The client does the string escaping and the client needs to know the current charset. If you use the SQL command, the client will not recognize the new charset … welcome my dear website hacker.

Ouch: transactions

The client cannot recognize the connection state change, if done trough SQL statements unless the client monitors all SQL. But parsing all SQL statements on the client is not much of an option for performance reasons. Unfortunately there is no MySQL C API and PHP mysql, mysqli extension API counterpart to a very common set of SQL statements changing the state of a connection: START TRANSACTION, COMMIT and ROLLBACK. Ouch… PECL/mysqlnd_ms cannot handle it.

Incomplete list of issues

SQL API transparent?
ext/mysqli PDO_MySQL
START TRANSACTION n/a implicit no
COMMIT n/a PDO::commit no
ROLLBACK n/a PDO::rollBack no
SET NAMES verbally: n/a verbally: n/a no
SET NAMES related: mysqli_set_charset() related: PDO_MYSQL DSN yes
SET @user_variable n/a n/a no
SET @@server_variable n/a n/a no
n/a mysqli_options partly yes, after connection is established
SET auto_commit verbally: n/a verbally: n/a no
SET auto_commit related: mysqli_autocommit() related: PDO::autocommit() yes
PREPARE stmt_name FROM preparable_stmt related: mysqli_stmt_prepare() related: PDO::prepare() no
PREPARE stmt_name FROM preparable_stmt verbally: n/a verbally: n/a no

Question: Connector/J is using autocommit setting to as kind of a switch disable/enable load balancing, you want the same? There are API calls in the mysqli and PDO_MySQL extension to monitor it. The old mysql extension, as usual, is out. No matching API call.

Workaround: let the application handle it!

Many of the limitations can be lifted, if the application hints PECL/mysqlnd_ms. In other words: the load balancer is no longer transparent :-/. The workaround is easy most of the time. The hinting can be done using SQL hints or introducing additional API calls provided by PECL/mysqlnd_ms. For example, if you want to run a transaction on a slave, here is how you do.

$mysqli = new mysqli("second_best_forum", "user", "password");
/* enforce use of slave */
$mysqli->query(sprintf("/*%s*/START TRANSACTION", MYSQLND_MS_SLAVE_SWITCH));
/* disable load balancing using SQL hint */
$mysqli->query(sprintf("/*%s*/SET @myvar=1", MYSQLND_MS_LAST_USED_SWITCH));
$mysqli->query(sprintf("/*%s*/COMMIT", MYSQLND_MS_SLAVE_SWITCH));

My personal take is that moving some responsibilities, such as taking care of SQL transactions, SQL user variables, SQL syntax of prepared statements, … is perfectly valid. To handle those things in a transparent – from an applications point of view – way, the mysqlnd plugin would have to do time consuming SQL parsing. Most of you, who are using replication today, are aware of these issues. Thus, no big deal?

Thanks again for you feedback!

2011/02/18
by admin
6 Comments

PHP: … want transparent master slave load balancing in mysqlnd?

Are you interested in a master slave load balancing plugin for mysqlnd? It could have two operational modes. The automatic mode would operate transparent from an applications point of view. The manual mode would give you full control over all decisions. Automatic mode is for install and forget when you start with MySQL replication. Manual mode is for growing as your application evolves.

Start small…

Let’s assume you have to run a forum for a customer. You have picked, more or less randomly, the second-best PHP forum application. The customer’s website becomes popular. The forum is read-mostly and qualifies for scaling through MySQL replication. You add two MySQL slaves to your web farm. Now, how do you get the second-best PHP forum application to direct all writes to the MySQL master? Of course, the second-best application was not designed to support MySQL replication and you do not want to spend a fortune to change the application. The free solution of your problem is five minutes away:

  1. Get the mysqlnd master slave load balancing plugin
  2. Install the plugin
  3. Write a config file describing your replication setup

Precondition: the plugin exists (read on…) and you are using PHP 5.3 compiled with mysqlnd support.

First you get the plugin. Then you edit your php.ini configuration file.

extension=mysqlnd_ms.so
mysqlnd_ms.enable=1
mysqlnd_ms.ini_file=mysqlnd_ms_repl_setup.ini

Well done, your PHP mysqlnd library has now built-in replication load balancing support. Now you need to tell the load balancer where to find the master and the slave server and how to distribute read requests over the slaves. You need to create the "mysqlnd_ms_repl_setup.ini" referenced by your php.ini file.

[second_best_forum]
master[] = master.mynetwork
slave[] = slave_1.mynetwork
slave[] = slave_2.mynetwork

Well done, replication load balancing has been configured. Read requests will be distributed over the slaves in a round robin fashion. All of the following connects will give you load balanced connections. Whatever PHP mysql extension/API the forum application is using, it will end up being load balanced.

/* all of those will be load balanced */
$mysqli = new mysqli("second_best_forum", "user", "password");
$mysql = mysql_connect("second_best_forum", "user", "password");
$pdo = new PDO("mysql:host=second_best_forum", "user", "password");

/* goes to the master */
$mysqli->query("INSERT INTO users(name) VALUES ('Andrey')");
printf("insert_id = %d\n", $mysqli->insert_id);

/* goes to the slave in roundrobin fashion */
$res = $mysqli->query("SELECT 'Hello '");
var_dump($res->fetch_assoc());

Didn’t I say it would be transparent? If so, why touch the second-best forum applications source code ?! No need to. The section name "[second_best_forum]" from the configuration file "mysqlnd_ms_repl_setup.ini" could also read " [" host_or_ip_used_so_far]" Let’s assume that your forum had been configured to connect to "192.168.2.12" before you started with replication. Try this config…

[192.168.2.12]
master[] = master.mynetwork
slave[] = slave_1.mynetwork
slave[] = slave_2.mynetwork

… and grow

Your customer loves you. You solved the load problem within minutes using MySQL replication and the magic mysqlnd plugin for PHP 5.3 hiding the little mistake you made when picking the second-best PHP forum application. No application developer was involved. Costs have been saved. Unfortunately, the customers forum becomes more and more popular. You reach a point where you have to introduce a second master. You need to start to plan your data distribution. You need more control over the query distribution.

Continue Reading →

2011/01/21
by admin
1 Comment

PHP: Transparent load balancing and sharding with mysqlnd

Psst, you want some client-side MySQL load balancing with and without sharding for your PHP application? PHP 5.3 has something to offer for you. It is free. It requires no to very little changes to your applications. It is open source but works with closed source applications as well. You won’t find this information in any book, because some book authors still do not cover "mysqlnd". Read on for 30+ lines of PHP to add round-robin connection load balancing and, 70+ lines of PHP to add MySQL master slave replication or sharding support to your application without changing the application. Black magic from Manmuckiel.

The goal

Round-robin connection load balancing

Client   Server
connect(‘Host A’) mysqlnd plugin
30+ lines PHP
-> MySQL on ‘Host B’

Load balancing and sharding by query redirection and rewriting

Client   Server
connect(‘Host A’)
query(‘Query A’);
mysqlnd plugin
70+ lines PHP
-> MySQL on ‘Host B’
runs query ‘Query B’

The PHP 5.3 source tree contains a PHP licensed full-fledged MySQL client library called "MySQL native driver for PHP" or short "mysqlnd". Any of the three existing PHP MySQL extensions (mysql, mysqli, PDO_MYSQL) can make use of it. As a new development the mysqlnd library has a modular design featuring a mighty plugin interface.

The mighty plugin interface has attracted the attention of Mayflower/thinkPHP and David Soria Parra leading into the development of PECL/mysqlnd_uh (uh = user handler). By help of PECL/mysqlnd_uh it is possible to write mysqlnd plugins, the magic back boxes above, using PHP instead of C. For prototyping and spreading ideas PHP is the best choice, for production use consider rewriting mysqlnd plugins prototyped in PHP using C for performance and feature-completeness.

Client stack
Any application (Oxid, WordPress, phpMyFAQ)
MySQL extension (mysql, mysqli, PDO_MYSQL)
mysqlnd library
mysqlnd plugin
PHP or C based
|
MySQL Server

Any mysqlnd plugin runs on the client-side of you web architecture. Unlike with MySQL Proxy there cannot be a single point of failure in your architecture. Nor is there the risk of an overloaded central bottle-neck as all logic is on the client. Scaling horizontally, scaling by adding machines to your setup is as easy as it can get if using mysqlnd plugins. The mysqlnd client-side query cache plugin PECL/mysqlnd_qc is the best example.

Continue Reading →

2010/10/28
by admin
1 Comment

PHP: the 38 characters of the mysqlnd profiler

Recently I was forced to benchmark a mysqlnd plugin on Windows. X-Debug was no help. It worked flawless on Windows but it can’t tell you what goes on inside a PHP user space API call, for example, you will not know how much time mysqli_query() spends reading a result set. Very Sleepy is nice and the latest Microsoft Visual Studio profiler are probably wonderful but any of those external profiling tools did give me too fine-grained information. Also, they are external profiler which means you have to install extra software.

The mysqlnd statistics didn’t help me either. I didn’t need any aggregated values, I was curios if a certain function was the bottleneck. The solution:

  • extra statistics for critical sections in the plugin using the mysqlnd statistics framework (C coding)
  • dump profiling information into mysqlnd debug log (thanks Andrey)
  • make mysqlnd debug log available on Windows (when using VC9 or newer, thanks Andrey)

Since today mysqlnd adds profiling information to its debug log. Note the "(total = n own= n in_calls=n)" behind the closing tag of a function call. The times are given in milliseconds. For example, "_mysqlnd_pecalloc (total=401 own=401 in_calls=0)" tells you that mysqlnd has spend 401 milliseconds to allocate some memory. Why is that so slow? Simple: I’m using a VM.

   0:>mysqlnd_init
   1:| info : persistent=0
   1:| >_mysqlnd_pecalloc
   2:| | info : file=mysqlnd.c       line=2322
   2:| | info : size=780 ptr=0x2e08d30
   1:| <_mysqlnd_pecalloc (total=401 own=401 in_calls=0)
   1:| >mysqlnd_conn::set_state
   2:| | info : New state=0
   1:| <mysqlnd_conn::set_state (total=247 own=247 in_calls=0)
   1:| >mysqlnd_conn::get_reference
   2:| | info : conn=0 new_refcount=1
   1:| <mysqlnd_conn::get_reference (total=195 own=195 in_calls=0)
   1:| >mysqlnd_conn::init
   2:| | >_mysqlnd_pecalloc
   3:| | | info : file=mysqlnd_net.c   line= 886
   3:| | | info : size=156 ptr=0x2e086f8
   2:| | <_mysqlnd_pecalloc (total=586 own=586 in_calls=0)
   2:| | >mysqlnd_net_init
   3:| | | info : persistent=0
   3:| | | >mysqlnd_net::set_client_option
   4:| | | | info : option=202
   4:| | | | info : MYSQLND_OPT_NET_CMD_BUFFER_SIZE
   4:| | | | info : new_length=4096
   4:| | | | >_mysqlnd_pemalloc
   5:| | | | | info : file=mysqlnd_net.c   line= 586
   5:| | | | | info : size=4096 ptr=0x2e09070 persistent=0
   4:| | | | <_mysqlnd_pemalloc (total=599 own=599 in_calls=0)
   3:| | | <mysqlnd_net::set_client_option (total=4764 own=4165 in_calls=599)
   2:| | <mysqlnd_net_init (total=5406 own=642 in_calls=4764)
   2:| | >_mysqlnd_pecalloc
   3:| | | info : file=mysqlnd_wireprotocol.c line=2095
   3:| | | info : size=72 ptr=0x2e06f78
   2:| | <_mysqlnd_pecalloc (total=442 own=442 in_calls=0)
   2:| | >mysqlnd_protocol_init
   3:| | | info : persistent=0
   2:| | <mysqlnd_protocol_init (total=207 own=207 in_calls=0)
   1:| <mysqlnd_conn::init (total=15320 own=8679 in_calls=6641)
   0:<mysqlnd_init (total=26276 own=10113 in_calls=16163)

Here is how you create such a debug log with profiling information:

  • Get the PHP 5.3.4-dev source, the fresh meat not the stinky one
  • Compile PHP with mysqlnd support and enable debugging, for example, ./configure --with-mysqli=mysqlnd --enable-debug on Linux or configure --with-mysqli --with-mysqlnd --enable-debug on Windows.
  • Set the PHP configuration (php.ini) setting: mysqlnd.debug, for example, mysqlnd.debug="d:t:O,/tmp/mysqlnd.log" (38 characters ;-)) on Linux or mysqlnd.debug="d:t:O,mysqlnd.log" (34 :-o).
  • Run your PHP MySQL script

Pitfall: the mysqlnd debug option parser does not support file names that contain ":" You cannot use something like "C:\tmp\mysqlnd.log". Its about debuging and profiling. We will probably not lift that limitation without a strong need. But, of course, mysqlnd is Open Source. If you need it urgently, feel free…

2010/10/14
by admin
1 Comment

PHP: 62 characters to see all MySQL queries

Did you ever want to see the database queries an arbitrary PHP MySQL application runs? It takes two free downloads and 62 characters to see them.

  • Get PHP 5.3.4-dev
  • Get PECL/mysqlnd_qc
  • Add 19 characters to your configure line: --enable-mysqlnd-qc
  • Add 43 characters at the end of applications source, e.g. using the auto_append_file configuration setting: var_dump(mysqlnd_qc_get_query_trace_log());
  • Start your query analysis


array(329) {
  [0]=>
  array(8) {
    ["query"]=>
    string(27) "SET @@session.sql_mode = """
    ["origin"]=>
    string(1330) "#0 /home/nixnutz/Downloads/oxid/core/adodblite/adodbSQL_drivers/mysql/mysql_driver.inc(352): mysql_query('SET @@session.s...', Resource id #26)
#1 /home/nixnutz/Downloads/oxid/core/adodblite/adodb.inc.php(316): mysql_driver_ADOConnection->do_query('SET @@session.s...', -1, -1, false)
#2 /home/nixnutz/Downloads/oxid/core/oxdb.php(216): ADOConnection->Execute('SET @@session.s...')
#3 /home/nixnutz/Downloads/oxid/core/oxconfig.php(479): oxDb::getDb()
#4 /home/nixnutz/Downloads/oxid/core/oxconfig.php(406): oxConfig->_loadVarsFromDb('oxbaseshop')
#5 /home/nixnutz/Downloads/oxid/core/oxconfig.php(448): oxConfig->init()
#6 /home/nixnutz/Downloads/oxid/core/oxsupercfg.php(115): oxConfig::getInstance()
#7 /home/nixnutz/Downloads/oxid/core/oxutilsobject.php(207): oxSuperCfg->getConfig()
#8 /home/nixnutz/Downloads/oxid/core/oxutilsobject.php(109): oxUtilsObject->getClassName('oxutilsobject')
#9 /home/nixnutz/Downloads/oxid/core/oxutilsobject.php(74): oxUtilsObject->oxNew('oxUtilsObject')
#10 /home/nixnutz/Downloads/oxid/core/oxfunctions.php(284): oxUtilsObject::getInstance()
#11 /home/nixnutz/Downloads/oxid/core/oxutils.php(101): oxNew('oxUtils')
#12 /home/nixnutz/Downloads/oxid/core/oxfunctions.php(448): oxUtils::getInstance()
#13 /home/nixnutz/Downloads/oxid/index.php(72): require_once('/home/nixnutz/D...')
#14 {main}"
    ["run_time"]=>
    int(0)
    ["store_time"]=>
    int(0)
    ["eligible_for_caching"]=>
    bool(false)
    ["no_table"]=>
    bool(false)
    ["was_added"]=>
    bool(false)
    ["was_already_in_cache"]=>
    bool(false)
  }
...
}

If you want to know more, check this presentation.