Ulf Wendel

Non-blocking INSERT with mysqlnd

An INSERT does not delay me much. At least, it does not necessarily block a PHP MySQL script immediately. The asynchronous query feature of the mysqlnd library helps out. A walkthrough what mysqlnd can do today and could do in theory.

Traditional synchronous API

$ret = $handle1->query("INSERT ...");
$ret = $handle1->query("INSERT ...");

By default, mysqli_query() is a blocking API call. It sends the query to MySQL and waits for MySQL to reply. Let’s see what happens in general when executing two INSERT statements. PHP offers no parallel processing language primitives such as threads and thus, the two INSERT statements run in a serial fashion. Execution times add up.

Synchronous, blocking API, single connection
Connect INSERT INSERT  
5ms 20ms 10ms 35ms

Please note, all times are fictional. They have been choosen to highligh the basic principles.

The non-blocking version

Running statements in parallel using multiple connections makes things faster from a client perspective. Assuming the server can handle the load, the clients wait time is roughly the execution time of the slowest query. As a cutting edge PHP MySQL developer, you know syntax for achieving this already.


$handle1->query("INSERT ...", MYSQLI_ASYNC);
$handle2->query("INSERT ...", MYSQLI_ASYNC);
$all_handles = array($handle1, $handle2);
$processed = 0;
do {
  $handles = $errors = $reject = array();
  foreach ($all_handles as $handle) {
    $handles[] = $errors[] = $reject[] = $handle;
  }
  if (!mysqli_poll($handles, $errors, $reject, 1)) {
    continue;
  }
  foreach ($handles as $handle) {
    if ($handle->reap_async_query()) {
      $processed++;
    }
  }
} while ($processed < count($all_handles));

Asynchronous, non-blocking API, two connections
Connect INSERT  
5ms 20ms 25ms
Connect INSERT  
5ms 10ms 15ms

Cool, the client’s wall clock wait time is down from fictional 35ms (5 + 10 + 20) to 25 ms (5 + max(10, 20) ) 30ms (5 + 5 + max(10, 20)) [good find, Thomas!]. If you are really concerned about performance you may want to try further reducing the connection times. Persistent connections come to mind. As an aside: in 2012, we should call them pooled connections because their state is reset before being reused. Internally, the MySQL C API call mysql_change_user() call is used to reset. As this is a rather expensive and slow call, I’ve lowered the connection time only from 5 to 3 in the below example.

Asynchronous, non-blocking API, persistent connections
Con. INSERT  
3ms 20ms 23ms
Con. INSERT  
3ms 10ms 13ms

Still not good? If you compile PHP yourself, you can disable the call of mysql_change_user() which gives you true persistent connections and gets the connection overhead (in case of reuse) near zero…

How about Multiplexing?

… in theory, given specificaly selected queries, multiplexing may further improve performance. Multiplexing will share a network connection to MySQL among multiple user handles. Please note, I am not talking about anything that exists and is available from MySQL.

  Traditional   Multiplexing
Connection handle Handle 1 Handle 2   Handle 1 Handle 2
  | |   \ /
Network Connection Conn 1 Conn 2   Connection 1
  \ /   |
  MySQL   MySQL

Multiplexing is usually done to lower connection overhead and reduce the number of openend connections. The latter is good for the server. Usually the server is the hardest part to scale in a client-server setup. Thus, any technique to offload the server should be welcome.

Asynchronous, non-blocking API, multiplexing
Connect INSERT  
Connect 20ms 5ms
INSERT  
10ms 10ms

This looks great but, hey, how is it different from mysqli_query(..., MYSQLI_ASYNC) using a single pooled connection? There is none: the two INSERT end up on the same line being serialized. Likely, the synchronization on the line would even have a negative impact on fire and forget sequences.

To discuss multiplexing advantages, one would have to look at a situation where multiple PHP scripts run in parallel. So, why am I telling? It can be worth looking into the PHP MySQL API at a finer level than the first, random “PHP MySQL tutorial” listed by a search engine suggest.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

5 Comments

  1. Small question:
    > $handle1->query(“INSERT …”, MYSQLI_ASYNC);
    > $handle2->query(“INSERT …”, MYSQLI_ASYNC);

    I need open 2 connections, so the total time is 30 ms? (5 + 5 + max(10, 20))
    Thanks!

  2. Have you given any consideration to INSERT DELAYED?

  3. Async is great, but error handling is not documented: https://bugs.php.net/bug.php?id=62101

  4. @Justin Sure, INSERT DELAYED is wonderful if its restrictions work for you, http://dev.mysql.com/doc/refman/5.6/en/insert-delayed.html . But INSERT DELAYED is depreated in MySQL 5.6. It will eventually be removed in future versions.

  5. @ Thomas, yes our documentation of the feature is not nice. It needs to be improved.

    Yes, you found another flaw. Not only a bug on the documentation but another small one in my math. I show an example with two handles for running two asynchronous queries on two connections. The connect shall take, for example, 5ms. First you spend 5ms to open the connection for handle1, then another 5ms to open the connection for handle2. This gives you the (5 + 5) + (max(10, 20)).

    However, this two connections, two INSERT example is only there to illustrate the idea. Imagine you have 1,000 INSERTs but four connections…

    Please, like with so many performance tipps, be sceptical and run your own tests. Don’t blame me if you find this trick to be of little use in your case. INSERT performance is not only an API question.

    Async may be much more useful for long running SELECTs. But that’s the example from the manual and I wanted to give a different one :-) .