Ulf Wendel

PHP: parallel, background, asynchronous fetch

Given the very low number of issues reported to us, we can assume that the core features of mysqlnd have matured. Therefore, we spend some time discussing parallel, background and asynchronous fetch ideas for mysqlnd. All have been on our brainstorming list from the beginning but it was just too early to even discuss them before these days. None of the ideas is new in any way as a recent discussion on the PHP Internals list on parallel database queries has shown. However, people seem to expect different things from it and are not sure how to implement it. We trapped into this ourself when Andrey wrote a background fetch proof-of-concept patch recently. Back one step: what is this about, what do you want, please comment!

The status quo

PHP scripts are not threaded. A PHP scripts cannot follow two distinct execution paths at the same time, unless a PHP extension provides measures to do so. PHP itself does not support parallel activities. This implies that it is not possible to perform background operations. When issuing a database query PHP waits for the database server to execute the query. PHP is blocked. How long PHP is blocked depends on the database interface. When using MySQL, PHP is either inactive until the MySQL Server has calculated the query result (unbuffered results, Prepared Statements) or until the query result has been calculated and all query results have been transferred to PHP (buffered results). The illustration shows the most common case: mysqli_query() used with default settings which causes a buffered result set.

Buffered fetch in detail

The two main causes for wait situations in this setup are highlighted using a yellow star symbol.

  1. PHP is blocked during the query execution
  2. PHP is blocked until results have been fetched and decoded

Is fetching too fast?

Fetching and decoding results is a very fast operation most of the time. If not, your network connection is the bottleneck, but not PHP. Andrey has tried to optimize the fetch operation using a background thread. We tested a patched version of mysqlnd which features a dedicated IO thread to perform an asynchronous, non-blocking fetch. That means, even for buffered results, mysqli_query() returned immediately after the query execution but before fetching any data. Fetching data was performed by the IO thread in parallel to PHP script execution. Did it help? Not really. Often the IO thread was done before the Zend Engine was able to execute the statement which followed the mysqli_query() call. With other words: fetching is so fast that you will hardly be able to use the time frame of the fetch operation for parallel activities from a PHP programmers view.

Background fetch: too fast?

Judging from the very first attempts with background fetch: the benefit is that one might be able to close a result set on the server a little earlier. However, if its up to PHP to call mysqli_free_result() and the fetch has been performed even before PHP can call any of the mysqli_fetch_*() functions, you gain very little from it. Actually, so far we have not been able to find a benchmark which proves this technique to be a good one.

If you are curious to know how the API could be like, here you go. But as said, it seems not very promising on the first look.

$con = mysqli_connect(...);

// returns after query execution and before buffering
$res = mysqli_query($con, 'SELECT ...', MYSQLI_BG_STORE_RESULT);

// quite often fetching has been done before we reach this line...
print mysqli_fetch_row($res);

mysqli_free_result($res);
mysqli_close($con);

The fun is in the query runtime

A more promising approach is to concentrate on the query runtime. It can range from milliseconds to hours. But take something more typical, say your web site always runs three independent queries with execution times of 500ms, 1000ms and 1500ms each. Let it take 100ms each to process the database results. Using the current synchronous API you’re done after 3300ms and you have spend 3000ms waiting. Using an asynchronous API, you are done after 1600ms and you have spend 1300ms waiting. Put in different numbers and you have different relations between synchronous and asynchronous. However, the potential becomes quite obvious.

t (ms) Synchronous Asynchronous
0 Send query 1 (500ms) Send query 1 (500ms), 2 (1000ms), 3 (1500ms)
500 Process query 1 (500ms) Process query 1 (500ms)
600 Send query 2 (1000ms) Done with query 1 (500ms)
1000   Process query 2 (1000ms)
1100   Done with query 2 (1000ms)
1500   Process query 3 (1500ms)
1600 Process query 2 (1000ms) Finished
1700 Send query 3 (1500ms)  
3200 Process query 3 (1500ms)  
3300 Finished  

The need for distinct connections

The MySQL Client-Server Protocol has no build-in support for asynchronous query execution. The state transition is always SEND – RECEIVE. After sending a query you always have to fetch its result in the very next step. You can send multiple statements at once – mysqli_multi_query(). But this does not free you from having to do the fetch right after the send: SEND(q1, q2) – RECEIVE(r1, r2). Given this, you will understand that an asynchronous PHP API cannot be build around just one connection. Within one connection you simply cannot run queries in parallel. If you try so, you will always stack them up and you will end up with the already existing. You will end up with mysqli_multi_query().

No need for a background thread?

Assuming that the key is in reducing wait times for query results, judging from our first experiences with a background thread for fetching and keeping in mind that simple solutions are the best to start with, there is little need for a sophisticated background thread design. Power users might profit from it, but most users should be perfectly fine with a simple poll()-based approach. Even that might be of use for only few users.

Suggesting an asynchronous API

A possible API consists of a new flag for mysqli_query() and a new function: mixed mysqli_poll(array $result_handles, int $timeout [, array &$invalid_handles]). A new flag MYSQLI_ASYNC makes mysqli_query() return to the PHP script immediately after the query has been send to the MySQL Server. Like ever, mysqli_query() returns a result handle. Using mysqli_poll() the PHP developer can check if a result handle is ready to send data because the corresponding query has been executed on the server. The function takes a list of result handles and returns whichever handle is ready for processing. If no handle is ready after $timeout milliseconds, mysqli_poll() returns false. It might happen that a handle becomes invalid while you wait for the query execution. For example, the network connection might get closed. mysqli_poll() could, this is just a suggestion, indicate this through an optional third parameter which is passed by reference. Its a bit like exec() or system().

Here comes some pseudo-code to demonstrate the suggested API.

// for two parallel queries, we need two connections
$con1 = mysqli_connect(...);
$con2 = mysqli_connect(...);

$results = array();
// returns immediately
$results[] = mysqli_query($con1, 'SELECT ...', MYSQLI_ASYNC);
$results[] = mysqli_query($con2, 'SELECT ...', MYSQLI_ASYNC);

// housekeeping
$timeout = 100;
$invalid_handles = array();
$results_left = count($results);
$num_timeouts = 0;

do {
  // poll result handles for $timeout milliseconds
  // return as soon as a handle is ready for reading
  // report errors through third parameter - similar to exec()
  $res = mysqli_poll($results, $timeout, $invalid_handles);

  if (!empty($invalid_handles)) {
    // issues with one of the handles, e.g. connection down
    foreach ($invalid_handles as $key => $handle) {
      printf('Problem with handle %d - %s',
        $key, mysqli_error($handle));
      $results_left--;
    }
  }

  if (is_resource($res)) {
    // result handle $res has data
   while ($row = mysqli_fetch_assoc($res))
     print_r($row);
   mysqli_free_result($res);
   $results_left--;
 } else {
   // timeout
   $num_timeouts++;
   if ($num_timeouts > 3)
     break;
 }

} while ($results_left > 0);

Ok, where’s the code?

Sorry, no code. First, we are all busy with other assignments these days and it will be a major task to implement an asynchronous API. The blog describes a potential way for buffered “normal queries”. It does not touch unbuffered result sets and Prepared Statements or even multi query. We won’t be able to implement a good solution within days.

Second and foremost: what do you think about it? Please comment.

17 Comments