Ulf Wendel

PDO_MYSQLND: R[a|u]mbling and a breeze of progress

The modification of PDO_MYSQL to support the MySQL native driver for PHP (mysqlnd) is progressing. We are using the project title “PDO_MYSQLND” for the modification. The goal of PDO_MYSQLND is to provide a PDO driver for MySQL which can be compiled either against the MySQL Client Library or against the MySQL native driver for PHP. This is the same type of modification we did with ext/mysql and ext/mysqli already.

The use of any of the libraries is transparent for the PHP user. You may continue to use the MySQL Client Library, like you do today with PDO_MYSQL, or give mysqlnd a try. The MySQL native driver for PHP (mysqlnd) is easier to compile as its tightly integreated into the PHP internals and ships with PHP as of version PHP 5.3. In case of ext/mysql and ext/mysqli our benchmarks and first user feedback indicates that mysqlnd gives you at least the same performance of libmysql. And sometimes it is faster and more memory efficient. Read below and see my blog archive for more on mysqlnd.

PDO_MYSQLND test results

Finally, the PDO_MYSQLND test results are looking fine. On a 64bit Linux box PDO_MYSQLND compiled against mysqlnd passes 98.8% of the tests and the code coverage is above 90%. When compiled against libmysql, the results are a little worse: 97.7%. In case of mysqlnd, the remaining 1.2% test failures are PDO (not driver) bugs. We are making use of the new --XFAIL-- feature of the PHP test tool run-tests.php to mark them as expected failures as long as we do not have the time to fix PDO itself as we need to finish our driver itself.

In total we run about 180 tests. Some 100 out of them are generic tests from ext/pdo/tests and some 80 tests are MySQL specific and reside inside ext/pdo_mysqlnd/tests. PHP 5.3 comes only with about 50 generic tests. We have written some 50 new tests (= doubled the number of tests) to test and guess the expected behavior of PDO. Unfortunately the PDO documentation leaves out many details and PDO does not ship with a comprehensive set of tests. As a result database vendors can develop PDO drivers which do not follow any strict standard as they do not need to pass a rich set of compliance tests.

Every PDO driver behaves differently

New generic tests, like ours, are facing the challenge that every PDO driver behaves slightly different! The differences cannot be called bugs as no compliance test suite exists and its not defined what is “correct”. Although we have run our new generic tests against SQLite, PostgreSQL, DB2 Express and Oracle Express during the development of the tests, I do not dare to call the tests “correct”. Therefore I would never commit the tests into the PHP repository. However, if anybody is interested in the tests and wants to tweak them to be good enough to be checked in to the PHP repository, feel free to contact me. The tests are Open Source and we will share them.

Cross-Platform testing

Once a MySQL Connector has matured a bit (= preview level) we routinely start cross-platform testing on all build hosts of the MySQL Server. At the time of writing this means some 30 different platforms. On each platform we have the latest version of MySQL 4.1, 5.0 and 5.1 installed to run the tests against them. The PHP we use gets updated from CVS before every test run.

In case of PDO_MYSQLND we have done the cross-platform testing only once in the past. The results have been OK – not perfect – for an early version. After fixing one 32bit issue, we will focus on the cross-platform testing.

PDO does not support the development of optimized drivers

If you ever happen to have a deeper look into the PDO source code and its architecture, you will soon realize how difficult it must have been to design the PDO core. Every database vendor provides a different C-API. PDO has defined a driver API to overcome all the different APIs. The drivers need to map their proprietary C-APIs to the PDO interface. Sometimes this works fine, sometimes its inefficient or even impossible to map the internal API of a driver to the internal PDO driver interface. The 32bit bug I was talking about in the last paragraph is one such example.

For the future it should be considered to give up the core plus driver approach in favor of a design similar to JDBC. Database vendors can implement their JDBC drivers however they want as long as the driver passes the JDBC compliance critieria. This would give the vendors more room to optimize their drivers for PHP.

The internal PDO driver interface defines a method get_col(pdo_stmt_t *stmt, int colno, char **ptr, unsigned long *len, int *caller_frees TSRMLS_DC) to fetch results on a per column basis. However, lets see what happens if a PHP user does fetch a row consisting of 10 columns using $row = $stmt->fetch(PDO::FETCH_ASSOC). PDO will fetch all 10 columns of the row individually using the internal driver interface method get_col(). Then PDO will copy the data obtained from the driver into a zval. PHP variables are implemented as zvals on the C-level.

This does not always match the underlying C-API very well. For example, when using Prepared Statements, the typical C-API calls are prepare(), bind_result(<column>, <output>), execute(), while (MORE_RESULT == fetch()) { <use output> }. The typical fetch() call provides the data on a per-row basis. Check the Oracle Call Interface or the MySQL C-API documentation for examples. Although the user requests the data on a per-row basis using $row = $stmt->fetch(PDO::FETCH_ASSOC) as well, PDO does blow up the one fetch() call to n get_col() calls with n being the number of columns in the row.

$row = $stmt->fetch(PDO::FETCH_ASSOC); // 5 columns
Number of calls PDO today PDO future? C-API (Prepared Statements)
1 stmt_fetch() stmt_fetch() stmt_fetch()
2 stmt_get_col(1) stmt_get_row() (data fetched into output buffers)
3 stmt_get_col(2)    
4 stmt_get_col(3)    
5 stmt_get_col(4)    
6 stmt_get_col(5)    

Don’t get me wrong. I do not know of a better internal PDO API. The huge number of fancy fetch options available in PDO is probably one reason why PDO needs the get_col() call. I am wondering if idea of a core should be skipped for future versions of PDO.

Teaching PDO to do fewer memory copies

The 32bit mysqlnd issue is another example of how tricky the internal PDO API can be. One of the great benefits of mysqlnd is its memory management. Virtually all PHP database drivers are based on a C libraries provided by the database vendors. The C libraries implement the communication protocol used between the database servers and the C application clients such as PHP. The MySQL Client Library (AKA libmysql) is such a typical library. When fetching data from the MySQL Server it holds the data twice: once in the internal network communication buffers and once in the internal data buffers. PHP will retrieve the data from the internal data buffers and copy it into zval-structures (the PHP variables). The MySQL native driver for PHP is cleverer. As long as the internal buffers are available and you do not modify the PHP variables returned by mysqlnd, the PHP variables will point to the internal buffers. A copy operation can be omitted. This saves both memory and CPU cycles. You can save up to 40% memory with mysqlnd.

PDO does not fully support this new technology. PDO’s get_col() function will always copy column data into its own zval although mysqlnd could provide PDO with a zval. Both ext/mysql and ext/mysqli will at least delay and often even avoid this additional copy operation when used with mysqlnd. But PDO will not make use of this mysqlnd feature.

Therefore, Johannes is working on a patch against PDO to support drivers which want to manage zvals on their own. The patch is rather short but you need to know what you are doing when working with pointers in C ;-). As a side-effect this patch with solve the 32bit issue I was talking about earlier.

PDO, please don’t set valid data to NULL

The classical MySQL Client Server protocol sends all data as strings. This can cause many cast operations on the client if the client tries to convert the strings back into their native data types. The introduction of a new binary protocol used for all Prepared Statements has solved this issue. The MySQL native driver for PHP makes use of the new binary protocol and returns native PHP types whenever possible.

PDO leaves all SQL type to PHP type mapping to the driver. A driver might or might not return PHP integer variables for INTEGER columns. But if a driver reports an integer value to PDO through metadata obtained before the first fetch happens, PDO will only accept integers. Any other value will be ignored and NULL gets returned to the PHP user. But what about dynamic SQL to PHP type mappings which do not decide on the type before the data has been fetched and its value is known? As PDO does not define any SQL type to PHP type mapping rules such a dynamic mapping is perfectly valid. Dynamic type mapping is allowed but hardly possible.

PDO_MYSQLND does such a dynamic mapping when compiled against mysqlnd. Take the value range of MySQL’s INTEGER UNSIGNED: 0 … 4294967295. On a 32bit system 4294967295 is larger than PHP_INT_MAX (2147483647). This means that some INTEGER UNSIGNED values can be mapped to PHP’s integer type but others need to be returned as double values. Whereas on a 64bit system PHP’s integer data type can hold the entire value range. The decision what data type is “best” (integer or double) cannot be made before the fetch is done. This is too late for PDO. PDO asks for the type before the fetch is done, mysqlnd indicates an integer, performs the fetch and returns a double. At this point PDO detects an out-of-range value and return NULL to the user.

You can’t call it a bug in PDO nor can you call it a bug in mysqlnd as its not defined how the type mapping shall be done. It might be possible to hack around the issue and fake the type reported to PDO. Luckily this issue can be solved by making PDO accept zvals from the driver and not copy data if not needed. See above: Johannes is working on it.

mysqlnd solves 2050 – Row retrieval was canceled by mysql_stmt_close() call

The PDO API does not provide a call to close a prepared statement handle. You can free a result set associated with a prepared statement using PDOStatement::closeCursor() but there is no call to free the prepared statement itself. Its a good practice to free prepared statements as early as possible. Every prepared statement requires database server resources. To explicitly close a statement you can use unset(<PDOStatement>) to destroy the statement object. unset() will call the destructor of the statement object. The destructor has been registered by PDO and PDO will close the statement and free all resources associated with it.

Usually, you will not miss the extra API call to close a prepared statement and PDOs destructor approach will work fine. But if you use PDO_MYSQL or PDO_MYSQLND compiled against libmysql, you might miss it. Its many conditions that have to come together but you can provoke the MySQL error message 2050. PDO has no chance to work around it. The easiest work around is to use mysqlnd.

The issue can only be reproduced with libmysql and unbuffered result sets. Unbuffered result sets are not the PDO default. You have to explicitly enable unbuffered fetches. Under these circumstances the second call to PDO::query() will fail.

$db = new PDO("mysql:dbname=phptest;unix_socket=/tmp/mysql.sock", "user", "pass");
// turn off prepared statement emulation
$db->setAttribute(PDO::MYSQL_ATTR_DIRECT_QUERY, 0);
// turn on unbuffered result sets
$db->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
$stmt = $db->query("SELECT 1");
var_dump($db->fetchAll(PDO::FETCH_ASSOC));
// 2050 - Row retrieval was canceled by mysql_stmt_close() call
$stmt = $db->query("SELECT 2");

Is is not a bug that you get an error. Its a race condition for the PDO destructor approach to close prepared statements. Lets analyze step by step what happens. The first query gets executed and all results will be fetched. A new prepared statement object gets created from the SQL command SELECT 2 using the PHP expression $stmt = $db->query("SELECT 2");. PHP will execute the right side of the expression first before it takes care of the left side of the expression. On the right side is $db->query("SELECT 2");. PDO will prepare and execute the SQL statement and return a new PDOStatement object. The execution continues on the left side of the expression by assigning the new PDOStatement object to the variable $stmt. At this point PHP detects that no references to the PDOStatement object stored currently stored in $stmt exist. PHP calls the destructor of the first PDOStatement object and assigns the new PDOStatement object to $stmt. The destructor tries to close the prepared statement handle. But that is not allowed at this point. MySQL has executed a prepared statement (right side of the expression) and expects you to fetch all data before sending new commands. But instead of fetching unbuffered data PDO sends a close command to the MySQL server: 2050. MySQL does its best to recover the line, cancels the fetch and bails out about the close command.

How to work around this? Use mysqlnd, explicitly close the first prepared statement using unset($stmt) before storing a new prepared statement in the same variable $stmt, use $stmt1 for the first prepared statement and $stmt2 for the second prepared statement, don’t use unbuffered fetch… – many possible solution but no automatic way.

A breeze of progress

You might have an idea now why we are progressing slowly. Anyway, the next alpha release is not far away now.