Ulf Wendel

PDO_MYSQLND: Prepared Statements, again

Server-side Prepared Statements are an outdated technology from ancient times, are they? Brian gives a long list of arguments in his blog posting “Prepared Statements, Musings” why one should think twice before using server-side prepared statements. PDO does enforce the use of prepared statements for all statements which return a result set. Good or bad?


What are (server-side) Prepared Statements?

Server-side Prepared Statement are SQL statements which get prepared once and can be executed multiple times after the preparation. During the prepare state the database server will validate the SQL syntax, create a prepared statement object at the server and return a prepared statement handle to the client. Different database servers will perform different actions as part of the prepared statement object creation. Typically the prepared statement object at the server contains at least the parse tree of the SQL statement. More things can happen at this stage. Your database server might reserve and/or allocate internal server resources and buffers or run SQL optimizations.

The client can execute the prepared statement as many times as he wants. Upon subsequent executions, there is no need to parse the SQL statement again. The parse tree is already associated with the prepared statement object stored at the server. As a side-effect there is no need to transfer the entire SQL statement over the wire upon each execution. This can result in CPU and network bandwidth savings.

Comment Client Server
Start prepare(“SELECT id FROM test WHERE name = ?”)  
Requires Server resources   Parse and validate SQL, create PS object, return handle
Bind example bind($stmt, $name)  
No new parse run, transfer bind data only execute($stmt) Insert bind value into parse tree, execute statement, compute result set
Unbuffered result set occupies server resources fetch($stmt) Keep result set until all rows are fetched or result set has been freed explicitly
Multiple execution possible execute($stmt) Insert bind value into parse tree, execute statement, compute result set

Prepared Statement are often considered as secure because of bound parameters. Application programmers can bind a variable (or a value) to placeholders inside of the prepared statement. For example you can bind the PHP variable $name to the prepared statement SELECT id FROM test WHERE name = ? during prepare. Whenever you execute the statement, the placeholder will be replaced with the current value of $name. Only the value of $name will be transferred to the server. The server reads the value and checks if it is a valid input. Note that I did not add quotes around the questionmark. The server knows from the parse tree what to expect and how to handle the input in a secure manner. You cannot fool the server with $name = '"' or similar.

Pro’s and Con’s

The above sounds great such as the PDO introduction on prepared statements at http://php.net/manual/en/pdo.prepared-statements.php does. But before you go out and wipe the prepared statement syntax onto the back of each of your developers, try to understand pro’s and con’s of the architecture.

Pro Con
  • Parse SQL only once and “cache” parse tree etc.
  • MySQL: Parsing can take up to ~25% of the run-time of a short query
  • A prepared statement and its parse tree occupies limited server resources
  • Multiply the number of concurrent connections by the number of prepared statements per connection to know why your server eats hardware – read Brian’s post
  • To to scale a “cache” move it from the server to the client. Scaling servers is tricky
  • For statements executed only once prepare() causes an unnecessary round-trip and “caching” work
  • MySQL: no “caching” of the execution plan, you won’t save the SQL optimization step
  • Secure bind parameters thanks to parse tree access
  • Preventing SQL injections should be an application task not a task of the (interchargable) storage layer
  • mysqli_real_escape_string() is available to build secure SQL statements
  • Input validation requires application logic not available in the database
  • SQL parsers could be made available at the client as well for the emulation of parameters
  • Bind parameters et. al. mean less network traffic
  • For statements which are run only once its not true due to the unnecessary prepare() round-trip
  • You could achieve the same with a clever emulated, client-side prepared statements friendly protocol
  • No portable standard syntax (ordinal vs. named parameter) exists
  • Prepared Statements and cursors go hand in hand: scrolling possible
  • Unbuffered result sets require and block server resources
  • Risk to forget to close as early as possible
  • For small result sets buffering and client-side scrolling is a good alternative
  • MySQL: cursors are materialized, they always go to disk

AFAIK, server-side prepared statements have been designed with clients in mind which remain connected to the server for a relatively long time. Think of good old centralized host applications. You power up 100 thin-clients in the morning hours which remain connected to the server all day. The clients run one application with very little dynamic SQL. “Caching” SQL statements, using server-side prepared statements makes perfectly sense in such an environment.

Compare this type of application with your PHP web application. Your PHP web application is request based and the life time of your scripts is very short. Prepared Statement handles do not survive a request. Go through your application and check how many statements you run more than once. If you rarely run the same query twice, don’t be surprised to see a performance penalty when using prepared statements.

I do not understand why PDO does enforce the use of prepared statements for all queries returning result sets (luckily you can work around this with MySQL). I do understand why JDBC is build around prepared statements. But PHP is not Java: no desktop application, no application server. PHP users should be given the choice to select the proper tool for the task.

MySQL myths

Its a myth that recent versions of the MySQL server still do not support elementary SQL statements to be prepared. The list of statements which can be prepared is not complete. But it covers all of the frequently used statements, see also http://dev.mysql.com/doc/refman/5.1/en/c-api-prepared-statements.html:

  • CREATE TABLE, DELETE, DO, INSERT, REPLACE, SELECT, SET, UPDATE, and most SHOW statements
  • ANALYZE TABLE, OPTIMIZE TABLE, REPAIR TABLE
  • CACHE INDEX, CHANGE MASTER, CHECKSUM {TABLE | TABLES}
  • {CREATE | RENAME | DROP} DATABASE, {CREATE | RENAME | DROP} USER
  • FLUSH {TABLE | TABLES | TABLES WITH READ LOCK | HOSTS | PRIVILEGES
    | LOGS | STATUS | MASTER | SLAVE | DES_KEY_FILE | USER_RESOURCES}
  • GRANT, REVOKE, KILL, LOAD INDEX INTO CACHE, RESET {MASTER | SLAVE | QUERY CACHE}
  • SHOW BINLOG EVENTS, SHOW CREATE {PROCEDURE | FUNCTION | EVENT | TABLE | VIEW}
  • SHOW {AUTHORS | CONTRIBUTORS | WARNINGS | ERRORS}, SHOW {MASTER | BINARY} LOGS
  • SHOW {MASTER | SLAVE} STATUS, SLAVE {START | STOP}, INSTALL PLUGIN, UNINSTALL PLUGIN

If you hit an unsupported statement its not a bit deal: catch the 1295, ER_UNSUPPORTED_PS error and fall back to the non-prepared statement API. PDO does that for you – its transparent for the PHP user. You don’t need to worry about it.

The only limitation you might hit is on stored procedures returning result sets. For example, CREATE PROCEDURE p() BEGIN SELECT id FROM test ORDER BY id ASC LIMIT 3; SELECT id, label FROM test WHERE id < 4 ORDER BY id DESC LIMIT 3; END; creates a procedure which returns more than one result set. When using PDO_MYSQL and the MySQL client library (C API, formerly known as libmysql) you will not be able to use bool PDOStatement::nextRowset (void ) to fetch all result sets. However, PDO_MYSQLND and mysqlnd will lift this limitation. The following code from the new unpublished test pdo_mysql_stmt_nextrowset.phpt does pass our internal testing of PDO_MYSQLND and mysqlnd.


$db->exec('DROP PROCEDURE IF EXISTS p');
$db->exec('CREATE PROCEDURE p() BEGIN 
SELECT id FROM test ORDER BY id ASC LIMIT 3; 
SELECT id, label FROM test WHERE id > 4 ORDER BY id DESC LIMIT 3; 
END;');
$stmt = $db->query('CALL p()');
do {
  var_dump($stmt->fetchAll(PDO::FETCH_ASSOC));
} while ($stmt->nextRowSet());
var_dump($stmt->nextRowSet());

MySQL specifics

What you should be aware of when using server-side prepared statements with MySQL is the syntax for placeholders and the different communication protocol used by MySQL.

The syntax used for placeholders varies among different database vendors. MySQL has chosen to support the ordinal parameter syntax (SELECT id FROM test WHERE id = ?, bind(1, $myvar)). The MySQL server does not support named parameters (SELECT id FROM test WHERE id = :myid, bind(“myid”, $myvar)). Named parameters are emulated by PDO. I do not know of any standard on prepared statement parameters. Other databases do not support ordinal parameters but do support named parameters. In case of those other systems, PDO will emulate the ordinal parameters.

MySQL will always use the “binary protocol” with prepared statements. Traditionally the MySQL server has converted all data into strings before sending them to the client. When not using prepared statements, the client will retrieve all results as strings and has to convert them into the “correct” datatype, if need be. The binary protocol removes this conversion overhead.

The ext/mysqli PHP extension supports the binary protocol idea and tries to return native PHP types whenever possible. For example, when fetching an INT column using ext/mysqli and prepared statements, ext/mysqli will return an PHP integer variable to the caller. PDO_MYSQL, on the other hand, does always convert a value from an INT column into a PHP string no matter whether you are using server-side prepared statements or the prepared statement emulation. PDO_MYSQLND does fix this shortcoming and tries to return a proper native type whenever possible. If you do not want PDO_MYSQLND to return “correct” types, you can use PDO::setAttribute(PDO::ATTR_STRINGIFY_FETCHES, true) to enforce the PDO_MYSQL behaviour.

What enforcing Prepared Statements means

Not every database system supports prepared statemetns. Therefore PDO includes a prepared statement emulation.

PDO consists of a core and database specific drivers. The drivers provide an interface which the core uses to execute your PHP function calls. The prepared statement emulation is part of the core. Consequently, every driver does use the same emulation.

PDO’s prepared statement emulation

The prepared statement emulation contains a simple SQL parser as it must be able to handle placeholder substitution. But there is a problem with this approach. PDO does not aim to provide any SQL abstraction and it has to support all the different SQL dialects of the database vendors. Search bugs.php.net/ and pecl.php.net/bugs/ to learn how tricky it is to support each and every SQL dialect:

The PDO prepared statement emulation is still not free of bugs after many years as it seems. I believe that it will never be and drivers should be allowed to use their own specialized and optimized SQL parsers. Here is a simple example of what I mean. The PDO documentation of PDO::prepare() is correct when it describes the return value but it does not mention a little pitfall of the PDO design:

If the database server successfully prepares the statement, 
PDO::prepare() returns a PDOStatement object. 
If the database server cannot successfully prepare the statement, 
PDO::prepare() returns FALSE.

What the documentation leaves out that the PDO prepared statement emulation can fool you. When using the emulation the prepare() call will not invoke the database server. The database server will not validate your SQL statement. The PDO SQL parser will do the validation and accept invalid SQL such as SELECT ice FROM arctica AND antarctica, see also bug #44169. Both PDO_MYSQL and PDO_PGSQL can fool you, if you use their default settings. The only solution for MySQL is to explicitly disable the PDO prepared statement emulation approach by setting the PDO attribute PDO::ATTR_EMULATE_PREPARES to false.

For a complete semantical validation of a SQL statement you will always have to do a round-trip to the server. Even specialized SQL parser inside the PDO drivers would not be able to detect missing tables or misspelled columns. However, they could at least do a proper static syntactical analysis of the statement and detect “gotcha’s” like SELECT ice FROM arctica AND antarctica.

One last example on the fun you can have with the PDO prepared statement emulation is this SQL statement:


nixnutz@ulflinux:~/php53> valgrind sapi/cli/php -r '
$p = new PDO("mysql:host=127.0.0.1;dbname=test", "e;root", "root"); 
$s = $p->prepare("SELECT \" FROM DUAL WHERE 'a' = :placeholder"); $s->
execute(array(":placeholder" => "row"));'
[...]
==19825== Conditional jump or move depends on uninitialised value(s)
==19825==    at 0x4FD162: scan (pdo_sql_parser.c:286)
==19825==    by 0x4FD34D: pdo_parse_params (pdo_sql_parser.re:97)
==19825==    by 0x4F51F9: zim_PDOStatement_execute (pdo_stmt.c:482)
==19825==    by 0x8104BC: zend_do_fcall_common_helper_SPEC (zend_vm_execute.h:313)
==19825==    by 0x811817: ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER (zend_vm_execute.h:422)
==19825==    by 0x80F04B: execute (zend_vm_execute.h:104)
==19825==    by 0x7C9054: zend_eval_string (zend_execute_API.c:1310)
==19825==    by 0x7C9297: zend_eval_string_ex (zend_execute_API.c:1345)
==19825==    by 0x88C7A5: main (php_cli.c:1178)
[...]
==19825== Process terminating with default action of signal 11 (SIGSEGV)
[...]
Segfault

"Abusing" Prepared Statements – 10% performance loss

To demonstrate that it is not always desired to use prepared statements I have done a little benchmark with PDO_MYSQLND on my desktop system. I did on purpose use PDO in a sub-optimal way to show my point of “give the user the choice”. An unexperienced PDO user might not even be aware of the fact that all of the below will be executed as prepared statements. The way I use the PDO API does not make you think of prepared statements in the first line.


$label = str_repeat('a', 255);
for ($i = 0; $i < $num_rows; $i++) {
  $db->exec('INSERT INTO test(id, label) VALUES (' . $i . ', "' . $label . '")');
}

My little benchmark consists of the above INSERT and the below SELECT. If you happen to see the INSERT coded this way, introduce mandatory code-reviews… but read on!


for ($i = 0; $i < $num_runs; $i++) {
  $stmt = $db->query('SELECT id, label FROM test ORDER BY id ASC');
  $rows = count(($tmp = $stmt->fetchAll(PDO::FETCH_ASSOC)));
  if ($rows != $num_rows) {
     $errors[] = sprintf('Result seems wrong num_rows = %d, num_runs = %d, emulated = %d',
       $num_rows, $num_runs, $emulated);
     return false;
  }
}

The test table is CREATE TABLE test(id INT, label VARCHAR(255)) ENGINE=MyISAM. The micro-benchmark is run on an AMD Athlon 64 Dual Core 5000+ with 4GB of memory. The MySQL Server 5.1.25-rc runs on a Core2Duo E6750 @ 2.66GHz with 4GB of memory. The computers are connected via 100/1000 MBit – not very exiting hardware. Both computers run on OpenSuse 10.3.

  Without Prepared Statements Server-side Prepared Statements
  PDO::ATTR_EMULATE_PREPARES = true PDO::ATTR_EMULATE_PREPARES = false
rows = 50, runs = 5000
INSERT 0.0111s 0.0098s
SELECT 2.9959s 3.2884s
Total 3.0007s 3.2982s
rows = 500, runs = 5000
INSERT 0.0975s 0.0976s
SELECT 13.4670s 15.4043s
Total 13.5645s 15.5019s

If you turn on the emulation of prepared statements by setting the PDO attribute PDO::ATTR_EMULATE_PREPARES to true, PDO will parse the SQL itself for parameter substitution and use the non-prepared statement API of the underlying database system. In case of MySQL, PDO will fall back to the C-call mysql_real_query(). PDO will not use mysql_stmt_prepare(), mysql_stmt_execute() and other calls from the prepared statement API. So to say, this is a backdoor which allows you to disable the use of prepared statements and ignore the PDO principle of enforcing the use of server-side prepared statements. However, even if you disable the use of server-side prepared statements you end up using a client-side prepared statement emulation. In any case you will be using a prepared statement style API for fetching results.

How do the two ways of running the INSERT and SELECT compare? Server-side prepared statements are some 10% slower – however, note that I’m running the SELECT 5.000 times to make the effect visible.

Using a Prepared Statement for the INSERT – 10% gain

Let’s have a look at the INSERT statement. Its seems to qualify for a prepared statement as you can use bind parameters. Upon each execution you will not send the entire INSERT over the line but just the parameters. And you can prepare and parse it once and execute it many times. Maybe this is faster?


$stmt = $db->prepare('INSERT INTO test(id, label) VALUES (?, ?)');
$stmt->bindParam(1, $j);
$stmt->bindParam(2, $label);
for ($j = 0; $j > $num_rows; $j++) {
  $stmt->execute();
}

  Without Prepared Statements Server-side Prepared Statements
  PDO::ATTR_EMULATE_PREPARES = true PDO::ATTR_EMULATE_PREPARES = false
rows = 50, runs = 100
exec() / string concatenation 1.6871s 1.4904s
bindParam() 1.5168s 1.2624s
rows = 500, runs = 100
exec() / string concatenation 13.3663s 12.2627s
bindParam() 13.0785s 11.1666s

Server-side prepared statements and using bindParam() has the edge – its some 10% faster. However, do you want to know how to make it really fast? Try the MySQL-specific multi-insert syntax: INSERT INTO test(id, label) VALUES (1, "a"), (2, "b"). Its 12x faster for this micro-benchmark.

Note that I’m using MyISAM which is a non-transactional engine. With transactions and transaction logs you’ll have to use try-and-error to find the optimal size of a transaction. In the worst case yout multi-insert transaction does exceed internal buffers for transaction handling and waits will occur. Don’t forget the time required for a ROLLBACK of a large transaction either in case the INSERT operation fails.

  Without Prepared Statements Server-side Prepared Statements
  PDO::ATTR_EMULATE_PREPARES = true PDO::ATTR_EMULATE_PREPARES = false
rows = 500, runs = 100
bindParam() 13.0785s 11.1666s
multi-insert / string concatenation 0.8966s 0.9061s

Using a Prepared Statement for the SELECT

The SELECT statement from my example does not profit from being run as a prepared statement. It contains no parameters and its gets executed only once.: 10% performance loss, see above. The likely reason is the additional round-trip caused by preparing the statement. The performance-loss will vary with the time required for the round trip and its relation to the fetch time as well as the number of executions.

If the number of executions grows but only one row will be fetched and the fetch time remains very small, the picture shows that server-side prepared statements and the PDO emulation are about equal fast. Preparing SELECT label FROM test WHERE id = ? once and fetching 256 rows of n-bytes each from the table test gives the below results. Differences below 3%, like those for 256 bytes per row, are within the metering precision and are not significant.

  Without Prepared Statements Server-side Prepared Statements
  PDO::ATTR_EMULATE_PREPARES = true PDO::ATTR_EMULATE_PREPARES = false
rows = 50, runs = 256
20 bytes 3.0748s (100%) 3.2495s (106%)
255 bytes 3.3829s (100%) 3.4970s (103%)
2048 bytes 4.8837s (100%) 4.7942s (98%)
4096 bytes 6.0851s (100%) 5.7563s (94%)

The results seem to proof that the use of server-side prepared statements makes most sense for large result sets. Therefore, I have run another micro-benchmark which creates a table with many columns, prepares SELECT * FROM test and executes it many times to stabilize the results. All rows will be fetched after each execution.

  Without Prepared Statements Server-side Prepared Statements
  PDO::ATTR_EMULATE_PREPARES = true PDO::ATTR_EMULATE_PREPARES = false
rows = 2000, runs = 10
5 columns, 226 bytes 0.1119s (100%) 0.1005s (89%)
10 columns, 442 bytes 0.2661s (100%) 0.1773s (66%)
40 columns, 2493 bytes 1.0757s (100%) 0.9173s (85%)
100 columns, 7952 bytes 2.9114s (100%) 2.3648s (82%)
500 columns, 40813 bytes 15.9159s (100%) 11.7043s (73%)

It seems that server-side prepared statements clearly win this micro-benchmark. However, whenever you read benchmark results check how your system performs for the same benchmark. And try to understand what the benchmark does. For example, the above micro benchmark runs the same query 10 times. Will your application run the same SELECT statement 10 times? Does your application deal with result sets as large as 2000 rows of 500 columns with 40813 bytes per row (2000 x 40813 = 77MB result set)?

Prepared statements a win?

Looking at the above micro-benchmark results it needs further testing to say if server-side prepared statements will result in a performance win. For example, I did not extensively test BLOB columns for which – in case of MySQL – you can expect a performance benefit when using prepared statements and their “new” client-server protocol.

If performance is not the primary reason for using server-side prepared statements why do so many people advocate their use for PHP applications? I can only agree with Lukas, who has blogged about prepared statements when I already had started drafting this posting, that its about PHP users looking for a convenient way to secure their applications against SQL injections. However, I don’t think that prepared statements have been designed for this.

There seems to be a strong demand among the PHP user base for a fool-safe way to secure database calls against SQL injections. PHP application developers seem to want to move the application developers task of securing their application towards the storage layer. Strictly speaking mysqli_real_escape_string() and disabling MySQL’s multi-query (disabled by default with mysqlnd) is all a PHP developer needs to secure his database calls against SQL injections. But for convenience reasons the use of prepared statements has been announced as a best practice. If you ask me its not best practice. It would be better to continue to secure an application against attackers instead of (ab)using a side-effect of prepared statements.

Brian has given prepared statements a criticial look from the perspective of resources. Lukas has added the aspect of security to the discussion. Both presented dashboard ideas for a complete redesign. As long as a new solution is not available, I would appreciate if everybody recommending prepared statements as a best practice teaches the pros and cons of the technology in the context of PHP applications: neither does a portable standard definition for server-side prepared statements exist nor did PDO manage to provide a convincing solution for client-side (emulated) prepared statements. Consider that when talking about the best practice “prepared statements”…

9 Comments