Unimpressed by the ongoing PDO 2 debate, we have continued working on PDO/mysqlnd for MySQL. It turned out that the current status of PDO makes developing and testing a new driver a challenge. I spent quite a lot of time comparing the different behaviours of the various drivers in the hope I could find out how PDO drivers are supposed to work. The PDO documentation and the specification do not cover each and every detail. PDO really needs some love…
Why do people suggest to improve PDO before kicking-off a PDO 2 project?
The challenge of the day was to find out what PDO::FETCH_UNIQUE
is about. All what the first page of the PDO manual tells you is, that is is a predefined (class) constant of the type INT. But luckily, there is more to learn about it on the manual page of PDOStatement->fetchAll()
:
[…]
array fetchAll ([ int $fetch_style [, int $column_index [, array $ctor_args ]]] )
[…]
To fetch only the unique values of a single column from the result set, bitwise-OR PDO::FETCH_COLUMN
with PDO::FETCH_UNIQUE
.
[…]
That’s one sentence. And according to Google this is pretty much it what you can find about it on php.net. Hold on, there are some test which give usage examples. However, you never know if a test covers all aspects of a feature. What about the comment-files (ext/pdo/pdo_stmt.c, …) you ask? Well, that’s no good documentation for the end user and you never know if the implementation is wrong as long as there is no description of what the code shall accomplish. Its a valid approach in software development to refine a feature as you go. But in the end you need one document which describes how things are supposed to work. And that document is not the .c file – in particular not, if you want third-parties to contribute.
Given this, can you imagine why some say that PDO needs love before a PDO 2 project gets kicked-off? Anyway, let’s not complain at each other, let’s move forwards towards a better PDO or however you call a unified, C-based database access layer in PHP. The lack of documentation is not a PDO specific issue. I know too well how much the ext/mysqli documentation is behind the latest developments in mysqlnd and I’m urgently looking forward for Q2/2008 to arrive. The documentation team has promised to take care of the docs in Q2….
Too stupid to read the documentation!
I’m not a native english speaker and often I have to ask someone to explain something to me in simplified english. The one sentence which has the PDO documentation to offer on PDO::FETCH_UNIQUE
tells me that I may use the flag together with PDO::FETCH_COLUMN
and PDOStatement->fetchAll()
. I have no idea if I may combine the flag with others, for example PDO::FETCH_OBJ
or, if I may use it with PDOStatement->fetch()
, the documentation does not tell me.
What the documentation clearly states is what it does: fetch only the unique values of a single column from the result set…. Sounds a bit like a convenience feature to emulate DISTINCT
, GROUP BY
or UNIQUE
in my ears. But which one is the "single column" the sentence refers to?
Lets find out using a step-by-step approach. Lets see what PDO::FETCH_COLUMN
does. PDOStatement->fetchAll()
returns an array containing all of the result set rows. You can restrict the data returned for each row to individual columns using the fetch style PDO::FETCH_COLUMN
. For example, if your result set has been generated from SELECT id, grp FROM test
each row contains the data of two columns: “id” and “grp”. If you need an array containing only the values of the column “id”, you can make PDOStatement->fetchAll()
returning an array containing all of the result set rows but only the value of the column “id” for each row by using PDOStatement->fetchAll(FETCH_COLUMN, 0)
. The 0 spezifies which column you are interested in. “id” has the column index 0, because the column list is 0-based. If you want only the data of the column “grp”, you use the column index 1.
$db = new PDO("mysql:dbname=phptest;unix_socket=/tmp/mysql.sock", "user", "pass"); $db->exec("DROP TABLE test"); $db->exec("CREATE TABLE test (id INT, grp CHAR(1))"); $db->exec("INSERT INTO test(id, grp) VALUES (1, 'A'), (2, 'A')"); $stmt = $db->prepare("SELECT id, grp FROM test ORDER BY id"); $stmt->execute(); var_dump($stmt->fetchAll(PDO::FETCH_COLUMN, 0));
Result:
array(2) { [0]=> string(1) "1" [1]=> string(1) "2" }
In the following I’ll not show code examples any more. This makes the posting shorter. I’ll use tables to illustrate the results. In the first row of the table you will find the function call which was used. The second row shows the SQL query and the following rows illustrate the table data and which data PDO will return (grey). Sometimes, you’ll find a last row showing the actual array returned.
fetchAll(PDO::FETCH_COLUMN, 0) |
|
---|---|
SELECT id, grp FROM test |
|
id [= 0] | grp [= 1] |
1 | A |
2 | A |
fetchAll(PDO::FETCH_COLUMN, 1) |
|
---|---|
SELECT id, grp FROM test |
|
id [= 0] | grp [= 1] |
1 | A |
2 | A |
At this point it should be clear what PDO::FETCH_COLUMN
does. We can add complexity, move forward to the next step and see what a bitwise-OR PDO::FETCH_UNIQUE
will change.
fetchAll(PDO::FETCH_COLUMN|PDO::FETCH_UNIQUE, 0) |
|
---|---|
SELECT id, grp FROM test |
|
id [= 0] | grp [= 1] |
1 | A |
2 | A |
fetchAll(PDO::FETCH_COLUMN|PDO::FETCH_UNIQUE, 1) |
|
---|---|
SELECT id, grp FROM test |
|
id [= 0] | grp [= 1] |
1 | A |
2 | A |
What PDO::FETCH_UNIQUE does
On the first view there is no difference! And there seems to be no unique constraint of whatever kind applied! You can’t see the actual difference because my illustration tells you only half of the story. I play a trick on you to raise your attention. In order to get the full picture the underlying table data will be changed and I will show you the arrays returned by PDO. Note the different indicies of the returned arrays.
fetchAll(PDO::FETCH_COLUMN, 1) |
|
---|---|
SELECT id, grp FROM test |
|
id [= 0] | grp [= 1] |
5 | A |
6 | A |
|
fetchAll(PDO::FETCH_COLUMN|PDO::FETCH_UNIQUE, 1) |
|
---|---|
SELECT id, grp FROM test |
|
id [= 0] | grp [= 1] |
5 | A |
6 | A |
|
The values of the two arrays are identical but the indicies differ. When using PDO::FETCH_COLUMN
you get a 0-based array (list). The list holds only the values of the column you have requested by specifying a column index. When using PDO::FETCH_COLUMN|PDO::FETCH_UNIQUE
you do not get a 0-based array (a list). What you get is an array (a hash) indexed by the values of the first column of your result set. This in turn means that the “unique constraint” which PDO::FETCH_UNIQUE
implies is about the values of the first column of your result set.
In the example above the distinct “id”-column values become the index values of the returned hash. No “unique constraint” is applied on the values of the returned hash. Step by step… The result set consists of two columns: “id”, “grp”. The values of the first column are used as hash indicies. The values are 5 and 6. So, we get: array( 5 => ??, 6 => => ??). The values are determined by the specified column index. In the example the column index was 1 (= “grp”). Therefore the hash returned by PDO is: array( 5 => “A”, 6 => “A”).
Now you know the meaning of the one sentence that describes PDO::FETCH_COLUMN
. Maybe you did get the meaning after reading To fetch only the unique values of a single column from the result set, bitwise-OR PDO::FETCH_COLUMN
with PDO::FETCH_UNIQUE
.. I didn’t. And I prefer to have “ulf-compliant software and documentation”. My experience is that if I get it, most others get it. If I don’t get it and IRC #php.de does not get it, most others don’t get it either. PDO needs love.
If you want to specify the “group by” or “unique constraint” column, check out PDO::FETCH_COLUMN|PDO::FETCH_GROUP
. But keep away from PDO::FETCH_COLUMN|PDO::FETCH_UNIQUE
.
Broken by design I
It took me about 2 hours to get to this point. And I wasted roughly 1 hour of the rare development time of the pdo/mysqlnd developer asking him about edge-cases. He’s working part time and that means he’s effectively not working more than 12 hours/week on his main assignment – when not disturbed. As he s the Release Manager of PHP 5.3, I guess he knows a bit about PHP and the comment-files. So, I guess if he needs one hour, Joe Dolittle and Ulf need more time…
Here comes one edge case. Lets have three rows. Two of them have “id” = 5 and one has “id” = 6. The two “id” = 5 rows have different values for the “grp” column. You may get two different results for the examples. And any of the two is 100% correct. But your function is not deterministic!
fetchAll(PDO::FETCH_COLUMN|PDO::FETCH_UNIQUE, 1) |
|
---|---|
SELECT id, grp FROM test |
|
id [= 0] | grp [= 1] |
5 | A |
5 | B |
6 | B |
|
fetchAll(PDO::FETCH_COLUMN|PDO::FETCH_UNIQUE, 1) |
|
---|---|
SELECT id, grp FROM test |
|
id [= 0] | grp [= 1] |
5 | A |
5 | B |
6 | B |
|
The problem is in the SQL statement. SELECT
may return results in any order. The order may vary at any time and without prior warning. If you want the results to be ordered, use an ORDER BY
-clause. PDO has a deterministic rule of “overwrite values as you fetch” but the underlying SQL has not. PDO is not broken by design. But wouldn’t it be nice if the PHP manual would tell you about this pitfall? PDO needs love.
If you like to read about such pitfalls, check out http://bugs.php.net/bug.php?id=44190. Its one of about 15 PDO “bugs” which we have reported recently. Actually its not a bug in PDO. Every database driver that supports FETCH_BOTH
should suffer from the pitfall, including ext/mysqli. FETCH_BOTH
is broken by design. You can’t make it work right unless you want to burn CPU cycles by adding extra checks to your driver code.
Broken by design II
Edge-case number two is about NULL. PDO loves strings. Whenever you fetch rows, you get string values. Almost every data type can be casted to string and so its only natural that a unified database access layer has a strong preference for strings. However, if you limit yourself to strings, how do you know if a value is NULL or an empty string. Casting NULL to string gives an empty string. So, eventually – depending on the settings – PDO mapps NULL to NULL. And you know NULL is special, for example NULL != NULL. Now, how will PDO::FETCH_UNIQUE handle NULL.
Answer: it can’t handle it. Neither can you create a hash with the index NULL nor can you have a hash with multiple indicies of the value NULL. If you try to create a hash with the index NULL, PHP will cast NULL to an empty string. Consequently NULL and empty string as seen as the same by PDO. PDO has no chance to handle this but to burn CPU cylces and add an extra check for a rare edge-case.
BTW, ext/mysqli tries to use proper PHP types when using Prepared Statements. ext/mysqli tries to return INT if your SQL column definition is INT and the PHP INT type can hold the INT value.
fetchAll(PDO::FETCH_COLUMN, 0) |
|
---|---|
SELECT grp, id FROM test ORDER BY id |
|
id [= 0] | grp [= 1] |
5 | NULL |
6 | “” |
|
fetchAll(PDO::FETCH_COLUMN|PDO::FETCH_UNIQUE, 0) |
|
---|---|
SELECT grp, id FROM test ORDER BY id |
|
id [= 0] | grp [= 1] |
5 | NULL |
6 | “” |
|
The message is love
PDO needs love. If the PHP community wants a unified database access layer written in C, its time to start fixing PDO. PDO has many edges that need to be cut off. Although the different drivers behave very differently its not always a failure of the vendors. The PDO documentation and specification needs to be improved. No driver can be better than the framework allows. Let’s fix PDO – on php.net and together. Its a good thing for the PHP project! Please understand that we cannot cover each any every detail as part of the pdo/mysqlnd development. It requires a joint effort. And please don’t rush.
The posting is not about blaming PDO. If it sounds a bit like, its because I have now spend 5 hours on this topic and I have not finished a single test. Don’t get me started on the edge-cases in mysqlnd and libmysql! However mysqlnd was easier because we could decide on the edge-cases ourself. Its much more tricky with PDO.