How large can a column of the data type POINT
be: 10 bytes, 100 bytes, 1k, 10k? No… think Enterprise! PHP will allocate 4GB of RAM if you use the Prepared Statements of ext/mysqli and the MySQL Client Library (AKA libmysql) to fetch a POINT
column. Of course, the MySQL native driver for PHP (mysqlnd) does better!
To understand why ext/mysqli can allocate up to 4 GB of RAM to fetch a POINT
column we need to have a look at the MySQL C API. PHP itself is written in C and in a way ext/mysqli is just a plain vanilla C API client application. The C API is provided by the MySQL Client Library which has been called "libmysql" in the past. The MySQL Client Library implements the MySQL Client Server Protocol. The MySQL native driver for PHP (mysqlnd) is an alternative implementation of the MySQL Client Server Protocol and it is part of PHP source code as of PHP 5.3. If that is a new to you, please have a look at the PHP manual.
Prepared statements fill result buffers allocated by the client
On the level of the C API, prepared statements return data, for example from SELECT
statements, in result buffers. The result buffers need to be allocated and managed by the client.
Some clients know exactly what data to expect from a SQL query because they know the types of the columns and what data to expect. This knowledge help to allocate appropriate buffers. Other clients, like ext/mysqli, do not have this knowledge. PHP’s ext/mysqli will see that you prepare SELECT id FROM test
but it does not know what type column id
is or what values you store in it. Therefore, ext/mysqli uses some tricks to make an educated guess.
Note, that the guessing game will only happen if you use the MySQL Client Library. With mysqlnd things are different. All of the following applies to the MySQL Client Library, not to mysqlnd.
Metadata guessing game
The educates guess that is made by PHP’s ext/mysqli is based on inspecting the MySQL C API data structure MYSQL_STMT. A MYSQL_STMT structure represents a prepared statement. The MySQL manual notes: The MYSQL_STMT structure has no members that are intended for application use. . Pffft… every since ext/mysqli does access MYSQL_STMT structure members directly.
The extension checks if the MySQL Client Library knows anything about the results that can be expected from the prepared statements and does read out the fields
member, if available. fields
contains a list of columns, their data type and their lengths. Perfect for calculating the size of a result buffer!
But wait MYSQL_FIELD
has two length values. length and
max_length
. The MySQL nanual says about length
: The width of the field. This corresponds to the display length, in bytes. . And it continues to enlighten the reader by also explaining max_length
: The maximum width of the field for the result set (the length in bytes of the longest field value for the rows actually in the result set). If you are using prepared statements, max_length is not set by default because for the binary protocol the lengths of the values depend on the types of the values in the result set. If you want the max_length values anyway, enable the STMT_ATTR_UPDATE_MAX_LENGTH option with mysql_stmt_attr_set() and the lengths will be set when you call mysql_stmt_store_result()..
Enough on theory. Let's hack a short C program to see what length gets reported for a POINT
column. The below C program does pretty much the same C API calls will be made if you use ext/mysqli's prepared statements. The C program reports a length of 4294967295 = 4GB. This is the value that ext/mysqli uses for its educated guess! The MySQL Client Library has no hint for us how long column values will actually be. All it knows so far is that the column is of the type MYSQL_TYPE_GEOMETRY
and that a geometry data value can be a pretty large thing.
SELECT id FROM test... stmt->fields[0].type = Geometry stmt->fields[0].max_length = 0 stmt->fields[0].length = 4294967295 column = '' length = 25 stmt->fields[0].max_length = 0 column = '' length = 25 stmt->fields[0].max_length = 0 ... all rows fetched
#include "stdio.h" #include "stdlib.h" #include "string.h" #include <mysql.h> int exit_failure(const char *msg, MYSQL *mysql, MYSQL_RES *res, MYSQL_STMT *stmt); int main(void) { const char* sql_select = "SELECT id FROM test"; MYSQL *conn; MYSQL_STMT *stmt; MYSQL_BIND bind[1]; char column_value[255]; unsigned long length; my_bool is_null; my_bool error; my_bool attr_get; int ret; conn = mysql_init(NULL); if (conn == NULL) exit_failure("mysql_init() failed", NULL, NULL, NULL); if (!mysql_real_connect(conn, "localhost", "root", "root", "phptest", 0, "/tmp/mysql.sock", 0)) exit_failure("mysql_real_connect() failed", conn, NULL, NULL); if (mysql_query(conn, "DROP TABLE IF EXISTS test") || mysql_query(conn, "CREATE TABLE test(id POINT)") || mysql_query(conn, "INSERT INTO test(id) VALUES (GeomFromText('POINT(1 1)')),(GeomFromText('POINT(1 1)'))")) exit_failure("mysql_query() failed", conn, NULL, NULL); printf("%s...\n\n", sql_select); stmt = mysql_stmt_init(conn); if (mysql_stmt_prepare(stmt, sql_select, strlen(sql_select))) exit_failure("mysql_stmt_prepare() failed", conn, NULL, stmt); if (mysql_stmt_execute(stmt)) exit_failure("mysql_stmt_execute() failed", conn, NULL, stmt); attr_get = 1; if (mysql_stmt_attr_set(stmt, STMT_ATTR_UPDATE_MAX_LENGTH, &attr_get)) exit_failure("mysql_stmt_attr_get() failed", conn, NULL, stmt); printf("\t stmt->fields[0].type = %s\n", (stmt->fields[0].type == MYSQL_TYPE_GEOMETRY) ? "Geometry" : "Other"); printf("\t stmt->fields[0].max_length = %lu\n", stmt->fields[0].max_length); printf("\t stmt->fields[0].length = %lu\n", stmt->fields[0].length); printf("\n"); memset(bind, 0, sizeof(bind)); bind[0].buffer_type = MYSQL_TYPE_STRING; bind[0].buffer = (char*)column_value; bind[0].buffer_length = 255; bind[0].is_null = &is_null; bind[0].length = &length; bind[0].error = &error; if (mysql_stmt_bind_result(stmt, bind)) exit_failure("mysql_stmt_bind_result() failed", conn, NULL, stmt); while (MYSQL_NO_DATA != (ret = mysql_stmt_fetch(stmt))) { printf("\t column = '%s'\n", column_value); printf("\t length = %d\n", length); printf("\t stmt->fields[0].max_length = %lu\n", stmt->fields[0].max_length); if (ret == MYSQL_DATA_TRUNCATED) printf("\t NOTE: data has been truncated!\n"); printf("\n"); } if (MYSQL_NO_DATA == ret) printf("... all rows fetched\n"); else printf("... fetch has failed - [%u] %s\n", mysql_stmt_errno(stmt), mysql_stmt_error(stmt)); mysql_stmt_close(stmt); mysql_close(conn); return EXIT_SUCCESS; } int exit_failure(const char *msg, MYSQL *mysql, MYSQL_RES *res, MYSQL_STMT *stmt) { printf("ERROR: %s\n", msg); if (stmt) { printf("[%u] %s\n", mysql_stmt_errno(stmt), mysql_stmt_error(stmt)); mysql_stmt_close(stmt); } if (mysql) { if (mysql_errno(mysql)) printf("[%u] %s\n", mysql_errno(mysql), mysql_error(mysql)); if (res) mysql_free_result(res); mysql_close(mysql); } printf("\n"); exit(EXIT_FAILURE); }
Buffering for a better guess
Do you remember what the MySQL manual says about max_length
? It will be set if you use the C API call mysql_stmt_store_result()
(PHP API: mysqli_stmt_store_result() ... ). Once you add this call to the C program after mysql_stmt_attr_set
the value will indeed be set and tell you that 4GB is a bit too much. A result buffer of only 25 bytes can hold the data of the POINT
column in our case.
SELECT id FROM test... stmt->fields[0].type = Geometry stmt->fields[0].max_length = 25 stmt->fields[0].length = 4294967295 column = '' length = 25 stmt->fields[0].max_length = 25 column = '' length = 25 stmt->fields[0].max_length = 25 ... all rows fetched
Calling mysql_stmt_store_result()
makes the MySQL Client Library fetch all rows into a buffer on the client. During the fetch it is easy to calculate the maximum length. However, buffering is usually not what you want to do when using MySQL Prepared Statements. And buffering is not the default.
Give mysqlnd a try. It does not have this pitfall. It does better.