Ulf Wendel

2012/11/22
by admin
11 Comments

Supercharging PHP MySQL applications using the best API

PHP MySQL developers are confronted with three PHP MySQL APIs. That is confusing. Since mid 2012 the PHP manual suggests not to use the oldest and original PHP MySQL extension any more to cure the situation. Please, swap out ext/mysql in favour of either ext/mysqli or PDO_MySQL. Why to upgrade is below. Read on, if you are a PHP beginner, your PHP application is still using mysql_* functions of ext/mysql or you are a PHP trainer.

As a PHP consumer, tease your software vendors by asking whether they use the state of the art APIs! Here is a cut&paste test question to ask your vendor: do you support secure SSL connections to MySQL?

History and features

PHP and MySQL go hand in hand. The original MySQL API (ext/mysql) is around since PHP 2.0. Many web sites still show examples of the original MySQL API (ext/mysql). This is amazing considering the feature comparison of the three PHP MySQL APIs. By the way, do you know this Quickstart guide on ext/mysqli that is in the PHP manual?

API feature comparison
  ext/mysqli PDO_MySQL ext/mysql
PHP version introduced 5.0 5.1 2.0
Year introduced 2003 2005 Last century
Included with PHP 5.x Yes Yes Yes
Development status Active Active Maintenance only
Feature additions All new,
flagship
All new,
given they fit PDO
None since 2006
Lifecycle Active Active Long term deprecation announced
Recommended for new projects Yes Yes No
OOP Interface Yes Yes No
Procedural Interface Yes No Yes
API supports non-blocking, asynchronous queries with mysqlnd Yes No No
Persistent Connections Yes Yes Yes
"No pitfall" Persistent Connections Default, can be disabled No No
API supports Charsets Yes Yes Yes
API supports server-side Prepared Statements Yes Yes No
API supports client-side Prepared Statements No Yes No
API supports Stored Procedures Yes Yes No
API supports Multiple Statements Yes Most No
API supports Transactions Yes Yes No
Transactions can be controlled with SQL Yes Yes Yes
API supports SSL Yes Yes No
API supports Compression Yes Yes Yes
Supports all MySQL 5.1+ functionality Yes Most No

The above is almost identical to what you find in the PHP manual at http://www.php.net/manual/en/mysqlinfo.api.choosing.php with minor additions to elaborte and make the points. If you are using the original ext/mysql API, you find yourself in the red zone! Get rid of the weak points in your application.

Soft features

Humans have hard skills and soft skills. Let’s see if that works for APIs too – a personal list of soft features.

  ext/mysqli PDO_MySQL ext/mysql
Performance factor tendency above average average below average
Security factor good good below average
Pitfall factor low average above average
Cosiness factory average above average average

Performance considerations

All three PHP MySQL APIs are tiny C wrappers around the underlying C client-server library. It is the underlying C client-server library that does most of the work.

Programming language
PHP WordPress, Drupal, Oxid, phpMyAdmin, YourPHPWonder
C ext/mysqli PDO_MySQL ext/mysql
C Client-Server library

For the same command executed using the same feature you can expect all APIs to show the same performance. How you wrap the C library in a C PHP extension does not matter. It is the features exposed to the users that matter. Different features have different performance characteristics.

Before doing endless profiling sessions ask yourself how much time your web applications spends in the API itself. I am not talking about the accumulated wall clock time of query or connect functions. I ask you to measure pure API times without query and network times!

Performance relevant features
  ext/mysqli PDO_MySQL ext/mysql
API supports non-blocking, asynchronous queries with mysqlnd Yes No No
Persistent Connections Yes Yes Yes
"No pitfall" Persistent Connections Default, can be disabled No No
API supports server-side Prepared Statements Yes Yes No
API supports Multiple Statements Yes Most No
API supports Compression Yes Yes Yes

Features is what counts for performance. The matrix of performance relevant features makes ext/mysql look poor.

Performance: asynchronous, non-blocking

There is no way to run a non-blocking query using ext/mysql. You cannot fire off a slow running query at the beginning of your script, do some work and fetch results once the server has computed them. Instead you are forced to wait for the results. For example, you can query an ad server without having to wait for it. In 2008 a conference talk was given on: How mysqlnd async queries help you with sharding!. Here is the corresponding success story from 2012: Asyncronous Shard Queries in PHP using mysqlnd enabling the feed load 10 times faster.

Performance: servers-side prepared statements

A non-prepared statement is parsed and interpreted by the server every time you execute it. SQL and data is sent to the server in one chunk. A prepared statement is first parsed and prepared on the server before it can be executed multiple times with different parameters without being reparsed again. SQL and data sent to the server seperatedly. Upon repeated execution only the parameter value is transferred, thus bandwith is saved.

Whether using a prepared statement gives a performance penalty or performance win depends on the case. Below is an example that showed a 25% higher query rate using prepared statements on my aged test box. If you don’t know about this feature, which is unavailable with ext/mysql, read this quickstart. Performance is about features being available in the API and smart, trained use of the features.


$link = new mysqli("localhost", "root", "", "test", NULL, "/tmp/mysql57.sock");
$link->query("DROP TABLE IF EXISTS test");
$link->query("CREATE TABLE test(id INT AUTO_INCREMENT PRIMARY KEY, note VARCHAR(255))");

$time = 100;
$note = 'How fast are prepared statements?';

$num_queries = 0;
$start = microtime(true);
do {
  $num_queries++;
  $my_note = $link->real_escape_string(sprintf("%s %d", $note, $num_queries));
  $res = $link->query("INSERT INTO test(note) VALUES ('" . $my_note . "')");
  $runtime = microtime(true) - $start;
} while ($runtime < $time);

printf("Non-prepared: %d qps\n", $num_queries / $time);

$num_queries = 0;
$start = microtime(true);
$stmt = $link->prepare("INSERT INTO test(note) VALUES (?)");
$my_note = NULL;
$stmt->bind_param("s", $my_note);
do {
  $num_queries++;
  $my_note = sprintf("%s %d", $note, $num_queries);
  $stmt->execute();
  $runtime = microtime(true) - $start;
} while ($runtime < $time);

printf("Prepared: %d qps\n", $num_queries / $time);

Performance: multiple statements

What if you could fire multiple statements with just one question to reduce client-server round-trips? Using Multiple Statements it is possible, unless you are using ext/mysql, which cuts you off from this possible optimization.

$link = new mysqli("localhost", "root", "", "test", NULL, "/tmp/mysql57.sock");
$link->multi_query("
  DROP TABLE IF EXISTS test ;
  CREATE TABLE test(id INT AUTO_INCREMENT PRIMARY KEY, note VARCHAR(255)) ;
  INSERT INTO test(id) VALUES ('multi statement');
  SELECT * FROM test;
  SELECT id FROM test; ");

do {
  if ($res = $link->store_result()) {
    var_dump($res->fetch_all(MYSQLI_ASSOC));
  }  
} while ($link->more_results() && $link->next_result());

Performance: summary

API internals don’t matter much, they hardly contribute to overall run time. Reducing SQL runtimes is most important followed by querying MySQL in the most efficient manner. The different options for querying MySQL with different performance characteristics that can be used for optimizations on a case-by-case basis. A certain old API cuts you off from some potential performance gains…

Security related features

Security has a lot to do with how you use a given tool and which tools you are given.

Security related features
  ext/mysqli PDO_MySQL ext/mysql
"No pitfall" Persistent Connections Default, can be disabled No No
API supports server-side Prepared Statements Yes Yes No
API supports SSL Yes Yes No

Security: Encryption/SSL

MySQL supports secure connections between clients and MySQL clients and the server using the Secure Sockets Layer (SSL) protocol. Both ext/mysqli and PDO_MySQL feature SSL connections. Thus, PHP MySQL users can use SSL for their business.

But, ext/mysql will not let you set the configuration settings required for an SSL encrypted connection. Feel free to use "the world’s foremost network protocol analyzer" (Wireshark) to sniff on any ext/mysql connection. The graphical user interface of Wireshark is simple to use. Good luck stealing information.

Security: SQL injection

There are many ways to fight SQL injection. Please, consider watching the web presentation Top 10 MySQL Tips and Mistakes for PHP Developers if you are new to the topic.

It is certainly possible to develop a bullet-proof web application on top of ext/mysql but it requires skill. The below example is taken from the article Not only SQL injection: I don’t trust you!. Do you see the denial-of-service attack? The answer is in the link. It is not only about charsets. Do you notice the charset pitfall in the API?

$query = "SELECT name, rating FROM movies WHERE rating > " .
  mysql_escape_string($_GET['rating']);
mysql_query($query, $link);

For an untrained person, a beginner, the simplest solution to the problem would be to use (server-side) prepared statements. A feature available with all PHP MySQL APIs but ext/mysql.

Security: Persistent connections

Persistent connections have been reason for vivid discussions ever since. A persistent connection is not closed at the end of a script but kept open for reuse in a future script run. A persistent connection keeps its state between use, as the name says. This very property has caused lots of confusion ever since. People have been surprised to be confronted with a connection that has open transactions or SQL user variables set. Many considered this a pitfall.

Imagine the DBA notices a PHP MySQL account has been granted to many rights, or, in the worst case has been compromised. The DBA changes the MySQL account. Your web application has many persistent connections open. The open connections are unaffected from the account changes. PHP continues to issue queries that you tried to prevent until all persistent connections have been closed.

The ext/mysqli API is the only one that is not flawed by default. The mysqli extension does flush the state of a persistent connections and reauthenticate the user (COM_CHANGE_USER) before reusing the persistent connection. Thus, no pitfalls around state, and no delayed recognition of account changes. Strictly speaking ext/mysqli gives you no persistent connection but a pooled one. However, those who know what they do can go back to the old behaviour. The price paid for "no pitfall" persistent connections is performance (german speaking readers can find details here and here).

Cosiness and pitfall factor

Upon closer look it turned out that performance and security are more hard facts than soft ones. Even cosiness and pitfall factor have a touch of hard facts that cannot be ignored.

Undoubtfully the most powerful PHP MySQL API is ext/mysqli. But, it is also the most complex one, which gives a minus on the cosiness factor. When it was introduced it served as a pioneer for dual procedural and object-oriented interfaces. The procedural interface was preserved to make migrating from ext/mysql easy. PDO came a tad later and got a more comprehensive, though less powerful, pure OOP interface that many would rate higher for cosiness.

The original PHP MySQL API (ext/mysql) lacks on features and API calls – cosy! Seriously, it has some neat looking stuff in it but watch out for traps…

$res = mysql_query("SELECT 1"); 
var_dump(mysql_fetch_assoc($res));

This is a complete script to query MySQL. It is no mistake that you see no call to any kind of connect function. Here is the proof using PHP 5.6:

nixnutz@linux-0v4u:~/git/php-src> sapi/cli/php 
  -d mysql.default_host="localhost" 
  -d mysql.default_user="root" 
  -d mysql.default_socket="/tmp/mysql57.sock" 
  -r '
$res = mysql_query("SELECT 1"); var_dump(mysql_fetch_assoc($res));'

array(1) {
  [1]=>
  string(1) "1"
}

Great stuff, thumbs up? Next example. Let’s assume the implicit connect happens somewhere deep, deep in your startup routine. Then, in the main part of your script you explicitly open a connection with mysql_connect(). You get a resource identifier for your connection and you use it with mysql_query(). Guess what, you are using the connection that was opened somewhere deep, deep in your startup routine.

nixnutz@linux-0v4u:~/git/php-src> sapi/cli/php 
  -d mysql.default_host="localhost" 
  -d mysql.default_user="root" 
  -d mysql.default_socket="/tmp/mysql57.sock" 
  -r '
function startup() { 
  mysql_query("SET @a = 1"); 
} 
startup(); 

$link = mysql_connect(); 
$res = mysql_query("SELECT @a", $link); 
var_dump(mysql_fetch_assoc($res));'

array(1) {
  ["@a"]=>
  string(1) "1"
}

It is not mandatory to pass a connection identifier to mysql_query(). The function will use the last opened connection – cosy! Particularily older code makes use of this feature. Assume you modern() stuff to your old application. You believe to do everything right. You tell mysql_connect() to open a new connection as you do not want to be fooled. For this you pass true as the fourth argument to the function. Then, you ensure the connection identifier is used with mysql_query(). Everything is beautifully encapsulated in a function – what should go wrong?

nixnutz@linux-0v4u:~/git/php-src> sapi/cli/php 
  -d mysql.default_host="localhost" 
  -d mysql.default_user="root" 
  -d mysql.default_socket="/tmp/mysql57.sock" 
  -r '
function startup() { 
  mysql_query("SET @a = 1"); 
} 
function modern() { 
  $link = mysql_connect(NULL, NULL, NULL, true); 
  mysql_query("SET @a=2", $link); 
} 
startup(); 
$link = mysql_connect(); 
modern(); 
$res = mysql_query("SELECT @a"); 
var_dump(mysql_fetch_assoc($res));'

array(1) {
  ["@a"]=>
  string(1) "2"
}

Well, your function has side effects. It has accidently made the main routine use the connection established within modern(). No, closing the connection in modern() is no solution. Maybe some cosy looking features contribute to the pitfall factor…

You deserve the best

Do yourself a favour and choose the best possible API for your PHP MySQL applications. Either use ext/mysqli or PDO_MySQL. The original MySQL API will not allow you to go beyond the MySQL 4.1 feature set – quite limiting.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

2012/11/08
by admin
Comments Off on MySQL Plugin development: public API?

MySQL Plugin development: public API?

The MySQL Plugin API is a neat way to add functionality to the MySQL server. Any user of MySQL is using it: see the storage engine plugins listed by SHOW PLUGINS. There are many types of plugins. For example, the do-whatever-you-want daemon plugins. Among the hottest things that appeared lately are plugins that add new interfaces to MySQL, such as the Memcache plugin of MySQL 5.6.

MySQL Client Server Protocol Memcache Protocol
Port 3306 Port 11211
| Daemon Plugin
MySQL

However, to me a PHP guy writing an extension, assimilating arbitrary libraries and glueing them together is reason enough to into plugin hacking It is tempting to a plugin developer to reuse code of the system being extended. MySQL 5.6.7-rc has more than 2 million lines of code. There must be hot stuff among it! But, is it "public", is there a public API?

A look at the daemon plugin example

Writing a daemon plugin is straight forward: grab the source, cd into the plugin directory and copy the daemon plugin example. Edit the structure that describes the plugin. The manual has more details.

mysql_declare_plugin(daemon_example)
{
  MYSQL_DAEMON_PLUGIN,
  &daemon_example_plugin,
  "daemon_example",
  "Brian Aker",
  "Daemon example, creates a heartbeat beat file in mysql-heartbeat.log",
  PLUGIN_LICENSE_GPL,
  daemon_example_plugin_init, /* Plugin Init */
  daemon_example_plugin_deinit, /* Plugin Deinit */
  0x0100 /* 1.0 */,
  NULL,                       /* status variables                */
  NULL,                       /* system variables                */
  NULL,                       /* config options                  */
  0,                          /* flags                           */
}
mysql_declare_plugin_end;

Among others, the structure lists an init (daemon_example_plugin_init) and a deinit (daemon_example_plugin_deinit) function of the plugin. The init function is called when the plugin is loaded into the server during startup or using INSTALL PLUGIN.

King Brian’s init function is below. It opens a log file and starts a heartbeat thread to fill the log.


static int daemon_example_plugin_init(void *p) {
  DBUG_ENTER("daemon_example_plugin_init");
  
  struct mysql_heartbeat_context *con;
  pthread_attr_t attr;          /* Thread attributes */
  char heartbeat_filename[FN_REFLEN];
  char buffer[HEART_STRING_BUFFER];
  time_t result= time(NULL);
  struct tm tm_tmp;

  struct st_plugin_int *plugin= (struct st_plugin_int *)p;
 
  con= (struct mysql_heartbeat_context *)
    my_malloc(sizeof(struct mysql_heartbeat_context), MYF(0)); 

  fn_format(heartbeat_filename, "mysql-heartbeat", "", ".log",
            MY_REPLACE_EXT | MY_UNPACK_FILENAME);
  unlink(heartbeat_filename);
  con->heartbeat_file= my_open(heartbeat_filename, O_CREAT|O_RDWR, MYF(0));

  /*
    No threads exist at this point in time, so this is thread safe.
  */
  localtime_r(&result, &tm_tmp);
  my_snprintf(buffer, sizeof(buffer),
              "Starting up at %02d%02d%02d %2d:%02d:%02d\n",
              tm_tmp.tm_year % 100, tm_tmp.tm_mon+1,
              tm_tmp.tm_mday, tm_tmp.tm_hour,
              tm_tmp.tm_min, tm_tmp.tm_sec);
  my_write(con->heartbeat_file, (uchar*) buffer, strlen(buffer), MYF(0));

  pthread_attr_init(&attr);
  pthread_attr_setdetachstate(&attr,
                              PTHREAD_CREATE_JOINABLE);

  /* now create the thread */
  if (pthread_create(&con->heartbeat_thread, &attr, mysql_heartbeat,
                     (void *)con) != 0) {
    fprintf(stderr,"Could not create heartbeat thread!\n");
    exit(0);
  }
  plugin->data= (void *)con;

  DBUG_RETURN(0);
}

I have tried to highlight some functions and macros used: DEBUG_ENTER, my_malloc, my_open, my_snprintf, my_write, DBUG_RETURN. Obviously, the daemon plugin has to include some header files to get access to them. And, in daemon_example.cc you find the corresponding include statements.

DBUG_ENTER, DBUG_RETURN include/my_dbug.h
my_malloc include/my_sys.h (mysys/my_malloc.c)
my_snprintf sql/sql_plugin_services.h
my_write include/my_sys.h (mysys/my_write.c)


#include <my_global.h>
#include <sql_priv.h>
#include <stdlib.h>
#include <ctype.h>
#include <mysql_version.h>
#include <mysql/plugin.h>
#include <my_dir.h>
#include "my_pthread.h"                         // pthread_handler_t
#include "my_sys.h"                             // my_write, my_malloc
#include "m_string.h"                           // strlen
#include "sql_plugin.h"                         // st_plugin_in

Cool, plugin developers can use all of the servers’ code?!

Looking at the above list of included files one may come to the conclusion a plugin developer is allowed use any server code. But, then there is sql/sql_plugin_services.h, which is included by sql_plugin.h. Is this the formal public API for accessing server code. Walking in the dark with no manual or book guidance…

In sql_plugin_services.h a couple of services are defined. The daemon example plugin is using the my_snprintf_service.

static struct st_service_ref list_of_services[]=
{
  { "my_snprintf_service", VERSION_my_snprintf, &my_snprintf_handler },
  { "thd_alloc_service",   VERSION_thd_alloc,   &thd_alloc_handler },
  { "thd_wait_service",    VERSION_thd_wait,    &thd_wait_handler },
  { "my_thread_scheduler_service",
    VERSION_my_thread_scheduler, &my_thread_scheduler_handler },
  { "my_plugin_log_service", VERSION_my_plugin_log, &my_plugin_log_handler },
  { "mysql_string_service",
    VERSION_mysql_string, &mysql_string_handler },
}

Not recalling a note about any services in the manual, the happy grepping through the code continues. An enlightening code comment can be found in sql/plugin.cc. Plugins are provided with "built-in functions".

  /* link the services in */
  for (i= 0; i < array_elements(list_of_services); i++)
  {
    if ((sym= dlsym(plugin_dl.handle, list_of_services[i].name)))
    {
      uint ver= (uint)(intptr)*(void**)sym;
      if (ver > list_of_services[i].version ||
        (ver >> 8) < (list_of_services[i].version >> 8))
      {
        char buf[MYSQL_ERRMSG_SIZE];
        my_snprintf(buf, sizeof(buf),
                    "service '%s' interface version mismatch",
                    list_of_services[i].name);
        report_error(report, ER_CANT_OPEN_LIBRARY, dlpath, 0, buf);
        DBUG_RETURN(0);
      }
      *(void**)sym= list_of_services[i].service;
    }
  }

Internal APIs vs. public stable APIs for plugin developers

The official daemon example plugin is accessing server functionality in two ways: using services and direclty including whatever header is needed.

The key idea behind the service is probably the versioning. Plugin developers should be allowed to assume that version 1 of service A acts in a defined way, no matter what the MySQL server version is. On the contrary, my_write could change its interface in the next server version and break a plugin without prior notice. However, there seems to be no registry for servies, no directory and not much of an API for a plugin developer to request the use of selected services of a certain version. Thus, this kind of service is nice but at the end of the day…?

The do-whatever-you-like style

The daemon example plugin is using a mixture of internal server APIs and plugin services. The few, internal my_sys.h functions used are peanuts compared to the use of internal MySQL APIs by storage engine plugins. Note, that an API includes functions and data types. Given that, plugin developers don’t have much of a choice but to use internal APIs that may change at any time without prior notice.

If I was to write a plugin, I would bite the bullet and use any internal API, possibly, through a lightweight wrapper to minimize dependencies.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

2012/11/06
by admin
Comments Off on The 2012 edition of ulf-wendel.de

The 2012 edition of ulf-wendel.de

Dear Reader,

welcome to the 2012 edition of ulf-wendel.de.

Until 2003 I have been working as a web developer focussing on PHP. The 2004 edition of my home page and all its subdomains have reflected this. In the years thereafter I worked as a support manager, consultant, trainer and software engineer for MySQL/Sun/Oracle using MySQL, MaxDB, PHP, C, C++ and a bounch of other technologies.

There was an constantly increasing mismatch between the home page and the person. For example, I hardly use PHP any more but I extend it using C. Furthermore, old contents could not be distinguished easily from current ones. Thus, the 2004 edition is gone.

The 2012 edition is based on a blog system. It is easy to see the age of the materials. And, its easy to follow me through a feed reader.

Enjoy!

2012/11/06
by admin
Comments Off on MySQL 5.6: SHA256 secure password support for PHP mysqlnd

MySQL 5.6: SHA256 secure password support for PHP mysqlnd

Pluggable Authentication is one of the many new MySQL 5.6 features. Pluggable authentication adds a capability to use external authentication services (PAM, Windows login IDs, LDAP, Kerberos, …) but also introduces built-in strong SHA-256 hashing for passwords. The SHA-256 Authentication Plugin uses encryption to protect the password from being sniffed during authentication. Read a live report from teaching the PHP mysqlnd library the new secure authentication method.

Your choices

Early MySQL versions have had very poor password hashing. The PHP mysqlnd library has never supported the old, insecure password hashing that was used until MySQL 4.1. With MySQL 4.1 things got better. Since then MySQL has supported two password hashing algorithms, which is why there are the two MySQL SQL functions OLD_PASSWORD() and PASSWORD(). Oracle demanded MySQL 5.6 to offer even higher levels of security and thus there are even more choices. Those are relevant to you as a PHP developer:

Authentication Plugin Description mysqlnd support
mysql_old_password Insecure pre-MySQL 4.1 authentication no, insecure!
mysql_native_password MySQL 4.1+ default PHP 5.3+
mysql_clear_password MySQL 5.5.10+ optional, no hashing or encryption, use only with secure connection and external authentication service PHP 5.4+
sha256_password MySQL 5.6.6+ optional PHP 5.5+

The MySQL 5.6 SHA-256 Authentication Plugin ensures that passwords are never exposed as cleartext when connecting to the server. To protect the password either SSL connections or RSA encryption are used. If neither method is available, the connection attempt fails. This makes SHA-256 a very secure albeit not the fastest method to authenticate a MySQL user.

Continue Reading →

2012/09/25
by admin
4 Comments

Not only SQL injection: I don’t trust you!

Never trust user input! Injection is a threat . You are the new web developer, aren’t you?. Never trust user input is the first rule I had to learn as a web developer in anchient times. Injection can happen whenever user input is interpreted or used to compose new data. A quick recap of the #3 mistake from todays Top 10 MySQL Tips and Mistakes for PHP Developers web presentation. A webinar recording should be available in a couple of days.

Don’t!

Your probation period as a PHP developer may come to an immediate end if you write code like this:

mysqli_query(
  $link, "SELECT actor, rating FROM movies WHERE name = '" .
  $_GET['movie_name'] . "'"
);
mysql> SELECT actor, rating FROM movies 
    WHERE name = '
      Self-made Sauerkraut' UNION ALL 
           SELECT user, password FROM mysql.user WHERE '' = '';
+--------+--------+
| actor  | rating |
+--------+--------+
| Andrey | 8      |
| root   |        |
+--------+--------+
2 rows in set (0,01 sec)

It does not take an expert to exploit a web site running such code. Any search engine of your trust will list numerous tools to try SQL injection attacks against web sites. Those tools are not evil. They help you finding security issues. To keep your job, make yourself familiar with them. However, its even better to learn how to avoid injections in the first place.

Half of one way… still: don’t!

The above attack becomes more difficult when escaping input values before composing a SQL string dynamically out of the input. The input value of Self-made Sauerkraut' UNION ALL SELECT user, password FROM mysql.user WHERE '' = ' is escaped, meaning ' (and other characters) are replaced with \'. Injection fails.

mysqli_query(
  $link, "SELECT actor, rating FROM movies WHERE name = '" .
  mysqli_real_escape_string($_GET['movie_name'])  .  "'"
);
mysql> SELECT actor, rating FROM movies 
  WHERE name = '
   Self-made Sauerkraut\' UNION ALL 
   SELECT user, password FROM mysql.user WHERE \'\' = \'';
Empty set (0,00 sec)

Half way only thus far…: denial of service attack

Do not blindly escape everything and believe its safe. Injection is still possible like so. Escaping certain characters does not help if the input is used as a numeric constant in a SQL WHERE clause.

mysqli_query(
  $link, "SELECT name, rating FROM movies WHERE rating > " .
  mysqli_real_escape_string($_GET['rating'])
);
mysql> SELECT name, rating FROM movies WHERE rating > 3 AND 0 = IF(1, BENCHMARK(1000000, MD5(REPEAT(CONVERT(1, CHAR), 1000))), 0);
+----------------------+--------+
| name                 | rating |
+----------------------+--------+
| The north sea        |     10 |
| Self-made Sauerkraut |      8 |
| Bavarbike            |     10 |
+----------------------+--------+
3 rows in set (18,38 sec)

One way: do filter and escape

One way to keep your job and to save your company a fortune is to combine filtering and escaping: never trust and user input. Filter it, validate it, sanitize it before you use it. PHP 5.2 and newer comes with a handy set of built-in filter and validation functions. Use them.

$rating = filter_var($_GET['rating'], FILTER_VALIDATE_INT);
if (false === $rating) {
   die("Are you fooling me?");
}

$options = array(
  'options' => array(
     'min_range' => 0,
     'max_range' => 100,
     'default' => 5,
));
$rating = filter_var($_GET['rating'], FILTER_VALIDATE_INT, $options);
if (false === $rating) {
   die("Are you fooling me?");
}

Sorry, this is just a teaser…

… the rest is in the webinar. If you really want to keep your job, then this is not everything you need to know. At some point someone may say "use PDO – it is safe by default" to you. After listening to the webinar recording (available in a few days) you will be able to explain why this is a bit too much as a general statement.

BTW, don’t let your boss fool you with the question whether there is SQL injection with NoSQL stores. Just because something is named NoSQL does not mean injection as such is not possible…

Happy hacking!

@Ulf_Wendel Follow me on Twitter

2012/09/24
by admin
1 Comment

Webinar tomorrow: Top 10 MySQL Tips and Mistakes for PHP Developers

"But 63 slides is too much" has been the initial reply to the draft of tomorrows Top 10 MySQL Tips and Mistakes for PHP Developers web presentation (Tuesday, September 25, 2012: 10:00 CET). SQL injection, security and access control, monitoring for indexing and tuning, choosing adequate data types, character sets, how your web application speaks UTF8 correctly, types of connections and their performance properties as well as planning for tomorrow are too important to allow for less slides, Johannes Schlueter and I thought.

Beginners will be presented with a list of pitfalls to avoid. Advanced users profit from the presentation of the latest news put in context. And, of course, we will answer questions. Whoever of us is not speaking at a time is happy to answer your questions in the chat.

Will we make it to go through 62 slides in 60 minutes? Given that its not the first presentation for either of us I am optimistic.

2012/09/24
by admin
2 Comments

Searching data in NotOnlyMySQL databases: a rich query language?

A popular NoSQL store claims to have a rich query language. A powerful, portable query language is most important. If not powerful, you have to send multiple commands to fetch the desired data. In the worst case a program must be written which contraditcs the idea of a query language. If not portable, you are back to the 1960th and vendor lock-in. Developing portable applications is impossible. Stepping up from one database to another means rewriting your application. SQL does not have any of these issues. Plus, it is widely known which means low training costs and high understanding of its properties.

SQL – a query language for the relational model

It may seem odd to compare a query language for the relational model (SQL) with access methods of a document store. However, the NoSQL store compared with explains how to map their concepts to SQL in their manual! In the following, I’ll do the same. I’ll consider tables as collections and rows of a table as documents.

The SQL query language is build around the following operations:

  • Projection (filter columns)
  • Selection (filter rows)
  • Join (join tables)
  • Union (merge tables)

Some of those operations are not found in the query language of the NoSQL store. There is no join, there is no union. In a way it makes sense. The document stores’ data model can rather be described as a hierarchical than relational one.

  • Projection (filter columns)
  • Selection (filter rows)
  • Join (join tables)
  • Union (merge tables)

The projection

A projection is formulated at the beginning of a SELECT statement.

SELECT
  [DISTINCT]
  attribute |  arithmetric expression |  aggregation function


Let’s see how to query the below given collection of documents – step by step.

[
  {   "street_number" : 64,   "_id" : {   "$oid" : "50520155cc93742e0d0dac65"   },   "firstname" : "Ulf"   },
  {   "street_number" : 64,   "_id" : {   "$oid" : "50520263cc93742e0d0dac7e"   },   "firstname" : "hamster"   }
]

By default SELECT will allow duplicates. DISTINCT is used to filter duplicates. SELECT DISTINCT street_number FROM people maps to db.people.distinct("street_number").


[
{ "street_number" : 64 }
]

Limiting the column list: to fetch all firstnames with SQL one uses SELECT firstname FROM people. The NoSQL store counterpart is db.people.find({}, {firstname:1, _id:0}).


[
  {   "firstname" : "Ulf"   },
  {   "firstname" : "hamster"   }
]

SQL allows you to have an arithmetric expression as part of the projection such as SELECT street_number + 1 FROM people. There is no equivalent in the query language of the NoSQL store I am comparing with. The only option one has is to use MapReduce. MapReduce is not considered a query language in this comparison. A query language is no programming language, it must be more comprehensive . A query language let’s you describe what data you need. How to fetch is up to the database. With MapReduce you have to provide a function (a program) and you have to implement an access path instead of just saying what you need. Thus, MapReduce is beyond the topic of the blog post.

SQL supports the following aggregation functions: COUNT(*), SUM(column), MAX(column), MIN(column), AVG(column). db.people.count() is easy going. But then comes kind of MapReduce: db.people.group({key: {}, initial: {sum: 0}, reduce: function (doc, out) { out.sum += doc.street_number}}). The user has to provide a function.

This leaves me with the following intermediate comparison of SQL and another rich query language.

  • Projection (filter columns): SELECT [DISTINCT] attribute | arithmetric expression | aggregation function
    • DISTINCT
    • arithmetric expression
    • aggregation function
      • COUNT(*)
      • SUM(column) (user to provide function)
      • MAX(column) (user to provide function)
      • MIN(column) (user to provide function)
      • AVG(column) (user to provide function)
  • Selection (filter rows)
  • Join (join tables)
  • Union (merge tables)

A basic selection

The SQL WHERE clause describes the conditions of a selection.

 constant: attribute = | != | < | <= | > | >= constant
 attribute: attribute = | != | < | <= | > | >= attribute
 logical operators: ( attribute ... constant | attribute) and|or|not (...)
 uncertainty: attribute LIKE constant
 NULL: attribute IS NULL  


Simple constant based selection first:

SQL Other
SELECT * FROM people WHERE street_number = 64 db.people.find({"street_number": : 64})
SELECT * FROM people WHERE street_number > 1 db.people.find({"street_number" : {$gt:1}})

SELECT * FROM people WHERE street_number != first_name is an example of an attribute based selection. The NoSQL counterpart is db.people.find({$where : "this.street_number != this.first_name"}). It works but the manual of the NoSQL store notes that $where based comparisons tend to be slow.

SQL and the NoSQL stores query language are on par for the rest of basic selection. AND, OR, NOT map to $and, $or, $not. LIKE is not available but the counterpart to MySQLs’ RLIKE, which is regular expression based. Regular expressions are a superset of LIKE. Regarding NULL the NoSQL store can distinguish between value given but NULL, value given and not NULL and value/column not given at all. No negative surprises.

Selection: subqueries

Although the NoSQL store does support the predicates ALL, ANY, IN, EXISTS it does not feature what the predicates are actually intended for: subqueries.

WHERE
   ...
   ALL, ANY, IN, EXISTS


SELECT * FROM people WHERE street_number IN (1, 64) maps to db.people.find({"street_number" : { $in : [1, 64] }}). But that is a boring one as it could also be written as ((street_number = 1) OR (street_number = 64)).

A subquery such as SELECT * FROM people WHERE street_number IN (SELECT street_number FROM other_people) cannot be expressed with the query language of the NoSQL store. Remember: MapReduce is not considered a query language here.

Is this a rich query language?

The NoSQL store I picked for this comparison is MongoDB. The story may be a different one when comparing SQL against other NoSQL stores.

  • Projection (filter columns): SELECT [DISTINCT] attribute | arithmetric expression | aggregation function
    • DISTINCT
    • arithmetric expression
    • aggregation function
      • COUNT(*)
      • SUM(column) (user to provide function)
      • MAX(column) (user to provide function)
      • MIN(column) (user to provide function)
      • AVG(column) (user to provide function)
  • Selection (filter rows): WHERE … (see above)
    • constant: attribute = | != | < | <= | > | &gt= constant
    • attribute: attribute = | != | < | <= | > | &gt= attribute (see above)
    • logical operators: ( attribute … constant | attribute) and|or|not (…)
    • uncertainty: attribute LIKE constant (must be emulated with regular expressions)
    • NULL: attribute IS NULL
    • ALL, ANY, IN, EXISTS (no subquery)
  • Join (join tables)
  • Union (merge tables)

SQL is very competitive as a query language. It is powerful and lets you describe what data you want. You do not have to write a function or program which is describes how to fetch the data you want. SQL stands for standardized query language. Your applications can be run against a variety of SQL databases. No vendor lock-in.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

2012/09/13
by admin
6 Comments

Eine reiche NoSQL-Anfragesprache?

Eine reiche Anfragesprache verspricht ein populäres NoSQL-System auf seiner Startseite. Reicher als SQL? Ist das auffinden von Informationen einfacher, kann ich besser suchen als mit SQL? Das System gehört zur Gruppe der dokumentenbasierten NoSQL-Stores und wirbt mit Schemafreiheit für sich. Es speichert BSON-Dokumente in Sammlungen. Das Dokument lässt sich grob mit einem Datensatz/Tupel aus einer Tabelle/Relation eines RDBMS vergleichen.

Etwas Relationenalgebra

SQL ist eine konkrete Anfragesprache für eine relationale Datenbank. Eine Anfragalgebra kann benutzt werden um eine Anfragesprache mathematisch exakt zu fassen. Das klingt gruselig für den praktisch orientierten Anwender. Es hilft aber, um sich der Frage zu nähern wie reich die eine oder andere Sprache ist und ist auch gleich vorbei. In der Relationenalgebra finden wir:

  • Projektion (Ausblenden von Spalten)
  • Selektion (Zeilen suchen)
  • Verbund (Tabellen verknüpfen)
  • Vereinigung (Tabellen vereinigen)
  • Differenz (Tabellen abziehen voneinander)
  • Spalten umbenennen (das ist nicht AS, brauchen wir in der Praxis nicht)

Continue Reading →

2012/09/06
by admin
3 Comments

Searching data in NotOnlyMySQL databases: two extremes

MySQL and NoSQL go together. In the data center, that’s a known. Inside MySQL, that’s lesser known. MySQL 5.6 is both a puristic key value store and a full-fledged relational database. MySQL could also be developed into a schema free document store supporting MapReduce as a search method. With regards to searching those are the extremes. What search methods should a NotOnlyMySQL offer? The extremes first.

Thanks, MySQL. Losing a key is no nightmare!

Memcached is a high performance key value store for volatile data. Memcached is part of a classic scaling story: first, we buy a bigger machine for MySQL, then we try replication, then we cache database results in Memcached.

Client
| |
MySQL Protocol Memcache Protocol
| |
MySQL
Complex queries, critical data
Memcached
Key lookup, volatile data

InnoDB and NDB Memcache API plugins turn MySQL 5.6 into a key value store. Adding Memcache protocol support to MySQL was a logical step not only because of the popularity of MySQL being used together with Memcache but also because the Memcache protocol is perfect match for exposing MySQL’s internal high performance storage APIs to the user.

Client
| |
MySQL Protocol Memcache Protocol
| |
MySQL
SQL Key lookup
Data

Data can be inserted into MySQL using both MySQL Protocol or Memcache protocol. Data can be searched using either SQL or doing a key lookup. There are two ways to write and read the same data.

Continue Reading →

2012/08/31
by admin
Comments Off on PHP MySQL persistent, cached, pooled connection – for how long?

PHP MySQL persistent, cached, pooled connection – for how long?

It is a common question: what is the lifespan of a persistent, pooled, cached, however-you-call-it PHP MySQL connection? What about CGI, FastCGI, web server module? The answer is always the same. Standard I/O file handles, including socket connections, are bound to processes. The lifespan of a persistent, pooled, cached, however-it-is-to-be-called-in-context-x connection is that of the PHP process. Depending on the web server deployment model, a PHP process handles one or multiple web requests. Read on only if puzzled.

A process

Do you have any colleguages around? Please, ask them for "APUX". In case of no reply, get a copy and study. To become a master PHP developer you will have to master todays frameworks (Zend Framework, Symfony, Dojo, …) and yesterdays. Yesterdays framework is still dominant. Yesterdays framework is your operating system. "APUX" stands for Advanced Programming in the UNIX Environment.

Building blocks of a process
  Task state and identifier PID = 112, parent, children, …
Credentials User-ID, group-ID, …
Signal handler
Standard I/O Handles (descriptors)
Memory Heap, stack, BSS, …

Standard I/O file handles, including socket connections, are bound to processes. The process may happen to execute the PHP interpreter (interpiler ;-)). PHP may open a socket connection to MySQL. If the process ends, the connection handle to MySQL goes away. Like the memory of a process is free’d, the connection handle is released as well.

A PHP process
 
Standard I/O Connection handle to MySQL

Webserver using CGI

If running PHP via CGI, the MySQL connection pooling/caching/persisting story pretty much ends here. For every (n = 1) web request the webserver starts and stops a process to handle the (n = 1) web request. Any process memory allocated is free’d at the end of the web request. Any standard I/O handle opened is closed. The lifespan of a MySQL connection is that of a single web request, a single script execution.

  1. Accept HTTP web request
    • Start worker process (PHP)
    • Execute *.php script through worker
    • Shutdown worker process
  2. Send worker’s reply to client

A webserver using FastCGI

Once I talk FastCGI, I get questions. I used grey to highlight the differences. The difference is minimal.

  1. Accept HTTP web request
    • Start worker process (PHP), if none cached
    • Execute *.php script through worker
    • Shutdown worker process, if work limit exceeded
  2. Send worker’s reply to client

For every n > 1 web requests the webserver starts and stops a process to handle the (n > 1) web requests. One PHP process handles n web requests. Process memory is available for the duration of n web requests. Standard I/O handles can be kept open for the duration of n web requests. A pooled, cached, persisted MySQL connection is such an I/O handle. It can be reused (at least) n – 1 times by a worker process. Given that every worker handles many thousand requests, it is often reused many thousand times. Things sum up quickly. The time for many thousand connection establishments is saved.

Compared to CGI there is no difference but the value of n. Simple as that.

Web server
FastCGI process FastCGI process
Standard I/O Handles (descriptors) Standard I/O Handles (descriptors)
Memory Heap, stack, BSS, … Memory Heap, stack, BSS, …

Whether you got 1 or 1,000 worker processes makes no difference: standard I/O file handles, including socket connections, are bound to processes. How many connection pools will you have with 1,000 worker processes…?

A (puristic) threaded web server

A threaded web server is a game changer. Threads are execution units within a process. Threads share the resources of a process, for example, process memory or open file handles. Let’s assume the web server operates like this.

  1. Load PHP into web server process
    • Accept HTTP web request
    • Execute *.php script using worker thread
    • Send worker’s reply to client
  2. Shutdown web server process

How many processes? One. How many MySQL connections pools would you have with such a web server? One. How many web requests will be served by a "worker" process? n > 1. How many times can a pooled, cached or persistent MySQL connection be used? At least as many times as web requests are served by the web server process (n>1). If you want to optimize for connection reuse, this is the best you can get.

Threads share the resources of a process, for example, process memory or open file handles? That is a case for APUX or any similar lecture…

Beyond CGI, FastCGI and plain threaded web servers

As an APUX reader you know tracks beyond the main roads. In case you do not plan to ever read APUX, stop here. Otherwise you get puzzled again.

Standard I/O file handles, including socket connections, are bound to processes. True, but child processes inherit descriptors. True, but what about send_fd() from APUX to send a descriptor from one process to another? True, but what about using a deamon? Yes, if we control the process creation, the process life and know what other processes there are, we can break out.

However, CGI and FastCGI isolate worker processes from each other. Process starting, stopping and caching is beyond the control of the worker processes. End of the road.

From within a PHP worker process we have no simple way of keeping a MySQL/DatabaseX connections open at the end of the process. Plus, we do not know what other worker processes there are. We cannot share anything easily. Not being able to do it easily usually means becoming slow or adding dependencies and requirements.

uWSGI

If, for example, you want to deploy an extra deamon process or load a web server module, then… – but that is beyond PHP itself. Outside of PHP there is no one standard web server to extend. If uWSGI was popular in the PHP world, one might want to have a closer look at it. Maybe one could come up with something around the master process or attach-daemon… But, again, that is external to the PHP MySQL extensions.

Any more questions on the Connection Multiplexing Plugin? I still have not given a direct reply to the one question in the announcement blog post. However, you can now answer it yourself, can’t you? Regarding keep alive – there is no ping, it is a prototype…

Happy hacking!

@Ulf_Wendel Follow me on Twitter